Econometric Theory/Regression versus Causation and Correlation

Statistical and deterministic Relationships


A deterministic relationship implies that there is an exact mathematical relationship or dependence between variables. An example in physics is Newton's law of gravity:  , where F, the force, is proportional to a constant, k, the mass of two objects,   and  , and inversely to the square of the distance.

A random or stochastic relationship allows that there is some variation in a relationship. This is where probability distributions will enter later on. The relationship may not be exact due to

  • measurement errors
  • reporting errors
  • computing errors
  • other influence,

etc. One example is crop yield relative to rain fall. We may not be able to measure the amount of rain accurately (measurement error), we round off to one decimal point (reporting error), there is a bug in the computer software that computes the sum (computing error) and there might be other factors such the quality of fertilisers, the quality of the earth and pollution (other influences).

Regression versus Causation


Regression deals with dependence amongst variables within a model. But it cannot always imply causation. For example, we stated above that rainfall affects crop yield and there is data that support this. However, this is a one-way relationship: crop yield cannot affect rainfall. It means there is no cause and effect reaction on regression if there is no causation.

In short, we conclude that a statistical relationship does not imply causation.[1]

Regression versus Correlation


Correlations form a branch of analysis called correlation analysis, in which the degree of linear association is measured between two variables. If we calculate the correlation between crop yield and rainfall, we might obtain an estimate of, say, 0.69. This is reasonably high to prove that there is a mathematical relationship between them.

There is a distinction in how we regard the relationship between rainfall and crop yield. In statistics, both variables are assumed to be variables with random error in them. Both are treated on an equal footing and there is no distinction between them..

In regression analysis, crop yield is the dependent variable and rainfall is the explanatory variable, according to our theory. The distinction is that the independent variable has no random component, all values are fixed from this distribution

This will be important in {section on measurement}.





Correlation does not imply causation


  • Gujarati, D.N. (2003). Basic Econometrics, International Edition - 4th ed. McGraw-Hill Higher Education. pp. 22–24. ISBN 0-07-112342-3.
  1. Gujarati (2003, p. 23)