When an increase in one variable result in a decrease in another variable the relationship is directly proportional?

Correlation and Causation

What are correlation and causation and how are they different?


Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income).

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.

Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.



Why are correlation and causation important?

The objective of much research or scientific analysis is to identify the extent to which one variable relates to another variable. For example:

  • Is there a relationship between a person's education level and their health?
  • Is pet ownership associated with living longer?
  • Did a company's marketing campaign increase their product sales?

These and other questions are exploring whether a correlation exists between the two variables, and if there is a correlation then this may guide further research into investigating whether one action causes the other. By understanding correlation and causality, it allows for policies and programs that aim to bring about a desired outcome to be better targeted.

How is correlation measured?
For two variables, a statistical correlation is measured by the use of a Correlation Coefficient, represented by the symbol (r), which is a single number that describes the degree of relationship between two variables.

The coefficient's numerical value ranges from +1.0 to –1.0, which provides an indication of the strength and direction of the relationship.

If the correlation coefficient has a negative value (below 0) it indicates a negative relationship between the variables. This means that the variables move in opposite directions (ie when one increases the other decreases, or when one decreases the other increases).

If the correlation coefficient has a positive value (above 0) it indicates a positive relationship between the variables meaning that both variables move in tandem, i.e. as one variable decreases the other also decreases, or when one variable increases the other also increases.

Where the correlation coefficient is 0 this indicates there is no relationship between the variables (one variable can remain constant while the other increases or decreases).

While the correlation coefficient is a useful measure, it has its limitations:

Correlation coefficients are usually associated with measuring a linear relationship.


For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.


Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered.
For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season (ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream) rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.

How can causation be established?

Causality is the area of statistics that is commonly misunderstood and misused by people in the mistaken belief that because the data shows a correlation that there is necessarily an underlying causal relationship

The use of a controlled study is the most effective way of establishing causality between variables. In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way. The two groups then receive different treatments, and the outcomes of each group are assessed.

For example, in medical research, one group may receive a placebo while the other group is given a new type of medication. If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes.

Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.

The objective of these studies is to provide statistical information to add to the other sources of information that would be required for the process of establishing whether or not causality exists between two variables.

Return to Statistical Language Homepage

Further information

ABS:


1500.0 - A guide for using statistics for evidence based policy

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.

An inverse correlation, also known as negative correlation, is a contrary relationship between two variables such that when the value of one variable is high then the value of the other variable is probably low.

For example, with variables A and B, as A has a high value, B has a low value, and as A has a low value, B has a high value. In statistical terminology, an inverse correlation is often denoted by the correlation coefficient "r" having a value between -1 and 0, with r = -1 indicating perfect inverse correlation.

  • Inverse (or negative) correlation is when two variables in a data set are related such that when one is high the other is low.
  • Even though two variables may have a strong negative correlation, this does not necessarily imply that the behavior of one has any causal influence on the other.
  • The relationship between two variables can change over time and may have periods of positive correlation as well.

Two sets of data points can be plotted on a graph on an x and y-axis to check for correlation. This is called a scatter diagram, and it represents a visual way to check for a positive or negative correlation. The graph below illustrates a strong inverse correlation between two sets of data points plotted on the graph.

Image by Sabrina Jiang © Investopedia 2021

Correlation can be calculated between variables within a set of data to arrive at a numerical result, the most common of which is known as Pearson's r. When r is less than 0, this indicates an inverse correlation. Here is an arithmetic example calculation of Pearson's r, with a result that shows an inverse correlation between two variables.

Assume an analyst needs to calculate the degree of correlation between the X and Y in the following data set with seven observations on the two variables:

  • X: 55, 37, 100, 40, 23, 66, 88
  • Y: 91, 60, 70, 83, 75, 76, 30

There are three steps involved in finding the correlation. First, add up all the X values to find SUM(X), add up all the Y values to find SUM(Y) and multiply each X value with its corresponding Y value and sum them to find SUM(X,Y):

SUM ( X ) = 55 + 37 + 100 + 40 + 23 + 66 + 88 = 409 \begin{aligned} \text{SUM}(X) &= 55 + 37 + 100 + 40 + 23 + 66 + 88 \\ &= 409 \\ \end{aligned} SUM(X)=55+37+100+40+23+66+88=409

SUM ( Y ) = 91 + 60 + 70 + 83 + 75 + 76 + 30 = 485 \begin{aligned} \text{SUM}(Y) &= 91 + 60 + 70 + 83 + 75 + 76 + 30 \\ &= 485 \\ \end{aligned} SUM(Y)=91+60+70+83+75+76+30=485

SUM ( X , Y ) = ( 55 × 91 ) + ( 37 × 60 ) + … + ( 88 × 30 ) = 26 , 926 \begin{aligned} \\\text{SUM}(X,Y) &= (55 \times 91) + (37 \times 60) + \dotso + (88 \times 30) \\&= 26,926 \\\end{aligned} SUM(X,Y)=(55×91)+(37×60)++(88×30)=26,926

The next step is to take each X value, square it and sum up all these values to find SUM(x2). The same must be done for the Y values:

SUM ( X 2 ) = ( 5 5 2 ) + ( 3 7 2 ) + ( 10 0 2 ) + … + ( 8 8 2 ) = 28 , 623 \text{SUM}(X^2) = (55^2) + (37^2) + (100^2) + \dotso + (88^2) = 28,623 SUM(X2)=(552)+(372)+(1002)++(882)=28,623

SUM ( Y 2 ) = ( 9 1 2 ) + ( 6 0 2 ) + ( 7 0 2 ) + … + ( 3 0 2 ) = 35 , 971 \text{SUM}(Y^2) = (91^2) + (60^2) + (70^2) + \dotso + (30^2) = 35,971 SUM(Y2)=(912)+(602)+(702)++(302)=35,971

Noting there are seven observations, n, the following formula can be used to find the correlation coefficient, r:

r = [ n × ( SUM ( X , Y ) − ( SUM ( X ) × ( SUM ( Y ) ) ] [ ( n × SUM ( X 2 ) − SUM ( X ) 2 ] × [ n × SUM ( Y 2 ) − SUM ( Y ) 2 ) ] r = \frac{[n \times (\text{SUM}(X,Y) - (\text{SUM}(X) \times ( \text{SUM}(Y) ) ]} {\sqrt{[(n \times \text{SUM}(X^2) - \text{SUM}(X)^2 ] \times [n \times \text{SUM}(Y^2) - \text{SUM}(Y)^2)]}} r=[(n×SUM(X2)SUM(X)2]×[n×SUM(Y2)SUM(Y)2)][n×(SUM(X,Y)(SUM(X)×(SUM(Y))]

In this example, the correlation is:

  • r = ( 7 × 26 , 926 − ( 409 × 485 ) ) ( ( 7 × 28 , 623 − 40 9 2 ) × ( 7 × 35 , 971 − 48 5 2 ) ) r = \frac{(7 \times 26,926 - (409 \times 485))} {\sqrt{((7 \times 28,623 - 409^2) \times (7 \times 35,971 - 485^2))}} r=((7×28,6234092)×(7×35,9714852))(7×26,926(409×485))
  • r = 9 , 883 ÷ 23 , 414 r = 9,883 \div 23,414 r=9,883÷23,414
  • r = − 0.42 r = -0.42 r=0.42

The two data sets have a correlation of -0.42, which is called an inverse correlation because it is a negative number.

Inverse correlation tells you that when one variable is high, the other tends to be low. Correlation analysis can reveal useful information about the relationship between two variables, such as how the stock and bond markets often move in opposite directions.

The correlation coefficient is often used in a predictive manner to estimate metrics like the risk reduction benefits of portfolio diversification and other important data. If the returns on two different assets are negatively correlated, then they can balance each other out if included in the same portfolio.

In financial markets, a well-known example of an inverse correlation is probably the one between the U.S. dollar and gold. As the U.S. dollar depreciates against major currencies, the dollar price of gold is generally observed to rise, and as the U.S. dollar appreciates, gold declines in price.

Two points need to be kept in mind with regard to a negative correlation. First, the existence of a negative correlation, or positive correlation for that matter, does not necessarily imply a causal relationship. Even though two variables have a very strong inverse correlation, this result by itself does not demonstrate a cause-and-effect relationship between the two.

Second, when dealing with time series data, such as most financial data, the relationship between two variables is not static and can change over time. This means the variables may display an inverse correlation during some periods and a positive correlation during others. Because of this, using the results of correlation analysis to extrapolate the same conclusion to future data carries a high degree of risk.