What are the boundaries of the correlation coefficient. Pearson's correlation criterion. Work experience in years

Various signs can be related.

There are 2 types of communication between them:

  • functional;
  • correlation.

Correlation translated into Russian, it is nothing more than a connection.
In the case of a correlation, the correspondence of several values ​​of one characteristic to several values ​​of another characteristic is traced. As examples, we can consider the established correlations between:

  • the length of the legs, neck, beak of such birds as herons, cranes, storks;
  • indicators of body temperature and heart rate.

For most biomedical processes, the presence of this type of connection has been statistically proven.

Statistical methods make it possible to establish the fact of the existence of the interdependence of features. The use of special calculations for this leads to the establishment of correlation coefficients (measures of connectivity).

Such calculations are called correlation analysis. It is carried out to confirm the dependence of 2 variables (random variables) on each other, which is expressed by the correlation coefficient.

Using the correlation method allows you to solve several problems:

  • identify the presence of a relationship between the analyzed parameters;
  • knowledge of the presence of a correlation allows us to solve forecasting problems. Thus, there is a real possibility to predict the behavior of a parameter based on the analysis of the behavior of another correlated parameter;
  • carrying out a classification based on the selection of features independent of each other.

For variables:

  • related to the ordinal scale, the Spearman coefficient is calculated;
  • related to the interval scale - Pearson's coefficient.

These are the most commonly used parameters, besides them there are others.

The value of the coefficient can be expressed both positive and negative.

In the first case, with an increase in the value of one variable, an increase in the second is observed. With a negative coefficient, the pattern is reversed.

What is the correlation coefficient for?

Random variables related to each other can have a completely different nature of this relationship. Not necessarily it will be functional, the case when there is a direct relationship between the values. Most often, both quantities are affected by a whole set of various factors, in cases when they are common to both quantities, the formation of related patterns is observed.

This means that the statistically proven fact of the presence of a relationship between the values ​​is not a confirmation that the cause of the observed changes has been established. As a rule, the researcher concludes that there are two interrelated consequences.

Correlation Coefficient Properties

This statistical characteristic has the following properties:

  • the value of the coefficient ranges from -1 to +1. The closer to the extreme values, the stronger the positive or negative relationship between the linear parameters. In the case of a zero value, we are talking about the absence of correlation between features;
  • a positive value of the coefficient indicates that in the case of an increase in the value of one feature, an increase in the second is observed (positive correlation);
  • negative value - in the case of an increase in the value of one feature, a decrease in the second is observed (negative correlation);
  • the approach of the indicator value to the extreme points (either -1 or +1) indicates the presence of a very strong linear relationship;
  • the indicators of the attribute can change with the constant value of the coefficient;
  • the correlation coefficient is dimensionless;
  • the presence of a correlation is not a mandatory confirmation of a causal relationship.

Correlation coefficient values

The strength of the correlation can be characterized by resorting to the Cheldock scale, in which a qualitative characteristic corresponds to a certain numerical value.

In the case of a positive correlation with the value:

  • 0-0.3 - the correlation is very weak;
  • 0.3-0.5 - weak;
  • 0.5-0.7 - medium strength;
  • 0.7-0.9 - high;
  • 0.9-1 is a very high correlation strength.

The scale can also be used for negative correlation. In this case, the quality characteristics are replaced by the opposite ones.

You can use the simplified Cheldock scale, in which only 3 gradations of the strength of the correlation are distinguished:

  • very strong - indicators ± 0.7 - ± 1;
  • average - indicators ± 0.3 - ± 0.699;
  • very weak - indicators 0 - ± 0.299.

This statistical indicator allows not only to check the assumption of the existence of a linear relationship between the signs, but also to establish its strength.

Correlation coefficient types

Correlation coefficients can be classified by sign and meaning:

  • positive;
  • null;
  • negative.

Depending on the analyzed values, the coefficient is calculated:

  • Pearson;
  • Spearman;
  • Kendala;
  • Fechner's signs;
  • concordation or multiple rank correlation.

The Pearson correlation coefficient is used to establish direct relationships between the absolute values ​​of variables. In this case, the distributions of both series of variables should approach normal. Compared variables should differ in the same number of varying features. The scale representing the variables must be an interval or a ratio scale.

  • accurate establishment of the correlation strength;
  • comparison of quantitative features.

There are few disadvantages of using Pearson's linear correlation coefficient:

  • the method is unstable in case of outliers of numerical values;
  • using this method, it is possible to determine the correlation strength only for a linear relationship; for other types of interconnections of variables, the methods of regression analysis should be used.

Rank correlation is determined by the Spearman method, which allows you to statistically study the relationship between phenomena. Thanks to this coefficient, the actually existing degree of parallelism of two quantitatively expressed series of features is calculated, and the tightness of the identified connection is also estimated.

  • not requiring precise determination of the value of the correlation force;
  • compared indicators have both quantitative and attributive values;
  • equalization of series of characteristics with open variants of values.

Spearman's method refers to the methods of nonparametric analysis, so there is no need to check the normal distribution of the feature. In addition, it allows you to compare indicators expressed in different scales. For example, a comparison of the values ​​of the number of erythrocytes in a certain volume of blood (continuous scale) and expert judgment, expressed in points (ordinal scale).

The efficiency of the method is negatively affected by the large difference between the values ​​of the compared quantities. The method is also ineffective in cases where the measured value is characterized by an uneven distribution of values.

Step by step calculation of correlation coefficient in Excel

The calculation of the correlation coefficient involves the sequential execution of a number of mathematical operations.

The above formula for calculating the Pearson coefficient shows how laborious this process is if you perform it manually.
Using the capabilities of Excell speeds up the process of finding the coefficient several times.

It is enough to follow a simple algorithm of actions:

  • introduction of basic information - a column of x values ​​and a column of y values;
  • the "Formulas" tab is selected and opened in the tools;
  • in the opened tab select "Insert function fx";
  • in the dialog box that opens, the statistical function "Correl" is selected, which allows you to calculate the correlation coefficient between 2 data sets;
  • in the opened window, the following data are entered: array 1 - the range of values ​​of the column x (data must be selected), array 2 - the range of values ​​of the column y;
  • the "ok" key is pressed, the result of the coefficient calculation appears in the "value" line;
  • conclusion regarding the presence of a correlation between the 2 data sets and its strength.
The purpose of correlation analysis is to identify the assessment of the strength of the relationship between random variables (features), which characterizes some real process.
Correlation analysis tasks:
a) Measurement of the degree of connectivity (tightness, strength, severity, intensity) of two or more phenomena.
b) The selection of factors that have the most significant impact on the effective attribute, based on measuring the degree of connectivity between the phenomena. Factors significant in this aspect are used further in the regression analysis.
c) Detection of unknown causal relationships.

The forms of manifestation of relationships are very diverse. The most common types of them are functional (complete) and correlation (incomplete) connection.
Correlation link manifests itself on average, for mass observations, when a certain series of probabilistic values ​​of the independent variable corresponds to the given values ​​of the dependent variable. The connection is called correlation, if each value of the factor attribute corresponds to a well-defined non-random value of the effective attribute.
The correlation field serves as a visual representation of the correlation table. It is a graph where X values ​​are plotted on the abscissa axis, Y values ​​are plotted on the ordinate axis, and the X and Y combinations are shown as dots. By the location of the points, one can judge the presence of a connection.
Tightness indicators make it possible to characterize the dependence of the variation of the effective trait on the variation of the trait-factor.
A better indicator of the degree of crowding correlations is an linear correlation coefficient... When calculating this indicator, not only the deviations of the individual values ​​of the trait from the average are taken into account, but also the very magnitude of these deviations.

The key issues of this topic are the equations of the regression relationship between the effective indicator and the explanatory variable, the least squares method for estimating the parameters of the regression model, analyzing the quality of the resulting regression equation, constructing confidence intervals for predicting the values ​​of the effective indicator using the regression equation.

Example 2


System of normal equations.
a n + b∑x = ∑y
a∑x + b∑x 2 = ∑y x
For our data, the system of equations has the form
30a + 5763 b = 21460
5763 a + 1200261 b = 3800360
From the first equation we express a and substitute in the second equation:
We get b = -3.46, a = 1379.33
Regression equation:
y = -3.46 x + 1379.33

2. Calculation of the parameters of the regression equation.
Selected averages.



Sample variances:


Standard deviation


1.1. Correlation coefficient
Covariance.

We calculate the indicator of the tightness of communication. This indicator is a selective linear correlation coefficient, which is calculated by the formula:

The linear correlation coefficient takes values ​​from –1 to +1.
The connections between signs can be weak and strong (close). Their criteria are assessed on the Chaddock scale:
0.1 < r xy < 0.3: слабая;
0.3 < r xy < 0.5: умеренная;
0.5 < r xy < 0.7: заметная;
0.7 < r xy < 0.9: высокая;
0.9 < r xy < 1: весьма высокая;
In our example, the relationship between the Y-factor X-factor is high and inverse.
In addition, the linear pairwise correlation coefficient can be determined through the regression coefficient b:

1.2. Regression equation(estimation of the regression equation).

The linear regression equation is y = -3.46 x + 1379.33

Coefficient b = -3.46 shows the average change in the effective indicator (in units of y) with an increase or decrease in the value of the factor x per unit of measurement. In this example, with an increase of 1 unit, y decreases by an average of -3.46.
The coefficient a = 1379.33 formally shows the predicted level of y, but only if x = 0 is close to the sampled values.
But if x = 0 is far from the sampled values ​​of x, then literal interpretation can lead to incorrect results, and even if the regression line describes the values ​​of the observed sample quite accurately, there is no guarantee that it will also be when extrapolated to the left or right.
Substituting the appropriate x values ​​into the regression equation, you can determine the aligned (predicted) values ​​of the effective indicator y (x) for each observation.
The relationship between y and x determines the sign of the regression coefficient b (if> 0 - direct relationship, otherwise - reverse). In our example, the relationship is inverse.
1.3. Elasticity coefficient.
It is undesirable to use the regression coefficients (in example b) to directly assess the influence of factors on the effective indicator if there is a difference in the units of measurement of the effective indicator y and the factor indicator x.
For these purposes, the coefficients of elasticity and beta are calculated.
The average coefficient of elasticity E shows how many percent on average across the population the result will change at from its average value when changing the factor x 1% of its average.
The coefficient of elasticity is found by the formula:


The coefficient of elasticity is less than 1. Therefore, when X changes by 1%, Y will change by less than 1%. In other words, the influence of X on Y is not significant.
Beta coefficient shows by what part of the value of its standard deviation the value of the effective indicator will change on average when the factor indicator changes by the value of its standard deviation with the value of the remaining independent variables fixed at a constant level:

Those. an increase in x by the value of the standard deviation S x will lead to a decrease in the mean value of Y by 0.74 standard deviation S y.
1.4. Approximation error.
Let us estimate the quality of the regression equation using the absolute approximation error. Average approximation error is the average deviation of the calculated values ​​from the actual ones:


Since the error is less than 15%, then this equation can be used as a regression.
Analysis of variance.
Analysis of variance is aimed at analyzing the variance of the dependent variable:
∑ (y i - y cp) 2 = ∑ (y (x) - y cp) 2 + ∑ (y - y (x)) 2
where
∑ (y i - y cp) 2 - the total sum of the squares of the deviations;
∑ (y (x) - y cp) 2 - sum of squares of deviations due to regression (“explained” or “factorial”);
∑ (y - y (x)) 2 - residual sum of squares of deviations.
Theoretical correlation ratio for a linear relationship is equal to the correlation coefficient r xy.
For any form of dependence, the tightness of the connection is determined using multiple correlation coefficient:

This coefficient is universal, as it reflects the closeness of the relationship and the accuracy of the model, and can also be used for any form of relationship between variables. When constructing a one-factor correlation model, the multiple correlation coefficient is equal to the pair correlation coefficient r xy.
1.6. Determination coefficient.
The square of the (multiple) correlation coefficient is called the coefficient of determination, which shows the proportion of variation in the effective trait explained by the variation in the factor trait.
Most often, giving an interpretation of the coefficient of determination, it is expressed as a percentage.
R 2 = -0.74 2 = 0.5413
those. in 54.13% of cases, changes in x lead to a change in y. In other words, the accuracy of fitting the regression equation is average. The remaining 45.87% change in Y is explained by factors not taken into account in the model.

Bibliography

  1. Econometrics: Textbook / Ed. I.I. Eliseeva. - M .: Finance and statistics, 2001, p. 34..89.
  2. Magnus Ya.R., Katyshev P.K., Peresetskiy A.A. Econometrics. Initial course. Tutorial. - 2nd ed., Rev. - M .: Delo, 1998, p. 17..42.
  3. Workshop on econometrics: Textbook. allowance / I.I. Eliseeva, S.V. Kurysheva, N.M. Gordeenko and others; Ed. I.I. Eliseeva. - M .: Finance and statistics, 2001, p. 5..48.

The correlation coefficient (or linear correlation coefficient) is denoted as "r" (in rare cases as "ρ") and characterizes the linear correlation (that is, the relationship that is defined by some value and direction) of two or more variables. The value of the coefficient lies between -1 and +1, that is, the correlation can be both positive and negative. If the correlation coefficient is -1, there is a perfect negative correlation; if the correlation coefficient is +1, there is a perfect positive correlation. Otherwise, there is a positive correlation between the two variables, a negative correlation, or no correlation. The correlation coefficient can be calculated manually using free online calculators or with a good graphing calculator.

Steps

Calculating the correlation coefficient manually

    Collect data. Before you start calculating the correlation coefficient, study these pairs of numbers. Better to write them down in a table that can be arranged vertically or horizontally. Label each row or column with "x" and "y".

    • For example, given four pairs of values ​​(numbers) of the variables "x" and "y". You can create the following table:
      • x || y
      • 1 || 1
      • 2 || 3
      • 4 || 5
      • 5 || 7
  1. Calculate the arithmetic mean "x". To do this, add up all the x values, and then divide the result by the number of values.

    Find the arithmetic mean "y". To do this, follow the same steps, that is, add up all the y values, and then divide the sum by the number of values.

    Calculate the standard deviation "x". After calculating the means of "x" and "y", find the standard deviations of these variables. The standard deviation is calculated using the following formula:

    Calculate the standard deviation "y". Follow the steps outlined in the previous step. Use the same formula, but plug in the y values.

    Write down the basic formula for calculating the correlation coefficient. This formula includes the means, standard deviations, and the number (n) of pairs of numbers of both variables. The correlation coefficient is denoted as "r" (in rare cases as "ρ"). This article uses a formula to calculate the Pearson correlation coefficient.

    You have calculated the means and standard deviations of both variables, so you can use the formula to calculate the correlation coefficient. Recall that "n" is the number of pairs of values ​​for both variables. Other values ​​have been calculated earlier.

    • In our example, the calculations will be written like this:
    • ρ = (1 n - 1) Σ (x - μ x σ x) ∗ (y - μ y σ y) (\ displaystyle \ rho = \ left ((\ frac (1) (n-1)) \ right) \ Sigma \ left ((\ frac (x- \ mu _ (x)) (\ sigma _ (x))) \ right) * \ left ((\ frac (y- \ mu _ (y)) (\ sigma _ (y))) \ right))
    • ρ = (1 3) ∗ (\ displaystyle \ rho = \ left ((\ frac (1) (3)) \ right) *)[ (1 - 3 1.83) ∗ (1 - 4 2. 58) + (2 - 3 1.83) ∗ (3 - 4 2. 58) (\ displaystyle \ left ((\ frac (1-3) ( 1.83)) \ right) * \ left ((\ frac (1-4) (2.58)) \ right) + \ left ((\ frac (2-3) (1.83)) \ right) * \ left ((\ frac (3-4) (2.58)) \ right))
      + (4 - 3 1.83) ∗ (5 - 4 2. 58) + (5 - 3 1.83) ∗ (7 - 4 2. 58) (\ displaystyle + \ left ((\ frac (4-3 ) (1.83)) \ right) * \ left ((\ frac (5-4) (2.58)) \ right) + \ left ((\ frac (5-3) (1.83)) \ right) * \ left ((\ frac (7-4) (2.58)) \ right))]
    • ρ = (1 3) ∗ (6 + 1 + 1 + 6 4.721) (\ displaystyle \ rho = \ left ((\ frac (1) (3)) \ right) * \ left ((\ frac (6 + 1 + 1 + 6) (4.721)) \ right))
    • ρ = (1 3) ∗ 2.965 (\ displaystyle \ rho = \ left ((\ frac (1) (3)) \ right) * 2.965)
    • ρ = (2.965 3) (\ displaystyle \ rho = \ left ((\ frac (2.965) (3)) \ right))
    • ρ = 0.988 (\ displaystyle \ rho = 0.988)
  2. Analyze the result. In our example, the correlation coefficient is 0.988. This value in some way characterizes a given set of pairs of numbers. Pay attention to the sign and magnitude of the value.

    • Since the value of the correlation coefficient is positive, there is a positive correlation between the variables "x" and "y". That is, as the value of "x" increases, the value of "y" also increases.
    • Since the value of the correlation coefficient is very close to +1, the values ​​of the variables "x" and "y" are highly correlated. If you put points on the coordinate plane, they will be located close to some straight line.

    Using Online Calculators to Calculate the Correlation Coefficient

    1. Find a calculator on the Internet to calculate the correlation coefficient. This coefficient is often calculated in statistics. If there are many pairs of numbers, it is almost impossible to calculate the correlation coefficient manually. Therefore, there are online calculators to calculate the correlation coefficient. In a search engine, enter "correlation coefficient calculator" (without the quotes).

      Enter data. Check the instructions on the website to enter the correct data (pairs of numbers). It is imperative to enter the appropriate pairs of numbers; otherwise, you will get the wrong result. Please be aware that different websites have different input formats.

      • For example, at http://ncalculators.com/statistics/correlation-coefficient-calculator.htm, the values ​​of the variables x and y are entered in two horizontal lines. The values ​​are separated by commas. That is, in our example, the values ​​"x" are entered like this: 1,2,4,5, and the values ​​"y" like this: 1,3,5,7.
      • On another site, http://www.alcula.com/calculators/statistics/correlation-coefficient/, data is entered vertically; in this case, do not confuse the corresponding pairs of numbers.
    2. Calculate the correlation coefficient. After entering the data, simply click on the "Calculate", "Calculate" or similar button to get the result.

    Using a graphing calculator

    1. Enter data. Take a graphing calculator, go into statistical calculation mode and select the "Edit" command.

      • Different calculators require different keys to be pressed. This article discusses the Texas Instruments TI-86 calculator.
      • To enter the statistical calculation mode, press - Stat (above the "+" key). Then press F2 - Edit.
    2. Delete the previously saved data. Most calculators keep the statistics you enter until you erase them. To avoid confusing old data with new ones, first delete any stored information.

      • Use the arrow keys to move the cursor and highlight the 'xStat' heading. Then press Clear and Enter to clear all values ​​entered in the xStat column.
      • Use the arrow keys to highlight the 'yStat' heading. Then press Clear and Enter to clear all values ​​entered in the yStat column.
    3. Enter the initial data. Use the arrow keys to move the cursor to the first cell under the heading "xStat". Enter the first value and press Enter. At the bottom of the screen, “xStat (1) = __” is displayed, with the entered value replacing a space. After you press Enter, the entered value will appear in the table, and the cursor will move to the next line; this will display "xStat (2) = __" at the bottom of the screen.

      • Enter all the values ​​for the variable "x".
      • After entering all the values ​​for x, use the arrow keys to navigate to the yStat column and enter the values ​​for y.
      • After entering all pairs of numbers, press Exit to clear the screen and exit the aggregation mode.
    4. Calculate the correlation coefficient. It characterizes how close the data is to a certain straight line. The graphing calculator can quickly determine the suitable straight line and calculate the correlation coefficient.

      • Click Stat - Calc. On the TI-86, press - -.
      • Select the Linear Regression function. On the TI-86 press which is labeled "LinR". The line “LinR _” will be displayed on the screen with a blinking cursor.
      • Now enter the names of two variables: xStat and yStat.
        • On TI-86, open the list of names; to do this, press - -.
        • The available variables are displayed on the bottom line of the screen. Select (you probably need to press F1 or F2 to do this), enter a comma, and then select.
        • Press Enter to process the entered data.

Step 3. Finding the relationship between data

Linear correlation

The last stage of the task of studying the connections between phenomena is the assessment of the tightness of the connection according to the indicators of the correlation connection. This stage is very important for identifying the dependencies between factorial and effective signs, and, consequently, for the possibility of diagnosing and predicting the phenomenon under study.

Diagnosis(from the Greek. diagnosis recognition) - determination of the essence and characteristics of the state of an object or phenomenon on the basis of its comprehensive study.

Forecast(from the Greek. prognosis foresight, prediction) - any specific prediction, judgment about the state of any phenomenon in the future (weather forecast, election outcome, etc.). A forecast is a scientifically grounded hypothesis about the probable future state of the studied system, object or phenomenon and the indicators characterizing this state. Forecasting - the development of a forecast, special scientific research of specific prospects for the development of a phenomenon.

Let's remember the definition of correlation:

Correlation- the relationship between random variables, expressed in the fact that the distribution of one quantity depends on the value of another quantity.

A correlation is observed not only between quantitative but also qualitative features. There are various methods and indicators for assessing the tightness of ties. We will only focus on linear pair correlation coefficient , which is used when there is a linear relationship between random variables. In practice, it is often necessary to determine the level of connection between random variables of unequal dimensions, so it is desirable to have some dimensionless characteristic of this connection. Such a characteristic (measure of connection) is the linear correlation coefficient r xy, which is determined by the formula

where , .

Denoting and, you can get the following expression for calculating the correlation coefficient

.

Introducing the concept normalized deviation , which expresses the deviation of the correlated values ​​from the mean in fractions of the standard deviation:



then the expression for the correlation coefficient takes the form

.

If you calculate the correlation coefficient based on the total values ​​of the initial random variables from the calculation table, then the correlation coefficient can be calculated using the formula

.

Linear correlation coefficient properties:

1). The correlation coefficient is a dimensionless quantity.

2). |r| £ 1 or.

3). , a, b= const, - the value of the correlation coefficient will not change if all values ​​of the random variables X and Y are multiplied (or divided) by a constant.

4). , a, b= const, - the value of the correlation coefficient will not change if all values ​​of the random variables X and Y are increased (or decreased) by a constant.

5). There is a relationship between the correlation coefficient and the regression coefficient:

The values ​​of the correlation coefficients can be interpreted as follows:

Quantitative criteria for assessing the tightness of communication:

For prognostic purposes, the values ​​with | r | > 0.7.

The correlation coefficient allows us to conclude that there is a linear relationship between two random variables, but does not indicate which of the values ​​determines the change in the other. In fact, the relationship between two random variables can exist without a causal relationship between the quantities themselves, since a change in both random variables can be caused by a change (influence) of the third one.

Correlation coefficient r xy is symmetric with respect to the considered random variables X and Y... This means that for determining the correlation coefficient, it is completely indifferent which of the quantities is independent and which is dependent.

Significance of the correlation coefficient

Even for independent variables, the correlation coefficient may turn out to be nonzero due to random scatter of measurements or due to a small sample of random variables. Therefore, the significance of the correlation coefficient should be checked.

The significance of the linear correlation coefficient is checked based on Student's t-test :

.

If t > t cr(P, n-2), then the linear correlation coefficient is significant, and therefore, the statistical relationship is also significant X and Y.

.

For the convenience of calculations, tables of values ​​of the confidence limits of the correlation coefficients for various numbers of degrees of freedom have been created. f = n–2 (two-sided test) and different levels of significance a= 0.1; 0.05; 0.01 and 0.001. It is considered that the correlation is significant if the calculated correlation coefficient exceeds the value of the confidence limit of the correlation coefficient for the given f and a.

For large n and a= 0.01 the value of the confidence limit of the correlation coefficient can be calculated using the approximate formula

.

The correlation coefficient is the degree of relationship between two variables. Its calculation gives an idea of ​​whether there is a relationship between the two data sets. Unlike regression, correlation does not predict the values ​​of quantities. However, the calculation of the coefficient is an important step in the preliminary statistical analysis. For example, we have found that the correlation coefficient between the level of foreign direct investment and the rate of GDP growth is high. This gives us the idea that in order to ensure prosperity, it is necessary to create a favorable climate specifically for foreign entrepreneurs. Not such an obvious conclusion at first glance!

Correlation and causality

Perhaps, there is not a single sphere of statistics that would have become so firmly entrenched in our life. The correlation coefficient is used in all areas of public knowledge. Its main danger lies in the fact that often its high values ​​are speculated in order to convince people and make them believe in some conclusions. However, in reality, a strong correlation does not at all indicate a causal relationship between the quantities.

Correlation coefficient: Pearson and Spearman formula

There are several main indicators that characterize the relationship between two variables. Historically, the first is Pearson's linear correlation coefficient. It is held at school. It was developed by K. Pearson and J. Youl based on the works of Fr. Galton. This coefficient allows you to see the relationship between rational numbers that change rationally. It is always greater than -1 and less than 1. A negative number indicates an inverse relationship. If the coefficient is zero, then there is no relationship between the variables. Equal to a positive number - there is a directly proportional relationship between the studied values. Spearman's rank correlation coefficient simplifies calculations by building a hierarchy of variable values.

Relationships between variables

Correlation helps answer two questions. First, is the relationship between the variables positive or negative. Second, how strong the addiction is. Correlation analysis is a powerful tool with which you can obtain this important information. It is easy to see that household income and expenses are falling and rising in proportion. This relationship is considered positive. On the contrary, when the price of a product rises, the demand for it falls. This relationship is called negative. The correlation coefficient values ​​are in the range between -1 and 1. Zero means that there is no relationship between the studied values. The closer the obtained indicator is to the extreme values, the stronger the relationship (negative or positive). The absence of dependence is evidenced by the coefficient from -0.1 to 0.1. It should be understood that such a value indicates only the absence of a linear connection.

Application features

The use of both indicators involves certain assumptions. First, the presence of a strong bond does not lead to the fact that one quantity determines another. There may well be a third quantity that defines each of them. Second, Pearson's high correlation coefficient does not indicate a causal relationship between the studied variables. Third, it shows an extremely linear relationship. Correlation can be used to evaluate meaningful quantitative data (eg, air pressure, air temperature) rather than categories such as gender or favorite color.

Multiple correlation coefficient

Pearson and Spearman examined the relationship between the two variables. But how to act if there are three or even more. This is where the multiple correlation coefficient comes in. For example, the gross national product is influenced not only by foreign direct investment, but also by the monetary and fiscal policy of the state, as well as the level of exports. The growth rate and the volume of GDP are the result of the interaction of a number of factors. However, it should be understood that the multiple correlation model is based on a number of simplifications and assumptions. First, multicollinearity between quantities is eliminated. Second, the relationship between the dependent and the influencing variables is considered linear.

Areas of Use of Correlation and Regression Analysis

This method of finding the relationship between values ​​is widely used in statistics. It is most often resorted to in three main cases:

  1. To test the causal relationship between the values ​​of two variables. As a result, the researcher hopes to find a linear relationship and derive a formula that describes these relationships between quantities. Their units of measurement can be different.
  2. To check if there is a relationship between values. In this case, no one determines which variable is dependent. It may turn out that the value of both quantities determines some other factor.
  3. To derive the equation. In this case, you can simply substitute numbers into it and find out the values ​​of the unknown variable.

Man in search of a causal relationship

Consciousness is arranged in such a way that we definitely need to explain the events that are happening around. A person is always looking for a connection between the picture of the world in which he lives and the information he receives. Oftentimes, the brain creates order out of chaos. He can easily see a causal relationship where there is none. Scientists have to specially learn to overcome this tendency. The ability to assess connections between data objectively is essential in an academic career.

Media bias

Consider how the presence of a correlation can be misinterpreted. A group of UK students with bad behavior were asked if their parents smoked. Then the test was published in the newspaper. The result showed a strong correlation between parental smoking and their children's delinquency. The professor who conducted this study even suggested putting a warning on the cigarette packs about it. However, there are a number of problems with this conclusion. First, the correlation does not indicate which of the quantities is independent. Therefore, it is entirely possible to assume that the addiction of parents is caused by the disobedience of children. Second, it cannot be said with certainty that both problems did not appear due to some third factor. For example, low income families. The emotional aspect of the initial findings of the professor who conducted the research should be noted. He was an ardent opponent of smoking. Therefore, it is not surprising that he interpreted the results of his research in this way.

conclusions

Misinterpreting correlation as a causal relationship between two variables can lead to embarrassing research errors. The problem is that it lies at the very foundation of human consciousness. Many marketing tricks are based on this feature. Understanding the difference between causation and correlation allows you to rationally analyze information both in everyday life and in your professional career.

Share with friends or save for yourself:

Loading...