Least-squares data fit. Function approximation by the least squares method. Function approximation with MathCAD

(see figure). It is required to find the equation of the line

The smaller the number in absolute value, the better the straight line (2) is selected. As a characteristic of the accuracy of the selection of straight line (2), we can take the sum of squares

The minimum conditions for S will be

(6)
(7)

Equations (6) and (7) can be written as follows:

(8)
(9)

From equations (8) and (9) it is easy to find a and b from the experimental values ​​x i and y i. Line (2), defined by equations (8) and (9), is called the line obtained by the method of least squares (this name emphasizes that the sum of squares S has a minimum). Equations (8) and (9), from which the straight line (2) is determined, are called normal equations.

You can indicate a simple and general way of writing normal equations. Using the experimental points (1) and equation (2), we can write the system of equations for a and b

y 1 = ax 1 + b,
y 2 = ax 2 + b,
...
(10)
y n = ax n + b,

We multiply the left and right sides of each of these equations by the coefficient of the first unknown a (i.e., by x 1, x 2, ..., x n) and add the resulting equations, the result is the first normal equation (8).

We multiply the left and right sides of each of these equations by the coefficient of the second unknown b, i.e. by 1, and add the resulting equations, the result is the second normal equation (9).

This method of obtaining normal equations is general: it is suitable, for example, for the function

there is a constant value and it must be determined from experimental data (1).

The system of equations for k can be written:

Find line (2) using the least squares method.

Solution. We find:

x i = 21, y i = 46.3, x i 2 = 91, x i y i = 179.1.

We write down equations (8) and (9)

From here we find

Estimating the accuracy of the least squares method

Let us give an estimate of the accuracy of the method for the linear case when equation (2) holds.

Let the experimental values ​​x i be exact, and the experimental values ​​y i have random errors with the same variance for all i.

Let us introduce the notation

(16)

Then the solutions of equations (8) and (9) can be represented in the form

(17)
(18)
where
(19)
From equation (17) we find
(20)
Similarly, from Eq. (18), we obtain

(21)
because
(22)
From equations (21) and (22) we find
(23)

Equations (20) and (23) give an estimate of the accuracy of the coefficients determined by equations (8) and (9).

Note that the coefficients a and b are correlated. We find their correlation moment by simple transformations.

From here we find

0.072 for x = 1 and 6,

0.041 at x = 3.5.

Literature

Shore. Ya. B. Statistical methods of analysis and quality and reliability control. M.: Gosenergoizdat, 1962, p. 552, S. 92-98.

This book is intended for a wide range of engineers (research institutes, design bureaus, test sites and factories) involved in determining the quality and reliability of electronic equipment and other mass industrial products (mechanical engineering, instrument making, artillery, etc.).

The book provides an application of the methods of mathematical statistics to the processing and evaluation of test results, which determine the quality and reliability of the tested products. For the convenience of the readers, the necessary information from mathematical statistics is provided, as well as a large number of auxiliary mathematical tables that facilitate the necessary calculations.

The presentation is illustrated by a large number of examples taken from the field of radio electronics and artillery technology.

Least square method

In the final lesson of the topic, we will get acquainted with the most famous application FNP, which finds the widest application in various fields of science and practice. It can be physics, chemistry, biology, economics, sociology, psychology, and so on, and so on. By the will of fate, I often have to deal with the economy, and therefore today I will issue you a ticket to an amazing country called Econometrics=) ... How do you not want it ?! It's very good there - you just need to make up your mind! ... But what you probably definitely want is to learn how to solve problems least squares method... And especially diligent readers will learn how to solve them not only faultlessly, but also VERY FAST ;-) But first general problem statement+ related example:

Let in some subject area the indicators are investigated that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be both a scientific hypothesis and based on elementary common sense. Leaving science aside, however, and exploring more mouth-watering areas - namely grocery stores. Let us denote by:

- retail space of a grocery store, sq.m.,
- annual turnover of the grocery store, million rubles.

It is absolutely clear that the larger the area of ​​the store, the more its turnover will be in most cases.

Suppose that after observing / experimenting / calculating / dancing with a tambourine, we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate estimate of the turnover can be obtained by means of mathematical statistics... However, let's not be distracted, the course of commercial espionage - it is already paid =)

Tabular data can also be written in the form of dots and depicted in the usual for us Cartesian system .

Let's answer an important question: how many points do you need for a qualitative study?

The bigger, the better. The minimum allowable set consists of 5-6 points. In addition, with a small amount of data, the sample cannot include “anomalous” results. So, for example, a small elite store can help out by orders of magnitude more "its colleagues", thereby distorting the general pattern that needs to be found!



To put it quite simply - we need to choose a function, schedule which passes as close as possible to the points ... This function is called approximating (approximation - approximation) or theoretical function ... Generally speaking, there immediately appears an obvious "challenger" - a high degree polynomial whose graph passes through ALL points. But this option is difficult, and often simply incorrect. (since the chart will be “twisting” all the time and reflecting poorly the main trend).

Thus, the sought function should be simple enough and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares method... First, let's take a look at its essence in general terms. Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us calculate the differences (deviations) between the experimental and functional values (studying the drawing)... The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative. (for example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to accept the sum modules deviations:

or collapsed: (suddenly, who does not know: Is the sum icon, and - auxiliary variable - "counter", which takes values ​​from 1 to ) .

Approaching the experimental points with different functions, we will get different values, and it is obvious where this sum is less - that function is more accurate.

Such a method exists and it is called least modulus method... However, in practice, it has become much more widespread. least square method, in which possible negative values ​​are eliminated not by the modulus, but by squaring the deviations:



, after which efforts are directed to the selection of such a function so that the sum of the squares of the deviations was as small as possible. Actually, hence the name of the method.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also a lot of such functions: linear , hyperbolic , exponential , logarithmic , quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." Which class of functions to choose for research? A primitive but effective trick:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for equation of a straight line with optimal values ​​and. In other words, the task is to find SUCH coefficients - so that the sum of the squares of the deviations is the smallest.

If the points are located, for example, along hyperbole, then it is a priori clear that a linear function will give a bad approximation. In this case, we are looking for the most "favorable" coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now, note that in both cases we are talking about functions of two variables whose arguments are parameters of wanted dependencies:

And in essence, we need to solve a standard problem - to find minimum function of two variables.

Let's remember our example: suppose that the "store" points tend to be located in a straight line and there is every reason to believe that linear relationship turnover from the retail space. Let's find SUCH coefficients "a" and "bs" so that the sum of the squares of the deviations was the smallest. Everything is as usual - first 1st order partial derivatives... According to linearity rule you can differentiate directly under the amount icon:

If you want to use this information for an essay or course book, I will be very grateful for the link in the list of sources, you will find such detailed calculations in few places:

Let's compose a standard system:

We reduce each equation by "two" and, in addition, "break up" the sums:

Note : Analyze on your own why “a” and “bie” can be taken out for the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Amounts can we find? Easily. We compose the simplest system of two linear equations in two unknowns("A" and "bh"). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for extremum, one can make sure that at this point the function achieves exactly minimum... Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewedhere ) ... We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer ... Roughly speaking, its graph goes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what turnover ("Game") will be at the store with one or another value of the retail space (this or that value "x")... Yes, the forecast obtained will be only a forecast, but in many cases it will be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level of the 7-8 grade school curriculum. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponent and some other functions.

In fact, it remains to hand out the promised buns - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between the two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function ... Find the sum of the squares of the deviations between empirical and theoretical values. Figure out if the function would be better (from the point of view of the method of least squares) zoom in on experimental points.

Note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular problem, both "x" and "game" values ​​can be fully or partially negative. Well, we have a “faceless” task, and we start it solution:

We find the coefficients of the optimal function as a solution to the system:

For the sake of a more compact notation, the "counter" variable can be omitted, since it is already clear that the summation is carried out from 1 to.

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we obtain the following the system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term-by-term... But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means that the system has a unique solution.

Let's check. I understand that I don’t want to, but why skip errors where they can be completely avoided? We substitute the found solution into the left side of each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the required approximating function: - from of all linear functions it is she who approximates the experimental data in the best way.

Unlike straight dependence of the turnover of the store on its area, the dependence found is reverse (the principle "the more - the less"), and this fact is immediately revealed by the negative slope... Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As the saying goes, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find two of its values:

and execute the drawing:

The constructed line is called trend line (namely, a linear trend line, i.e., in the general case, a trend is not necessarily a straight line)... Everyone is familiar with the expression "be in trend", and I think that this term does not need additional comments.

Let's calculate the sum of the squares of the deviations between empirical and theoretical values. Geometrically, it is the sum of the squares of the lengths of the "crimson" segments (two of which are so small that you can't even see them).

Let's summarize the calculations in a table:


They can again be done manually, just in case I will give an example for the 1st point:

but it is much more efficient to act in a well-known way:

Let's repeat: what is the meaning of the obtained result? From of all linear functions function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squares of deviations - in order to distinguish, I will designate them with the letter "epsilon". The technique is exactly the same:


And again, just for every fireman, calculations for the 1st point:

In Excel, we use the standard function EXP (see the Excel Help for the syntax).

Output:, which means that the exponential function approximates the experimental points worse than the straight line .

But here it should be noted that "worse" is does not mean yet, what is wrong. Now I have plotted this exponential function - and it also goes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, natural "xes" number months, years or other equal time intervals. Consider, for example, a problem like this:

We have the following data on retail turnover of the store for the first half of the year:

Using analytical straight line alignment, determine the turnover for July.

Yes, no problem: we number the months 1, 2, 3, 4, 5, 6 and use the usual algorithm, as a result of which we get an equation - the only thing when it comes to time is usually the letter "te" (although this is not critical)... The resulting equation shows that in the first half of the year, trade increased by an average of 27.74 units. per month. Get the forecast for July (month no. 7): d.e.

And such tasks - darkness is dark. Those who wish can use an additional service, namely my Excel calculator (demo version), which the solves the analyzed problem almost instantly! The working version of the program is available in exchange or for token.

At the end of the lesson, brief information on finding dependencies of some other types. Actually, there is nothing special to tell, since the principled approach and the solution algorithm remain the same.

Let's assume that the arrangement of the experimental points resembles a hyperbola. Then, in order to find the coefficients of the best hyperbola, you need to find the minimum of the function - those who wish can carry out detailed calculations and come to a similar system:

From a formal and technical point of view, it is obtained from a "linear" system (let's designate it with an "asterisk") replacing "x" with. Well, and the amounts are calculate, and then to the optimal coefficients "a" and "be" a stone's throw.

If there is every reason to believe that the points are located along a logarithmic curve, then to search for optimal values ​​and find the minimum of the function ... Formally, in the system (*) must be replaced by:

When doing calculations in Excel, use the function LN... I admit, it will not be difficult for me to create calculators for each of the cases under consideration, but it will still be better if you "program" the calculations yourself. Lesson videos to help.

With exponential dependence, the situation is a little more complicated. To reduce the matter to the linear case, let us logarithm the function and use properties of the logarithm:

Now, comparing the resulting function with a linear function, we come to the conclusion that in the system (*) must be replaced by, and - by. For convenience we denote:

Please note that the system is resolved relative to and, and therefore, after finding the roots, you must remember to find the coefficient itself.

To bring the experimental points closer optimal parabola , should be found minimum function of three variables ... After completing the standard actions, we get the following "working" the system:

Yes, of course, there are more sums here, but when using your favorite application, there are no difficulties at all. And finally, I'll tell you how to quickly check and build the desired trend line using Excel: create a scatter chart, select any of the points with the mouse and through the right click select the option "Add a trend line"... Next, select the type of chart and on the tab "Options" activate the option Show Equation In Chart... OK

As always, I would like to end the article with some beautiful phrase, and I almost typed “Be in trend!”. But he changed his mind in time. And not because it is stereotyped. I don’t know how anyone, but I don’t want to follow the promoted American and especially European trend =) Therefore, I wish each of you to adhere to your own line!

http://www.grandars.ru/student/vysshaya-matematika/metod-naimenshih-kvadratov.html

The least squares method is one of the most widespread and most developed due to its simplicity and efficiency of methods for estimating parameters of linear econometric models... At the same time, certain caution should be exercised when using it, since the models built with its use may not satisfy a number of requirements for the quality of their parameters and, as a result, it is not “good enough” to display the patterns of the process development.

Let us consider the procedure for estimating the parameters of a linear econometric model using the least squares method in more detail. Such a model in general form can be represented by the equation (1.2):

y t = a 0 + a 1 х 1t + ... + a n х nt + ε t.

The initial data when estimating the parameters a 0, a 1, ..., a n is the vector of values ​​of the dependent variable y= (y 1, y 2, ..., y T) "and the matrix of values ​​of independent variables

in which the first column of ones corresponds to the coefficient of the model.

The method of least squares got its name, proceeding from the basic principle, which the parameter estimates obtained on its basis must satisfy: the sum of the squares of the model error should be minimal.

Examples of solving problems using the least squares method

Example 2.1. The trading enterprise has a network of 12 stores, information on the activities of which is presented in table. 2.1.

The company's management would like to know how the size of the annual turnover depends on the retail space of the store.

Table 2.1

Store number Annual turnover, RUB mln Trade area, thousand m 2
19,76 0,24
38,09 0,31
40,95 0,55
41,08 0,48
56,29 0,78
68,51 0,98
75,01 0,94
89,05 1,21
91,13 1,29
91,26 1,12
99,84 1,29
108,55 1,49

Least squares solution. Let's designate - the annual turnover of the th store, mln rubles; - sales area of ​​the th store, thousand m 2.

Figure 2.1. Scatter plot for example 2.1

To determine the form of the functional relationship between the variables and build a scatter diagram (Fig. 2.1).

Based on the scatter diagram, it can be concluded that the annual turnover is positively dependent on the retail space (i.e., y will grow with growth). The most appropriate form of functional communication is linear.

Information for further calculations is presented in table. 2.2. Using the least squares method, we estimate the parameters of a linear one-factor econometric model

Table 2.2

t y t x 1t y t 2 x 1t 2 x 1t y t
19,76 0,24 390,4576 0,0576 4,7424
38,09 0,31 1450,8481 0,0961 11,8079
40,95 0,55 1676,9025 0,3025 22,5225
41,08 0,48 1687,5664 0,2304 19,7184
56,29 0,78 3168,5641 0,6084 43,9062
68,51 0,98 4693,6201 0,9604 67,1398
75,01 0,94 5626,5001 0,8836 70,5094
89,05 1,21 7929,9025 1,4641 107,7505
91,13 1,29 8304,6769 1,6641 117,5577
91,26 1,12 8328,3876 1,2544 102,2112
99,84 1,29 9968,0256 1,6641 128,7936
108,55 1,49 11783,1025 2,2201 161,7395
S 819,52 10,68 65008,554 11,4058 858,3991
The average 68,29 0,89

Thus,

Consequently, with an increase in the sales area by 1 thousand m 2, all other things being equal, the average annual turnover increases by 67.8871 million rubles.

Example 2.2. The company's management noticed that the annual turnover depends not only on the retail space of the store (see example 2.1), but also on the average number of visitors. The relevant information is presented in table. 2.3.

Table 2.3

Solution. Let's designate - the average number of visitors to the th store per day, thousand people.

To determine the form of the functional relationship between the variables and build a scatter diagram (Fig. 2.2).

Based on the scatterplot, it can be concluded that the annual turnover is positively dependent on the average number of visitors per day (i.e., y will grow with growth). The form of functional dependence is linear.

Rice. 2.2. Scatterplot for Example 2.2

Table 2.4

t x 2t x 2t 2 y t x 2t x 1t x 2t
8,25 68,0625 163,02 1,98
10,24 104,8575 390,0416 3,1744
9,31 86,6761 381,2445 5,1205
11,01 121,2201 452,2908 5,2848
8,54 72,9316 480,7166 6,6612
7,51 56,4001 514,5101 7,3598
12,36 152,7696 927,1236 11,6184
10,81 116,8561 962,6305 13,0801
9,89 97,8121 901,2757 12,7581
13,72 188,2384 1252,0872 15,3664
12,27 150,5529 1225,0368 15,8283
13,92 193,7664 1511,016 20,7408
S 127,83 1410,44 9160,9934 118,9728
Average 10,65

In general, it is necessary to determine the parameters of the two-factor econometric model

у t = a 0 + a 1 х 1t + a 2 х 2t + ε t

The information required for further calculations is presented in table. 2.4.

Let us estimate the parameters of a linear two-factor econometric model using the least squares method.

Thus,

The estimate of the coefficient = 61.6583 shows that, all other things being equal, with an increase in the selling area by 1 thousand m 2, the annual turnover will increase by an average of 61.6583 million rubles.

The estimate of the coefficient = 2.2748 shows that, all other things being equal, with an increase in the average number of visitors per 1,000 people. per day, the annual turnover will increase by an average of 2.2748 million rubles.

Example 2.3. Using the information presented in table. 2.2 and 2.4, estimate the parameter of the univariate econometric model

where is the centered value of the annual turnover of the th store, million rubles; - the centered value of the average daily number of visitors to the t-th store, thousand people. (see examples 2.1-2.2).

Solution. Additional information required for calculations is presented in table. 2.5.

Table 2.5

-48,53 -2,40 5,7720 116,6013
-30,20 -0,41 0,1702 12,4589
-27,34 -1,34 1,8023 36,7084
-27,21 0,36 0,1278 -9,7288
-12,00 -2,11 4,4627 25,3570
0,22 -3,14 9,8753 -0,6809
6,72 1,71 2,9156 11,4687
20,76 0,16 0,0348 3,2992
22,84 -0,76 0,5814 -17,413
22,97 3,07 9,4096 70,4503
31,55 1,62 2,6163 51,0267
40,26 3,27 10,6766 131,5387
Amount 48,4344 431,0566

Using formula (2.35), we obtain

Thus,

http://www.cleverstudents.ru/articles/mnk.html

Example.

Experimental data on the values ​​of variables NS and at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate this data with a linear dependence y = ax + b(find parameters a and b). Find out which of the two lines is better (in the sense of the least squares method) equalizes the experimental data. Make a drawing.

Solution.

In our example n = 5... We fill in the table for the convenience of calculating the amounts that are included in the formulas of the desired coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​in the last column of the table are the row sums of the values.

We use the formulas of the least squares method to find the coefficients a and b... We substitute in them the corresponding values ​​from the last column of the table:

Hence, y = 0.165x + 2.184- the required approximating straight line.

It remains to find out which of the lines y = 0.165x + 2.184 or better approximates the original data, that is, make an estimate using the least squares method.

Proof.

So that when found a and b the function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positively definite. Let's show it.

The differential of the second order has the form:

That is

Therefore, the matrix of the quadratic form has the form

and the values ​​of the elements do not depend on a and b.

Let us show that the matrix is ​​positive definite. This requires the corner minors to be positive.

Corner minor of the first order ... The inequality is strict, since the points

Ordinary Least Squares (OLS)- a mathematical method used to solve various problems, based on minimizing the sum of the squares of the deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate the point values ​​of some function. OLS is one of the basic regression analysis methods for estimating unknown parameters of regression models based on sample data.

Collegiate YouTube

    1 / 5

    ✪ Least squares method. Theme

    ✪ Least squares lesson 1/2. Linear function

    ✪ Econometrics. Lecture 5 Least squares method

    ✪ Mitin IV - Processing of the results of physical. Experiment - Least Squares Method (Lecture 4)

    ✪ Econometrics: Understanding Least Squares # 2

    Subtitles

History

Until the beginning of the 19th century. scientists did not have definite rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used that depended on the type of equations and on the wit of calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the author of the first application of the method, and Legendre (1805) independently discovered and published it under the modern name (fr. Méthode des moindres quarrés). Laplace linked the method with the theory of probability, and the American mathematician Edrain (1808) considered its theoretical and probabilistic applications. The method was spread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let be x (\ displaystyle x)- kit n (\ displaystyle n) unknown variables (parameters), f i (x) (\ displaystyle f_ (i) (x)), , m> n (\ displaystyle m> n)- a set of functions from this set of variables. The task is to select such values x (\ displaystyle x) so that the values ​​of these functions are as close as possible to some values y i (\ displaystyle y_ (i))... In essence, we are talking about the "solution" of the overdetermined system of equations f i (x) = y i (\ displaystyle f_ (i) (x) = y_ (i)), i = 1,…, m (\ displaystyle i = 1, \ ldots, m) in the indicated sense of the maximum proximity of the left and right parts of the system. The essence of the LSM is to choose the sum of the squares of the deviations of the left and right sides as a "measure of proximity" | f i (x) - y i | (\ displaystyle | f_ (i) (x) -y_ (i) |)... Thus, the essence of OLS can be expressed as follows:

∑ iei 2 = ∑ i (yi - fi (x)) 2 → min x (\ displaystyle \ sum _ (i) e_ (i) ^ (2) = \ sum _ (i) (y_ (i) -f_ ( i) (x)) ^ (2) \ rightarrow \ min _ (x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is redefined, that is, speaking loosely, the number of independent equations is greater than the number of sought variables, then the system does not have an exact solution and the least squares method allows you to find some “optimal” vector x (\ displaystyle x) in the sense of maximum proximity of vectors y (\ displaystyle y) and f (x) (\ displaystyle f (x)) or the maximum proximity of the vector of deviations e (\ displaystyle e) to zero (proximity is understood in the sense of Euclidean distance).

Example - a system of linear equations

In particular, the least squares method can be used to "solve" a system of linear equations

A x = b (\ displaystyle Ax = b),

where A (\ displaystyle A) rectangular size matrix m × n, m> n (\ displaystyle m \ times n, m> n)(that is, the number of rows of the matrix A is more than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector x (\ displaystyle x) to minimize the "distance" between vectors A x (\ displaystyle Ax) and b (\ displaystyle b)... To do this, you can apply the criterion for minimizing the sum of squares of the differences between the left and right sides of the equations of the system, that is, (A x - b) T (A x - b) → min x (\ displaystyle (Ax-b) ^ (T) (Ax-b) \ rightarrow \ min _ (x))... It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

ATA x = AT b ⇒ x = (ATA) - 1 AT b (\ displaystyle A ^ (T) Ax = A ^ (T) b \ Rightarrow x = (A ^ (T) A) ^ (- 1) A ^ (T) b).

OLS in regression analysis (data fit)

Let there be n (\ displaystyle n) values ​​of some variable y (\ displaystyle y)(these can be the results of observations, experiments, etc.) and the corresponding variables x (\ displaystyle x)... The challenge is to ensure that the relationship between y (\ displaystyle y) and x (\ displaystyle x) approximate by some function known up to some unknown parameters b (\ displaystyle b), that is, in fact, find the best values ​​of the parameters b (\ displaystyle b), maximally approximating values f (x, b) (\ displaystyle f (x, b)) to actual values y (\ displaystyle y)... In fact, this reduces to the case of a "solution" of an overdetermined system of equations with respect to b (\ displaystyle b):

F (x t, b) = y t, t = 1,…, n (\ displaystyle f (x_ (t), b) = y_ (t), t = 1, \ ldots, n).

In regression analysis, and in econometrics in particular, probabilistic models of the relationship between variables are used

Y t = f (x t, b) + ε t (\ displaystyle y_ (t) = f (x_ (t), b) + \ varepsilon _ (t)),

where ε t (\ displaystyle \ varepsilon _ (t))- so called random errors models.

Accordingly, the deviations of the observed values y (\ displaystyle y) from model f (x, b) (\ displaystyle f (x, b)) is assumed already in the model itself. The essence of OLS (ordinary, classical) is to find such parameters b (\ displaystyle b) for which the sum of squares of deviations (errors, for regression models they are often called regression residuals) e t (\ displaystyle e_ (t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\ displaystyle (\ hat (b)) _ (OLS) = \ arg \ min _ (b) RSS (b)),

where R S S (\ displaystyle RSS)- English. Residual Sum of Squares is defined as:

RSS (b) = e T e = ∑ t = 1 net 2 = ∑ t = 1 n (yt - f (xt, b)) 2 (\ displaystyle RSS (b) = e ^ (T) e = \ sum _ (t = 1) ^ (n) e_ (t) ^ (2) = \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) ^ (2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case, they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function R S S (b) (\ displaystyle RSS (b)), differentiating it by unknown parameters b (\ displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (yt - f (xt, b)) ∂ f (xt, b) ∂ b = 0 (\ displaystyle \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) (\ frac (\ partial f (x_ (t), b)) (\ partial b)) = 0).

OLS for Linear Regression

Let the regression dependence be linear:

yt = ∑ j = 1 kbjxtj + ε = xt T b + ε t (\ displaystyle y_ (t) = \ sum _ (j = 1) ^ (k) b_ (j) x_ (tj) + \ varepsilon = x_ ( t) ^ (T) b + \ varepsilon _ (t)).

Let be y is the column vector of observations of the variable being explained, and X (\ displaystyle X)- this is (n × k) (\ displaystyle ((n \ times k)))-matrix of observations of factors (rows of the matrix are vectors of values ​​of factors in a given observation, by columns - a vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

y = X b + ε (\ displaystyle y = Xb + \ varepsilon).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b, e = y - y ^ = y - X b (\ displaystyle (\ hat (y)) = Xb, \ quad e = y - (\ hat (y)) = y-Xb).

accordingly, the sum of the squares of the regression residuals will be

R S S = e T e = (y - X b) T (y - X b) (\ displaystyle RSS = e ^ (T) e = (y-Xb) ^ (T) (y-Xb)).

Differentiating this function with respect to the parameter vector b (\ displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\ displaystyle (X ^ (T) X) b = X ^ (T) y).

In deciphered matrix form, this system of equations looks like this:

(∑ xt 1 2 ∑ xt 1 xt 2 ∑ xt 1 xt 3… ∑ xt 1 xtk ∑ xt 2 xt 1 ∑ xt 2 2 ∑ xt 2 xt 3… ∑ xt 2 xtk ∑ xt 3 xt 1 ∑ xt 3 xt 2 ∑ xt 3 2… ∑ xt 3 xtk ⋮ ⋮ ⋮ ⋱ ⋮ ∑ xtkxt 1 ∑ xtkxt 2 ∑ xtkxt 3… ∑ xtk 2) (b 1 b 2 b 3 ⋮ bk) = (∑ xt 1 yt ∑ xt 2 yt ∑ xt 3 yt ⋮ ∑ xtkyt), (\ displaystyle (\ begin (pmatrix) \ sum x_ (t1) ^ (2) & \ sum x_ (t1) x_ (t2) & \ sum x_ (t1) x_ (t3) & \ ldots & \ sum x_ (t1) x_ (tk) \\\ sum x_ (t2) x_ (t1) & \ sum x_ (t2) ^ (2) & \ sum x_ (t2) x_ (t3) & \ ldots & \ sum x_ (t2) x_ (tk) \\\ sum x_ (t3) x_ (t1) & \ sum x_ (t3) x_ (t2) & \ sum x_ (t3) ^ (2) & \ ldots & \ sum x_ (t3) x_ (tk) \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\\ sum x_ (tk) x_ (t1) & \ sum x_ (tk) x_ (t2) & \ sum x_ (tk) x_ (t3) & \ ldots & \ sum x_ (tk) ^ (2) \\\ end (pmatrix)) (\ begin (pmatrix) b_ (1) \\ b_ (2) \\ b_ (3 ) \\\ vdots \\ b_ (k) \\\ end (pmatrix)) = (\ begin (pmatrix) \ sum x_ (t1) y_ (t) \\\ sum x_ (t2) y_ (t) \\ \ sum x_ (t3) y_ (t) \\\ vdots \\\ sum x_ (tk) y_ (t) \\\ end (pmatrix)),) where all the sums are taken over all admissible values t (\ displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\ displaystyle x_ (t1) = 1) with all t (\ displaystyle t), therefore, in the upper left corner of the matrix of the system of equations, there is the number of observations n (\ displaystyle n), and in the rest of the elements of the first row and the first column - just the sum of the values ​​of the variables: ∑ x t j (\ displaystyle \ sum x_ (tj)) and the first element of the right side of the system is ∑ y t (\ displaystyle \ sum y_ (t)).

The solution of this system of equations gives the general formula of the OLS estimates for the linear model:

b ^ OLS = (XTX) - 1 XT y = (1 n XTX) - 1 1 n XT y = V x - 1 C xy (\ displaystyle (\ hat (b)) _ (OLS) = (X ^ (T ) X) ^ (- 1) X ^ (T) y = \ left ((\ frac (1) (n)) X ^ (T) X \ right) ^ (- 1) (\ frac (1) (n )) X ^ (T) y = V_ (x) ^ (- 1) C_ (xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, instead of sums, arithmetic means appear). If in the regression model the data centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second is the vector of covariance of factors with the dependent variable. If, in addition, the data is also normalized to SKO (that is, ultimately standardized), then the first matrix has the meaning of a selective correlation matrix of factors, the second vector is a vector of selective correlations of factors with a dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

y ¯ = b 1 ^ + ∑ j = 2 kb ^ jx ¯ j (\ displaystyle (\ bar (y)) = (\ hat (b_ (1))) + \ sum _ (j = 2) ^ (k) (\ hat (b)) _ (j) (\ bar (x)) _ (j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an OLS-estimate - it satisfies the criterion of the minimum sum of squares of deviations from it.

The simplest special cases

In the case of paired linear regression y t = a + b x t + ε t (\ displaystyle y_ (t) = a + bx_ (t) + \ varepsilon _ (t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations is as follows:

(1 x ¯ x ¯ x 2 ¯) (ab) = (y ¯ xy ¯) (\ displaystyle (\ begin (pmatrix) 1 & (\ bar (x)) \\ (\ bar (x)) & (\ bar (x ^ (2))) \\\ end (pmatrix)) (\ begin (pmatrix) a \\ b \\\ end (pmatrix)) = (\ begin (pmatrix) (\ bar (y)) \\ (\ overline (xy)) \\\ end (pmatrix))).

Hence, it is easy to find estimates of the coefficients:

(b ^ = Cov ⁡ (x, y) Var ⁡ (x) = xy ¯ - x ¯ y ¯ x 2 ¯ - x ¯ 2, a ^ = y ¯ - bx ¯. (\ displaystyle (\ begin (cases) (\ hat (b)) = (\ frac (\ mathop (\ textrm (Cov)) (x, y)) (\ mathop (\ textrm (Var)) (x))) = (\ frac ((\ overline (xy)) - (\ bar (x)) (\ bar (y))) ((\ overline (x ^ (2))) - (\ overline (x)) ^ (2))), \\ ( \ hat (a)) = (\ bar (y)) - b (\ bar (x)). \ end (cases)))

Despite the fact that in the general case the model with a constant is preferable, in some cases it is known from theoretical considerations that the constant a (\ displaystyle a) should be zero. For example, in physics, the relationship between voltage and current has the form U = I ⋅ R (\ displaystyle U = I \ cdot R); measuring the voltage and current strength, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\ displaystyle y = bx)... In this case, instead of the system of equations, we have the only equation

(∑ x t 2) b = ∑ x t y t (\ displaystyle \ left (\ sum x_ (t) ^ (2) \ right) b = \ sum x_ (t) y_ (t)).

Consequently, the formula for estimating a single coefficient has the form

B ^ = ∑ t = 1 nxtyt ∑ t = 1 nxt 2 = xy ¯ x 2 ¯ (\ displaystyle (\ hat (b)) = (\ frac (\ sum _ (t = 1) ^ (n) x_ (t ) y_ (t)) (\ sum _ (t = 1) ^ (n) x_ (t) ^ (2))) = (\ frac (\ overline (xy)) (\ overline (x ^ (2)) ))).

Polynomial model case

If the data is fitted with a single variable polynomial regression function f (x) = b 0 + ∑ i = 1 k b i x i (\ displaystyle f (x) = b_ (0) + \ sum \ limits _ (i = 1) ^ (k) b_ (i) x ^ (i)), then, perceiving the degree x i (\ displaystyle x ^ (i)) as independent factors for everyone i (\ displaystyle i) it is possible to estimate the parameters of the model based on the general formula for estimating the parameters of a linear model. To do this, it is sufficient to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\ displaystyle x_ (ti) x_ (tj) = x_ (t) ^ (i) x_ (t) ^ (j) = x_ (t) ^ (i + j)) and x t j y t = x t j y t (\ displaystyle x_ (tj) y_ (t) = x_ (t) ^ (j) y_ (t))... Consequently, the matrix equations in this case will take the form:

(n ∑ nxt… ∑ nxtk ∑ nxt ∑ nxt 2… ∑ nxtk + 1 ⋮ ⋮ ⋱ ⋮ ∑ nxtk ∑ nxtk + 1… ∑ nxt 2 k) [b 0 b 1 ⋮ bk] = [∑ nyt ∑ nxtyt ⋮ ∑ nxtkyt ]. (\ displaystyle (\ begin (pmatrix) n & \ sum \ limits _ (n) x_ (t) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (k) \\\ sum \ limits _ ( n) x_ (t) & \ sum \ limits _ (n) x_ (t) ^ (2) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (k + 1) \\\ vdots & \ vdots & \ ddots & \ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) & \ sum \ limits _ (n) x_ (t) ^ (k + 1) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (2k) \ end (pmatrix)) (\ begin (bmatrix) b_ (0) \\ b_ (1) \\\ vdots \\ b_ (k) \ end ( bmatrix)) = (\ begin (bmatrix) \ sum \ limits _ (n) y_ (t) \\\ sum \ limits _ (n) x_ (t) y_ (t) \\\ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) y_ (t) \ end (bmatrix)).)

Statistical properties of OLS estimates

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For the unbiasedness of the OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional in terms of factors, should be equal to zero. This condition, in particular, is satisfied if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, as opposed to a random error, which automatically means the fulfillment of the exogenous condition. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\ displaystyle V_ (x)) to some non-degenerate matrix with increasing sample size to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be effective (the best in the class of linear unbiased estimates), it is necessary to fulfill additional properties of a random error:

These assumptions can be formulated for the covariance matrix of the vector of random errors V (ε) = σ 2 I (\ displaystyle V (\ varepsilon) = \ sigma ^ (2) I).

A linear model satisfying these conditions is called classical... OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) is the best linear unbiased estimate; in the domestic literature, the Gauss - Markov theorem is more often cited). As it is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ OLS) = σ 2 (XTX) - 1 (\ displaystyle V ((\ hat (b)) _ (OLS)) = \ sigma ^ (2) (X ^ (T) X) ^ (- 1 )).

Efficiency means that this covariance matrix is ​​"minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have the minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix - the variances of the coefficient estimates - are important parameters of the quality of the estimates obtained. However, it is impossible to calculate the covariance matrix, since the variance of the random errors is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

S 2 = R S S / (n - k) (\ displaystyle s ^ (2) = RSS / (n-k)).

Substituting this value in the formula for the covariance matrix and we obtain an estimate of the covariance matrix. The estimates obtained are also unbiased and consistent. It is also important that the estimation of the variance of errors (and hence the variances of the coefficients) and the estimates of the model parameters are independent random variables, which allows one to obtain test statistics for testing hypotheses about the coefficients of the model.

It should be noted that if the classical assumptions are not met, the OLS estimates of the parameters are not the most efficient and, where W (\ displaystyle W)- some symmetric positive definite weight matrix. The usual OLS is a special case of this approach, when the weight matrix is ​​proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is a decomposition W = P T P (\ displaystyle W = P ^ (T) P)... Therefore, this functional can be represented as follows e TPTP e = (P e) TP e = e ∗ T e ∗ (\ displaystyle e ^ (T) P ^ (T) Pe = (Pe) ^ (T) Pe = e _ (*) ​​^ (T) e_ ( *)), that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It has been proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called generalized OLS (OLS, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε - 1 (\ displaystyle W = V _ (\ varepsilon) ^ (- 1)).

It can be shown that the formula for OLS estimates for the parameters of a linear model has the form

B ^ GLS = (XTV - 1 X) - 1 XTV - 1 y (\ displaystyle (\ hat (b)) _ (GLS) = (X ^ (T) V ^ (- 1) X) ^ (- 1) X ^ (T) V ^ (- 1) y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ GLS) = (XTV - 1 X) - 1 (\ displaystyle V ((\ hat (b)) _ (GLS)) = (X ^ (T) V ^ (- 1) X) ^ (- 1)).

In fact, the essence of OLS is a certain (linear) transformation (P) of the original data and the application of the usual OLS to the transformed data. The goal of this transformation is that for the transformed data, random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and hence a covariance matrix of random errors), we have the so-called Weighted Least Squares (WLS). In this case, the weighted sum of the squares of the residuals of the model is minimized, that is, each observation receives a "weight" inversely proportional to the variance of the random error in this observation: e TW e = ∑ t = 1 net 2 σ t 2 (\ displaystyle e ^ (T) We = \ sum _ (t = 1) ^ (n) (\ frac (e_ (t) ^ (2)) (\ sigma _ (t) ^ (2))))... In fact, the data is transformed by weighting the observations (dividing by a value proportional to the estimated standard deviation of random errors), and regular OLS is applied to the weighted data.

ISBN 978-5-7749-0473-0.

  • Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and statistics, 2006 .-- 576 p. - ISBN 5-279-02786-3.
  • Alexandrova N.V. History of mathematical terms, concepts, designations: reference dictionary. - 3rd ed .. - M.: LKI, 2008 .-- 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24s.
  • Let us approximate the function with a polynomial of degree 2. To do this, we calculate the coefficients of the normal system of equations:

    , ,

    Let's compose a normal system of least squares, which has the form:

    The system solution is easy to find :,,.

    Thus, the polynomial of the 2nd degree is found:.

    Theoretical background

    Back to page<Введение в вычислительную математику. Примеры>

    Example 2... Finding the optimal degree of a polynomial.

    Back to page<Введение в вычислительную математику. Примеры>

    Example 3... Derivation of the normal system of equations for finding the parameters of the empirical dependence.

    Let us derive a system of equations for determining the coefficients and the function , which performs the root-mean-square approximation of the given function by points. Let's compose the function and write down the necessary extremum condition for it:

    Then the normal system will take the form:

    Received a linear system of equations with respect to unknown parameters and, which is easily solved.

    Theoretical background

    Back to page<Введение в вычислительную математику. Примеры>

    Example.

    Experimental data on the values ​​of variables NS and at are given in the table.

    As a result of their alignment, the function is obtained

    Using least square method, approximate this data with a linear dependence y = ax + b(find parameters a and b). Find out which of the two lines is better (in the sense of the least squares method) equalizes the experimental data. Make a drawing.

    The essence of the method of least squares (OLS).

    The task is to find the coefficients of the linear dependence for which the function of two variables a and btakes the smallest value. That is, given a and b the sum of the squares of the deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

    Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

    Derivation of formulas for finding coefficients.

    A system of two equations with two unknowns is composed and solved. Find the partial derivatives of the function by variables a and b, we equate these derivatives to zero.

    We solve the resulting system of equations by any method (for example substitution method or Cramer's method) and we obtain formulas for finding the coefficients by the method of least squares (OLS).

    With data a and b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

    That's the whole least squares method. Formula for finding the parameter a contains the sums,,, and the parameter n- the amount of experimental data. We recommend calculating the values ​​of these amounts separately.

    Coefficient b is after calculation a.

    It's time to remember the original example.

    Solution.

    In our example n = 5... We fill in the table for the convenience of calculating the amounts that are included in the formulas of the desired coefficients.

    The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

    The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

    The values ​​in the last column of the table are the row sums of the values.

    We use the formulas of the least squares method to find the coefficients a and b... We substitute in them the corresponding values ​​from the last column of the table:

    Hence, y = 0.165x + 2.184- the required approximating straight line.

    It remains to find out which of the lines y = 0.165x + 2.184 or better approximates the original data, that is, make an estimate using the least squares method.

    Estimation of the error of the least squares method.

    To do this, you need to calculate the sum of the squares of the deviations of the initial data from these lines and , the lower value corresponds to the line that better approximates the original data in the sense of the least squares method.

    Since, then straight y = 0.165x + 2.184 approximates the original data better.

    Graphical illustration of the method of least squares (mns).

    Everything is perfectly visible on the graphs. The red line is the straight line found y = 0.165x + 2.184, the blue line is , pink dots are raw data.

    What is it for, what are all these approximations for?

    I personally use for solving problems of data smoothing, interpolation and extrapolation problems (in the original example, you might have asked to find the value of the observed value y at x = 3 or at x = 6 by the OLS method). But we'll talk about this in more detail later in another section of the site.

    Back to the top of the page

    Proof.

    So that when found a and b the function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positively definite. Let's show it.

    The differential of the second order has the form:

    That is

    Therefore, the matrix of the quadratic form has the form

    and the values ​​of the elements do not depend on a and b.

    Let us show that the matrix is ​​positive definite. This requires the corner minors to be positive.

    Corner minor of the first order ... The inequality is strict, since the points do not coincide. In what follows, we will mean it.

    Corner minor of the second order

    Let us prove that by the method of mathematical induction.

    Output: found values a and b correspond to the smallest function value , therefore, are the required parameters for the least squares method.

    No time to figure it out?
    Order a solution

    Back to the top of the page

    Developing a forecast using the least squares method. An example of solving the problem

    Extrapolation Is a method of scientific research, which is based on the dissemination of past and present trends, patterns, connections for the future development of the forecasting object. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

    The essence least squares method consists in minimizing the sum of the standard deviations between the observed and calculated values. The calculated values ​​are found according to the fitted equation - the regression equation. The smaller the distance between the actual values ​​and the calculated values, the more accurate the forecast based on the regression equation.

    A theoretical analysis of the essence of the phenomenon under study, the change in which is displayed by a time series, serves as the basis for choosing a curve. Sometimes considerations about the nature of the growth of the levels of the series are taken into account. So, if the growth of output is expected in an arithmetic progression, then smoothing is performed along a straight line. If it turns out that growth is exponential, then smoothing should be done according to the exponential function.

    Least Squares Working Formula : Y t + 1 = a * X + b, where t + 1 is the forecast period; Уt + 1 - predicted indicator; a and b - coefficients; X is a symbol of time.

    The calculation of the coefficients a and b is carried out according to the following formulas:

    where, Uf - the actual values ​​of a number of dynamics; n is the number of levels in the time series;

    Smoothing of time series by the least squares method serves to reflect the patterns of development of the phenomenon under study. In the analytical expression of the trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

    The development of a phenomenon does not depend on how many years have passed since the starting moment, but on what factors influenced its development, in which direction and with what intensity. Hence, it is clear that the development of a phenomenon in time appears as a result of the action of these factors.

    Correctly establishing the type of curve, the type of analytical dependence on time is one of the most difficult tasks of pre-predictive analysis. .

    The selection of the type of function that describes the trend, the parameters of which are determined by the least squares method, is performed in most cases empirically, by constructing a number of functions and comparing them with each other by the value of the mean square error calculated by the formula:

    where Uf - the actual values ​​of a number of dynamics; Ur - calculated (smoothed) values ​​of a number of dynamics; n is the number of levels in the time series; p is the number of parameters defined in formulas describing the trend (development trend).

    Disadvantages of the least squares method :

    • when trying to describe the studied economic phenomenon using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
    • the complexity of the selection of the regression equation, which is solvable when using typical computer programs.

    An example of using the least squares method to develop a forecast

    Task ... There is data characterizing the unemployment rate in the region,%

    • Build a forecast of the unemployment rate in the region for November, December, January months using the following methods: moving average, exponential smoothing, least squares.
    • Calculate the errors of the obtained predictions using each method.
    • Compare the results obtained, draw conclusions.

    Least squares solution

    To solve the problem, we will draw up a table in which we will make the necessary calculations:

    ε = 28.63 / 10 = 2.86% forecast accuracy high.

    Output : Comparing the results obtained in the calculations moving average method , exponential smoothing and by the least squares method, we can say that the average relative error in calculations by the exponential smoothing method falls within the range of 20-50%. This means that the forecast accuracy in this case is only satisfactory.

    In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the method of moving averages made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,13%.

    Least square method

    Other articles on this topic:

    List of sources used

    1. Scientific and methodological recommendations on the diagnosis of social risks and forecasting challenges, threats and social consequences. Russian State Social University. Moscow. 2010;
    2. Vladimirova L.P. Forecasting and planning in market conditions: Textbook. allowance. M .: Publishing House "Dashkov and Co", 2001;
    3. Novikova N.V., Pozdeeva O.G. Forecasting the National Economy: Teaching Guide. Yekaterinburg: Ural Publishing House. state econom. University, 2007;
    4. Slutskin L.N. MBA course on forecasting in business. M .: Alpina Business Books, 2006.

    OLS program

    Enter data

    Data and approximation y = a + b x

    i- experimental point number;
    x i- the value of the fixed parameter at the point i;
    y i- the value of the measured parameter at the point i;
    ω i- weight of measurement at a point i;
    y i, calc.- difference between measured and calculated by regression value y at the point i;
    S x i (x i)- error estimation x i when measuring y at the point i.

    Data and approximation y = k x

    i x i y i ω i y i, calc. Δy i S x i (x i)

    Click on the graph,

    Instructions for the user of the MNK online program.

    In the data field, enter the `x` and` y` values ​​at the same test point on each separate line. Values ​​must be separated by a whitespace character (space or tab).

    The third value can be the weight of the point `w`. If point weight is not specified, then it is equal to one. In the overwhelming majority of cases, the weights of the experimental points are unknown or not calculated, i.e. all experimental data are considered to be equivalent. Sometimes the weights in the studied range of values ​​are absolutely not equivalent and can even be calculated theoretically. For example, in spectrophotometry, weights can be calculated using simple formulas, although basically everyone neglects this to reduce labor costs.

    Data can be pasted through the clipboard from an office suite spreadsheet such as Excel from Microsoft Office or Calc from Open Office. To do this, in the spreadsheet, select the range of data to be copied, copy to the clipboard and paste the data into the data field on this page.

    For calculation by the method of least squares, at least two points are required to determine two coefficients `b` - the tangent of the slope of the straight line and` a` - the value cut off by the straight line on the `y` axis.

    To estimate the error of the calculated regression coefficients, you need to set the number of experimental points more than two.

    Least squares method (OLS).

    The larger the number of experimental points, the more accurate the statistical estimate of the coefficients (due to the decrease in the Student's coefficient) and the closer the estimate to the estimate of the general sample.

    Obtaining values ​​at each experimental point is often labor intensive, so there is often a trade-off number of experiments that gives a digestible estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for linear least squares dependence with two coefficients selects in the region of 5-7 points.

    Brief theory of the method of least squares for linear dependence

    Suppose we have a set of experimental data in the form of pairs of values ​​[`y_i`,` x_i`], where `i` is the number of one experimental measurement from 1 to` n`; `y_i` - the value of the measured value at the point` i`; `x_i` - the value of the parameter we set at the point` i`.

    As an example, consider the operation of Ohm's Law. By changing the voltage (potential difference) between sections of the electrical circuit, we measure the amount of current passing through this section. Physics gives us the dependence found experimentally:

    `I = U / R`,
    where `I` - current strength; `R` - resistance; `U` - voltage.

    In this case, `y_i` is the measured current value, and` x_i` is the voltage value.

    As another example, consider the absorption of light by a solution of a substance in a solution. Chemistry gives us the formula:

    `A = ε l C`,
    where `A` is the optical density of the solution; `ε` - the transmittance of the solute; `l` - path length when light passes through a cuvette with a solution; `C` - concentration of solute.

    In this case, `y_i` we have the measured value of the optical density` A`, and `x_i` is the value of the concentration of the substance that we set.

    We will consider the case when the relative error in setting `x_i` is much less than the relative error in measuring` y_i`. We will also assume that all measured values ​​`y_i` are random and normally distributed, i.e. obey the normal distribution law.

    In the case of a linear dependence of `y` on` x`, we can write a theoretical dependence:
    `y = a + b x`.

    From a geometric point of view, the coefficient `b` denotes the tangent of the angle of inclination of the line to the` x` axis, and the coefficient `a` - the value of` y` at the point of intersection of the line with the `y` axis (at` x = 0`).

    Finding the parameters of the regression line.

    In the experiment, the measured values ​​of `y_i` cannot exactly lie on the theoretical straight line due to measurement errors that are always inherent in real life. Therefore, a linear equation must be represented by a system of equations:
    `y_i = a + b x_i + ε_i` (1),
    where `ε_i` is the unknown measurement error of` y` in the `i`-th experiment.

    Dependence (1) is also called regression, i.e. dependence of two values ​​from each other with statistical significance.

    The task of restoring the dependence is to find the coefficients `a` and` b` from the experimental points [`y_i`,` x_i`].

    To find the coefficients `a` and` b`, it is usually used least square method(OLS). It is a special case of the maximum likelihood principle.

    Let us rewrite (1) as `ε_i = y_i - a - b x_i`.

    Then the sum of the squares of the errors will be
    `Φ = sum_ (i = 1) ^ (n) ε_i ^ 2 = sum_ (i = 1) ^ (n) (y_i - a - b x_i) ^ 2`. (2)

    The principle of OLS (least squares method) is to minimize the sum (2) with respect to the parameters `a` and` b`.

    The minimum is reached when the partial derivatives of the sum (2) with respect to the coefficients `a` and` b` are equal to zero:
    `frac (partial Φ) (partial a) = frac (partial sum_ (i = 1) ^ (n) (y_i - a - b x_i) ^ 2) (partial a) = 0`
    `frac (partial Φ) (partial b) = frac (partial sum_ (i = 1) ^ (n) (y_i - a - b x_i) ^ 2) (partial b) = 0`

    Expanding the derivatives, we obtain a system of two equations with two unknowns:
    `sum_ (i = 1) ^ (n) (2a + 2bx_i - 2y_i) = sum_ (i = 1) ^ (n) (a + bx_i - y_i) = 0`
    `sum_ (i = 1) ^ (n) (2bx_i ^ 2 + 2ax_i - 2x_iy_i) = sum_ (i = 1) ^ (n) (bx_i ^ 2 + ax_i - x_iy_i) = 0`

    We open the brackets and transfer the sums independent of the sought coefficients to the other half, we obtain a system of linear equations:
    `sum_ (i = 1) ^ (n) y_i = a n + b sum_ (i = 1) ^ (n) bx_i`
    `sum_ (i = 1) ^ (n) x_iy_i = a sum_ (i = 1) ^ (n) x_i + b sum_ (i = 1) ^ (n) x_i ^ 2`

    Solving the resulting system, we find the formulas for the coefficients `a` and` b`:

    `a = frac (sum_ (i = 1) ^ (n) y_i sum_ (i = 1) ^ (n) x_i ^ 2 - sum_ (i = 1) ^ (n) x_i sum_ (i = 1) ^ (n ) x_iy_i) (n sum_ (i = 1) ^ (n) x_i ^ 2 - (sum_ (i = 1) ^ (n) x_i) ^ 2) `(3.1)

    `b = frac (n sum_ (i = 1) ^ (n) x_iy_i - sum_ (i = 1) ^ (n) x_i sum_ (i = 1) ^ (n) y_i) (n sum_ (i = 1) ^ (n) x_i ^ 2 - (sum_ (i = 1) ^ (n) x_i) ^ 2) `(3.2)

    These formulas have solutions when `n> 1` (the line can be drawn using at least 2 points) and when the determinant` D = n sum_ (i = 1) ^ (n) x_i ^ 2 - (sum_ (i = 1) ^ (n) x_i) ^ 2! = 0`, i.e. when the points `x_i` in the experiment are different (i.e. when the line is not vertical).

    Estimation of the errors of the coefficients of the regression line

    For a more accurate assessment of the error in calculating the coefficients `a` and` b`, it is desirable to have a large number of experimental points. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will pass through two points unambiguously.

    The error of the random variable `V` is determined the law of accumulation of errors
    `S_V ^ 2 = sum_ (i = 1) ^ p (frac (partial f) (partial z_i)) ^ 2 S_ (z_i) ^ 2`,
    where `p` is the number of parameters` z_i` with an error `S_ (z_i)` that affect the error `S_V`;
    `f` - function of dependence of` V` on `z_i`.

    Let us write down the law of accumulation of errors for the error of the coefficients `a` and` b`
    `S_a ^ 2 = sum_ (i = 1) ^ (n) (frac (partial a) (partial y_i)) ^ 2 S_ (y_i) ^ 2 + sum_ (i = 1) ^ (n) (frac (partial a ) (partial x_i)) ^ 2 S_ (x_i) ^ 2 = S_y ^ 2 sum_ (i = 1) ^ (n) (frac (partial a) (partial y_i)) ^ 2 `,
    `S_b ^ 2 = sum_ (i = 1) ^ (n) (frac (partial b) (partial y_i)) ^ 2 S_ (y_i) ^ 2 + sum_ (i = 1) ^ (n) (frac (partial b ) (partial x_i)) ^ 2 S_ (x_i) ^ 2 = S_y ^ 2 sum_ (i = 1) ^ (n) (frac (partial b) (partial y_i)) ^ 2 `,
    since `S_ (x_i) ^ 2 = 0` (we made a reservation earlier that the error of` x` is negligible).

    `S_y ^ 2 = S_ (y_i) ^ 2` - error (variance, square of standard deviation) in measurement` y`, assuming that the error is uniform for all values ​​of `y`.

    Substituting the formulas for calculating `a` and` b` into the obtained expressions, we obtain

    `S_a ^ 2 = S_y ^ 2 frac (sum_ (i = 1) ^ (n) (sum_ (i = 1) ^ (n) x_i ^ 2 - x_i sum_ (i = 1) ^ (n) x_i) ^ 2 ) (D ^ 2) = S_y ^ 2 frac ((n sum_ (i = 1) ^ (n) x_i ^ 2 - (sum_ (i = 1) ^ (n) x_i) ^ 2) sum_ (i = 1) ^ (n) x_i ^ 2) (D ^ 2) = S_y ^ 2 frac (sum_ (i = 1) ^ (n) x_i ^ 2) (D) `(4.1)

    `S_b ^ 2 = S_y ^ 2 frac (sum_ (i = 1) ^ (n) (n x_i - sum_ (i = 1) ^ (n) x_i) ^ 2) (D ^ 2) = S_y ^ 2 frac ( n (n sum_ (i = 1) ^ (n) x_i ^ 2 - (sum_ (i = 1) ^ (n) x_i) ^ 2)) (D ^ 2) = S_y ^ 2 frac (n) (D) `(4.2)

    In most real life experiments, the `Sy` value is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points of the plan, which increases the time (and possibly the cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The estimate of the variance `y` in this case is calculated by the formula.

    `S_y ^ 2 = S_ (y, rest) ^ 2 = frac (sum_ (i = 1) ^ n (y_i - a - b x_i) ^ 2) (n-2)`.

    The divisor `n-2` appears because we have decreased the number of degrees of freedom due to the calculation of two coefficients for the same sample of experimental data.

    This estimate is also called the residual variance relative to the regression line `S_ (y, rest) ^ 2`.

    The assessment of the significance of the coefficients is carried out according to the Student's criterion

    `t_a = frac (| a |) (S_a)`, `t_b = frac (| b |) (S_b)`

    If the calculated criteria `t_a`,` t_b` are less than the tabular criteria `t (P, n-2)`, then it is considered that the corresponding coefficient does not differ significantly from zero with a given probability `P`.

    To assess the quality of the description of a linear relationship, you can compare `S_ (y, rest) ^ 2` and` S_ (bar y) `relative to the mean using Fisher's test.

    `S_ (bar y) = frac (sum_ (i = 1) ^ n (y_i - bar y) ^ 2) (n-1) = frac (sum_ (i = 1) ^ n (y_i - (sum_ (i = 1) ^ n y_i) / n) ^ 2) (n-1) `- sample estimate of the variance` y` relative to the mean.

    To assess the effectiveness of the regression equation for describing the dependence, the Fisher coefficient is calculated
    `F = S_ (bar y) / S_ (y, rest) ^ 2`,
    which is compared with the table Fisher's coefficient `F (p, n-1, n-2)`.

    If `F> F (P, n-1, n-2)`, the difference between the description of the dependence `y = f (x)` using the regression equation and the description using the mean is considered statistically significant with the probability `P`. Those. regression describes the relationship better than the scatter of `y` relative to the mean.

    Click on the graph,
    to add values ​​to the table

    Least square method. The least squares method is understood as the determination of the unknown parameters a, b, c, the adopted functional dependence

    The least squares method is understood as the determination of unknown parameters a, b, c, ... accepted functional dependence

    y = f (x, a, b, c, ...),

    which would provide the minimum mean square (variance) error

    , (24)

    where x i, y i - a set of pairs of numbers obtained from the experiment.

    Since the condition for the extremum of a function of several variables is the condition of equality to zero of its partial derivatives, the parameters a, b, c, ... are determined from the system of equations:

    ; ; ; … (25)

    It must be remembered that the least squares method is used to select parameters after the function type y = f (x) defined.

    If from theoretical considerations it is impossible to draw any conclusions about what the empirical formula should be, then one has to be guided by visual representations, primarily a graphical representation of the observed data.

    In practice, they are most often limited to the following types of functions:

    1) linear ;

    2) quadratic a.

    Example.

    Experimental data on the values ​​of variables NS and at are given in the table.

    As a result of their alignment, the function is obtained

    Using least square method, approximate this data with a linear dependence y = ax + b(find parameters a and b). Find out which of the two lines is better (in the sense of the least squares method) equalizes the experimental data. Make a drawing.

    The essence of the method of least squares (OLS).

    The task is to find the coefficients of the linear dependence for which the function of two variables a and b takes the smallest value. That is, given a and b the sum of the squares of the deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

    Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

    Derivation of formulas for finding coefficients.

    A system of two equations with two unknowns is composed and solved. Find the partial derivatives of a function with respect to variables a and b, we equate these derivatives to zero.

    We solve the resulting system of equations by any method (for example substitution method or) and we obtain formulas for finding the coefficients by the method of least squares (OLS).

    With data a and b function takes the smallest value. The proof of this fact is given.

    That's the whole least squares method. Formula for finding the parameter a contains the sums,,, and the parameter n- the amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b is after calculation a.

    It's time to remember the original example.

    Solution.

    In our example n = 5... We fill in the table for the convenience of calculating the amounts that are included in the formulas of the desired coefficients.

    The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

    The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

    The values ​​in the last column of the table are the row sums of the values.

    We use the formulas of the least squares method to find the coefficients a and b... We substitute in them the corresponding values ​​from the last column of the table:

    Hence, y = 0.165x + 2.184- the required approximating straight line.

    It remains to find out which of the lines y = 0.165x + 2.184 or better approximates the original data, that is, make an estimate using the least squares method.

    Estimation of the error of the least squares method.

    To do this, you need to calculate the sum of the squares of the deviations of the initial data from these lines and , the lower value corresponds to the line that better approximates the original data in the sense of the least squares method.

    Since, then straight y = 0.165x + 2.184 approximates the original data better.

    Graphical illustration of the method of least squares (mns).

    Everything is perfectly visible on the graphs. The red line is the straight line found y = 0.165x + 2.184, the blue line is , pink dots are raw data.

    What is it for, what are all these approximations for?

    I personally use for solving problems of data smoothing, interpolation and extrapolation problems (in the original example, you might have asked to find the value of the observed value y at x = 3 or at x = 6 by the OLS method). But we'll talk about this in more detail later in another section of the site.

    Proof.

    So that when found a and b the function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positively definite. Let's show it.

    Share with friends or save for yourself:

    Loading...