We are tasked in least squares approximation with fitting a straight line:
y = mx + b
To a series of
experimentally observed points, corresponding
to each of the observed values of x, e.g.
(x 1 , y 1), (x2 , y2), (x3 , y3)……. (xn , y n),
And corresponding to each
of the observed values of x there are actually two values of y, the observed
value yobs and the value predicted by the straight line: mxobs + b. And we call the difference: yobs - mxobs + b, a
deviation. Each such deviation measures the amount by which the predicted value
falls short of the observed value. Then
the set of all such deviations, e.g.
D 1 = y 1 - (m x 1 + b), D 2 = y2 - (m x 2 + b)
D 3 = y3 - (m x3 + b), D n = yn - (m x n + b)
Gives an indication of
the closeness of fit of the line y = mx + b to the data. For example, in the case of the graph shown below:
We have a graph in the form of F = mA + b, where F is the given frequency for a solar flare in a sunspot region and A is the area associated with the region. The defined line then represents the best fit to the assorted data: (A 1 , F 1), (A2 , F2), (A3 , F3)……. (An , F n).
We say the line shown is a perfect fit if and only if all of the deviations are zero, i.e. D 1 = 0, D 2 = 0, D 3 = 0, D n = 0.
The problem then is to find the line which best fits a given set of data.
In general, for a straight line which comes close to fitting all of the observed points some of the Ds will be positive and some negative. However, the squares (D 2 )will all be positive so we have:
f(m,b) = (y 1 - m x 1 + b)2 + (y2 - m x2 + b)2 +.... (yn - m 2n + b)2
This sum of square of the deviations depends on the choice of m and b but is never negative and can only be zero if m and b have values which produce a straight line that is a perfect fit. The method of least squares then says in effect: Take as the line y = mx + b of best fit, that for which the sum of squares of the deviations:
f(m,b) = D1 2 + D2 2 + D3 2 + ...... + Dn 2 is a minimum. Which means solving the equations:
¶f/¶m = 0, ¶f/¶b = 0
Example problem: Find the straight line that best fits the points:
(0, 1), (1, 3), (2, 2), (3, 4), (4, 5)
Using the method of least squares.
Solution:
We proceed by first compiling the table below with relevant inputs:
Then: f (m,b) =
å (D 2 )= 55- 30b + 5b 2 - 78m + 20mb +30m 2
¶f/¶m = -78 + 20b + 60m
¶f/¶b = -30 +10b + 20m
The value of m and b for which f(m,b) has a minimum must satisfy the simultaneous equations:
¶f/¶m = 0, 20b + 60m = 78
¶f/¶b = 0, 10b + 20m = 30
Then solving by subtracting bottom line from top:
20b + 60m = 78
10b + 20m = 30
------------------
10b + 40m = 48
From which we then obtain: 20m = 18 or m = 18/20 = 9/10 = 0.9
Then: 10b + 20 (9/10) = 30
Or: 10 b = 30 - 18 = 12 or b = 12/10 = 1.2
This leads to the best fit line: y = 0.9 x + 1.2
The graph of which is shown below fits amongst the points:
Suggested Problems:
1) Obtain the line: y = mx + b which best fits the following data points:
(0.10, 0.10), (0.20, 0.20), (0.30, 0.30), (0.40, 0.40), (0.50, 0.50)
2) Apply the method of least squares to obtain the line y = mx + b which best fits the points: (0,1), (1,2), (2,3)
3) In examining the frequency F of subflares within regions of sunspot area (A)* the following table of data is obtained:
Apply the method of least squares to obtain the line F = mA + b which best fits the points
* In millionths of a solar hemisphere.
No comments:
Post a Comment