We are tasked in least squares approximation with fitting a straight line:

y = mx^{ } + b

To a series of
experimentally observed points, corresponding
to each of the observed values of x, e.g.

(x _{1} , y _{1}), (x_{2} , y_{2}), (x_{3} , y_{3})……. (x_{n} , y _{n}),

And corresponding to each
of the observed values of x there are actually two values of y, the observed
value y_{obs} and the value predicted by the straight line: mx_{obs}^{ } + b. And we call the difference: y_{obs} - mx_{obs}^{ } + b, a
deviation. Each such deviation measures the amount by which the predicted value
falls short of the observed value. Then
the set of all such deviations, e.g.

D _{1 }= _{ }y _{1} - (m x _{1}^{ } + b), _{ }D_{ 2 }= _{ }y_{2} - (m x _{2}^{ } + b)_{ }

_{
}D_{ 3} = y_{3} - (m x_{3}^{ } + b), D_{ n }= _{ }y_{n} - (m x _{n}^{ } + b)_{ }

Gives an indication of
the closeness of fit of the line y = mx^{ } + b to the data. For example, in the case of the graph shown below:

We have a graph in the form of F = mA + b, where F is the given frequency for a solar flare in a sunspot region and A is the area associated with the region. The defined line then represents the best fit to the assorted data: (A _{1} , F _{1}), (A_{2} , F_{2}), (A_{3} , F_{3})……. (A_{n} , F _{n}).

We say the line shown is a perfect fit if and only if all of the deviations are zero, i.e. D _{1 }= 0, D_{ 2 }= 0, D_{ 3} = 0, D_{ n }= 0.

The problem then is to find the line which best fits a given set of data.

In general, for a straight line which comes close to fitting all of the observed points some of the Ds will be positive and some negative. However, the *squares* (D^{ 2} )will all be positive so we have:

f(m,b) = (y _{1} - m x _{1}^{ } + b)^{2} + (y_{2} - m x_{2}^{ } + b)^{2} +.... (y_{n} - m _{2n}^{ } + b)^{2}

This sum of square of the deviations depends on the choice of m and b but is never negative and can only be zero if m and b have values which produce a straight line that is a perfect fit. The method of least squares then says in effect: Take as the line y = mx + b of best fit, that for which the sum of squares of the deviations:

f(m,b) = D_{1 }^{2} + D_{2 }^{2} + D_{3 }^{2} + ...... + D_{n }^{2} is a minimum. Which means solving the equations:

¶f/¶m = 0, ¶f/¶b = 0

* Example problem:* Find the straight line that best fits the points:

(0, 1), (1, 3), (2, 2), (3, 4), (4, 5)

Using the method of least squares.

*Solution*:

We proceed by first compiling the table below with relevant inputs:

Then: f (m,b) =

å (D^{ 2} )= 55- 30b + 5b ^{2} - 78m + 20mb +30m ^{2}

¶f/¶m = -78 + 20b + 60m

¶f/¶b = -30 +10b + 20m

The value of m and b for which f(m,b) has a minimum must satisfy the simultaneous equations:

¶f/¶m = 0, 20b + 60m = 78

¶f/¶b = 0, 10b + 20m = 30

Then solving by subtracting bottom line from top:

20b + 60m = 78

10b + 20m = 30

------------------

10b + 40m = 48

From which we then obtain: 20m = 18 or m = 18/20 = 9/10 = 0.9

Then: 10b + 20 (9/10) = 30

Or: 10 b = 30 - 18 = 12 or b = 12/10 = 1.2

This leads to the best fit line: y = 0.9 x + 1.2

The graph of which is shown below fits amongst the points:

* Suggested Problems*:

1) Obtain the line: y = mx + b which best fits the following data points:

(0.10, 0.10), (0.20, 0.20), (0.30, 0.30), (0.40, 0.40), (0.50, 0.50)

2) Apply the method of least squares to obtain the line y = mx + b which best fits the points: (0,1), (1,2), (2,3)

3) In examining the frequency F of subflares within regions of sunspot area (A)* the following table of data is obtained:

Apply the method of least squares to obtain the line F = mA + b which best fits the points

* In millionths of a solar hemisphere.

## No comments:

Post a Comment