Wednesday, July 26, 2023

An Introduction To Numerical Analysis (5): Least Squares Approximation

We are tasked in least squares approximation with fitting a straight line: 

y = mx   + b
 

To a series of experimentally observed points,   corresponding to each of the observed values of x, e.g.

(x 1 ,  y 1),  (x2 ,  y2), (x3 ,  y3)……. (xn ,  y n),   

And corresponding to each of the observed values of x there are actually two values of y, the observed value yobs   and the value predicted by the straight line:  mxobs   + b.  And we call the difference:  yobs   -   mxobs   + b,  a deviation. Each such deviation measures the amount by which the predicted value falls short of the observed value.  Then the set of all such deviations, e.g.

 D 1    =  1   -   (m x 1   + b),     D 2 =  y2  -  (m x 2   + b)              

  D 3  = y3   -   (m x3   + b),  D n =  yn   -   (m x n   + b)              

Gives an indication of the closeness of fit of the line y = mx   + b to the data.  For example, in the case of the graph shown below:


We have a graph in the form of F = mA + b, where F is the given frequency for a solar flare in a sunspot region and A is the area associated with the region. The defined line then represents the best fit to the assorted data: (A 1 ,  F 1),  (A2 ,  F2), (A3 ,  F3)……. (An ,  F n).  

We say the line shown is a perfect fit if and only if all of the deviations are zero, i.e.  D 1    = 0, D 2    = 0,  D 3  = 0,  D n  = 0.

The problem then is to find the line which best fits a given set of data.

In general, for a straight line which comes close to fitting all of the observed points some of the Ds  will be positive and some negative. However, the squares  (D 2 )will all be positive so we have:

f(m,b) =  (1 -  m x 1   + b)2  + (y2 -  m x2   + b)2 +.... (yn -  m 2n   + b)2 

This sum of square of the deviations depends on the choice of m and b but is never negative and can only be zero if m and b have values which produce a straight line that is a perfect fit.   The method of least squares then says in effect:  Take as the line y = mx + b of best fit, that for which the sum of squares of the deviations:

f(m,b) =  D2  +  D2   +  D2   +   ......  +  D2                                                                                                                                                        is a minimum.  Which means solving the equations:

f/m = 0,   f/b = 0

Example problem:  Find the straight line that best fits the points:

(0, 1), (1, 3), (2, 2), (3, 4), (4, 5)

Using the method of least squares.

Solution:

We proceed by first compiling the table below with relevant inputs:

Then: f (m,b) = 

å (D 2 )= 55- 30b + 5b 2 - 78m + 20mb +30m 2

f/m = -78 + 20b + 60m

f/b = -30 +10b + 20m

The value of m and b for which f(m,b) has a minimum must satisfy the simultaneous equations:

f/m = 0,   20b + 60m  = 78

f/b =  0,   10b + 20m  =  30

Then solving by subtracting bottom line from top:

20b + 60m  = 78

10b + 20m  =  30

------------------

10b + 40m = 48

From which we then obtain:  20m = 18 or m = 18/20 = 9/10 = 0.9

Then: 10b + 20 (9/10) = 30  

Or: 10 b = 30 - 18 = 12  or b = 12/10 = 1.2

This leads to the best fit line: y = 0.9 x  +  1.2

The graph of which is shown below fits amongst the points:


Suggested Problems:  

1)  Obtain the line: y = mx + b which best fits the following data points:

(0.10, 0.10),  (0.20, 0.20), (0.30, 0.30), (0.40, 0.40), (0.50, 0.50)  

2) Apply the method of least squares to obtain the line y = mx + b which best fits the points:  (0,1), (1,2), (2,3)

3) In examining the frequency F of subflares within regions of sunspot area (A)* the following table of data is obtained:

Apply the method of least squares to obtain the line F = mA + b which best fits the points


* In millionths of a solar hemisphere.


No comments: