Statistics/Curve fitting
Whenever trying to evaluate data that has been collected, often patterns appear, such as a -1 slope when making a scatter plot of in ray optics. It may often be the goal to find a mathematical function that "fits" the data. That is to say a function whose values are close to the data values at the corresponding values and independent values. This is often referred to as the "least squares", and the reason for which is explained later.
Sales Example
editA store sells whatsits at P=3.49 each and the average number of whatsits sold (the volume) per day is V=100. Therefore the total money received T=P times V=349.00 ..... If the price is reduced then, maybe, more whatsits will be sold, but T may be more or less. Obviously if P=0 then T will also be zero. The following was the result:
P V T 2.99 130 388.70 3.29 123 404.67 3.49 100 349.00
Obviously the "best" price is somewhere between 2.99 and 3.49. ..... Curve fitting provides an equation for T versus P for each of the many models that are available for comparison.
Linear model
editThe linear model is based on the "best" straight line. Using a calculator that can do regression, we find for the above data that the closest line of the graph showing T versus P is
- T=605.268605263 - 68.9289473684 * P, and the correlation is shown as about 60% for this model.
Let us examine it in more detail:
P Actual T Calculated T Difference Difference2
2.99 388.70 399.17105263159 - 10.4710526316 109.642943214 3.29 404.67 378.49236842106 26.1776315789 685.268395081 3.49 349.00 364.70657894738 - 15.7065789474 246.696622231
Adding the differences, we find that their sum is nearly zero, indicating that it is the "best" linear model. Squaring a negative number always gives a positive number. so that the SUM OF SQUARES will give us an indication of the GOODNESS OF FIT. Here the SUM OF SQUARES is 1041.60796053, and we can compare the different models, selecting finally the model that has the LEAST SQUARES.
If you do NOT have a calculator or a computer that can do regression, then.....
Calculation of the least square line to fit the given points:
editLOOKING FOR a and b in the equation of the straight line y=a+b*x:
We have, in the above example:
x x2 y y2 xy 2.99 8.9401 388.70 151087.69 1162.213 3.29 10.8241 404.67 163757.8089 1331.3643 3.49 12.1801 349.00 121801 1218.01 ---- ------- ------- ----------- --------- 9.77 31.9443 1142.37 436646.4989 3711.5873
We have:
n = number of points = 3
ax=average of x=9.77/3=3.256
ay=average of y=1142.37/3=380.79
x1=sum of x=9.77
x2=sum of x2=31.9443
y1=sum of y=1142.37
y2=sum of y2=436646.4989
s1=sum of xy=3711.5873
z1=s1-(x1*y1/n)=3711.5873-(9.77*1142.37/3)= -8.731
z2=x2-(x12/n)=31.9443-9.772/3=0.126
b=z1/z2=-68.9289473682
a=ay-b*ax=380.79-(-68.9289473682)*3.256=605.268605263
Thus we have y=605.268605263-68.92894736828*x as the best line to fit the given points of this example.
Parabolic Model
editIf we have n points, then a polynomial of (n-1) degree will fit these n points exactly. We are given in this example 3 points, and a polynomial of the 2nd degree (parabola) should give us an exact fit. The calculator provides the equation
(-663.1666666653)x2 + 4217.91999999x-6294.10448332, giving us
P Actual T Calculated T Difference
2.99 388.70 388.6999999956 4.4E-9 = zero plus rounding error 3.29 404.67 404.6699999951 4.9E-9 = zero plus rounding error 3.49 349.00 348.999999995 5.0E-8 = zero plus rounding error
That is a perfect fit, with the LEAST SQUARES indicating that this model be used.
Other models
editSome of the many other models are based on the exponential function, logarithms, and various manipulations of the independent and/or the dependent variable(s). The "best fit" is usually the one that provides the LEAST SQUARES. Also weighting of the data could be used when some points on a graph are more important than others (such as, maybe, end points, for example).
- Caution: Some calculators may require for Curve fitting consecutive, equally spaced, independent variables. Always compare the original graph with the "fitted" graph.