Links : Home Index (Subjects) Index (Categories) Contact StatTools 
Related link :
A common statistical problem is to describe a relationship between two measurements that are not linearly related
(in a straight line). When such a relationship can be mathematically
defined (e.g. y=x^{2}) variables can be transformed using programs in the Numerical Transformation Program Page
and the relatively simple linear relationship retained.
Often however, a curved relationship that exists may appear regular and consistent, but a mathematical definition of that relationship is not available, and an empitical "best fit" algorithm, such as the polynomial curve fitting from the Curve Fitting Program Page is required. The polynomial curve fit uses the formula y=a + b_{1}x + b_{2}x^{2} + b_{3}x^{3} + b_{4}x^{4}.... As each increase in power bends the relationship into a sharper curve, the combination of all the coefficients will be able to produce a curve of potentially any level of complexity. In biosocial science, however, curve fitting beyond the third power is seldom necessary or meaningful. Curve fitting can be easily accomplished by using multiple regression as described in the Multiple Regression Explained Page , where the single x variable can be transformed into x^{2}, x^{3}, x^{4}, and so on, and the combination subjected to multiple regression analysis. Curve fitting has been used successfully in laboratories, to define relationships between the results of a test (e.g. the depth of a color reaction) to the amount of a chemical (e.g. sugar) present. The problem of using curve fitting when more than the mean values of the fit are required is the difficulty of assigning variance and the 95% confidence interval of the fitted curve. The least square statistics is seldom useful here, as each of the coefficient has its own variation, and it is difficult to integrate them. An even more difficult issue is that, for many biological measurements, variance increases with the scale of measurement, so that the 95% confidence interval around y increases as the x value increases. Altman (see reference) described a two stage procedure that solves this problem. In the first stage, the standard curve fitting for the mean value is carried out. In the second stage, the distance between y of each data point and the mean y from the curve fit is obtained, and its absolute value used to perform another curve fit, so that a variable 95% confidence interval can be defined. The program in the Curve Fitting Program Page uses Altman's algorithm, and it can be used as follows.
We will fit the mean y value to the power of 3, and the standard deviation to the power of 1. The results are as follows.
The output is to the right. The first table is the curve for the mean value, and here y = 7 + 23.53x 6.49x^{2} + 0.69x^{3}. This is followed by the regression line for the standard deviation, SD = 1.67 + 2.33x, which defines the Standard Deviation from the curve fitted mean for any x value If we were to combine the two formulae, we can now have the two equations that can be used to draw the 95% confidence interval lines. From the first table, the curve of mean is y = 7 + 23.5317x  6.4881x^{2} + 0.6944x^{3}
From the second table, the standard deviation from the mean curve is SD = 3.7247 + 10.2753x The 95% confidence interval is mean ±1.96SD, so by combining the two fitted lines, we can obtain the upper and lower 95% CI lines, as shown in the table to the right. These are
The data points and their deviation from the mean line are then presented. The abbreviations are:
Altman DG (1993) Constructing agerelated reference
centiles using absolute residuals. Statistics in Medicine 12(10):917924
