 StatTools : Discriminant Analysis Explained
 Explanation Analysis Using Reference Data Use Model on other data sets Plotting Discriminant Map References Introduction The programs to perform Discriminant Analysis in StatTools provides only the basic calculations that creates and use the Discriminant model, and this explanation page discusses only those features that supports the use of this basic sets of programs. Users requiring more details can follow the trail from the references provided. Those wishing to acquire an in depth understanding of the subject should attend the appropriate courses, which usually requires a one semester study at the Masters level in a tertiary institution. Mathematically, Discriminant Analysis has some similarity to Multiple Regression. Both are based on the Least Square method, and assume that the data is parametric (Normally distributed measurements). In the case of Multiple Regression, the independent variables are binary or at least ordinal, and the dependent variable a parametric measurement. In Discriminant Analysis, the independent variables are parametric while the dependent variable represents groups which are not related. The use of Discriminant Analysis differs from that of Multiple Regression. Multiple Regression is used to model how a particular measurement is affected by, and in many cases can be predicted from, its associated independent variables Discriminant Analysis is usually used to classify and separate individual into pre-conceived groups. Organization of Programs All procedures of Discriminant Analysis offered by StatTools placed into 3 program pages according to how they may be used. These are briefly described here, but explored in greater details in their own panels on this page. The Discriminant Analysis (Analysis Using Reference Data) Program Page is provided for the initial analysis using a set of reference data. If successful, this produces a model by which future individuals can be correctly classified. This is used by researchers wishing to develop a clinical classification system. Discriminant Analysis (Use of Coefficients on New Data) Program Page is provided to use the statistical results obtained from the reference data on future and independent sets of data. This is used by clinicians, having accepted the validity of the model already produced, to classify new cases as they present. Discriminant Analysis (Plotting Discriminant Map) Program Page is provided for the research assistant or secretary, to produce graphic representation of results produced by the two previous programs. Technical Issues The programs from StatTools are assembled from information available in the public domain (see references), and the results tested against that produced by SPSS. Although the results are numerically the same as that from SPSS (other than small rounding errors), they differed in the following manner Many safety checks (e.g. whether the independent variables are truly parametric), and many intermediary results (covariance tables) are not presented by StatToolsStatTools uses the correlation matrix and not the covariance matrix. This means all measurements are converted to z values (z=(value-mean)/SD), so have a mean=0 and SD=1 before calculation. The coefficients thus created is called the Standardized Discriminant Coefficient. The means and Standard Deviations for each variable used in the Discriminant Analysis (Analysis Using Reference Data) Program Page are calculated from the data The means and Standard Deviations for each variable used in the Discriminant Analysis (Use of Coefficients on New Data) Program Page should be from the reference data that produced the coefficient. Users do not always adhere to this, and use either the data being processed or some estimate of population means and Standard Deviation. The results is produced to 4 decimal point precision by default. This is usually unnecessary, as most reports of Discriminant Function and probability use 2 decimal precision The function values, being created using normalized values, are also normalized (mean=0 and SD=1), and not related to the units of measurements used in the input data. SPSS produces the probability estimates using the Maximum Likelihood method. This is also presented by StatTools, with the addition of the Bayesian model that incorporates the apriori probability and a loss function. 