Links : Home Index (Subjects) Contact StatTools |
Related link :
This page provides explanations and support for the most basic of Bayesian Probability algorithms, the classification of individuals into alternative groups according to observed attributes. The basic development of a model using a set of reference data is carried out in the Bayesian Classification (Analysis of Reference Data) Program Page
, and how that model can be modified to suite local circumstances in the Bayesian Classification (Adjust Reference Table) Program Page
.
The next panel takes the user through the calculations using the default example data of the two calculations. This is followed in a panel giving an introduction to Bayesian concepts.
Example 1 : To establish the model using reference Data
Please Note : The example data are computer generated to demonstrate the procedures and are not real. also a much larger reference data set is required to establish a stable model. The small sample is used in order that it can be displayed on the web page. This example follows the procedures used in the Bayesian Classification (Analysis of Reference Data) Program Page , and uses the default example data of that page. We would like to establish a method of identifying Europeans into 3 ethnic types of Italian, French, and German, based on two observations, whether they have dark hair, and whether they have brown eyes. Four combinations are therefore possible, dark hair and brown eyes (++), dark hair and not brown eyes (+-), not dark hair and brown eyes (-+), and neither dark hair nor brown eyes (--). We carefully chose a sample of people truly representing these three ethnic groups, and observed their hair and eye colors, creating the data set as shown in the table to the right.
Step 1 : A table of count is established. The rows are the patterns, and the columns are groups, and the number of cases with each pattern in each group are listed, as shown in the table to the right.
Step 2 : The creation of the relative Frequency table. This contains the probability of having a particular pattern in each of the group, also known as the reference P(pattern|group) table, or more generically as the P(x|j) table. This is calculated by dividing the count in each cell by the total count in the group (column total). The results are as shown in the table to the left. The P(pattern|group) table represents the model we have created, and from which we can create decision tools to use on future independent sets of data. Different software packages provide different format to represent this model, but StatTools uses the table as calculated by the Bayesian Classification (Analysis of Reference Data) Program Page . Step 3 : The creation of the first decision making table, the Maximum Likelihood Table. This is discussed as part of example 2. Example 2 : The adjustment of the reference P(pattern|group) table to create decision tools. This discussion supports the procedures in the Bayesian Classification (Adjust Reference Table) Program Page and uses its default example data.
We begin by using the reference P(pattern|group) table created by the analysis of a set of reference data, as shown in the table to the left.
Step 1 : The creation of the Maximum Likelihood Table which shows the probability of belonging to a group based on the observations available. It is also called the P(group|pattern) or the P(j|x) table. The Maximum Likelihood Probability is calculated by dividing each probability in the P(pattern|group) table by the sum across all groups (row total). The results are as shown in the table to the right.
Step 2 : The construction of Bayesian Probability Table, taking into consideration the apriori probability of belonging to each of the groups. The table is also called P(group|pattern,π) or P(j|x,π). The Maximum Likelihood is based on the assumption that the probability of being in any of the groups is the same, except for the observed characteristics. This is seldom the case in reality. If we were to take our model to Rome, to Paris, or to Dresden, the probability of someone to be Italian, French, or German would be very different even before we observed the characteristics. Such a probability, the apriori probability (π) needs to be taken into account. The program takes π into consideration by using an array of apriori indicators. This is an array which contains relative probabilities of belonging to each group. The values to be entered by the user can be in any measurements (number of cases, probabilities, ratios), and the program normalize these values into probabilities before calculation. The default example in the Bayesian Classification (Adjust Reference Table) Program Page is "1 1 1", indicating that the apriori probability in the 3 groups are the same (normalized to 0.33 each). The results of the calculations will be the same as that from the Maximum Likelihood table.
If we are to use the reference patterns in say Zurich, a predominantly German speaking part of Switzerland, we may find that , for each Italian in town, there are 2 Frenchmen and 4 Germans, so the apriori probability is "1 2 4", or the probability of being a German is twice of being French and four times of being an Italian. If we were to add such an Apriori array into calculation, the program will firstly normalize "1 2 4" to proportions of "0.14 0.29 0.57", meaning the apriori probabilities are 14% Italian, 29% French, and 57% German. The Bayesian Probability table taken apriori probabilities into consideration would be as shown to the right. Because the probability of being German is greater, all those with eyes not brown are classified as German, while the probability (certainty) of classifying to other groups are reduced. Step 3 : The construction of Bayesian Probability Table, taking into consideration the apriori probability of belonging to each of the groups, and also include a cost function for error. The table is also called P(group|pattern,π,cost) or P(j|x,π,cost). The cost function for a group conceptually represent a measurement of cost or loss, if a case erroneously fails to be assigned to that group. An obvious example is the diagnosis of a swelling on the face. It can be a bruise, an infection, or a cancer. To miss a cancer when there is one would be much more serious (greater cost) than that for an infection, than that for a bruise. Common practice is to include the cost function after including the apriori probabilities. If cost is to be considered without apriori, the apriori array can be assigned equal values for all groups. The unit for cost can be any measurement, in money, time, or arbitrary units of judgement. The program normalized the array into fractions before use.
The default example for costs in the Bayesian Classification (Adjust Reference Table) Program Page is "1 1 1" indicating that there is no cost difference between the groups. However, if we are looking desperately for an Italian interpreter in Zurich for an important function, missing an Italian may cost 4 times as much as missing a Frenchman or a German, and we may use a cost array such as "4 1 1", which the program will normalize to "0.67 0.17 0.17", and the results can be seen as in the table to the right. We can see that we would now assign anyone with brown eyes as Italian, and the rest German. This is because the probability of being a German is greater, and missing an Italian incurs a greater cost. We would not assign anyone to be French at all.
Bayesian Probability
Bayesian probability is based on a relatively simple premise, that, if we know the probability of a set of observation in a situation, then we can calculate the probabilities of alternative situations when presented with the set of observations. In StatTools the term pattern is used to represent one or more observations, and group is used group to represent situations. More formally put :
The 3 most common Bayesian functions are therefore :
The Bayesian Classification Algorithm The algorithm, as described in the example panel of this page, and carried out in the Bayesian Classification (Analysis of Reference Data) Program Page and Bayesian Classification (Adjust Reference Table) Program Page , represents the most basic Bayesian model. This model is often used to introduce students to the Bayesian concept. The advantages of using this model are :
The disadvantages are :
Wikipedia. History and basic theoretical consideration of Bayes Theorem. Wikipedia. Modern adaptation and use of Bayesian probability, terminologies, and some formulae. Overall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology. McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6 p.329-344. This is where I got the algorithm from. |