Please note : the data presented in all course material for the statistical module are
generated by computers to demonstrate the methodologies, and should not be confused with
actual clinical information
Introduction Probability Statistical Significance Nonparametric statistics Sample Size
This page provides a discussion of Probability, the theoretical basis for all the subsequent contents and calculations in the statistics module.
Although the subjects are discussed in some detail, the students need not memorize all the explanations or the histories of development. Students should try to follow the logic, so that they can understand how to interpret results of statistical calculations. Students however should understand and remember a number of key technical terms explained in this page, specifically
Why Probability
Normal & t Distribution
Why Probability and Statistics The scientific approach that characterises the western civilisation is based on reproducible empirical observations. The idea is that repeatedly observed relationships or differences are more likely to reflect reality. Another way of saying this is that, a proposition cannot be accepted unless it is supported by repeated observations. The problem with repeated observations is that the results, often similar, are not always the same. Experience compels us to abandon the idea that something is either true or false. Rather we increasingly see true or false merely as extremes while most of reality is a continuum in between. Similarly, when we consider a scale (e.g. how tall is a man), we can only state an approximation, a range that most observations would fit in. The uncertainty of reality therefore needs to be approached in a consistent and logical way. Probability is a measurement of how likely things are to occur and is one of the ways to represent uncertainty. Statistics is the set of tools to handle probability. How do we handle Probability Clinicians often present probability as a percent. Mathematicians and statisticians however usually use a number between 0 and 1. This module will try to familiarize students to both notations, so will use either, a probability of 0.25 and 25% means the same thing. Probability can be established by observation. For example, if we examine all the children in a class, and there are 20 boys and 15 girls, then we can conclude that the probability of being a girl in that class is 15/(20+15) = 15/35 = 0.43 or 43% Probability can also be calculated based on a theoretical construct. For example, if we are to toss a coin, the results can only be the two outcomes of head or tails, so the probability of getting a head is 1/2 , 0.5 or 50%. Similarly, if we roll a dice, the results can only be 1 of the 6 numbers, so the probability of obtaining any number is 1/6 = 0.17 or 17%. By default, all results produced by programs in this module has a precision of 4 decimal places. However, by convention probability is presented to a precision of 2 decimal places, or at most 3 (e.g. 0.254 or 25.4%). Students should therefore edit the results of calculations and conform to common practices.
Mean
Ancient Phoenician traders carried their merchandise by boat, and often overload their boats to make more money. In a storm, some of the merchandise are thrown overboard to save the boat. A common practice was to compensate the traders who lost their merchandise by contributions from those who did not lose their merchandise. A sophisticated method of calculation was developed on how to do this, and the system was called "Havara". The term Havara evolved through the centuries, and eventually became "average". There are 3 presentations of average. The mode which is the most common value, the median which is a value that divide all the values into two equal groups, and the mean which is a mathematical function where mean = sum of all values / number of values. Statisticians sometimes use the median, but most commonly use the mean. This module will use mean to represent average. Normal Distribution The astronomer, Gauss, tried to measure distances between stars, and noticed that it was difficult to reproduce his measurements exactly. However, his measurements clustered around a central value, more common near the mean, and becoming less common as they are further away from the mean. He then noticed that, whenever he made any measurements of anything, this pattern applies, so he name this the Normal distribution. Standard Deviation Following Gauss, De Moivre derived a formula for the Normal Distribution curve in mathematical terms, so that various components of the Normal distribution can be mathematically handled. The mean (abbreviated to μ) becomes the measure of the central tendency, and the Standard Deviation (abbreviated to SD or σ), a measure of dispersion. Probability of z Fisher used calculus to calculate the area under De Moivre's Normal distribution curve. Fisher argued that, if the total area under the curve represents all the possibilities (probability=1 or 100%), then any area further from a defined distance from the mean represents the probability of having a value greater than that value. Fisher standardized the distance from the mean and called it the Standard Deviate z. By common usage, z is now called the Standard Deviation (SD or σ). The concept of Standard Deviation becomes very useful, as any value in a distribution of known mean and SD can be translated to z where z = (valuemean) / SD, and the relationship between z and probability is constant. For example, we know that the probability of having a z value >1.65 is 0.05 (5%), and >1.96 is 0.025 (2.5%). The program in StatPgm 1. Probability of z and t provides a calculation relating z and probability. 95% Confidence Interval, One and two tail model As statistics is the science of handling uncertainties, an expression of a measurement is the range of likely values, the most common form of which is the 95% confidence interval. This means that, if we are to make repeated observations, we expect the value observed would be within this interval 95% of the time. We can calculate this easily using the relationship between z and probability. There are two ways of doing this.
The problem with the normal distribution is that it works best when the sample size is very large. and the probability distribution become increasingly wider as the sample size becomes smaller, as shown in the diagram to the left. Gosset, who called himself Student, derived a correction of the probability estimate according to sample size and called it t, and this became known as Student's t. Student's t allows the use of a small number of measurements to estimate what may be true of the whole population. This forms the basis of modern inferential statistics, where a small number of observations are made, and the results are generalized to the wider population. The t distribution curve is wider than the normal one. Therefore, a larger area (or higher probability) of being greater than a particular deviate is obtained compared to the normal distribution. This difference varies with sample size (degrees of freedom), such that the probability of t approaches that of z when the sample size increases towards infinity. Conceptually, this is represented by the diagram to the left. With infinite degrees of freedom (i.e., a large sample size), the one tailed t and z have the same value for a particular probability, but with fewer cases, t will be larger than z in obtaining the same probability. The relationship between t, the degrees of freedom, and probability can be calculated using the program in StatPgm 1. Probability of z and t. The example in the program shows
To calculate the 95% confidence interval for a set of observation therefore requires the following steps.
Sample mean and Standard Error of the mean
After establishing the concept of Standard Deviation, Fisher went on to develop the idea of the Standard Error of the Mean (SE for short). He argued that the true mean is difficult to find, as this requires the measurement of everyone in a population, or an infinite number of times. The mean value obtained in a set of observations is therefore only the sample mean, an estimate of the underlying true mean, and this would vary from samples to samples. An estimate of this variation is called the Standard Error of the Mean (SE). Conceptually, Standard Error SE represents the Standard Deviation (SD) of the mean values if repeated samples of the same size were taken. In other words, the mean value is calculated for each repeated sample from the population. The SD of these mean values are calculated which equals the SE of the mean. Difference between two means and its Standard Error Extending the argument of sample means, Fisher argued that the difference between two means is itself a mean, and the Standard Error of this difference can be estimated using the Standard Deviations in the two groups. The null hypothesis If all the arguments up to this point are accepted, and if the difference between two means is a true reflection of population differences, then the probability of having any theoretical difference can be calculated using the z value, where z = (theoretical difference  observed difference) / Standard Error of the difference Fisher then proposed the null hypothesis. He asked that, given the observed difference and its Standard Error, what is the probability for this to represent a theoretical difference of null (0). This is the probability of z, where z = (0  Difference) / Standard Error of the difference. In other words, the probability of z is the probability that there is no difference between the groups, as shown in the diagram to the right. Probability of Type I Error, α or p Fisher, being a mathematician, presents the null hypothesis in terms of a mathematical proof, in the following series of arguments.
Probability of Type II Error, β, and Power Fisher's α worked very well in industry, useful in comparing a new method of manufacturing to an existing one. When α is low, p<0.05, a decision that the difference observed is real can be made. The problem arises when α is high, p>0.05, as the rejection of the null does not mean the acceptance of nonnull, we cannot decide that the difference is zero, so no statistical conclusions can be drawn To fix this, Pearson proposed an additional Alternate Hypothesis, that the difference is not null, which is shown to the left. Following Fisher's argument style, he called the error in rejecting the alternative hypothesis Type II Error, and the probability of Type II Error abbreviated to beta (β). With this proposal, any difference found between two groups will be able to reject the null hypothesis according to α and reject the alternative hypothesis according to β. Although Pearson's initial proposal was theoretically sound, it was not practical, as a nonnull value can be anything between ∞ to +∞ except 0. To make his theory work, Pearson proposed the following, shown in the diagram to the right.
The term Power is often used instead of the probability of Type II Error, where Power = (1  β). Conceptually power is used to represent the probability of detecting a difference, if it really exists. The 95% confidence interval of the difference Since 1980, researchers increasingly became doubtful about Pearson's model, as on many occasions, the research results using this model were unstable, and could not be replicated. There are two related reasons for this failure.
Introduction
Most statistical calculations are based on the assumption of parametric data, that the measurements are continuous and Normally distributed, with a range between infinity and + infinity. In reality, these assumptions are seldom true. However, the data are usually close enough to being parametric that any error of assumption is trivial, and does not seriously impact on interpretations of results. This section discusses situations that depart from the assumption of parametric statistic to the extent that the data must be adjusted or handled in a different manner. Data requiring transformation to become Normally distributed These are measurements which are continuous, but in its original form not Normally distributed. By transforming the values mathematically, they can be altered to become Normally distributed
Proportions Although individual risks of each case requires difficult transformation, the proportion of a group or cluster of data can be transformed into a mean and Standard Error, and subjected to analysis as if they are parametric. The proportion (p) itself can be treated as if it is a mean, and the Standard Error of the proportion = sqrt(p(1p)/n) where n is the sample size Ordinal data These are measurements which are ordered, in that 3>2>1, but the intervals are not constant or defined (32 is not necessarily the same as 21). The data usually have no definable distribution patterns. A typical example is the Likert score where 1=Strongly disagree, 2=disagree, 3=neutral, 4=agree, and 5=strongly agree. In this the difference between strongly agree and agree (54) is not the same as the difference between neutral and disagree (32) Ordinal data requires special mathematical treatment, and this module provides 3 examples of calculations for ordinal data
Statistical power is the probability of detecting a significant difference. It depends on the variation
(Standard Deviation) of the data, and the sample size.
With larger sample sizes, statistics is able to reject the null hypothesis with smaller differences. The diagram to the left shows an example to this. In this diagram, the Standard Deviation of the data is 10, and the difference between the two mean values is 2 (one fifth of the Standard Deviation). The 95% confidence intervals of the difference when different sample sizes are used are shown in this Forest Plot. It can be seen that, for this combination of Standard Deviation and difference, it requires 250 cases per group to demonstrate statistical significance. The importance of sample size The previous example shows that the appropriate sample size must be used in order not to produce misleading results. A sample size that is too small will fail to detect a real difference even if it is there. A sample size unnecessarily large will waste resources, place research subjects under unnecessary risks and discomfort, and inconvenience colleagues and helpers. Increasingly therefore, a failure to estimate the appropriate sample size is considered a sign of poorly conceived and conducted research, resulting in the refusal to approve the research by regulators, funding by funding bodies, and publication of the results by journal editors. Estimation of sample size In principle, sample size requirements are estimated base on the following 4 parameters
