Parametric Paired t Test
Nonparametric Wilcoxon PSRT
The paired difference is a powerful and commonly used model in clinical research and quality control. By examining the difference outcomes from different causes in the same individual, variations between individuals are very much reduced, so a smaller difference can be detected by a small sample.
In clinical research, questions such as whether husbands are older than wives from the same family, whether boys are heavier than girls in non-identical twins, whether the effects of two different treatments for the same condition in the same patient, all use the paired difference model.
Paired difference is also commonly used in quality control, each pair evaluating a measurement against a standard. Questions such as whether the waiting time for operations exceed a benchmark, whether blood loss in operations exceeds that expected, also used the paired difference model.
StatTools provides three programs for evaluating paired difference, the parametric paired t test and 95% confidence interval of the paired difference, and the nonparametric Wilcoxon Paired Signed Rank Test, and the Permutation Test. The algorithms for calculation are in the Paired Difference Programs Page , and each test is discussed in its own panel.
Paired Difference Sample Size Example
Parametric paired difference calculates the difference between the pairs of values (d = v1 - v2), summarizes these as n, mean, Standard Deviation, and Standard Error of the mean. It then evaluates the mean and its Standard Error against the null hypothesis that mean = 0.
Two tests are performed.
The Sample Size for Mean of Paired Difference Explanations and Tables Page provides 4 programs for sample size issues related to the parametric paired comparison. Theories and interpretaions of these programs are generally discussed in Sample Size Introduction and Explanation Page , and only details specific to paired differences are discussed here.
Two programs are available to assist in the planning phase of a research project
Two programs are available to assist in the evaluation of the data in the analysis phase of a research project
The following example uses computer generated data to demonstrate the processes.
We wish to know, in twin deliveries, whether the first twin is bigger or smaller than the second twin. The paired difference is diff = wttwin 1 - wttwin 2
Using Sample Size for Mean of Paired Difference Explanations and Tables Page, and providing a Standard Deviation of paired differences of 150g, we obtained the table as shown to the right.
With a sample size of 15 pairs, further increases in sample size will only reduce the confidence interval of the difference by 10g per pair (5%). We therefore decided that it would not be cost efficient in a pilot study to exceed this, and conducted a pilot study using 15 pairs of twin deliveries.
The pilot study indicated that there was no insurmountable barrier to mount a successful project, and that the Standard Deviation of the paired differences has not contradicted our initial estimate of 150g. We can decide to proceed with the main project.
Sample size Estimation. We calculate the sample size requirement as follows.
Using the first two columns in the Paired t Test in the Paired Difference Programs Page , the results, rounded to the nearest gram, are as follows.
Mean Difference = -101
Standard Deviation of Difference = 456
Standard Error of Mean = 105
t Test : t = -0.9659 df = 19 p(α) = 0.35
95% Confidence Interval of Paired Difference (two tail) = -320 to 118
The immediate conclusion is that there is no significant difference related to the order of birth. The results are however confusing, as the mean difference found was 101g, exceeding what the critical value determined at the time of planning.
Returning to the Sample Size for Mean of Paired Difference Program Page , the power estimation shows this result to have a power of 0.15, far short of the 0.8 we stipulated during planning. The reason for this lack of power is that the Standard Deviation of the paired difference, 456g, was far greater than the 150g envisaged at the time of planning.
At this point, a decision is required on how to proceed, the options as explained in the Probability Introduction and Explanation Page ,are as follows.
If we choose option 2, we will either settle for an inconclusive result, or recalibrate the sample size based on the observed Standard Deviation. To detect a paired difference of 100g, if the Standard Deviation is 456g, with α=0.05 and power=0.8, will require 166 pairs for a two tail model.
The Wilcoxon Paired Signed Rank test is a nonparametric equivalence of the Paired t test. The procedures are as follows
Sample size for Wilcoxon Paired Signed Rank Test is discussed in Sample Size for Mean of Paired Difference Explanations and Tables Page and not repeated here.
The data in the examples are made up to demonstrate the methods.
We wish to study whether a new analgesic is effective in relieving headaches.
We ask the subjects to describe their headache as none (0), some (1), moderate (2), and severe (3), a 4 point scale (0 to 3), before and after administering the analgesics, and use the paired differences to evaluate the analgesics.
As the paired difference can be from -3 to +3, the range is 6. We can therefore estimate the standard deviation as 6/3.92 = 1.53. We would like our data to be able to detect a paired difference of 1, so our effect size = 1/1.53 = 0.65
We set the power of the study to 0.8, therefore we will use the power of 0.8x0.995=0.84 in the calculation of our sample size.
We set α=0.05, power=0.84, and diff/SD=0.65. Using the table of sample size for the paired t test in Sample Size for Mean of Paired Difference Explanations and Tables Page, we find the sample size to be 23 subjects (pairs).
This is shown in the table to the right, and the table of counts constructed from this is shown in the table to the left.
There were 8 subjects whose headache scores did not change (0), and these are not included in the table of counts.
On the negative side, there were:
We can therefore conclude that headaches decreased significantly after receiving the analgesic.
The Permutation Tests are the most basic of statistical tests, from which other models have developed. StatTools presents two models, the significance test for paired differences presented in Paired Difference Programs Page , and the significance test comparing two groups presented in Unpaired Difference Programs Page .
The general principles are that, in a randomly allocated study, the data obtained could have been in either of the paired measurements. The test consists of calculating every possible permutation of the data, and examine the results. If the results from the original data is near the extremes (e.g. less than 5 percentile or more than 95 percentile in a one tail model), then a decision can be made that it is unlikely to be null and therefore statistically significant.
The advantages of using the Permutation tests are :
The disadvantages of using the tests are related to the computation intensity required, both in the large memory use, and the time required for computation. The number of permutation is 2n, were n is the number of pairs. Computation time therefore increases exponentially with increasing sampl size, and large dataset may either crash the program when available RAM is exhausted, or the computation becomes unacceptably too long.
The Permutation Test is therefore ideal for handling small sets of interval data with uncertain distributions. With larger sample size, the more common non-parametric (Wilcoxon PSRT or Mann-Whitney U Test) and parametric (Paired t test) tests should be preferred.
In theory, the Permutation Test can cope with any number of pairs. However, a probability of <0.05 is not possible with less than 6 pairs unless the differences are uniformly in one direction, and computation will take an unacceptably long time with 22 pairs or more.
The mathematical argument of the Permutation Test is as follows
Step 1. The paired difference (v1-v2) for each case is calculated, as shown in the table to the right. The first 2 columns are the paired measurements from the data, the third column is difference. The paired differences are summed, so that there are 16 pairs, and the sum of paired differences is -3.11
Step 2. Calculating the mathematics of permutation. Given 16 pairs, there are 216=36636 possible combinations. If we are to use a two tail model at α<0.05, then there are 0.025 (2.5%) of paired differences in the extreme values on either side. One would therefore expect that 0.025 x 36636 = 1638 values in either extremes which can be considered unlikely, therefore significantly deviating from null.
Step 3. The sum of difference for all permutations are calculated, each are compared to determine whether it is less than, the same, or greater than that obtained from the original data. Of the 36636 permutations, there are 22622 values less than -3.11, 42880 values greater than -3.11. Both of these are more than the 1638 which define the decision border for α=0.05, 2 tail model. Therefore, we can conclude that the null hypothesis cannot be rejected, or that the paired difference is not statistically significant.
Looking it another way, there are 22623 values greater than -3.11, so 3.11 is the 22624th value from the minimum, or 22624/36636 x 100 = 34.52th percentile of all possible value, not less than 2.5 percentile or more than 97.5 percentile for an α of 0.05 two tail model. In other words, the probability of Type I Error (α) = 0.35 for the one tail model, double to 0.69 for the two tail model.
Paired t Test :
Armitage P. Statistical Methods in Medical Research (1971). Blackwell Scientific Publications. Oxford. P.189-207
Sample Size :
Wilcoxon Paired Signed Rank Test :
Permutation Test for Paired Differences :