Introduction
1 Probability(z&t)
2 One Group
3 Two Groups
4 MetaAnalysis
5 Tests
6 Graphics
7 Utilities What is StatPgm StatPgm consists of a number of web based computer programs that performs all the statistical calculations used in the module. Why StatPgm The reasons of providing StatPgms to the students are as follows.
Does StatPgm have any issues Students should be aware of the following issues
Are there other statistical and graphics programs that students can use All StatPgm programs are modified from those on the Dept. O & G statistical website StatTools. The programs on that site are designed more for researchers and statisticians, so more options and output are provided. However there are over 150 pages on that site, so students will have to search and find the specific program needed. SPSS is probably the most popular and easy to use commercial statistical package available, but is expensive unless the student has a university license. The older versions do not have sample size estimations and the graphic facilities are difficult to use. More recent versions may be better STATA, MatLab, and SAS are the most comprehensive and powerful packages preferred by mathematicians and statisticians. The learning curve is however steep, and the packages very expensive. All graphics produced by MacroPlot in StatPgm can also be produced by a combination of procedures in Excel and Powerpoint, and in many instances it is easier to use Excel than StatPgm What will the rest of this page discuss The rest of this page will discuss how to use the StatPgms, how to format the data entry, and a brief description of what the results represent. Detailed discussion on theories, when to use the procedures, and other statistical considerations will be discussed in the content pages for individual research situations.
StatPgm 1. Probability of z and t provides the following procedures
Procedure 1.a. Probability of z calculates probability from z, and z from probability. z is a short hand for Standard Deviate (z = (Value-mean) / Standard Deviation), and has a mean of 0 and Standard Deviation of 1. In a Normally distributed measurement, the probability of being z Standard Deviations away from the mean can thus be estimated. Likewise, from any probability, the z value can also be calculated. Please note that, in this program, the probability of a z value further away from 0 is calculated, so that a positive or a negative z value will produce the same probability, and a probability greater than 0.5 will produce the same z as 1 - that probability. In other words, a z of 1.65 and -1.65 will both result in a probability close to 0.05, and a probability of 0.05 and 0.95 will both result in a z close to 1.65 Procedure 1.b. Probability of t calculates probability from t and degrees of freedom, and t from probability and degrees of freedom The Normal distribution is based on the assumption of infinite (or at least large) sample size. As the sample size decreases, the results becomes erroneous. The Student's t, most commonly referred to as just t, is z corrected for the degree of freedom, which is a function of sample size As t is a correction for z with small sample size, the value of t approaches z as sample size increases towards infinity. Mathematically, with sample size of 300, t is the same as z to 2 decimal precision. Please note that, in this program, the probability of a t further away from 0 is calculated, so that a positive or a negative t value will produce the same probabilities, and a probability greater than 0.5 will produce the same t as 1 - that probability. Please also note that two results are produced, for one and two tail model. The two tail model halves the probability outside of t value and assign them to both ends of the Normal Distribution. Procedure 1.c. Values and Percentiles calculates percentile from values and values from percentiles in a set of Normally distributed measurements, when the sample size, mean, and Standard Deviation are available. The calculation is based on the one tail t distribution.
StatPgm 2a : Survey for Means and Proportions
StatPgm 2b : Correlation and Regression
StatPgm 2a. One Group : Survey for Means and Proportions provides survey statistics to estimate population Standard Deviation, means and population proportions
StatPgm 2a, i. Survey for a Standard Deviation performs two calculations Data input for sample size is the 95% confidence interval of tolerable error, measured as a percentage of the Standard Deviation to be found. In the example, we wish to establish a Standard Deviation with a tolerable error of ±5% of that Standard Deviation. The sample size required is 770 measurements. Data input to estimate the error is the sample size used. The 95% confidence interval of error, measured as a percent of the Standard Deviation found is then calculated. In the example provided, 770 measurements were made, and the 95% confidence interval of the error is the Standard Deviation ± 5% of that Standard Deviation. StatPgm 2a, ii. Survey for a Mean performs two calculations Data input for sample size are the Standard Deviation and the error to be tolerated. In the example, we wish to estimate the birth weight of babies. We anticipate a Standard Deviation of 450g, and we want our results to have a 95% confidence interval of ± 100g. The sample size required is 81. Sample size used and Standard Deviation observed are used to estimate 95% confidence interval of the mean obtained. In the example, we weighed 81 babies, and found the mean to be 3500g and Standard Deviation to be 450g. The 95% confidence interval calculated from the sample size of 81 and Standard Deviation of 450 is ±99.5. The 95% confidence interval of birth weight we estimated are therefore 3500±99.5, between 3400.5g to 3599.5g StatPgm 2a, iii. Survey for a Proportion performs two calculations Data input are the proportion anticipated or to be detected, and the 95% confidence error interval. In the example, we wish to estimate the Caesarean Section rate in our hospital. We suspect it to be around 20% (0.2), and we want the accuracy to be within ±5% (0.05). The sample size required is 246 births. Sample size and the proportion positive are used to estimate the 95% confidence interval of the proportion observed. In the example, we examined 250 births and found 55 Caesarean Sections. The Caesarean Section rate is 0.22 (22%), and the 95% confidence level is ±0.0514. The 95% confidence interval for Caesarean Section rate observed is therefore 0.22±0.0514, 0.1686 to 0.2714 or 16.9% to 27.1%
StatPgm 2b. One Group : Correlation and Regression provides 4 calculations, the sample size to estimate a parametric Correlation Coefficient, the parametric and nonparametric Correlation Coefficients, and regression analysis.
The Sample size required to estimate a parametric Correlation Coefficient ρ, using the default values of α=0.05 and power=0.8, depends on a nominated value ρ that is relevant to the researcher. In our example, the nominated value of ρ is 0.6, and the sample sizes are 16 cases for the one tail model, and 19 for the two tail model. For parametric Pearson's Correlation Coefficient ρ, the data entry is a table with two columns, each row data from a case, and the two columns are the x and y values to be used in calculations. In our example the crown rump length (column 1) and head circumference (column 2) of 20 babies (20 rows) were used for calculating Pearson's correlation Two results of statistical significance are provided.
For the nonparametric Spearman's Correlation Coefficient, the data entry is a table with two columns, each row data from a case, and the two columns are the x and y values to be used in calculations. In our example the two columns are Likert Scores (1 to 5) in responses to two questions in 8 women after childbirth. The first questions solicits perceptions of painful labour, the second that of quality of care. The Spearman's correlation ρ = 0.7975, p<0.05. The conclusion is that a significant Correlation exists between the severity of pain in labour and perception of quality of care in women immediately after childbirth. For Regression analysis, the data entry is a table with two columns, each row data from a case. The first column is the value of x or independent variable, and the second y or dependent variable. In our example, the x variable is gestation at birth in weeks, and the y variable birth weight in grams. The Regression Coefficient is 230g per week and the constant -5585g.
MacroPlot of the data is provided for parametric correlation and regression analysis, consisting of the following
3.a. Two Measurements
3.b. Two Proportions
3.c. Two Regressions
StatPgm 3a. Compare Two Measurements provides calculations comparing two groups of measurements, parametric and nonparametric
StatPgm 3a i. Sample Size for Comparing two means Data entry requires an estimate of the background, population, or within group Standard Deviation of the measurement concerned, and the difference the user thinks matters clinically. The result is sample size per group, assuming that the two groups are of equal size. Two results are produced, for the one and two tail models. Example : We wish to compare birth weight between boys and girls. We expect the Standard Deviation of birth weight to be 400g, and we think a difference of 100g or more will allow us to conclude that the difference is clinically meaningful. Based on the default Probability of Type I Error α=0.05, and a power (1-β) of 0.8, the sample size is 89 per group for the one tail model and 113 for the two tail model. If we want to know if boys are heavier than girls, but not interested if the reverse is true (one tail model), we need to weigh 89 boys and 89 girls. If we have no preconceived idea which sex is heavier, and wishes to see if a difference exists, either way, then we need to weigh 113 boys and 113 girls. StatPgm 3a ii and iii. Compare two sets of parametric measurements Two modes of data entry are available
Example : The height of 24 women who delivered by Caesarean section and 25 who delivered normally are compared.
When the raw data input mode is used, the data is also plotted. Management of the plotted bitmap are discussed in Graphic Editor and MacroPlot : Explanation and Help StatPgm 3a iii. Compare two sets of nonparametric measurements Data entry consists of a 2 column table
The program first created a table of frequencies of responses from the two groups, then calculate the Mann-Whitney U Test and its Probability of Type I Error (α). In this case Mann-Whitney U had a negative value, meaning the scores from group 1 are lower than those from group 2. α is p=n.s. for not significant. The conclusion is that, although doctors agree less than midwives to the statement, the difference was not statistically significant. The things to note :
StatPgm 3b. Compare Two Proportions provides a number of procedures for comparing two proportions
StatPgm 3b i. Sample Size for Comparing two proportions Data input consists of the two expected proportions the researcher proposed to compare. Example : In a controlled trial comparing the use or not use of oxytocic to manage the third stage of labour. We know from past experience that, without oxytocics, 6% (0.06) of women had post-partum haemorrhage. We will consider oxytocic to be effective if it can reduce post-partum haemorrhage to 3% (0.03) or less. Using the default Probability of Type I Error α p=0.05, and the power (1-β) of 0.8, the sample size per group is 590 per group for the one tail test and 749 for the two tail test. As we are only interested in whether oxytocic reduces post-partum haemorrhage, we chose the one tail model, so randomized 2x590=1180 cases into two groups for the trial. StatPgm 3b ii. Comparing two proportions Data input consists of 4 counts, the number of cases with positive attributes in the two groups, and the number of cases with negative attributes in the second group. In the program, the two groups are represented by the two columns, positive attributes the top row and negative attribute the second row. Example used in the program uses : . For positive attributes, there are 10 cases in the first group and 18 cases in the second group. For negative attributes, there are 50 cases in the first group and 222 cases in the second group. In other words, group 1 is 10/(10+50) = 0.1667 (16.7%) positive, and group 2 18 / (18+222) = 0.075 (7.5%) positive. There are 5 calculations available for comparing the two proportions
StatPgm 3c. Compare Two Regressions
Data input is a table with 3 columns
Results : Most of the tabulated results are background data. The important ones are
Data input is a table of 2 columns
Heterogeneity : Two related tests are carried out
Publication Bias : The Rank Correlation Test shows z = 0.49, p = 0.31, not statistically significant. The conclusion is that no significant publication bias detected. Combined Data using the Fixed Effect Model : shows the effect size (blood loss between the two groups in mls) = -8.0 Standard Error = 11.8, 95% CI = -92.0 to 198.3 Combined Data using the Random Effect Model : shows the effect size (blood loss between the two groups in mls) = 53.2 Standard Error = 74,1, 95% CI = -31.0 to 15.1
Binary Tests
Receiver Operator Characteristics
StatPgm 5a i. Qualities of a Binary Test Predicting a Binary Outcome
Input data consists of 4 counts
StatPgm 5a ii. Bayesian Conversion of Pre-test to Post-test Probability Using Likelihood Ratio Data input uses 2 numbers. The prior or Pre-test probability which is the probability of outcome before we know the test result, and the Likelihood Ratio. Example : If 2% of our pregnancies have Placenta Previa (Pre-test = 0.02), the the presence of Ante-partum Haemorrhage (with Likelihood Ratio Test Positive=2.4) will predict the risk of Placenta Previa to be 0.047 (4.7%). In the absence of Antepartum Haemorrhage (with the Likelihood Ratio Test Negative of 0.65), the risk of Placenta Previa is 0.013 (1.3%)
StatPgm 5b contains two related procedures. The first takes raw data for analysis, and produces a table that is suitable as data entry for the second procedure, ROC analysis proper.
StatPgm 5b i. Create Table for ROC Analysis from 2 Columns of Data Data entry consists of a two column table.
StatPgm 5b ii. Receiver Operator Characteristics (ROC) Analysis Data entry is in the form created by the procedure 5b.i, and is a table with 3 columns
The main result is the ROC value and its Standard Error, and in this example, ROC = 0.78, SE = 0.07, 95% confidence interval = 0.65 to 0.91. AS the null value for ROC is 0.5, the 95% confidence interval did not overlap the null value, so is statistically significant. The details are presented in the table, showing True and False Positive and Negative Rates, and Likelihood Ratios for Test Positive and Negative, for cut off diagnostic values throughout the full range of the test. A graphic plot of the ROC, along with the MacroPlot codes are also presented. Editing of the plot are discussed in Graphic Editor and MacroPlot : Explanation and Help.
StatPgm 6. Graphic Editor contains the following plotting procedures
StatPgm 7. Supportive Utilities provides 4 supportive programs that may help users to edit and prepare data for statistical analysis
StatPgm 7 i. Calculate Mean and SD from a Column of Data. Takes a single column of measurements, and calculates sample size (n), mean, Standard Deviation, 95% confidence of measurements, and 95% confidence interval of the mean. StatPgm 7 ii. Create Array of counts from a Column of Data. The data can be numbers of labels, but it must contain only a single word (no gap). StatPgm 7 iii. Create Table of Counts from 2 Columns of Data. The data can be numbers of single words (labels) with no gaps. Column 1 becomes the rows and column 2 the columns of the count table. StatPgm 7 iv. Create Transposed Table of Counts from 2 Columns of Data. This does the same thing as 7 iii, except column 1 becomes the columns and column 2 the rows. |