Please note : the data presented in all course material for the statistical module are
generated by computers to demonstrate the methodologies, and should not be confused with
actual clinical information
Introduction
Exercises
Exercise 2 contains exercises for statistics for correlation and regression.
The theoretical basis for Norm distribution, 95% confidence interval, and sample size are discussed in
Contents_1. Probability.
Discussions on correlation and regression can be found in Contents_2b. Correlation and Regression
The programs for calculating probabilities can be found in StatPgm 1. Probability of z and t, for correlation and regression in StatPgm 2b. One Group : Correlation and Regression
The program StatPgm 7. Supportive Utilities is also available for assistance to convert raw data to means and Standard Deviation, or to percentages
The Microsft Office package of Word, Excel, and Powerpoint, or similar software, should be activated during the exercise. Excel is a useful tool to manipulate data, Powerpoint is useful to edit graphics, and the results should be copied to and edited in a Word file.
Correlation and Regression
Questions 1_1 : Sample Size : click to show contents
 Create a table of sample size requirement to estimate a correlation coefficient from 0.1 to 0.9, at 0.1 interval, using
Probability of Type I Error of p=0.05, power of 80%, for the one tail and two tail models
Answers 1_1 : click to show contents
 Sample size for correlation coefficient, p=0.05, power=80%, one and two tail
α p  Power  Correlation Coefficient ρ  SSiz(one tail)  SSiz(two tail) 
0.05  0.8  0.1  617  782 
0.05  0.8  0.2  153  193 
0.05  0.8  0.3  67  84 
0.05  0.8  0.4  37  46 
0.05  0.8  0.5  23  29 
0.05  0.8  0.6  16  19 
0.05  0.8  0.7  11  13 
0.05  0.8  0.8  8  9 
0.05  0.8  0.9  6  7 
Questions 1_2 : Non parametric Correlation (Spearman) : click to show contents
Wait Respond Wait Respond Wait Respond
2:Soon 3:OK 4:Long 2:Sad 3:Norm 3:OK
4:Long 3:OK 3:Norm 1:VSad 5:Delay 1:VSad
3:Norm 2:Sad 4:Long 4:Glad 1:Short 3:OK
4:Long 5:VGlad 4:Long 4:Glad 1:Short 5:VGlad
2:Soon 2:Sad 3:Norm 2:Sad 3:Norm 3:OK
4:Long 3:OK 4:Long 2:Sad 3:Norm 4:Glad
2:Soon 3:OK 5:Delay 2:Sad 2:Soon 2:Sad
3:Norm 4:Glad 5:Delay 3:OK 3:Norm 3:OK
3:Norm 3:OK 2:Soon 3:OK 3:Norm 2:Sad
2:Soon 2:Sad 3:Norm 4:Glad 3:Norm 3:OK
1:Short 3:OK 3:Norm 3:OK 3:Norm 2:Sad
1:Short 5:VGlad 2:Soon 2:Sad 2:Soon 3:OK
4:Long 3:OK 4:Long 4:Glad 5:Delay 2:Sad
4:Long 3:OK 3:Norm 3:OK 3:Norm 3:OK
1:Short 5:VGlad 3:Norm 3:OK 4:Long 2:Sad
1:Short 5:VGlad 3:Norm 2:Sad 2:Soon 4:Glad
3:Norm 1:VSad 4:Long 2:Sad

 In a survey of the outpatient clinic, the waiting time for patients between arrival and being seen is classified as
1:Short (<10 min), 2:soon (<20min), 3:Norm (<30min), 4:long (<1hr), and 5:Delay (>=1hr).
The patient's emotional response is classified as 1:VSad (very unhappy), 2:Sad (unhappy), 3:OK (neither happy nor unhappy),
4:Glad (happy), 5:VGlad (very happy). The table to the right contained data from such a survey.
 Produce a table, where the row represents waiting time and the columns patient's response, and the cells contain the
number of cases for that waiting time and response
 Calculate the Spearman's Correlation Coefficient and its statistical significance
 Interpret the results in terms of whether the longer waiting time is related to less happy patients
Answers 1_2 : click to show contents
 From the survey of waiting time and patient's response
 1:VSad  2:Sad  3:OK  4:Glad  5:VGlad 
1:Short  0  0  2  0  4 
2:Soon  0  4  4  1  0 
3:Norm  2  5  9  3  0 
4:Long  0  4  4  3  1 
5:Delay  1  2  1  0  0 
 Spearman Correlation Coefficient : n=50, ρ=0.253, p(1 tail)=0.025 to 0.05 p(2 tail)=0.05 to 0.1
 The Spearman's Correlation Coefficient (ρ) is 0.253, the longer the wait, the less happy
ρ is statistically not significant in the two tail model (p<0.1 but >0.05)
ρ is statistically significant in the one tail model (p<0.05 but >0.025)
 As we are interested only in whether longer waiting time is related to less happy patients, but not interested in whether
longer waiting time is related to happier patients, the one tail model is appropriate
 The conclusions is that this set of data and results supports the hypothesis that longer waiting time is associated with
less happy patients
Questions 1_3 : Parametric Correlation and Regression : click to show contents
cm Min cm Min cm Min
162 201 164 165 166 170
161 39 159 239 158 281
158 215 167 345 157 176
155 312 163 202 163 302
163 159 158 404 161 236
164 426 156 407 161 187
162 120 164 136 160 244
162 347 159 345 160 304
162 131 157 262 163 143
161 311 155 311 163 339
159 325 157 226 157 291
158 219 159 315 166 188
159 465 164 20 163 15
163 99 161 526 157 262
162 311 162 238 159 412
164 269 164 301 161 346
161 388 161 215

 A survey was conducted to evaluate whether shorter women have longer labour, the height of the mother is measured in cms,
and the duration of labour in number of minutes (Min). The results of the survey are presented in the table to the right.
 Calculate the Pearson's correlation coefficient relating height and duration of labour, its 95% confidence interval, and
whether shorter women are associated with a longer labour.
 Create a Forest Plot to demonstrate the 95% confidence intervals, both one and two tail.
 Calculate the regression formula with which the duration of labour can be predicted by the height of the mother
 Tabulate the expected duration of labour for mothers who are 155cm to 165cms at 1 cm interval
 Create a scatter plot showing the data points and the regression line
Answers 1_3 : click to show contents
 In a survey of maternal height and duration of labour
 Correlation Coefficient ρ = 0.3035, 95% confidence interval = 1 to 0.0726 (one tail), and 0.5366 to 0.0275 (two tail)
 As the research hypothesis is whether shorter women are associated with longer labour and not concerned with whether
shorter women are associated with shorter labour, the one tail model is appropriate
 As the 95% confidence interval does not overlap the null value (1 to 0.0726), the correlation is statistically significant,
so that the data and results support the hypothesis that shorter women are associated with longer labour
 The regression formula is Duration of labour (min) = 2143  11.72(height (cms))
Ht(cm)  Duration (min) 
155  326 (5hr26min) 
156  315 (5hr15min) 
157  303 (5hr3min) 
158  291 (4hr51min) 
159  280 (4hr40min) 
160  268 (4hr28min) 
161  256 (4hr16min) 
162  244 (4hr4min) 
163  233 (3hr53min) 
164  221 (3hr41min) 
165  209 (3hr29min) 
