OBGYN MSc Stat Module (2016-2018) : Questions and Answers
Latest Q&A Archived Q&A Q&A from Previous Modules

December 3rd 2017

I understand one tail 95% confidence interval. I remained confused how to interpret it in rlationship to using which tail to support or not support the research hypothesis

I will demonstrate with the following example. Lets say we want to know whether boys weighed more at birth than girls, and we have the results of 3 studies

StudyGroup 1 (Boys)Group 2 (Girls)One Tail 95% CI
nmeanSDnmeanSDDiffLeft Tailright tail
11003500420110345041050<145>-45
210538004051203600412200<290>110
39234004001053550406-150<-55>-405
The research hypothesis is that boys are heavier than girls. Given that group 1 are boys and group 2 girls, the research hypothesis is group 1 > group 2. We therefore use the right tail.

In study 1 (>-45) and 3(>-405) the right tail 95% confidence intervals overlap the null value, so the results do not support the research hypothesis. In study 2 the right tail 95% confidence interval does not overlap the null value, so results of study 2 supports the research hypothesis

If you are still confused, one way to clarify the situation is to draw the 95% confidence intervals as a Forest Plot. I will present a formal diagram (to the right), but you can do this quickly using paper and pencil. Please note the following

  1. A determination to use the right tail was made from the srart, and not after examining the figures. In this case, the hypothesis is that group 1 > group 2, so the right tail is looked at. The 3 left tails, in black, are of no interest
  2. Only when the right tail does not overlap the null value, does the result supports the research hypothesis
  3. In study 1, the right tail (red) overlaps the null value, so the result does not support the research hypothesis
  4. In study 2, the right tail (blue) does not overlap null, so the result supports the research hypothesis
  5. In study 3, the right tail (red) overlaps null, so the result does not support the research hypothesis
The common misinterpretation occurs in situations like study 3, where one of the tails (the left tail, in black) does not overlap null, and the beginner thinks this represents statistical significance and therefore supporting the research hypothesis. Unfortunately, this is the wrong tail, and is only important if our hypothesis was that boys weigh less than girls (group1 < group 2). The important thing to remember is that, in the one tail model, selecting the correct tail comes first, and the correct tail depends on the research hypothesis. Only after the correct tail is determines should whether the 95% confidence interval overlaps null be considered.

July 25th 2017

I do understand one and two tail, as well as the 95% confidence interval. However I keep getting the calculations wrong. Is there a simple approach for me to use to get the right answers.

Let me use the example data in StatPgm_3a_2Measurements.php, pgm 3aii

  • Data
    Grpnmeansd
    grp 1 24153.93.1
    grp 2 25157.12.8
  • Results
    • Difference = mean1 - mean2 = 153.9 - 157.1 = -3.2
    • Standard Error (SE) = 0.8
    • 95% CI (two tail)
      • t for two tail = 2.0
      • 95% CI = (mean- t x SE) to (mean + t x SE) = (-3.2 + 2.0 x 0.8) to (-3.2 + 2.0 x 0.8) = -4.9 to -1.9 (with minor rounding errors)

        As the whole of the 95% confidence interval not overlapping null (0), we can conclude that a significant difference exists

    • 95% CI (one tail)
      • t for 1 tail = 1.7
      • The right tail, to be used if the hypothesis is "difference >0". The 95% CI excluding the 2.5% on the left = > 2.5 percentile = > mean - t x SE = >-3.2 - 1.7 x 0.8 = >-4.6

        This set of data shows the difference >-4.6, overlapping 0, so we cannot conclude the difference is > then 0

      • The left tail, to be used if the hypothesis is "difference <0". The 95% CI excluding the 2.5% on the right = < 97.5 percentile = < mean + t x SE = <-3.2 + 1.7 x 0.8 = <-1.8

        This set of data shows the difference <-1.8, not overlapping 0, so we cannot conclude the difference is < then 0

  • The things to check
    • Make sure the t value you used to calculate the confidence interval is the correct one, as there is a t for one tail and a different t for two tail
    • The two tail 95% CI is easy, as it is difference - t x se to difference + t x se
    • The one tail 95% CI is equally easy when you are familiar with them, but a bit counter-intuitive for the beginner, because the left/right, +/-, and </> are not aligned and have to be carefully placed
      • Conventionally, the difference is group 1 - group 2, so the one tail hypothesis of grp 1 < grp 2 is the same as difference < 0. The 95% confidence to use here is -∞ to (difference + t x SE) or >(difference + t x SE). If this is <0 then it is significant. If this is >0 then it is not significant
      • On the other hand, the one tail hypothesis of grp 1 > grp 2 is the same as difference > 0. The 95% confidence to use here is (difference - t x SE) to +∞, or <(difference - t x SE). If this is >0 then it is significant. If this is <0 then it is not significant

July 25th 2017

What is the null value

The null value is defined by Fisher as the value representing no difference in the null hypothesis

  • In comparing to means the null value is 0
  • In comparing two ratios, the null value is 1
  • In Receiver Operator Characteristics, the null value is 0.5
  • I asking whether the complication rate is more than 5%, the null value is 5%

    July 11th 2017

    Why are the results published on the teaching and example pages are sometimes slightly different to when I do the calculations myself

    As I have previously explained, computers have different processors so calculations are precise to different number of decimal places. On top of this, statistical calculations often uses multiple iterations (repeated calculations to obtain the best approxinmation). Depending on the machine and the programs written therefore, results may differ slightly, anything up to 1-2%. This usually shows up as a difference of 1 to 5 in sample size calculations, and differences at the second or third decimal places in precision results. Students should not be alarmed by these minor differences.

    July 6th 2017

    In what way are t and z values differ

    Both z and t are calculated the same way, both t and z = (value-mean) / (Standard Deviation or Standard Error). Both means the number of Standard Deviations (or Standard Errors) from the mean.

    z was first devised by Fisher, who mathematically assumed that he was dealing with a population, every one involved, or very large numbers.

    t was devised later (by someone who called himself Student), as a correction for z when the data is from a sample (not everyone), or when the number of observations (sample size) is small. The reason for its development was so that conclusions can be drawn with few observations (small sample size)

    When the sample size is infinite (everyone), z and t (one tail) have the same value. As sample size decreases, the probability value from t becomes larger than the probability value for z. When the degrees of freedom (sample size - 1) is less than 400, the difference is big enough to be noticed.

    July 6th 2017

    Why do I get different results when I enter the data with different number of decimal point precision

    Most modern computer uses calculators with 64 bits (64 0/1) processor. This means in multiplication and division, the numbers are accurate to more than 14 decimal points. For outputting results, such precision are both unnecessary and confusing, so most statistical programs truncate the results to a default number of decimal points. In the programs for the module, all output are truncated to 4 decimal places, even when this is unnecessarily too many for many situations.

    The calculations starts with the numbers entered as data. The number of decimal point precision entered is interpreted by the computer as different values. For example, 1.2 means 1.20000000000 and 1.22 means 1.22000000000. To the user they are the same with trivial difference in precision, but to the computer they are completely different values, and in a complex calculations, difference from high precision calculation accumulates, so the results becomes different.

    Using the computer to perform any calculation therefore requires consideration concerning precisions. Both in entering the data and presenting the results, the number of decimal points in precision should be no more than adequate for the purpose. For example, there is no point using any decimal points in birth weight when babies are weighed to the nearest 10g, and no point in using more than 1 decimal point in height when most heights are measured to the nearest half cms.

    June 1st 2017

    In difference between two means, the one tail model, two 95% confidence intervals are provided. Which one should I choose

    It depends on the research question. I will illustrate with the following example

    nmeanSD
    Boys503550320
    Girls503400305

    We are comparing the birthweight in grams of boys and girls, with the data as shown to the right.

    Using StatPgm 3aii from StatPgm_3a_2Measurements.php, the 95% confidence intervals (boys - girls) are:

    • One tail : <=254 or >=46
    • Two tail : 26 to 274
    If the research question is whether there is a difference either way, boys heavier than girls or girls heavier than boys, then the two tail model is appropriate, and the answer is boys are 26g to 274g heavier than girls (2.5th to 97.5th percentile of the difference) . Given that the 95% confidence interval does not overlap the null value (0), we can conclude that a significant difference exists, boys are found to be heavier than girls

    If the research question is whether boys are heavier than girls, then the one tail model is used, with the exclusion of the left tail, so that the 95% confidence interval (5th percentile to 100th percentile) is >=46 (46g to +∞). Given that this interval does not overlap the null value (0), we can conclude that boys are significantly heavier than girls

    If the research question is whether boys are lighter than girls, again the one tail model is used, with the exclusion of the right tail, so that the 95% confidence interval (0th percentile to 95th percentile) is <=274 (-∞ to 274g). Given that this interval overlaps the null value (0), we can conclude that boys are not significantly lighter than girls

    Put this in another way, the <= interval is used to test group 1 (boys) < (lighter than) group 2 (girls), and the >= interval is used to test group 1 (boys) > (heavier than) group 2 (girls)