Introduction
Part 1 : Q1.1 to 1.18
Part 1 : Q1.16 to 1.23
Part 2 : Q2.1 to 2.10
Part 2:Q2.16 to 2.30
The assessment exercise for the BSc. Reproductive Science consists of a series of questions that the students must correctly answer. The reasoning, guidelines, and marking guides are presented in the Guidelines to Assessment page.
This page contains the questions to be answered, and these are common to all students. Each student however will received a unique set of data, in the form of an excel file, to be used to answer these questions.
The assessment is in two parts.
 Part 1 covers all basic statistical procedures taught in the module, excluding metaanalysis and statisticsd of prediction. When all exercises in Part 1 are successfully completed, the student will have pass the module with a grade of C or D
 Part 2 contains more complex exercises, sometimes requiring more than one statistical procedure and sometimes requiring some handling of the data. Part 2 also ioncludes questions related to metaanalysis and statistics of prediction. When all exercises are completed in Part 1 and Part 2, the student will be considered for the grade of A or B.
Students should place their answers in a Word or pdf file, write protect the contents with a password, then submit it attached to an email to Prof. Chang. Students are reminded that
 There is no limit to the number of submissions, but the number of submissions required before passing will
affect grading. Specifically
 Student will be graded D if more than 3 submissions are required before passing Part 1
 Student will have priority in being considered for grade B if both Part 1 and 2 are passed with earlier submissions
 Student will only be considered for grade A if both Part 1 and 2 are passed with the first submission
 Students may submit their answers as soon after July 2015 as they choose, but are reminded that
 Student must successfully pass Part 1 by the end of March, 2018, will be graded F for failure.
 30% of those graded C are awarded B or better, and 30% of these are awarded A. Once qualified for
consideration, the higher grades are awarded in order of the time of the final submission until the quota is filled.
Finally, be aware that all data used in assessment, in fact, all data used in the statistics module, are computer generated and not real
Q1.1
Q1.2
Q1.3
Q1.4
Q1.5
Q1.6
Q1.7
Q1.8
Q1.9
Q1.10
Q1.11
Q1.12
Q1.13
Q1.14
Q1.15
Q1.16
Q1.17
Q1.18
Question 1.1 : Produce a Pie Chart
The student is required to produce a pie chart using the data in worksheet A in his/her unique Excel file
The table contains the numbers of different outcomes in IVF cycles from a unit over 1 year
The minimum acceptable standards of the pie chart produced are as follows
 Each wedge of the pie must have a different color or shading
 The color or shading must be labelled with
 The group the wedge is associated with. The wording need not be exact, but should be clearly interpretable
 The percent of total. The percentage should be accurate to 1 decimal point, e.g. 12.3%
 One wedge of the pie (it does not matter which one) should be separated from the whole pie
Question 1.2 : Produce a Fixed Interval Bar Chart
The student is required to produce a bar chart using the data in worksheet A in his/her unique Excel file
The table contains the numbers of different outcomes from IVF from a unit over 1 year
The minimum acceptable standards of the bar chart produced are as follows
 The widths of the bars are the same in the chart, and the intervals between the bars are the same in the chart
 The width of the bars are approximately the same as the interval between them
 The x axis
 What the x axis represents should be named
 The name for each bar is clearly labelled. The names need not be the same as that in the data but must be clearly interpretable
 If a name is too long, it should be split up into two rows
 The center of the name should align with the center of the bar
 The y axis
 What the y axis represents should be named
 The y axis should be the number of cases and not percentages
 The major intervals should be regular, in multiples of 2, 5, 10, or 100
 The minor intervals should be immediately interpretable
Question 1.3 : Sample Size and Precision of Proportions
The student is required to produce a table of sample size and precision estimations, using the data in worksheet A in his/her unique Excel file
The table should be constructed as follows
 The rows
 The top row should contains the labels for each column
 The subsequent rows are for each type of delivery or each cause of abdominal pain, according to which table of data is used
 The columns, counting from the left
 Column 1 is the label for outcomes from IVF
 Column 2 is the number of cases, copied from the data
 Column 3 is the percentage of the total
 Column 4 is the 95% confidence interval of the percentage, in term of percent
 Column 5 is the sample size required in a survey to confirm the percentage, with a 95% confidence interval of ±1%
 Column 6 is the sample size required in a survey to confirm the percentage, with a 95% confidence interval of ±5%
The minimum acceptable standards of the table
 All labels must be clearly interpretable
 All percentages have precision up to 1 decimal point. e.g. 12.3%
 All columns and rows are clearly separated and aligned.
Question 1.4 : Normally Distributed Measurements
The student is required to analyse a set of normally distributed measurements and tabulate the results, using the ultrasound measurement of endometrial thickness in mm in worksheet B.
The student should analyse the data and produce a table showing the following results
 Sample size
 Mean
 Standard Deviation
 95% confidence interval of the measurement
 Standard Error of the mean
 Precision of the mean, the ± error of mean at the 95% confidence level
 95% confidence interval of the mean
The minimum acceptable standards of the table
 The tables should be two columns, the label and the value
 Labels must be clearly interpretable
 Values should have precision of 1 decimal point. e.g. 3500.3
 All columns and rows are clearly separated and aligned.
Question 1.5 : Calculation of percentile values
Based on the results of analysis in Question 1.4, the student is required to produce a table showing the value for the 5 ^{th}, 10 ^{th}, 90 ^{th}, and 95 ^{th} percentile
The minimum acceptable standards of the table
 The table should be two columns, the label and the value
 Labels must be clearly interpretable
 Values should have precision of 1 decimal point. e.g. 3500.3
 All columns and rows are clearly separated and aligned.
Question 1.6 : Calculation Sample size for estimating means
Based on the results of analysis in Question 1.4, the student is required to produce a table showing the sample size required in a survey to confirm the mean value with an error at half of current error, at the current error, and at double the current error
The minimum acceptable standards of the table
 The table should be two columns, the label and the sample size
 Labels must be clearly interpretable
 All columns and rows are clearly separated and aligned.
Question 1.7 : Calculation t scores and percentile of values
Based on the data used and results of analysis in Question 1.4, the student is required to produce a table showing the t score (number of Standard Deviation from the mean) and percentile for each value
 The table should have 3 columns
 Column 1 shows the value from the data
 Column 2 shows the t score, t = (valuemean) / Standard Deviation
 Column 3 shows the percentile value
The minimum acceptable standards of the table
 The data is shown as obtained
 t scores should have a 2 decimal point precision
 Percentile should be to the nearest whole number
 All columns and rows are clearly separated and aligned.
Question 1.8. Correlation Analysis
The student is required to perform correlation analysis, using blood mercury concentration and sperm count data in worksheet C.
The data is in two columns. The first column are blood mercury levels in ng/ml, and the second column cperm count in millions / ml.
The student should analyse the data and produce a table showing the following results
 Assuming mercury concentration and sperm count are normally distributed, the Pearson's correlation Coefficient ρ
 The 95% confidence interval of ρ, calculated using Fisher's Z Transformation, both one and two tails
The minimum acceptable standards of the table
 The tables should be two columns, the label and the value
 Labels must be clearly interpretable
 Values should have precision of 4 decimal point. e.g. 0.3456
 For 95% confidence interval, the two tail model, the range of the interval is required
 For 95% confidence interval, the one tail model, the ranges for the left tail and the right tail are required
Question 1.9. Sample size for estimation Correlation Coefficient
Based on the correlation coefficient ρ obtained in Question 1.8, create a table showing the sample size required (one and two tail model), to estimate correlation coefficients with the value obtained in Question 1.8, half its value, and a third its value.
The minimum acceptable standards of the table
 The tables should be 3 columns, ρ, sample size (1 tail), and sample size (two tail)
 In addition to the row with labels, the table should have 3 rows, for ρ obtained in Question 1.8, ρ/2, and ρ/3
Question 1.10. Regression Analysis
The student is required to perform regression analysis, using mercury concentration and sperm count data in worksheet C.
The data is in two columns. The first column are mercury concentration in ng/ml, and the second column sperm count in millions/ml.
The student should carry out a regression analysis and obtain the formula y = a + bx, where x is mercury concentration in ng/ml, and y sperm count in millions / ml.
Both the constant a and regression coefficient b should have precision to 4 decimal points.
Based on the regression formula obtained, the student should calculate the mean sperm count for mercury concentrations from 35 to 65 ng/ml, at 5 ng/ml intervals.
Sperm count, in millions / ml, should be calculated to the 1 decimal point precision.
Question 1.11. Scatter Plot and Regression Line
The student is required to produce a scatter plot, using mercury concentration and sperm count data in worksheet C.
The data is in two columns. The first column are mercury concentration in ng/ml, and the second column sperm count in millions/ml.
The minimum standard of the plot are as follows
 The x axis should represent mercury concentration (ng/ml)
 The y axis sperm count (millions / ml)
 Both x and y intervals are clearly marked and labelled.
 A regression line, extending from minimum to maximum x values should be drawn.
Question 1.12 : Nonparametric Correlation : Spearman's Correlation Coefficient
The data in Worksheet D is a two column table of sperm quality measurements.
 Column 1 is the sperm motility grading, judging at least 60% of the sperms in a sample, a=linear progression,
b=nonlinear progression, c=nonprogressive movement, d=no movement
 Column 2 is the proportion of sperms with abnormal morphology. 1:0%10%, 2:10%30%, 3:30%70%, 4:70%100%
Assuming that motility grading and morphology percentage are ordered but not normally distributed, the student is required to construct a table correlating the two
Likert scales. The student is also require to calculate the nonparametric Spearman's Correlation Coefficient, its statistical significance, and interpretat the results.
The standards required for the table are as follows
 The rows represents motility grading
 The columns represents proportion with normal morphology
 Each cell represents the number of cases that responded to that motility grading and proportion normal
 The rows and columns are clearly labelled
The standards required for the results of analysis are
 The Spearman's Correlation Coefficient should have a precision of 4 decimal places
 The Probability of Type I Error should have a precision of 2 decimal places, or "not significant" if p>0.05
 Student should conclude whether a significant correlation exists between the two parameters of sperm normality, and if so, whether the correlation is positive or negative.
Question 1.13 : Comparing two parametric measurements
The table in worksheet E is a two column table. Each row representing data from an IVF pregnancy. Column 1 represents the the culture medium used for the embryo (A or B), and column 2 the weight at birth (grams).
The data is used to test the hypothesis that culture in medium A resulted in babies that weigh more than that cultured in medium B
The student is required to analyse the data and produce the following results
 The sample size (n), mean, and Standard Deviation of birth weight for babies from each culture medium
 The difference in mean birth weight between the two methjods of culture
 The 95% confidence interval of the difference, the two tail, the left of the one tail, and the right of the one tail
 Whether there is a significant difference between the babies from the two groups (two tail conclusion)
 Whether babies from culture medium A are significantly heavier than those from culture medium B (one tail,right)
 Whether babies from culture medium B are significantly heavier than those from culture medium A (one tail,right)
Birth weight should be presented to the nearest gram
Question 1.14 : Sample size comparing two parametric measurements
Based on the difference between the two means obtained in question 1.13, the student is required to calculate the sample size (per group) for comparing that difference, if the within group Standard Deviations is 350g, 400g, or 450g. The sample size required for both the one and two tail models should be calculated
Question 1.15 : Data Plot comparing two groups
Based on the data in worksheet E
The student is required to produce a data plot showing the relationship between the two groups.
The standards of the plot are as follows
 The x axis represents the two groups
 The y axis represents birth weight
 Both x and y axis should be clearly marked and labelled.
 All data points should be seen. Where thee are 2 or more cases in a group with the same value,the data points should be shifted
slightly, so that no data point is completely obscured.
 The mean, 95% confidence interval of measurements, and 95% confidence interval of the means in the two groups should be marked.
Question 1.16 : Nonparametric comparison of two measurements
The data in worksheet F is a two column table.
 The rows represents data from each woman receiving artificial insemination (AI) for infertility treatment
 Column 1 represents whether AI was (1:Success) resilt in pregnancy or (2:failed) when no pregnancy resulted
 Column 2 represents age of women in 4 groups (1:<30, 2:3034, 3:3539, 4:40+).
The student is required to analyse the data and produce the following
 A table of frequencies of age group in the two groups. The standard of the table are as follows
 The top row contains labels for success or failure of AI
 The left most column contains labels for age groups
 The cells contains the number of women for that combination of results and age group, and the percentage of the
total of that age group (column total)
 The labels must be clear and immediately interpretable
 The percentage should have a precision to 1 decimal point
 The U value in the MannWhitney U Test, comparing the two parity groups, and the probability of the Type I Error
 An interpretation
 Whether there is a significant difference in age between the two groups, and if so,
which group have older women.
 An explanation for the conclusions
Question 1.17 : Risk Difference
The data in worksheet G is from a controlled trial comparing the use or not use of hormonal support following embryo transfer in IVF.
 Cases are randomly allocated to two groups, group 1 to receive hormonal support and group 2 controls not receiving hormonal support.
 The results are designated as + for success with live birth and  for failure with no pregnancy or early abortion.
 The research hypothesis to be tested is that hormopnal support results in a higher proportion of live births.
The data is a 2 column table
 The rows represents data from each woman in the trial
 Column 1 represents the treatment group, hormonal support (1:Treatment) or no hormonal support (2:Control)
 Column 2 represents outcome, + for success with live birth or  for failure with no live birth.
The student is required to analyse the data comparing the two proportions (risk difference), and produce the following results
 The risk (proportion) of success in the two groups
 The difference in risks (proportions) between the two groups
 The 95% confidence interval of the difference, the two tails interval, the left tail and right tail of the one tail model
 The numbers needed to treat to change the outcome of a single case
 Interpret the statistical results in terms of the research hypothesis
The standard of the answers are as follows
 Proportions (risks) are to be presented as percentage, with one decimal point precision
 Numbers needed to treat are to be presented rounded upwards to the next whole number. e.g. 2.1 rounded upwards to 3
Question 1.18 : Odds Ratio
The data from worksheet H are a number of retrospective studies to test the research hypothesis that those failed to have a live birth following IVF were more likely to be smokers.
Women with successful IVF resulting in live births (LB+) and failure resulting in no live birth (LB) were recruited, and asked whether they smoked (SM+ or SM) at the time they had their their IVF treatment.
The table in worksheet H contains results from a number of studies, where the number of women with IVF success or failure and whether they smoked are listed.
 column 1 (LB+,SM+) are the number of women that smoked and had live births after IVF
 Column 2 (LB+,SM) are the number of women that did not smoke and had live births after IVF
 Column 3 (LB,SM+) are the number of women that smoked and had no live births after IVF
 Column 4 (LB,SM) are the number of women that did not smoke and had no live births after IVF
The student is required to perform analysis using Odds Ratio for each of the studies, and produce a table showing the following
 The odd of smoking in the success and failure groups
 The Log(Odds Ratio) and its Standard Error
 The 95% confidence interval of the Odds Ratio
 Interpret the results in term of the research hypothesis, whether those who had no live birth after IVF were more likely to have been smokers
The standard of the table are as follows
 The rows and columns are clearly labelled.
 Each row represents results from a study
 Odds and ratios should have a precision of 4 decimal places
 Interpretation should be whether the results "supported" or "not supported" the research hypothesis
Q1.16
Q1.17
Q1.18
Q1.19
Q1.20
Q1.21
Q1.22
Q1.23
Q1.24
Q1.25
Q1.26
Q1.27
Q1.28
Q1.29
Q1.30
Contents of Q1.24 : 208
Contents of Q1.25 : 209
Contents of Q1.26 : 210
Contents of Q1.27 : 211
Contents of Q1.28 : 212
Contents of Q1.29 : 213
Contents of Q1.30 : 214
Q2.1
Q2.2
Q2.3
Q2.4
Q2.5
Q2.6
Q2.7
Q2.8
Q2.9
Q2.10
Q2.11
Q2.12
Q2.13
Q2.14
Q2.15
Question 2.1 : Plot Data Distribution
Based on the data used and results of analysis in Question 1.7, the student is required to summarize then produce a distribution plot of the t scores. The student may carry this out in any way he/she chooses providing the final plot is produced. However, the following sequence is suggested to assist the students that have initial difficulties
 Each t score is converted to the nearest whole number 2, 1, 0, 1, and 2
 The whole number scores are summarized into a table of counts for each value
 The number of cases in each whole number is converted to a percentage of total
 A bar plot is produced with the x axis being the whole number t value, and the y axis the percentage
Only the plot needs to be shown. However the student may also show the intermediary calculations so that errors (if any) can be traced.
The minimum acceptable standards of the plot
 The plot is a Bar Chart
 The x axis represents t values in groups of nearest whole number with each bar labelled
 The y axis is the percent of total for each bar, marked with major intervals of 10% and minor intervals of 2% or 5%
 The bars have the same width and are clearly separated from each other
Question 2.2 : Metaanalysis
Using the results obtained in question 1.18, perform a metaanalysis to answer the following questions
 Whether the studies are heterogeneous or not
 Whether publication bias should be suspected or not
 Combine the data to produce the summary effect size and its 95% confidence interval, using the Random Effect Model
 Provide an interpretation of the collective results as to whether the conclusions should be accepted or whether significant
flaws exists in the data used.
Question 2.3 : Forest Plot
Using the results obtained in question 1.19, produce a Forest Plot to represent the data and the combined Effect size.
The plot should have the following standards
 The x axis should be the Odds ratio and not log(odds ratio)
 The x axis should be adequately marked and scaled
 Each study should be represented by a mark for the odds ratio, and a line marking the 95% confidence interval
 The marks for the data and the summary effect should be different
 There should be a vertical line to mark the null value of 1
Question 2.4 : Prediction and Diagnosis
A study is carried out to assess the quality of prediction using previous pregnancy to predict successful (live birth) IVF treatment.
The table in worksheet I contains the results of such a study
 Column 1 are those with successful outcomes (live birth LB+), and column 2 failures (no live birth LB)
 Row 1 are those with one or more previous pregnancy (PP+), and row 2 those never pregnant (PP)
The student is required to analyse the quality of prediction, using the previous pregnancy to predict live birth outcome. The standard required are as follows
 The True and false Positive and Negative Rates, in percentages, to a precision of 1 decimal places
 The Likelihood Ratios for Test Positive and Negative, to a precision of 4 decimal places
Question 2.5 : Posttest Probability
Using the results ontained in question 1.21, predict the probability of a live birth outcome in the following situations
 For those with one or more previous pregnancy (PP+)
 In an IVF service where the overall pregnancy rate is 25%
 IIn an IVF service where the overall pregnancy rate is 45%
 For those who have never been pregnant (PP)
 In an IVF service where the overall pregnancy rate is 25%
 IIn an IVF service where the overall pregnancy rate is 45%
Percentage should be to a precision of 1 decimal place
Question 2.6 : Receiver Operator Characteristics (ROC)
A study is carried out using the duration of infertility (years) to predict success in IVF in terms of live births The results of the study are in the table in worksheet J
 Column 1 is the duration of infertility in years
 Column 2 is the outcome of treatment live birth (LB+) or no live birth (LB)
The student is required to analyse the data and produce the following'
 The area under the Receiver Operator Characteristics curve (θ) and its 95% confidence interval
 A 3 row table showing the common cut off values
 The rows are where
 The test has maximum accuracy, where the Youden Index is maximum
 The test can be used as a screening tool, where the ratio True Positive Rate/True Negative Rate is closest to 3
 The test can be used as a action decision tool, where the ratio True Negative Rate/True Positive Rate is closest to 3
 The columns are for values to be listed, including
 Maternal Height, in cm to 1 decimal place precision
 The False and True Positive Rates, in percent to 1 decimal place precision
 The Likelihood Ratios for test positive and test negative, to 4 decimal place precision
 Discuss how these cut off values are to be used clinically, if the student is in charge of the treatment
Question 2.7 : Plotting Receiver Operator Characteriscs (ROC)
Using result obtained from Question 1.23, the student is required to produce a plot for the Receiver Operator Characteristics. The standard required are
 The x axis is the False Negative Rate
 The y axis is the True Positive Rate
 Both axis must be clearly marked and labelled. The rates can either be in percent or as a number from 0 to 1
 The ROC curve is drawn
 A diagonal line joining where the two rates are 0 and are 1
 Data points (circular or square) where
 The test has maximum accuracy, where the Youden Index is maximum
 The test can be used as a screening tool, where the ratio True Positive Rate/True Negative Rate is closest to 3
 The test can be used as a action decision tool, where the ratio True Negative Rate/True Positive Rate is closest to 3
Question 2.8 : Complex Posttest Probabilities
Using the results of analysis from questions 2.4 and 2.6, the student is required to calculate the probability of success resulting in live birth after IVF under the following circumstances
 In a IVF unit with an overall success rate of 25%
 where duration of infertiity is less than that where Youden Index is maximum
 When the patient had one or more previous pregnancy
 When the patient had no previous pregnancy
 where duration of infertiity is greater than that where Youden Index is maximum
 When the patient had one or more previous pregnancy
 When the patient had no previous pregnancy
 In a IVF unit with an overall success rate of (not 25%, 45%)
 where duration of infertiity is less than that where Youden Index is maximum
 When the patient had one or more previous pregnancy
 When the patient had no previous pregnancy
 where duration of infertiity is greater than that where Youden Index is maximum
 When the patient had one or more previous pregnancy
 When the patient had no previous pregnancy
The results should be in a clearly labelled table where
 Column 1 is the overall success Rate
 Column 2 is the duration of infertility
 Column 3 is whether the patient had a previous pregnancy
 Column 4 is the probability of live birth as resuult
 Probabilities should be in percent, with 1 decimal point precision
Question 2.9 : Covariance Analysis
Worksheet K contains a table of 3 columns
 Column 1 Success with live birth (LB+) or failurewith no live birth (LB) following IVF
 Column 2 Patient's age in years
 Column 3 Duration of infertility (years)
The student is required to perform a covariance analysis on the data and produce the following results
 The mean and Standard Deviation of the duration of infertility for success (LB+) and failure (LB)
 The difference in age between successes (LB+) and failures (LB), and its 95% confidence interval
 The difference in age between successes (LB+) and failures (LB), and its 95% confidence interval, after adjusting for the
effects of age on the duration of infertility
 Comment and draw conclusions from the results of analysis
Age and duration of infertility should be presented as years with the precision to 1 decimal point precision
Question 2.10 : Complex x/y data plot
The student is required to plot the data from worksheet K, to demonstrate the complex relationship between success/failure, age, and duration of infertility.
The plot should conform to the following standards
 The data from success and failure should be separated by colors and positions so they can be clearly seen
 Three regression lines should be drawn, one for each outcome, and one for the two outcomes combined
 The x and y axis should be clearly marked and labelled in units of multiples of 2, 5, or 10
 The colors for outcomes and regression lines should be clearly labelled
Contents of Q2.12 : 311
Contents of Q2.13 : 312
Contents of Q2.14 : 313
Contents of Q2.15 : 314
Q2.16
Q2.17
Q2.18
Q2.19
Q2.20
Q2.21
Q2.22
Q2.23
Q2.24
Q2.25
Q2.26
Q2.27
Q2.28
Q2.29
Q2.30
Contents of Q2.16 : 400
Contents of Q2.17 : 401
Contents of Q2.18 : 402
Contents of Q2.19 : 403
Contents of Q2.20 : 404
Contents of Q2.21 : 405
Contents of Q2.22 : 406
Contents of Q2.23 : 407
Contents of Q2.24 : 408
Contents of Q2.25 : 409
Contents of Q2.26 : 410
Contents of Q2.27 : 411
Contents of Q2.28 : 412
Contents of Q2.29 : 413
Contents of Q2.30 : 414
