Program
Explanation
References
This program is the algorithm described by Chen (see reference) which was initially designed for epidemiological
surveys to detect an increase in congenital malformations. However, this program can also be used to detect an increase in the
prevalence or proportion from that expected, and is able to handle very low proportions.
The program is not so much one to calculate sample size requirements than a design of an
epidemiological survey for continuous monitoring for an increase in the proportion of index cases. It is only included in the
sample size section because the program does not fit anywhere else.
Description of the program
 Four (4) parameters are used to calculate the sample sizes required.
 The baseline proportion π or π_{0} is the proportion from which the increase departs from.
It is usually the normal or expected proportion before any increase is triggered. In Chen's paper, the
proportion is expressed as 1 in so many cases, but in these programs it is represented by a decimal number, so that
1 in 10,000 is expressed as 0.0001.
 The increased proportion to detect π_{1}. This is conceptually in the background but not shown or used directly,
because the algorithm is designed to detect a large increase from a low baseline proportion, so that the
increase is defined in terms of the multiples of the baseline gamma γ so that
γ = π_{1} / π_{0}.
An increase from 1 in 10,000 (π_{0}=0.0001) to 5 in 10,000 (π_{1}=0.0005) is a 5 fold
increase so that γ = 5.
 Probability of signally an alarm P represents the sensitivity of the detection. Chen in her paper recommend that this
should be a high probability close to 1, and suggested 0.99 be used for all calculations.
 Averaged number of cases between false alarms at base proportion α_{0} represents error the false positive rate.
In Chen's paper, this is represented as 1 false alarm over a period of time, when there are so many cases per unit of time.
In these programs however this is translated to the simpler and more fundamental parameter of 1 false alarm per so many
cases observed.
For example, Chen used the example of 1 false alarm per 20 years in a unit that delivers 400 babies a month. This is the same
as 1 false alarm per 96000 (20 x 12 x 400) babies, as used in these calculations
increase so that γ = 5.
 Probability of signally an alarm P represents the sensitivity of the detection. Chen in her paper recommend that this
should be a high probability close to 1, and suggested 0.99 be used for all calculations.
 Output of calculations :
 Criteria for signalling an alarm consists of two values. n is the number of consecutive sets (a set is all the cases
between two positive cases), and η_{1} is the number of cases in each of these sets. An alarm is signalled
when there are n consecutive sets each of which have less than η_{1} cases. In the example given,
where the baseline abnormality rate is 3 in 1000 (π_{0}=0.0003), a detection of a 7 fold increase is required
(π_{1}=0.0021 or γ=7),
probability of signally P=0.99, and false positive rate of less than 1 in 96000 babies, an alarm is raised if there are 4 (n)
consecutive sets between abnormal babies where each set has less than 475 (η_{1}) normal babies
 Averaged Number of cases needed for a true alarm (α_{1}) represents the number of cases needed to
detect the increase after it has been triggered, and is another measurement of the sensitivity of the model.
In the example an average of 1815 cases is required after abnormality increases from 0.0003 to 0.0021.
Technical consideration :
The algorithm in this calculation is designed specifically to detect relative large
increased from a very low base proportion, where both normal and binomial distributions are prone to error and where
the more appropriate geometric distribution is used. Chen provided examples from 1 in 1,000 (π_{0}=0.001)
to 1 in 10,000 (π_{0}=0.0001). At this level, some of the more complicated calculations can be bypassed
and approximations used, which made computations much easier. These approximations will be increasingly inappropriate
when the proportions are closer 0.5, so algorithms based on binomial distribution should then be used.
Differences between these calculations and Chen's paper :
The following differences should be noted by the user. Users
are particularly encouraged to read the original paper by Chen and draw their own conclusions whether the differences in these
calculations are acceptable.
 Chen's paper is clinically orientated, specifically towards the detection of malformations in babies. Calculations
are therefore based on time intervals (false alarm once in 20 years). These are then translated by multiplying the birth rate
so that the statistical calculations ultimately are based on the number of cases (number of babies normal and abnormal).
The calculations in StatTools has left time out altogether, and are based entirely on number of cases. This simplifies
presentation, and also provides flexibility to use the program in other situations than fetal malformation.
 Chen's paper was published in 1978 before computers became ubiquitous. The calculations focussed on making it easy to
do by hand, and whenever possible approximations were used. Mathematical rounding was also different to the precisions
modern computers are capable of. As a result, the output from StatTools programs are often slightly different to those
from Chen's paper. Using π_{0}=0.0003, η_{0}=(1π_{0})/π_{0} = 3332 (formula 2.1),
but Chen used an approximation of 3330 instead. An intermediary parameter where k= log(1P)/γ = 0.6579 to 4 decimal
places were given in Chen's paper as 0.66, and the final α_{0} was presented as 1807 in the paper instead of
1815 using greater precisions by StatTools. The opinion from StatTools is that these differences are
caused by approximations and different rounding conventions, and are trivial and inconsequential to statistical decisions.
However users should decide for themselves whether these differences are acceptable.
 Chen's paper has two parts. The first is to detect an increase using continuous monitoring from a single source of data.
The second was to gather data from multiple sources, and analyse them at regular intervals to see if an increase has occurred.
StatTools only translated the first part, as it is simple to carry out. The second part requires the inclusion of
more parameters such as false positive rate in terms of time, and time intervals for periodic inspection of the data.
Users may wish to access and translate this second part themselves from the original publication.
Chen R (1978) A Surveillance System for Congenital Malformations.
Journal of the American Statistical Association 72: p.323327
