Links : Home Index (Subjects) Contact StatTools 
Related Links:
Systematic Review is a method of searching, selecting, cataloguing, and summarising research results to produce Evidence to guide
clinical decision making. Metaanalysis is the mathematical component of Systematic Review, providing algorithms to make
sense and decisions from the vast amount of information obtained.
Metaanalysis itself is a very large and developing subject, and this page covers only metaanalysis of Effects Size (ES) and its Standard Error (SE), mostly applying to data and comparison between two groups. Metaanalysis for prediction studies are separately presented in the Metaanalysis for Predictions Explanation Page The next panel will discuss metaanalysis as it is commonly used, analysing and combining sets of data of similar studies from multiple sources. The algorithms used are as presented in the Metaanalysis for Comparing Two Unpaired Groups Program Page , and the example used to demonstrate the procedures are the same as the default example data from that page.
Data Input
Heterogeneity
Publication Bias
Summary Effect Size
Funnel Plot with Trim and Fill Procedure
The data input consists of a w column table, each row from a separate study, column 1 being the effect size, and column 2
its Standard Error. The assumption is that the Effect Size is a population based normally distributed measurement.
The Numerical Transformation Program Page and the Create Effect Size for Metaanalysis Program Page provide calculations for converting the common statistical calculations to the Effect Sizes that can be used for metaanalysis.
The table to the right shows the different types of data that can be analysed and combined using metaanalysis. Example The example will use the default data from the Metaanalysis for Comparing Two Unpaired Groups Program Page . Please note that the data is computer generated to demonstrate the procedures and to assist explanation, and is not representing any real clinical information. A common obstetric problem is that of miscarriages, and there was a proposition that miscarriage was caused by hormonal dysfunction in early pregnancy, and providing a supplement of hormones in early pregnancy my reduce miscarriage rate. We wish to develop a conclusion to this proposition by conducting a metaanalysis on the available data
Eight Controlled trials were reviewed, and the results of the trials listed in the table to the right. From each trial, Group 1 received hormonal supplement (treatment group), and Group 2 did not (Control Group). The number of pregnancies that aborted (Outcome Positive) and went on to live birth (Outcome Negative) were observed, and the Risk Difference and its Standard Error, as calculated using the Create Effect Size for Metaanalysis Program Page obtained. The last two columns, the Risk Differences and their Standard Errors, are then used in the program in the Metaanalysis for Comparing Two Unpaired Groups Program Page for metaanalysis.
Heterogeneity describes whether the studies included in the metaanalysis can be concluded as representing results of similar
studies, and can be combined to produce a summary conclusion. When no significant heterogeneity exists, data may be combined.
When minor heterogeneity exists, some investigations to exclude causes and statistical adjustments to conclusions may be necessary.
Where major heterogeneity exists, combining the data should not proceed without detailed investigation, as the heterogeneity
itself may be an interesting lead to aspects of the subject that has hitherto not been considered.
The following programs for heterogeneity are offered in the Metaanalysis for Comparing Two Unpaired Groups Program Page The Q Test is the oldest and most commonly used test of homogeneity, and nearly universally used. It is a Chi Square Test of goodness of fit, whether the studies included in the metaanalysis can be considered to be from the same population. A significant Chi Square, (P<0.05), indicates that significant heterogeneity exists. In the example provided, The Q Test shows Chi Sq=10.21, p=0.18, so a conclusion that no significant heterogeneity exists can be made. The I^{2} Test is derived from the Q Test, and partitions the total variance in the data into that between studies and within studies, I^{2} being the percent of total variance attributable to between studies. Altman (see references) argued that the I^{2} allows a more nuanced interpretation of heterogeneity over a simple significant or not significant statement as offered by the Q Test. This allows a researcher to make decisions regarding whether further investigation or partition of the data are necessary, according to the aims of the metaanalysis. Altman suggested that an I^{2}<30% can be considered minor and probably acceptable. I^{2}>70% should be considered major and must be resolved in some way before metaanalysis can proceed. In between some judgement should be exercised as to how to proceed, and usually this means a statistical adjustment, using the Random Effect Model (see combining data) In the example provided, I^{2}=31.4%, marginal between minor and moderate. This indicates that some in depth analysis would be useful if the data available contain sufficient information for this purpose, and that, when combining the data, the Random Effect Model should be used ( see panel on combining data). The z Test calculates the mean and Standard Deviation of all the data in the metaanalysis, then recalibrates each effect size in terms of the number of Standard Errors (z) away from the mean. Cut off values of 1.96 (95% confidence interval) or 2 (rounded value for extreme) are used, so that an effect further than these from the overall mean can be considered heterogeneous. In the example provided, study 3, z is 2.1 Standard Deviations from the mean, indicating that this groups is possibly heterogeneous from all others. The Radial Plot is a regression analysis between an expression of variance (1/Standard Error) against the z statistics (ES / SE) of each study. The regression line is drawn, with ± 2 Standard Deviations, and the z statistics are plotted calculated in relationship to these regression lines. Studies that are more than 2z away from the mean regression line are then considered heterogeneous. In the example provided, study 2, 3, and 5 deviates significantly from the regression line, and can be considered heterogeneous to the remaining studies. Overview Most published metaanalysis use the Q Test as a check on heterogeneity, then move on to combine the data, so the Q Test can be considered the standard and the first test to use. Altman encourages the use of the I^{2} Test, which allows the researcher to make more nuanced decisions in the face of marginal levels of heterogeneity. The z Test and the Radial Plot allow detailed examination of every study in the dataset, so that they are useful for decisions on how to exclude or partition the studies when dealing with severely heterogeneous datasets. They are usually not presented in metaanalysis reports as general conclusions about heterogeneity.
Publication Bias is the concept that, either out of ignorance or being entrepreneur, many researchers conduct substandard
research, mostly in terms of insufficient sample size. The results are then offered for publication if a significant result
is obtained, but shelved or rejected by editors when significant results are absent. As a result of this tendency, the
scientific literature contains a bias of excessive significant results.
Although the concept is generally accepted, the problem is that the extent of the bias is not knowable, only estimated from assumptions and using what information that is available in the metaanalysis dataset. Although a number of tests are available, they are all based on the idea that, if a publication bias exists, the excessive significant results have a tendency to contain smaller sample size, and therefore have wider Standard Errors or variances. In the example provided, the constant a = 2.0045, SEa=0.9945, z=2.0156, p=0.0219. A significant level of publication bias is therefore indicated. The Radial Plot is used both to identify heterogeneity and publication bias. The constant in the regression formula represents the presence of bias. The Rank Correlation between Standardized Effect Size and its variance is based on the argument that, if publication bias exists, then the smaller studies with larger variances are more likely to have a larger effect size, so that a correlation would exist between these two parameters. The values are weight adjusted and ranked before correlation is calculated, and a significant correlation signifies the presence of publication bias. In the example provided, the z value for Rank Correlation is 1.7321 p = 0.0416. A significant level of publication bias is therefore indicated. The Rosenthal File drawer is based on a calculation of the failsafe N, the number of unpublished null results necessary to render the presented dataset nonsignificant, and compare this with the tolerance level. If the failsafe N is less than the tolerance level, then publication bias is considered likely. There are confusion in the literature whether the one or two tail calculation for the failsafe N should be used. Rosenthal's original paper (see references) used the one tail test, but the manual by Sutton et.al (see references) suggested the two tail test more appropriate. StatTools provides calculation for both, but interprets according to the one tail N, as suggested by Rosenthal. In the example provided, the tolerable number is 50 and failsafe N (one tail) is 5, suggesting that publication bias should be suspected Funnel Plot and the Trim and Fill Procedure is a complex and controversial topic, and will be discussed in its own panel in this page. Overview All of the procedures provide an estimate of possible publication bias, but none are entirely satisfactory. The Rank Correlation is the easiest to understand, but it lacks power, and the other two procedures are more powerful, but intuitively difficult to understand.
If, after testing the data for heterogeneity and publication bias, a conclusion that combining the data is valid, a number of procedures can be used to combine the data and produce a summary effect size and its Standard Error. Amongst these, the two most common
procedures are :
In the example provided, using the Fixed Effect Model, the Summary Effect=0.0115, and Standard Error=0.0126. Using the Random Effect Model, Summary Effect=0.0193, and Standard Error=0.017. These are nearly the same because there is little heterogeneity in the data Displaying Results. A Forest chart is the most common method of displaying the results of metaanalysis, as shown in the plot on the right. The central tendency (round dot) its 95% confidence interval from each study are displayed, and the combined summary effect (square dot) and its 95% confidence interval placed at the bottom
Please Note : Funnel Plot with Trim and Fill produces two plots, the Funnel Plot, and the Forest Plot with adjusted Summary
Effect. The last two buttons in the Metaanalysis for Comparing Two Unpaired Groups Program Page
therefore activate the same procedures and
produce the numerical results. They differ only that one [reduces the Funnel Plot and the other the Forest Plot. The reason is
to make graphical editing less confusing.
The Funnel Plot is a visual method to evaluate the existence of publication bias. The plot has the effect size as its x axis, and the inverse of Standard Error (1/SE) as the y axis. If there is no bias, one would expect that the larger studies (ones with smaller Standard Error therefore near the top of the plot) to cluster near the mean effect size value, while the smaller studies (ones with larger Standard Errors therefore cluster near the bottom of the plot), to cluster towards the extreme ends of the effect size values. The plot should therefore resemble that of an inverted funnel. If publication bias exists, then the smaller studies without significant findings would be missing, and the plot would become asymmetrical. Such a plot would be intuitively easy to understand and interpret, but the down side is that is depends on subjective interpretations The funnel plot using the example provided is shown above and to the left, and it is obvious that the right side of the funnel is missing, and therefore considerable publication bias exists. Trim and Fill is a procedure to replace what might have been the studies that had been left out to cause the publication bias. The idea is to take the study with the most extreme effect size on the Funnel, and create a data point equidistance from the mean on the other side of the Funnel. This is performed one data point a time, and the mean value recalculated, until the funnel is no longer lop sided. The results are as shown in the plot to the right. The two most extreme data points on the left of the plot are replicated on the right side (triangular dots). Adjusted Summary Effect After Trim and Fill, the new Summary Effect, using the Random Effect Model and including the replicated "fill" data points, can be calculated. The result Effect Size=0.0134, and Standard Error=0.0183. The final Forest Plot is shown to the left. The round dots represent the original data, triangles added from Trim and Fill, and the square the adjusted Effect Size. Overview Funnel Plot plus Trim and Fill is a powerful tool to enable completion of metaanalysis in the face of probable publication bias. However the method is only valid if the underlying assumptions are valid, that the imbalance in extreme effect size studies with wide Standard Error are in fact due to an underlying publication bias, that by replicating these values to the other side of the null position can, at least approximately, correct this bias. These assumptions cannot be easily made.
If significant heterogeneity or publication bias is suspected during a metaanalysis, or when the results differ from what is
expected, researchers often wish to investigate the structures within the dataset, either to identify invalid studies for exclusion,
to divide the data into subsets for analysis, or to find information hitherto unidentified. Metaregression is one of the tools used in such analysis.
Metaregression use the regression concept, but the dependent variable is not a measurement of research subjects within any study, but the weighted research effect size of each study. The independent variable is some population or environmental parameter that may influence the effect size. In the example provided in Sutton's text book, the efficacy of immunisation was found to depend on the distance between the research site and the equator, studies further away from the tropics had a greater effect size. Heterogeneity between studies could be explained by this geographic variable. Other parameters can also be examined. For example, epidemics become less virulent with time, and mortality rate may decrease in successive studies. The outcomes from complex surgical and medical procedure may improve with the learning curve, so effectiveness may improve with time. Preventive therapies may be influenced by age, so the effect size of trials may vary with the age of the study population. Data Entry The data is a table with 3 columns.
Analysis : The algorithm performs a regression analysis, using the effect size weighted by its Standard Error as the dependent variable (in this example the Risk Difference in each study), against the environmental parameter (in this example the order of publication). The variations are partitioned into that attributable to regression and the residual, and both are tested for statistical significance using the Chi Square Test. In the example used, for regression, chi sq=7.4147, df=2, p=0.0245. This shows that a significant regression exists, confirming our suspicion that the effect size changed in order of publication. For the residual, chi sq=3.6235, df=6, p=0.7275, indicating that, other than that attributable to regression, there is no heterogeneity remained between the studies. The regression analysis can be further plotted, as shown in the plot to the right. Each circle represents one of the studies, its diameter indicating the weight of its data on the analysis (the larger the circle, the greater the weight, the smaller the Standard Error). It can be seen that the later studies had much greater weight than the earlier ones. Confirmation of our hypothesis
The same regression analysis can be repeated, using the total sample size of each study as the environmental variable (x), as shown in the table to the left. The plot is shown to the left and, for regression, chi sq=5.2493, df=2, p=0.0725. This shows that a significant regression does not exist at the p<0.05 level. For the residual, chi sq=5.7889, df=6, p=0.4472, indicating that, other than that attributable to regression, there is no heterogeneity remained between the studies. For introduction to Metaanalysis of Correlation Coefficients, and description of the algorithms, http://www.statisticshell.com/docs/meta.pdf For a comparison between the Hunter Schmidt and Hedges Olkin algorithms, http://www.statsdirect.com/help/default.htm#meta_analysis/correlation.htm. For references to original descriptions of the algorithm
Publication Bias
For calculations of Combined Summary Effect Size using Fixed and Random Effect Models :
For metaregression
