Multiple treatment comparison meta-analyses: a step forward into complexity

Correspondence: Edward Mills, Faculty of Health Sciences, University of Ottawa, 43 Templeton St, Ottawa, Ontario K1H 8M5, Canada, Tel +1 778 317 8530, Fax +1 613 562 5149, Email ac.awattou@sllim.drawde

Received 2011 May 26 Copyright © 2011 Mills et al, publisher and licensee Dove Medical Press Ltd.

This is an Open Access article which permits unrestricted noncommercial use, provided the original work is properly cited.

Abstract

The use of meta-analysis has become increasingly useful for clinical and policy decision making. A recent development in meta-analysis, multiple treatment comparison (MTC) meta-analysis, provides inferences on the comparative effectiveness of interventions that may have never been directly evaluated in clinical trials. This new approach may be confusing for clinicians and methodologists and raises specific challenges relevant to certain areas of medicine. This article addresses the methodological concepts of MTC meta-analysis, including issues of heterogeneity, choice of model, and adequacy of sample sizes. We address domain-specific challenges relevant to disciplines of medicine, including baseline risks of patient populations. We conclude that MTC meta-analysis is a useful tool in the context of comparative effectiveness and requires further study, as its utility and transparency will likely predict its uptake by the research and clinical community.

Keywords: network, multiple treatment comparison, mixed treatment comparison, meta-analysis

Introduction

New methods of evaluating the relative effectiveness of competing interventions may provide unique opportunities for comparative effectiveness research. As the utility of meta-analysis grows in popularity, so too does it grow in the complexity of methods and questions that it aims to answer. 1 An increasingly common challenge to decision makers is to infer which of several competing interventions is likely to be most effective. This is particularly challenging when the interventions have not been directly evaluated in well-conducted randomized clinical trials (RCTs). This is referred to as an indirect comparison. 2

Although meta-analysis has been used in clinical medicine since the 1980s 3 , 4 and became commonly used in the 1990s, possibly due to the establishment of the Cochrane Collaboration, 5 the methods to refine, reduce bias, and improve meta-analysis have developed slowly. 1 Standard meta-analyses have typically investigated the effect of an intervention against a control, typically a placebo or another active intervention. However, such an analysis provides no inference into the relative effect of one intervention over another intervention that has not been compared directly in an RCT. The adjusted indirect comparison, first reported by Bucher et al, 6 developed initial methods to make indirect comparisons and has since been extended to the multiple treatment comparison (MTC) meta-analysis, to provide more sophisticated methods for quantitatively addressing indirect comparisons of several competing interventions.

The MTC approach, based on developing methods by several investigators, 7 – 9 is a generalization of standard pair-wise meta-analysis for drug A versus drug B trials, to data structures that include, for example, A versus B, B versus C, but no A versus C evaluation ( Figure 1 ). The MTC requires that there is a network of pair-wise comparisons that connects each intervention to every other treatment. This approach can only be applied to connected networks of RCTs and has two important roles: i) strengthening inference concerning the relative efficacy of two treatments by including both direct and indirect comparisons of these treatments, and ii) facilitating simultaneous inference regarding all treatments, in order to simultaneously compare, and potentially rank, these treatments. 7 The MTC approach yields several advantages over other indirect comparison approaches, such as those proposed by Bucher et al 6 and Song et al, 10 as it can deal with large numbers of indirect comparisons during a single analysis and can improve statistical power by combining both direct and indirect evidence. 11 , 12

An external file that holds a picture, illustration, etc. Object name is clep-3-193f1.jpg

Direct and indirect comparisons. Circled letters represent trial arms of drug A (A), drug B (B), drug C (C), and placebo (P). Flat lines represent direct trials, dotted lines represent indirect comparisons. Example 1: Direct comparison of drug A and drug B. Example 2: Adjusted indirect comparison where drug A and drug B have not been evaluated directly. Example 3: A multiple treatment comparison where drug A and drug C, drug B and placebo, and drug C and placebo have not been evaluated directly.

However, despite the sophistication and desirability of a network of compared trials, 13 the MTC approach is hampered by several important concerns. First, it is a relatively new approach that is most commonly conducted in a Bayesian framework and will necessitate familiarity with Bayesian software (eg, WinBUGS [WinBUGS Project, Cambridge, UK] and R2BUGS [R2BUGS project, Columbia University, NY]). Second, the basic assumptions underlying the MTC approach are more complex than the assumptions concerning the standard pair-wise meta-analysis approach, and these are typically not well defined. Finally, interpreting MTC outputs may be misleading, as assessments of heterogeneity and statistical power are not commonly employed, resulting in a “black box” effect of the analysis. Assuming that these concerns can be overcome, MTCs are a powerful tool for decision making in medicine.

The aim of this article is to describe some of the current challenges of MTC for readers who are familiar with meta-analysis. We have chosen to illustrate the novelty and challenges of this approach in oncology medicine, although its use is not limited to any specific field of medicine. We chose oncology because it is a very well-funded area of medicine that frequently reports clinical trials and regularly has major clinical advances. We then describe in more detail some of the assumptions underlying MTC methods and interpretations. We finally discuss specific challenges that readers and the methodological community may consider if MTC is to be widely understood.

Multiple treatment comparison meta-analyses in oncology

Despite the high profile and large number of clinical trials in oncology, there have been relatively few MTC meta-analyses conducted within this field. This is likely to be for two reasons: first, MTC is a new and sophisticated approach to meta-analysis that has yet to gain much popularity in the general academic community, most likely due to its statistical complexity; and, second, conducting MTC in cancer identifies unique, disease-specific challenges, due to both a rapidly changing therapeutic armory and progressive understanding of the disease and underlying risks. Using a systematic search of the medical literature with the search terms “(network OR multiple treatment comparison OR mixed treatment comparison)” and “meta-analysis”, up to January 2010, we identified six published MTC meta-analyses conducted in the field of oncology ( Table 1 ). As the table displays, these analyses range from simple to very complex. In this article we discuss the challenges and some solutions to interpreting and conducting clinically relevant MTC analysis.

Table 1

Characteristics of published multiple treatment comparison (MTC) analysis in oncology

Author, yearConditionPeriodNumber of trials*Number of interventionsNumber of trials used in MTCMedian RCT sample size (IQR)Minimum/maximum number of RCTs in arms (minimum/maximum patients in arms)Interventions collapsed into class effectsOutcomesComments
Kyrgiou et al, 18 2006Ovarian cancer1971–200619812060103 (53–234)2/11 (NA)YesOverall survivalCombination of both first- and second-line treatments
Golfinopoulos et al, 57 2007Colorectal cancer1967–200724213740152 (81–283)1/9 (28/4566)YesOverall survival, disease progressionCombination of first-, second-, and third-line treatments. Patient status improved by 6.2% per decade
Golfinopoulos et al, 17 2009Cancers of unknown site1980–200910101073 (49–87)1/5 (17/170)YesOverall survivalEight trials of untreated patients, two trials unknown
Mauri et al, 19 2008Advanced breast cancer1971–200737022172141 (87–262)1/153 (NA)YesOverall survivalAssessed interventions to classification of “older combinations”
Hawkins et al, 58 2009Nonsmall cell lung cancer2000–2007646731 (651–1257)1/4 (104/1692)NoOverall survivalOnly second-line treatments
Griffin et al, 59 2006Ovarian cancer2004 alone333NA2/2 (NA)NoOverall survival, progression-free survival

Abbreviations: IQR, interquartile range; NA, not applicable; RCT, randomized controlled trial.

Issues of methods

Assumptions of an MTC analysis

When conducting a standard pair-wise meta-analysis of RCTs comparing two interventions, we assume that included trials are broadly similar in terms of interventions tested and the expected direction of intervention effects across included patient populations. This similarity assumption is also required when conducting an MTC analysis aiming to compare more than two interventions.

In addition to trial similarity, effect size similarity is also of concern in a standard pair-wise meta-analysis of RCTs. The most common methods for pooling studies in a meta-analysis are the fixed- and random-effects models. An assumption of fixed effects is that the effect size is the same across studies and the observed variability results from chance alone. This is commonly referred to as the statistical homogeneity assumption. 14 Although several common interpretations exist, 15 usually, an assumption of random effects is that there may be genuine diversity in the results of various trials owing to differences between these trials in study and patient characteristics, so a between-study variance component is incorporated into the calculations to capture this diversity (commonly referred to as statistical heterogeneity). When there is no observed between-study heterogeneity, the fixed- and random-effects approaches coincide. Otherwise, the random-effects approach provides wider confidence intervals (CIs) for the relative intervention effect and is thus considered more clinically conservative. 16 Although a random-effects approach explicitly models between-study heterogeneity, it does not explain it. Attempts to explain the between-study heterogeneity would have to rely on meta-regression, a technique that allows one to study whether or not relevant trial-level covariates act as modifiers of the relative intervention effect.

In an MTC analysis, the assumptions made about statistical heterogeneity are of prime importance, as assessments of heterogeneity are not yet established and conventional measurements of heterogeneity do not exist (ie, τ 2 or I 2 ). Note that in this setting, one would need to consider the issue of statistical heterogeneity in relation to each possible pair-wise comparison of interventions.

Clinical heterogeneity may induce statistical heterogeneity. In a recent MTC analysis involving 60 RCTs of cancers of unknown sites published between 1971 and 2006, 17 the populations range from poor-risk patients who had received previous therapy to favorable- and intermediate-risk patients as time progressed (a 6% performance status improvement per decade). Therefore, clinicians will need to determine for themselves whether the underlying risk of events is sufficiently similar across time. This appears to be an issue across differing diseases, as an MTC examining breast cancer, 19 including trials from 1971 to 2007, demonstrated changing disease risks over time. This possibly reflects the cointerventions that improve outcomes for patients and that have been used for breast cancer since 1971. 20

Methodological heterogeneity may also induce statistical heterogeneity. Therefore, in addition to intervention and clinical similarities, the MTC analysis requires an assumption of similarity on methodological grounds. In particular, are trials measuring a similar estimate of effect? Is the length of follow-up sufficiently similar? Are adjuvant therapies considered? Were any trials stopped early? 21 – 23 Are doses of the intervention sufficiently similar? In many cases, differences across trials do not result in meaningful discrepancies in pooled results, and an MTC should not be any more conservative in terms of inclusion criteria than any other meta-analysis. 24 However, without consideration of these issues, it may be impossible to determine whether and where biases are affecting results. Song et al 25 have demonstrated that pooled indirect comparisons may, in some circumstances, provide less biased estimates of treatment effects than pooled direct (head-to-head) comparisons.

To summarize, at least three issues of combinability need to be considered: a homogeneity assumption for each meta-analysis, a similarity assumption for individual comparisons, and a consistency assumption for the combination of evidence from different sources.

Assessing the heterogeneity of included trials in an MTC analysis

As with standard pair-wise meta-analysis, the assumption that trials include similar populations, methodological approaches, and interventions should be assessed using both visual assessment and, where possible, an assessment of statistical homogeneity. As no formal statistical tools exist for evaluating statistical heterogeneity in an MTC analysis, we suggest several possible steps here.

The first step involves assessing the statistical heterogeneity for each direct pair-wise comparison before conducting the MTC analysis. Specifically, for pair-wise comparisons where sufficient direct evidence is available, one can compute measures of between-study (statistical) heterogeneity in the context of standard pair-wise meta-analyses (eg, I 2 ). Because MTC typically assumes that statistical heterogeneity is constant between different pair-wise treatment comparisons, one could contrast these measures of heterogeneity across the relevant treatment comparisons to get a sense of whether or not this assumption is tenable. This approach is of limited use when the measures of heterogeneity are computed from a small number of studies, as these measures would likely be unreliable.

A second strategy for gauging whether or not to take into account between-study (statistical) heterogeneity when performing an MTC analysis is to fit both a fixed-effect and a random-effects MTC model to the data and then compare the resulting model fits using a measure of model fit adjusted for model complexity (eg, deviance information criterion). 29 Although the fixed-effect MTC model assumes that there is no between-study heterogeneity, the random-effects MTC model would allow for between-study heterogeneity but would assume that this heterogeneity is constant across the different pair-wise treatment comparisons. If no substantial difference can be detected between the two model fits (ie, if the difference in the deviance information criterion for the two models would not exceed three points), heterogeneity may be low.

In some situations, it may be possible to relax the assumption of constant between-study heterogeneity across distinct pair-wise intervention comparisons and consider a random-effects MTC model that allows this heterogeneity to be different across these comparisons. 29 The latter type of model could be compared against the random-effects MTC model introduced previously via the deviance information criterion to determine which assumption is more sensible for the data: constant or nonconstant between-study heterogeneity across pair-wise intervention comparisons. This would constitute the third strategy for evaluating heterogeneity in an MTC analysis.

The influence of methodological approach

With indirect and MTC comparisons in their infancy, it is not surprising that there has been little comparison between the influence of the different approaches on the estimation of relative intervention effects. The adjusted indirect comparison enables the construction of an indirect estimate of the relative effect of two interventions A and B by using information from RCTs comparing each of these interventions against a common comparator C (eg, A vs C and B vs C). 6 The MTC, in contrast, enables the incorporation of direct comparisons of A versus B with indirect comparisons (A vs C and B vs C) to strengthen the inference of results. 7 The MTC is a statistically more flexible approach and allows incorporation of various analyses at the same time. It is, however, often more complicated to implement and validate than adjusted indirect comparisons.

The influence of using each approach will be different depending on the data available, particularly in situations where both direct and indirect evidence is available. O’Regan et al 11 have recently provided a comprehensive review of the influence of the different approaches in seven scenarios, each pertaining to a different number of trials with direct and indirect comparisons. Their findings demonstrate that depending on the evidence, the indirect and MTC comparisons can provide different results. In the scenario where all direct pair-wise comparisons involve a common comparator (corresponding to star-shaped networks of interventions), they found the results of these two approaches to be similar. However, where the network of trial evidence becomes more complicated, the MTC is more appropriate, as it enables the incorporation of more evidence, often reducing the variance in results. As a result, we have a starting point for selecting the most appropriate approach.

Information size

Precision and adequacy of sample size

An increasingly recognized weakness of pair-wise meta-analysis is the inadequate power or precision to confirm or refute some important intervention effects when only a few studies with a small number of events are available. 30 A growing body of work has provided evidence that about 15%–30% of such meta-analyses are prone to yield spurious inferences in the form of false-positive results or important overestimates of treatment effects. 31 – 34 This is particularly problematic for MTC analyses, where several different interventions are being assessed and where authors may choose to rank the effectiveness of interventions according to probability values. For example, if three treatments, A, B, and C, are being compared in an MTC and all treatments have been compared head to head in a few RCTs, there is a considerable risk that one of the three pooled head-to-head treatment comparisons will yield an over- or underestimate of the comparative treatment effect due to the play of chance (imprecision). The scenario is less problematic if the indirect evidence adds sufficient precision to “correct” the spurious estimate. Unfortunately, indirect evidence is typically very imprecise. Glenny et al 2 have recommended a rule of thumb that four RCTs contributing to an indirect comparison are required to approximately match the precision that a single direct (head-to-head) trial would contribute. When the number of trials included for the different treatment comparisons is unbalanced (eg, three trials compare A with B, but nine trials compare A with C), four trials are likely to be an underestimate. 27 Hence, a spurious result due to imprecision within one treatment comparison is likely to contaminate the overall inferences drawn from the MTC. 34 Given this circumstance, the precision of estimates may be affected by a few imprecise comparative treatment effect estimates.

Trial-level challenges

The issue of crossing over

In any clinical study, we should aim to evaluate patient-important outcomes. In many cases of disease, patients and their families are most concerned about quality of life and mortality. Oncology is one of the most funded fields of medicine, and there are many large trials published. However, using overall survival as an endpoint in cancer clinical trials may be an elusive goal. Although arguably objective and easy to measure, it is limited by requiring extended patient follow-up and being confounded by disease progression that may be unrelated to site-specific cancer. 35 – 37 Further, as new therapies may provide effectiveness along a continuum of the disease, patient survival may be influenced as to whether they received adjunct therapies after randomization. It is common in cancer clinical trials that a failing patient crosses over (also called crossing in) to either the intervention under investigation or another salvage therapy, thus obfuscating the effect of the study drug on overall survival. 36 , 38 As a result, most RCTs are underpowered to assess overall survival. 36

The largest challenge to employing overall survival as a study primary endpoint is that clinicians typically want to exhaust treatments in order to sustain a patient’s life, regardless of participation in a clinical trial. A patient who does not respond to the trial intervention may seek or be provided with the alternative study drug or an alternative existing or experimental treatment. However, when a patient crosses over to receive the intervention treatment, the extent of carryover effect or the contribution of a new treatment to hastening mortality cannot be known. In addition, the patient’s mortality status is frequently removed from the group they were assigned to. The following example exemplifies this concern. In a trial of tamoxifen or letrozole for first-line treatment in postmenopausal patients with endocrine-responsive advanced breast cancer, letrozole was significantly more effective than tamoxifen for response rates and time to tumor progression. 39 However, no important differences in overall survival existed between the study arms. But when a sensitivity analysis censored patients who crossed over to the other study drug, the results indicated that the use of letrozole was associated with a survival benefit. 40

Although crossing over in cancer clinical trials is common, methods with which to deal with overall survival in the analysis of individual RCTs and meta-analysis are not well established. 35 , 41 A clinical trial may consider a crossover as a failure of treatment (included in a progression of disease analysis) or may exclude a patient from analysis, as a result losing the benefits of the intent-to-treat principle. 42 Despite the fact that these are RCTs, such crossing over may not occur randomly across arms, as one treatment may be genuinely more beneficial than the other. In meta-analysis, the inclusion of crossed-over patients represents an important challenge. Principally, the benefits of randomization are lost on the patients who crossed over. If this is a small number of patients in a moderate to large trial, the effects may be small. However, if this is a large number of patients, the overall survival analysis could be seriously biased in a meta-analysis. Further, the effects of the patient’s previous treatment (period A) may have a differing effect on response to the second treatment (period B), akin to the carryover concept in a crossover trial. 43 It is possible that the survival endpoint remains valid if the trial-ists aim to determine what regimen or strategy (ie, what drug to start with) of first-line therapies provides overall survival benefits. However, this becomes increasingly confusing as patients switch from drug A to drug B or drug C/D. A common approach is to censor these patients from the survival analysis. However, this will bias a survival assessment, as the true time point for progression of that patient is replaced by the time points of patients who were free of disease when that patient was censored, favoring the less effective intervention. 41 , 44 A meta-analyst aiming to calculate a relative risk based on event rates would typically not have this information from a publication, and pooling hazard ratios may improperly ignore this issue. It seems appropriate that meta-analysis reports whether crossed-over patients could create bias, just as large loss to follow-up may. 45 Some study designs and opinions do not permit a crossover of patients to the intervention arm if a patient fails first-line therapy. 41 , 46 However, this creates dilemmas for clinicians and trialists, especially regarding patients with a poor survival prognosis or those for whom alternative experimental drugs may exist.

Issues pertaining to the use of patients’ characteristics

The requirement of evidence to inform decisions regarding health technologies has been a leading motivation in the development of indirect and MTC approaches. It is therefore not surprising that pragmatic decision makers have sought to identify characteristics or subgroups of patients in whom technologies may have greater benefit of improved safety.

As described previously, conventional meta-analysis requires included trials to be sufficiently homogeneous, but MTC approaches have an additional requirement that trials are similar for moderators of relative treatment effect. 25 Song et al 10 state that the average relative effect estimated in placebo-controlled trials of one therapy should be generalizable to patients in placebo-controlled trials of an alternative therapy, and vice versa. The role of meta-regression and subgroup analysis in circumstances where heterogeneity between sources of evidence is present can play a greater role than in conventional meta-analysis by enabling comparisons between “similar” groups. 10 However, the pitfalls of subgroup analysis, meta-regression, and meta-analysis acknowledged in the conventional meta-analysis literature 47 , 48 will carry through to indirect and MTC approaches, and, as a consequence, authors are recommended to use predefined characteristics and interpret the results cautiously.

We did not find examples of meta-regression in MTC for oncology. However, an example of meta-regression from one of the authors examined the effectiveness of biologic agents in rheumatoid arthritis. 49 In this example, baseline disease duration, a characteristic known to be related to the effectiveness of biologics versus standard treatments, was included in the MTC in the form of a meta-regression. Although the inclusion of the meta-regression did not change the significance of the results, it did modify the odds to suggest that the three tumor necrosis factor antagonists etanercept, infliximab, and adalimumab were very similar, a result that has long been suspected in the wider literature ( Figure 2 ).