Article
Author(s):
A clinical algorithm-based approach that was developed to improve fractional polynomial model selection utilized additional criteria based on face validity, predictive accuracy, and expert opinion, which improved the plausibility of survival outcomes in patients with renal cell carcinoma.
A clinical algorithm-based approach that was developed to improve fractional polynomial (FP) model selection utilized additional criteria based on face validity, predictive accuracy, and expert opinion, which improved the plausibility of survival outcomes in patients with renal cell carcinoma (RCC) who were receiving first-line treatment compared with models based on statistical fit alone, according to data from a network meta-analysis (NMA).1
Results from the NMA, which were presented during the 2022 Genitourinary Cancers Symposium, also suggested that to use the algorithm in other indications beyond RCC in the first-line setting will require personalization based on prior knowledge of treatments, their mechanism of actions, and criteria of interest for that specific indication.
“Applying this decision algorithm more broadly may also improve the accuracy and relevance of indirect comparisons of oncology treatments by boosting the predictive accuracy of selected model estimates and better aligning with clinical expectations,” wrote lead study author, Bradley McGregor, MD, clinical director of the Lank Center for Genitourinary Oncology, senior physician at Dana-Farber Cancer Institute, and instructor in medicine at Harvard Medical School, and colleagues, in the poster.
Determining the cost-effectiveness (CE) of cancer treatments requires estimating long-term survival beyond what has been evaluated thus far, but no single trial has compared all therapeutic options that are relevant for evaluating this CE in first-line RCC treatment. The differences in mechanism of action of single-agent TKIs compared with immunotherapy/TKI combinations lead to differences in survival trends and produce non-proportional hazards over time.
“This variability over time necessitates indirect treatment comparisons accounting for time-varying hazards, such as NMA using FP,” the study authors wrote. “FP NMA entails estimating models with 1 or 2 parameters, which are called first-order and second-order models. Each of these parameters is assigned its own polynomial with a power selected from a range of prespecified possible polynomial powers. This results in a wide variety of power combinations, leading to a vast pool of candidate models. A single ideal model is then selected from this pool of available models.”
Clinical plausibility and validation against available information serve as important drivers of FP model selection, but statistical fit criteria used for the selection of these models, like deviance information criteria, do not account for clinical data. As such, they only illustrate the overall statistical fit of the model to the trial findings that have been reported; this can lead to models that demonstrate unlikely hazard trends over time and lack face validity.
For this analysis, investigators developed an algorithm with the intention to improve the FP NMA model selection process by factoring in not just statistical goodness of fit, but also predictive accuracy and clinical plausibility. The algorithm was applied for the analysis and to make indirect comparisons of overll survival (OS) and progression-free survival (PFS) of first-line RCC treatment options.
The treatment landscape for frontline RCC has evolved from the use of single-agent TKIs to novel immunotherapy combinations. Data from pivotal phase 3 trials such as the JAVELIN study (NCT02684006), the CLEAR trial (NCT02811861), the CheckMate-9ER trial (NCT03141177), and the KEYNOTE-426 trial (NCT02853331) have supported the approvals of avelumab (Bavencio) plus axitinib (Inlyta), pembrolizumab (Keytruda) plus lenvatinib (Lenvima), nivolumab (Opdivo) plus cabozantinib (Cabometyx), and pembrolizumab plus axitinib, respectively.
Data from these combinations were utilized in the algorithm to examine the current fractional polynomial NMA model selection process. Sunitinib (Sutent) was the common comparator arm across all the trials considered in the analysis, and all these data were used to form a connected network of evidence for OS and PFS outcomes.
Investigators reconstructed synthetic PFS and OS findings from JAVELIN, CLEAR, and KEYNOTE-426, and considered individual-level patient data from CheckMate-9ER from the September 2020 database lock. Forty-four models were considered for the examination of the data, and they represented several parameter combinations.
For the OS network, investigators calculated absolute modeled outcomes by leveraging the sunitinib arm from CheckMate-9ER as an anchor treatment to which relative treatment effects were applied; this was done because the trial was recently conducted and was noted to be reflective of current clinical practice regarding subsequent treatments. For the PFS network, investigators pooled all data collected in the sunitinib arms from all trials to balance significant differences observed among the survival trends of the individual sunitinib data.
Model selection was conducted either solely based on statistical fit criteria (DIC-based approach), or based on a selection algorithm utilizing a priori criteria (predictive accuracy against trial data), face validity, and clinical plausibility of survival beyond the observed trial period, as well as statistical fit criteria. Because of the lack of relevant long-term survival data from the studies, clinical plausibility of the fits and long-term survival extrapolations of clinical algorithm-based approach models were shared with 2 oncologists who specialized in the treatment of patients with RCC.
The DIC-based model selected the second-order model with p1 = -2; p2 = -2 for PFS, and the second-order model with p1 = 1; p2 = -2 for OS. Over the long term, both models resulted in clinically implausible survival extrapolations, according to the investigators. Notably, results showed that the PFS with pembrolizumab plus axitinib lacked face validity, as outcomes across immunotherapy/TKI combinations were expected to be comparable; however, the model displayed favorable PFS outcomes of pembrolizumab plus axitinib compared with avelumab plus axitinib over 20 months.
Regarding OS, the trial data were noted to be heterogeneous, and the Kaplan-Meir curves demonstrated that the curves for pembrolizumab plus lenvatinib and the sunitinib crossed twice, as other trials had one-time crossing or no crossing. Although a single functional form was fitted to all trials considered, the form was poorly reflective of the CLEAR findings and modelled OS for pembrolizumab/lenvatinib did not adequately reflect trial outcomes. Investigators posited that this was because of preferential fitting to the other trials or overfitting to hazard trends within the findings.
Furthermore, the DIC-based approach selected an OS model that estimated extreme hazards for pembrolizumab plus lenvatinib and avelumab plus axitinib during the initial months, resulting rapidly declining survival curves. Long-term OS with sunitinib was also found to lack face validity, showing a plateau.
Six viable models for PFS and 2 for OS were identified using the clinical algorithm-based approach, considering predictive accuracy and clinical plausibility for survival extrapolations. The selected model predicted the highest median PFS with pembrolizumab plus lenvatinib, at 23.5 months, as well as nivolumab plus cabozantinib, at 19.2 months. The lowest predicted median PFS was for sunitinib, at 10.2 months.
Additionally, the models for PFS were found to overestimate outcomes for most treatments examined, and this was thought to be partly because of the anchoring of the absolute outcomes to a pooled sunitinib arm; doing this led to overestimations for treatments with poorer-performing comparator sunitinib arms. The OS model was selected in line with clinical expert opinion that the use of immunotherapy following treatment sunitinib would lead to favorable outcomes vs trial data, and likely cause immunotherapy plus TKI vs sunitinib survival curves to cross over time. Notably, OS outcomes for the immunotherapy/TKI combinations were similar.
The investigators noted that although DIC-based fit is still an effective measure for evaluating the fit of FP models to observed trial data, it can still generate clinically implausible results. The utilization of additional model selection criteria improved the clinical plausibility of survival outcomes vs statistical fit alone.
The study authors noted that the decision to utilize a sunitinib arm that was pooled from individual sunitinib arms from all the available trials resulted in over- and underpredictions of the PFS models by both selection approaches due to the fact that the relative effectiveness of treatments that was estimated based on each trial’s sunitinib arm, and absolute survival outcomes were estimated based on the pooled data from the sunitinib arm spanning the trials.
“In our study, FP models generally performed imperfectly against some of the trial data,” study authors wrote. “This likely reflects heterogeneity between trial populations, differing mechanism of actions, and subsequent treatment patterns and highlights the need for further research on the integration of adjustment for within-trial and cross-trial heterogeneity in fitting FP models to survival data.”