News
Article
Author(s):
Investigators have developed a machine learning algorithm that could eventually facilitate earlier cancer detection via smaller blood draws.
Investigators at City of Hope in Duarte, California, in collaboration with the Translational Genomics Research Institute (TGen), a precision medicine research organization located in Phoenix, Arizona, that is part of City of Hope, have developed a machine learning algorithm that could eventually facilitate earlier cancer detection via smaller blood draws.1
Findings from a study published in Science Translational Medicine demonstrated that the algorithm, Alu Profile Learning Using Sequencing (A-Plus), displayed a sensitivity rate of 40.5% across 11 different cancer types in the validation cohort at a specificity rate of 98.5%. When the algorithm was combined with aneuploidy and 8 common protein biomarkers, it was able to produce a cancer detection rate of 51% with a specificity rate of 98.9%. A-Plus was applied to a total of 7615 samples from 5178 individuals consisting of solid tumor samples (n = 2073) and samples without cancer (n = 5542).2
“The first conclusion is that we are able to catch [cancer] at high specificity with a very low false positive rate,” Kamel Lahouel, PhD, assistant professor, TGen Integrated Cancer Genomics Division, and the study’s co-first author said in an interview with OncLive. “Other tests have been published with similar performance, even though most usually report higher false positive rates. Assuming that this is at least as good as competing methods, one important message is that this technology uses much less input DNA. It’s an amplicon-based approach, you start with a very small amount of DNA and you make several copies of it. The workflow is also simpler here compared with shallow whole genome sequencing.”
To conduct their study, investigators first used an approach called RealSeqS to amplify short interspersed nuclear elements (SINEs) comprised of approximately 300 base pairs known as alu elements. They then applied the A-plus supervised machine learning approach to identify differences in normalized read depth for RealSeqS loci between normal and cancer cell samples.2
“The technology is measuring cell-free DNA fragmentation patterns, basically, you need to sequence cell-free DNA,” Lahouel explained. “RealSeqS amplifies approximately 350,000 amplicons in the genome called repetitive alu elements. It takes chunks of these repetitive elements and makes multiple copies of them. The classical approach, which is shallow whole genome sequencing when it comes to cell-free DNA, amplifies the full genome without predefined spots. So you lose the fact that you do not sequence all the genome, but what you gain is that on these specific alu elements, you end up with coverage that is about 28-times higher than the coverage of shallow whole genome sequencing. A-plus takes [approximately 120,000 useful amplicons] and uses them as features to classify cancer vs normal [tissue].”
Cohort 1 of the study consisted of 315 samples from patients with cancer (n = 202) and 400 samples from individuals without cancer (n = 354) and was used as the feature selection and model training group. Cohort 2 was the biomarker integration and threshold determination group, consisting of 852 samples from patients with cancer (n = 704) and 1402 samples from individuals without cancer (n = 958). The validation cohort, cohort 3, was made up of 1167 samples from patients with cancer (n = 1167) and 1793 samples from individuals without cancer (n = 1793). Finally, the reproducibility cohort, cohort 4, examined 1539 individuals without cancer and 147 patients with cancer.
Samples in cohort 1 included breast (n = 35), colorectal (n = 69), esophageal (n = 22), liver (n = 28), lung (n = 52), ovarian (n = 22), pancreatic (n = 30), and stomach (n = 57) cancers. In cohort 2, samples consisted of breast (n = 251), colorectal (n = 311), esophageal (n = 30), liver (n = 19), lung (n = 81), ovarian (n = 25), pancreatic (n = 84), and stomach (n = 51) cancers. In cohort 3, samples included breast (n = 70), colorectal (n = 100), esophageal (n = 56), head and neck (n = 100), kidney (n = 84), lung (n = 137), ovarian (n = 119), pancreatic (n = 173), prostate (n = 92), stomach (n = 99), and uterine (n = 133) cancers.
Additional findings from cohort 2 revealed that the A-plus score corresponding to 99% specificity among control samples was 0.28, and the samples with the highest sensitivities were from the esophagus (86% CI, 68%-96%) and stomach (87% CI, 73%-94%). The samples with the lowest sensitivity were from the breast (34% CI, 28%-40%). Additionally, the A-plus model used 60 principal components to outperform models with 15, 30, 90, and 240 principal components.
In cohort 3, the multi-analyte test incorporating A-plus, aneuploidy, and proteins detected over two-thirds of cancers of the esophagus, pancreas, ovary, stomach, and colorectum, at an observed specificity rate of 98.9%. Most cancers in cohort 3 were able to be detected relatively early in that only a few had distant metastatic lesions observable at the time of sample acquisition. With the multi-analyte classifier, sensitivity averaged 57% in stage I, 60% in stage II, and 65% in stage III across cancer types.
Lahouel noted that the study was limited by the potential biases that are introduced by variables independent of cancer, such as ethnicity, gender, technical noise, and conditions of processing. He went on to say that the reproducibility and stability of the test will need to be revalidated beyond what was done in cohorts 3 and 4 to show that they are not affected by these and other sources of bias.
A clinical trial is planned for summer 2024 to compare the fragmentomics blood testing approach with standard of care in adult patients between the ages of 65 and 75. The prospective study will aim to determine the efficacy of the biomarker panel in detecting early-stage cancer.1