Big Data–Based Tools Support Cancer Clinical Decision-Making

November 20, 2020
Lynne Lederman, PhD
Lynne Lederman, PhD

Page Number: cing

Clinicians are increasingly looking to big data and electronic medical records (EMR) data integration tools in support of clinical decision-making questions and processes for improved access and care for their patients with cancer.

Clinicians are increasingly looking to big data and electronic medical records (EMR) data integration tools in support of clinical decision-making questions and processes for improved access and care for their patients with cancer.

Mia Levy, MD, PhD, spoke about such cancer clinical decision-making tools in the era of precision medicine and big data at the Association of Molecular Pathology (AMP) 2020 Annual Meeting and Expo.1 She described clinical decision support workflows for interpreting tumor next-generation sequencing (NGS) results for clinical management, knowledge resources, and secondary uses of NGS data for clinical trial design and operational optimization.

Clinical decision support for the care of patients with cancer often starts with tumor NGS and the resulting laboratory-based interpreted data, including identification of somatic and germline variants. When laboratories first began reporting NGS data, it was not integrated into EMR. Portals were created for data access, and data were sometimes reported via email. The interpreted data, portals, and emails did not include patient history, but contained a lot of information not relevant to patient care.

Integrating NGS with diagnostic data within the EMR is now possible and supports just-in-time clinical decisions, such as referral to appropriate clinical trials. Molecular tumor boards have also been created to capture patient history in addition to sequencing data, adding value to the pathology report, and helping to interpret results for clinical management in a collaborative fashion.

Clinical decisions supported by the use of tumor NGS data include what treatments to consider—both on-label and off-label options when a potentially actionable mutation has been reported—as well as referral to clinical trials. NGS data can also suggest treatments to avoid, such as those that are not likely to be effective, and support continuation of treatments shown to be most optimal or to which the tumor is responding.

Classes of knowledge base assertions made based on tumor NGS data include:

  • Functional: alteration of protein function
  • Frequency: of biomarker observed in a specific disease
  • Therapeutic: efficacy or lack thereof in the context of treating disease with a biomarker
  • Prognostic: effect of a specific biomarker in a specific disease
  • Diagnostic: association of diagnoses with molecular alterations
  • Clinical trial eligibility criteria: in the context of the disease, biomarker, and treatment

An example of the therapeutic class is sensitivity to osimertinib (Tagrisso) in patients with non–small cell lung cancer (NSCLC) based on the presence of the EGFR T790M mutation.

Biomarkers used in assertions include the typical gene variants like point mutations, insertions, deletions, gene amplification, fusions, and rearrangements, cytogenetics, and others. It is important to be able to consider the implications of certain combinations of alterations (AND/OR/NOT) in addition to single alterations, Levy explained.

An example of a therapy assertion in NSCLC is the L858R mutation in EGFR, which confers sensitivity to erlotinib in metastatic disease. When the EGFR mutations L858R and T790M co-occur (”AND”), this confers resistance to erlotinib (Tarceva) in metastatic disease. For colon cancer, sensitivity to cetuximab treatment requires the absence (“NOT”) of both the KRAS and NRAS exon 2, 3, or 4 mutations.

Enrollment in many clinical trials is complicated by having different criteria, including biomarkers, for different arms. The National Cancer Institute (NCI) Match trial has over 40 different arms, with more being added. One arm includes patients with tumors with the biomarkers BRAF V600E/K/R/D mutations, but not KRAS, NRAS, or HRAS mutations. It includes solid tumors and lymphomas, but not melanoma, colorectal, or papillary thyroid cancer, although these latter tumors may also have BRAF mutations. Patients had to have had first-line therapy, but not with a BRAF inhibitor.

Levy said, “Oftentimes when we are doing clinical trial matching from an interpretive report, we really only have this [NGS] information available to us, and this information is really downstream, and sometimes we get false positives in clinical trial matching when we don’t take into account the other eligibility and inclusion and exclusion criteria.”

Her group published a trial curation workflow that created structured assertions related to disease-biomarker eligibility criteria, therapeutic context, and treatment cohorts that is publicly available on the website A massive effort, including manual curation of biomarker trials of 67,479 recruiting cancer-related clinical trials, yielded 5045 annotated trials, and identified unique biomarkers used in clinical trials, most frequently genomic, with protein, cytogenetic, and other biomarkers occurring much less frequently.2

The availability of data affects the sensitivity and specificity of this kind of analysis. With only the NGS and pathology data and no clinical information, trials that aren’t really appropriate for a patient are likely to be identified. Therefore, Levy said, laboratories are trying to get more clinical history into their reports to avoid this.

A secondary use of NGS data beyond clinical decision support for patients is to optimize clinical trial accrual. Levy asked, “What is the most expensive trial to open at a particular site?” The answer is, “The trial that accrues 0 subjects. There is significant expense to opening clinical trials. It takes about 3 patients accrued to recoup expenses from opening.”

Given that only 5% to 6% of adult patients with cancer participate in clinical trials, Levy said, “We need to think about how data can improve trial design.” Opportunities to improve clinical trial accrual occur across entire scope of trials using big data and decision making.

One opportunity is to consider the landscape of clinical trials. Looking at over 800 trials in breast cancer by classes of agents, even though it was difficult to get data from public databases, showed that most trials are in metastatic disease, with kinase inhibitors, cytotoxic agents, and hormonal agents the most frequent agents being tested.

Another way to look at data is to analyze by the top biomarkers in trials. The most frequent in breast cancer involve hormone receptor positive, HER2 negative or positive, or triple-negative cancers, with small numbers of novel biomarkers.

Looking at the data from the perspective of drugs, there are 701 open trials of pembrolizumab (Keytruda) across a range of tumor types. It is not known which biomarkers are best, hence the large number of trials. It will be interesting to see the evolution of biomarkers in immunotherapy, and this should help to understand the landscape when planning the next trials, Levy suggested.

Levy said that to copy and paste the eligibility criteria from existing trials when planning new trials is a mistake, because it eliminates too many people. The NCI, American Society of Clinical Oncology (ASCO), and Friends of Cancer Research recommend removing restrictions (eg, HIV and other viral infection status and organ function limits), to increase enrollment.

Real-world data (RWD) and big data from EMR can assist in modeling eligibility criteria. Public and private organizations are addressing this issue, including ASCO’s CancerLinQ (Learning Intelligence Network for Quality), the American Association for Cancer Research (AACR) Project GENIE (Genomics Evidence Neoplasia Information Exchange), and PCORI (Patient-Centered Outcomes Research Institute).

Interestingly, RWD also played a role in expanding the indication of palbociclib (Ibrance) to treat breast cancer in men, which was used off-label after the FDA approval for use only in women.

The feasibility of finding NCI Match trial arms for the AACR GENIE project was studied using NGS data from 8 institutions (5 US, 3 international).3 Out of over 18,000 samples, up to several hundred would be eligible for some arms of the trial, whereas there were other arms for which none to up to less than 10 would be eligible. Because some arms were not likely to enroll patients, Levy said that this thought experiment is important for designing clinical trials using public databases to refine criteria at a high level to make sure there will be enough patients to enroll.

Another way that might assist clinical trial design is to rethink how control arms are designed. Traditionally, trials compare an intervention arm with a standard of care (SOC) arm. Can SOC arms be supplemented by historic controls or data from prior trials where that SOC was used, or comprise prospective data collected across the country? These “synthetic” controls to decrease clinical trial costs will only work though if everyone shares data.

Patients frequently look for trials using consumer-facing clinical trial search support tools, some of which are disease specific, and some global. One caveat is that clinical trial registries may lack arm- and site-specific status. Information in clinical trial knowledge bases can have delays concerning whether they are open and accruing. Given that study sites may have different statuses for each arm, patients could be devastated to find out specific arms are not open to them.

There are clinical decision support tools for clinical trial matching, both interpretive reports for NGS testing and online portal tools, as well as tools integrated into EMR. Levy said that clinical trial matching workflows are complicated, and humans are essential for prescreening for clinical trials. “It’s important to share data so we can learn from the experience of every patient,” she concluded.

Some of the discussion after the session involved how to monetize the process. Levy’s institution does not charge for molecular tumor boards. Instead, they have a molecular consult clinic for patients, which is a reimbursable model to provide sustainability to the program. The current widespread use of telehealth is providing the opportunity to set up clinics for patients who cannot travel, benefitting both patient and clinician.


1. Levy M. Cancer Clinical Decision Making Tools in the Era of Precision Medicine & Big Data. Presented at: the virtual Association for Molecular Pathology Annual Meeting; November 16-20, 2020.

2. Jain M, Mittendorf KF, Holt M, et al. The My Cancer Genome clinical trial data model and trial. J Am Med Inform Assoc. 2020;27(7):1057-1066. doi:10.1093/jamia/ocaa066

3. AACR Project GENIC Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 2017;7(8):818-831. doi:10.1158/2159-8290.CD-17-0151