Data Mining Being Harnessed to Power Precision Medicine


Every time you check out at the supermarket with your customer loyalty card, a computer makes a record of your purchases, which links to your demographic profile.

Clifford A. Hudis, MD

Clifford A. Hudis, MD

Clifford A. Hudis, MD

Every time you check out at the supermarket with your customer loyalty card, a computer makes a record of your purchases, which links to your demographic profile.1Whenever you use a debit or credit card to make a purchase, whether online or at a bricks-and-mortar building, that purchase becomes part of your consumer record.1Facebook tracks your 'likes,' friends, and click habits,2and Google monitors your Internet activity so it can send you targeted advertisements.3

Some companies, such as Acxiom Corporation, exist for the sole purpose of obtaining as much information as they can about as many consumers as possible and scanning it for trends relevant to their clients.4Government agencies also gather and sift through large amounts of electronic information about telephone and e-mail activity, looking for potential terrorists.5This widespread technology, which marketers and governments consider essential, is calleddata mining, and it is making its way into everyday oncology practice.

The physician-led American Society of Clinical Oncology (ASCO) has partnered with the SAP software company on a "big-data" initiative intended to improve cancer care.6The ASCO subsidiary CancerLinQ will gather patient data from large oncology practices and cancer centers and analyze the information using the SAP HANA platform. Although it sounds impersonal, a basic goal of the platform is to enhance individualized care by relying on the collective experiences of other patients.

The CancerLinQ Web site7outlines 3 scenarios of how the tool might be used in practice: (1) physicians can search the database to compare their treatment plan for a patient with treatment plans used in similar cases; (2) physicians can enter their patient’s characteristics into the database to identify a less conventional treatment plan that proved effective in patients similar to their patient; or (3) physicians can use the database to review an area of performance in their own practice.

Researchers can also explore the database to generate new hypotheses for future research. CancerLinQ will be able to generate instant reports from its database of patient records based on user-specified criteria, such as age, genetic alterations, or comorbidities, which researchers can analyze for correlations. At least 15 oncology groups have already agreed to supply ASCO with the records of 500,000 patients.6In turn, ASCO expects to deploy the first iteration of CancerLinQ to these centers by the end of 2015.

Also in oncology, the Multiple Myeloma Research Foundation is collaborating with GNS Healthcare, a developer of computer algorithms and architecture, to analyze genetic profiles for 1000 patients with multiple myeloma from the CoMMpass study to identify drivers of the disease process and potential therapeutic strategies.8

Meanwhile, Flatiron Health and the National Comprehensive Cancer Network (NCCN), which develops treatment guidelines for various subtypes of cancer, announced their own joint initiative to aggregate data from electronic health records.9Flatiron Health will allow NCCN member institutions to use its proprietary OncoAnalytics cloud-based software to mine the data for trends in care and adherence to NCCN Clinical Practice Guidelines. Member institutions and commercial stakeholders will also be able to analyze the data for epidemiologic information, treatment and reimbursement patterns, and other aspects of care.

Other medical partnerships are being formed to implement data-mining initiatives. Columbia University Medical Center announced a partnership with Biogen Idec to create a gene sequencing and analytics facility.10Biogen will analyze genetic records for Columbia’s patients for undiscovered connections between genes or pathways and their relationships between the natural history of a disease.

Although researchers have been mining insurance plan data, prescribing information, and the Medicare database for years, the patient-level health details that these new initiatives will make available are unprecedented. All these groups guarantee their commitment to protecting patient privacy, but similar efforts to use big data for health care-driven goals have caused concern.

In 2014, the National Health Services (NHS) in England announced the program to compile health records from general practices throughout the United Kingdom in a central database and share them with the Health and Social Care Information Center.11The NHS assured the public that the data would only include details such as age, location, and gender, but privacy advocates expressed concern that someone could use another database to pair the demographic information with a specific individual. The program was delayed after a document leaked from the NHS revealed that the agency was also worried that enough information might be available to expose patients' identity.

The accuracy of the information in the database and the conclusions drawn represent another concern about using data mining to guide care. None of the groups announcing big data collaborations has explained whether or how information will be verified for accuracy before it enters the database. Also, in an era of medicine where physicians are being driven to adopt evidenced-based practices, inviting them to rely on circumstantial evidence to guide treatment decisions seems contradictory.

However, ASCO seems confident that the CancerLinQ initiative will succeed. In a press release, ASCO’s recent former president Clifford A. Hudis, MD, said, “CancerLinQ will help improve cancer care by delivering the latest information to doctors no matter where they practice so that patients can receive high-quality, state-of-the-art care regardless of where they live.”6He added that CancerLinQ allows physicians and researchers to learn valuable lessons about care from the “97% of adult patients who do not currently participate in clinical trials.”


  1. Ferguson D. How supermarkets get your data—and what they do with it. The Guardian. Published June 8, 2013. Accessed January 25, 2015.
  2. Kirk J. Facebook ‘stalker’ tool uses Graph Search for powerful data mining. PC World. Published October 17, 2013. Accessed January 25, 2015.
  3. Epstein R. Google’s gotcha: 15 ways Google monitors you. US News & World Report. Published May 10, 2013. Accessed January 25, 2015.
  4. Singer N. Mapping, and sharing, the consumer genome. New York Times. Published June 16, 2012. Accessed January 25, 2015.
  5. Savage C, Peters JW. Bill to restrict NSA data collection blocked in vote by senate republicans. New York Times. Published November 18, 2014. Accessed January 25, 2015.
  6. ASCO teams with multinational software corporation, SAP, to develop CancerLinQ [press release]. Published January 21, 2015. Accessed January 26, 2015.
  7. ASCO CancerLinQ. How it works. Accessed January 26, 2015.
  8. The Multiple Myeloma Research Foundation and GNS Healthcare announce collaboration to identify potential new multiple myeloma therapies [press release]. Published January 7, 2015. Accessed January 26, 2015.
  9. NCCN and Flatiron Health announce collaboration to launch novel oncology outcomes database [press release]. Published January 8, 2015. Accessed January 26, 2015.
  10. Biogen Idec and Columbia University Medical Center to conduct collaborative genetics research [press release]. Published January 9, 2015. Accessed January 26, 2015.
  11. du Preez D. UK’s faces more delays but new evidence shows support for controversial big data project. Published December 16, 2014. Accessed January 26, 2015.
Related Videos
Video 10 - "RCC: Informing Treatment Decisions with Clinical Trial Data"
Video 9 - "KEYNOTE-564: Adjuvant Pembrolizumab in Renal Cell Carcinoma"
Related Content