Self-Taught AI Accurately Predicts Lung Cancer Recurrence and Survival


A self-taught AI tool that accurately diagnoses adenocarcinoma and predicts cancer recurrence has been developed and has shown to outperform traditional methods.

Aristotelis Tsirigos, PhD

Aristotelis Tsirigos, PhD

A new study performed and developed by researchers at NYU Langone Health's Perlmutter Cancer Center and the University of Glasgow found that a self-taught artificial intelligence (AI) tool can accurately diagnose cases of adenocarcinoma.1

The computer program is based on data from nearly a half-million tissue images and is powered by AI. The researchers report that by including structural features of tumors from 452 patients with adenocarcinoma, part of the over 11,000 patients in the United States National Cancer Institute's Cancer Genome Atlas, the program provides an unbiased, detailed, and reliable second opinion for patients and oncologists regarding cancer presence and prognosis. This includes the likelihood and timing of the cancer's return.

Notably, the program is self-taught, meaning that it independently identified which structural features were statistically most significant when assessing disease severity and which had the greatest impact on tumor recurrence.

“This is one of the AI tools that you may have heard about. The difference with this tool is that it is self-taught. That is important because typically, when you train a machine learning model, you need to know what the diagnosis is in the medical space. But that requires quite a bit of effort from the pathologist side,” explained Aristotelis Tsirigos, PhD, study co-senior investigator, professor in the Departments of Pathology and Medicine at NYU Grossman School of Medicine and Perlmutter Cancer Center, and co-director of precision medicine and director of its Applied Bioinformatics Laboratories, in an interview with Targeted OncologyTM.

Lungs anterior view: © 7activestudio -

Lungs anterior view: © 7activestudio -

“Here, we decided to do it in a completely unsupervised way, which means the algorithm would have to teach itself what the important parts of the image are so it could go ahead and do the diagnostics. But this tool can be used in different contexts for different diseases. We are focused on lung cancer, but of course, it is applicable to different types of cancer,” he continued.

The study, which was published in Nature Communications,2 showed the AI tool to accurately distinguish between similar lung cancers 99% of the time. This includes adenocarcinoma and squamous cell cancers. Tsirigos explained that the tool was 72% accurate at predicting the likelihood and timing of cancer's return after therapy. This is better than the 64% accuracy seen with pathologists who directly examined the same patients' tumor images.1

“Seventy-two percent is not perfect, so there is room for improvement. But it is definitely a big improvement over the standard that is used right now,” added Tsirigos.

Researchers analyzed lung adenocarcinoma tissue slides from the Cancer Genome Atlas, and from their analysis, found 46 key characteristics. They referred to these as histomorphological phenotype clusters, which included features from normal and diseased tissue. A subset of these clusters were statistically linked to either the early return of the cancer or to long-term survival.

Subsequently, these results were validated using additional testing on tissue images from 276 patients who were treated for adenocarcinoma at NYU Langone from 2006 to 2021.

According to the press release, the goal of the researchers is to use the histomorphological phenotype learning (HPL) algorithm to assign a score between 0 and 1 to each patient, which will reflect their statistical chance of survival and tumor recurrence for up to 5 years. With the self-learning nature of the program, Tsirigos says that it will become more accurate as additional data is added over time.

“Now, you can improve this further by incorporating more data, either on the imaging side, so train the algorithm with more data, or use additional data, meaning, for example, demographic data, or the age of the patient, or perhaps the sex of the patient on birth, or perhaps in mutations of the tumor. There is definitely additional data that we could incorporate into the algorithm to make it better. But the point of this study was to show that just by just looking at an image, perhaps using AI, [you can] improve on what pathologists can do alone,” said Tsirigos.

The team plans to continue developing HPL-like programs for other cancers, including breast, ovarian, and colorectal cancers, and plans to include other data, like hospital electronic health records about other illnesses and diseases, income, home ZIP code, and more, to improve on its accuracy.

Further, the programming code has been published online to build the trust of the public. Plans are also in the works to make the new HPL tool fully available upon completion of further testing.

“We are definitely looking into a different future for precision medicine. With all the data that we are accumulating, it prepares us for big discoveries, but also big changes in clinical practice... If we can make doctors' lives easier and facilitate certain things, we give them more time to work with the patient,” added Tsirigos.

"Self-taught" AI tool helps to diagnose and predict severity of common lung cancer. News release. NYU Langone Health System. June 11, 2024. Accessed June 11, 2024.
Claudio Quiros A, Coudray N, Yeaton A, et al. Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unannotated pathology slides. Nat Commun. 2024;15(1):4596. Published 2024 Jun 11. doi:10.1038/s41467-024-48666-7
Recent Videos
The Oncology Brothers with Joshua K. Sabari, MD, presenting slides
The Oncology Brothers with Joshua K. Sabari, MD, presenting slides
The Oncology Brothers with Joshua K. Sabari, MD, presenting slides
The Oncology Brothers with Joshua K. Sabari, MD, presenting slides
The Oncology Brothers with Joshua K. Sabari, MD, presenting slides
Related Content