A machine learning, artificial intelligence algorithm analyzing diagnostic bone marrow biopsy digital whole-slide images was able to effectively differentiate with 92.3% accuracy between prefibrotic primary myelofibrosis and essential thrombocythemia.
Research revealed that an AI-powered machine learning algorithm, when examining digital whole-slide images (WSI) of diagnostic bone marrow (BM) biopsies, accurately distinguished between prefibrotic primary myelofibrosis (pre-PMF) and essential thrombocythemia (ET) with an 92.3% accuracy rate, according to data presented at the 2023 ASH Annual Meeting.
The AI was trained on a dataset that was split evenly between pre-PMF and ET and contained 32,226 patient-derived WSI. In the validation cohort of samples following this training, the sensitivity of the algorithm for differentiating between pre-PMF and ET was 66.6% and the specificity was 100%. The positive predictive value for the algorithm was 100% and the negative predictive value was 90.9%.
“It's my idea that algorithms such as the one I'm presenting is a clinical decision support tool for physicians to help as a companion, potentially maybe as a screening but with the overriding concept where the physician can say this doesn't make any sense,” said lead author Andrew Srisuwananukorn, MD, assistant professor at The Ohio State University Comprehensive Cancer Center. “I think that it's a physician's job to be aware that these algorithms are coming, and we must know how to critique them. I don't view it right now as something we should be fearing, but I do think we should be smart on how we incorporate it into our practice.
Overlapping disease characteristics between pre-PMF and ET complicate the accurate diagnosis of these diseases and other myeloproliferative neoplasms, which all share a common JAK/STAT activation along with JAK2, CALR, or MPL mutations. The World Health Organization defined pre-PMF as a separate disease type in 2016, given its connection to constitutional symptoms, major hemorrhage, and progression to myelofibrosis or leukemia, which are characteristics not commonly seen with ET.
“Differentiating between these 2 diseases can be quite challenging, as the diagnostic criteria for ET and pre-PMF both rely on similar characteristics, including clinical and laboratory abnormalities, mutational profiling, and assessment of their bone marrow biopsies, which can be subjective,” said Srisuwananukorn. “There's high interobserver variability among pathologists to make these diagnoses. Consensus varies widely between 50% and 100%, so being that large, there's a pressing need for improved diagnostics to differentiate between these 2 diseases.”
The algorithm was trained using WSI from 200 patients diagnosed at the University of Florence in Italy (100 each with pre-PMF and ET). These findings were further validated using WSI from 26 patients enrolled at the Moffitt Cancer Center; these patients consisted of 6 with pre-PMF and 20 with ET. Slides for analysis were digitized using the Aperio AT2 slide scanners and tessellated at a 10x magnification for the training model.
The scripts used for the algorithm were written using Slideflow, an open-source AI framework available on Github that was developed by James M. Dolezal, MD, another author on the ASH paper. Training was completed using a Minerva High Performance Computer at Mount Sinai. The final product, Srisuwananukorn noted, can be used on a standard laptop and is already ready for use. “We're able to produce a prediction on a whole slide in roughly 6 seconds,” he said. “Our model workflow is open source and was developed by our research team at the University of Chicago, and you can use it today.”
In the initial training set, the model achieved a high level of performance, with an area under receiver operator curve (AUROC) of 0.90 ±0.04. This was maintained in the validation set, with an AUROC of 0.90. The accuracy of 92.3% was achieved following optimization thresholding.
The results were further critiqued by the investigators to examine how the algorithm was arriving at its conclusions. This was completed using heat maps from the images that were analyzed. In this experiment, researchers found the algorithm was focusing on areas of high biological interest. Specifically, it examined bone marrow cellularity as opposed to areas of fat, bone, or background tissue. “This AI algorithm seems to be biologically reasonable and again this is an important step as we use these AI algorithms in clinical practice,” concluded Srisuwananukorn.