A machine learning model equipped with only data on people’s age, smoking duration and the number of cigarettes smoked per day can predict lung cancer risk and identify who needs lung cancer screening, according to a new study publishing October 3rd in the open access journal PLOS Medicine by Thomas Callender of University College London, UK, and colleagues.
Credit: Brandon Baunach, Flickr (CC-BY 2.0, https://creativecommons.org/licenses/by/2.0/)
A machine learning model equipped with only data on people’s age, smoking duration and the number of cigarettes smoked per day can predict lung cancer risk and identify who needs lung cancer screening, according to a new study publishing October 3rd in the open access journal PLOS Medicine by Thomas Callender of University College London, UK, and colleagues.
Lung cancer is the most common cause of cancer death worldwide, with poor survival in the absence of early detection. Screening for lung cancer among those at highest risk could reduce lung cancer deaths by nearly a quarter, but the ideal way to determine the high-risk population has been unclear. The current standard-of-care model of lung cancer risk requires 17 variables, few of which are routinely available in electronic health records.
In the new study, researchers used data on 216,714 ever-smokers from the UK Biobank cohort and 26,616 ever-smokers participating in the US National Lung Screening trial to develop new models of lung cancer risk.
A machine learning model used three predictors — age, smoking duration and pack-years — to calculate people’s odds of both developing lung cancer and dying of lung cancer over the next five years. The researchers tested the new model on a third set of data, from the US Prostate, Lung, Colorectal and Ovarian Screening Trial. The model predicted lung cancer incidence with an 83.9% sensitivity and lung cancer deaths with an 85.5% sensitivity. All versions of the model had a higher sensitivity than the currently used risk prediction formulas at an equivalent specificity.
Callender adds, “We know that screening for those who have a high chance of developing lung cancer can save lives. With machine learning, we’ve been able to substantially simplify how we work out who is at high risk, presenting an approach that could be an exciting step in the direction of widespread implementation of personalised screening to detect many diseases early.”
#####
In your coverage, please use this URL to provide access to the freely available paper in PLOS Medicine: http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1004287
Citation: Callender T, Imrie F, Cebere B, Pashayan N, Navani N, van der Schaar M, et al. (2023) Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study. PLoS Med 20(10): e1004287. https://doi.org/10.1371/journal.pmed.1004287
Author Countries: United Kingdom, United States
Funding: This work was supported by the Wellcome Trust (222890/Z/21/Z to TC), the National Science Foundation (1722516 to FI and MvdS), the Medical Research Council (MR/T02481X/1 to NN and MR/W025051/1 to SMJ) and Cancer Research UK (EDDCPGM\100002 to SMJ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Journal
PLoS Medicine
DOI
10.1371/journal.pmed.1004287
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
COI Statement
Competing Interests: see manuscript