The model demonstrated 70.63% accuracy for predicting depression, a common comorbidity seen in COPD and one that increases risk for negative outcomes.
A machine learning (ML) model developed using data from a large nationally representative Chinese cohort demonstrated strong predictive performance for identifying depression risk among adults with chronic obstructive pulmonary disease (COPD), according to study findings published in the Journal of Affective Disorders.1
In the study, conducted by a team from the department of respiratory and critical care medicine, Affiliated Hospital of Guangdong Medical University, in Zhanjiang, China, investigators developed an extreme gradient boosting (XGBoost) model that showed 70.63% accuracy for predicting depression in this vulnerable population.1 The tool responds to a critical need for early identification and management of depression in COPD, a comorbidity seen in up to 42% of individuals with stable disease and nearly universal among adults with COPD during acute exacerbations.2 Depression in this population is associated with increased exacerbations, hospitalizations, and 30-day mortality,3 according to research the authors cite.
ML methods have significant advantages over traditional statistical approaches to predicting depression, first author Xuanna Zhao, and colleagues explained, in particular the ability to model complex nonlinear relationships and automatically select relevant features.
“Our study fills a gap in the literature on machine learning’s role in linking COPD and depression risk,” they wrote. “While the comorbidity is well-known, traditional predictive methods struggle to capture it. The incorporation of machine learning algorithms…not only enhances predictive accuracy but also broadens the multidimensional understanding of underlying risk factors.”1
Zhao et al also note that previous depression prediction models in COPD have been constrained by small sample sizes and limited variable sets. The novel XGBoost model draws on a large national cohort and validates findings across temporally distinct data.
The investigators analyzed data from 2921 adults aged 45 to 85 years with physician-confirmed COPD. Depressive symptoms were defined using the Center for Epidemiologic Studies Depression Scale-10 (CESD-10), with a threshold score of 10 or greater. Of the study cohort, 49.7% exhibited depressive symptoms. Researchers extracted 36 demographic, behavioral, psychological, and health-related variables and used least absolute shrinkage and selection operator (LASSO) regression to refine the set of predictors of depression to 11 variables: sex, self-perceived health status, life satisfaction, sleep duration, pain, activities of daily living (ADL) score, history of falls, and several common comorbidities, ie, arthritis, kidney disease, and digestive disease.
Zhao and colleagues applied 6 ML algorithms —logistic regression, support vector machine, multilayer perceptron, LightGBM, XGBoost, and random forest— on data from a preselected training set (70%). They evaluated model performance on an internal test dataset and an external temporal validation cohort (n = 933) drawn from the 2013 China Health and Retirement Longitudinal Study database.
The predictive performance of the XGBoost model was consistently superior to the 5 others evaluated, achieving an area under the receiver operating characteristic curve (AUROC) of .811 (95% CI, .79-.829), accuracy of 78.91%, sensitivity of 77.31%, and precision of 79.74%. The XGBoost model also achieved specificity of 80.51% and an F1 score of 78.5%, according to the study.1 In the time series validation set, which included diverse population characteristics, XGBoost achieved the highest accuracy (70.63 %), sensitivity (59.05 %), and F1 score (63.17 %). XGBoost Performance on precision, specificity, and AUROC was also superior in the validation set, "highlighting its superior generalizability and stability across different populations," Zhao et al wrote.
When they performed interpretability analysis, the researchers found the strongest contributors to elevated risk of depression in adults with COPD were lower life satisfaction, poor self-rated health, reduced physical function, short sleep duration, and presence of pain. Notably, female sex, disability, and history of falls further increased risk. Conversely, better perceived health and life satisfaction protected against depressive symptoms.
Investigators acknowledged several limitations of the study, including reliance on self-reported measures for both COPD diagnosis and depression, absence of spirometric data, and the retrospective nature of the analysis. The authors acknowledge that further external validation in clinical settings is required, as is prospective evaluation of interventions triggered by risk prediction.
To facilitate clinical translation, Zhao and team note that the XGBoost model could be deployed online, allowing healthcare professionals to perform individualized depression risk assessments at point of care. Such a tool could support timely mental health interventions, that can "help alleviate depressive symptoms, prevent the development of a vicious cycle between COPD and depression, and significantly enhance overall outcomes," the team wrote.