Findings highlight an AI-based machine-learning approach designed to work around incomplete diagnostic labeling in Alzheimer disease using routine clinical data.
Alzheimer’s disease (AD) remains substantially underdiagnosed in routine clinical care, particularly among racially and ethnically underrepresented populations. In a new study published in npj Digital Medicine, investigators from UCLA report that a semi-supervised artificial intelligence model applied to electronic health records (EHRs) can identify patients with likely undiagnosed Alzheimer’s disease while achieving more equitable performance across racial and ethnic groups than conventional supervised approaches.1

Using EHR data from more than 115,000 adults receiving care at UCLA Health, the researchers developed a semi-supervised positive unlabeled learning framework (SSPUL) designed to address two persistent challenges in dementia prediction: incomplete diagnostic labeling and algorithmic bias. Across non-Hispanic White, non-Hispanic African American, Hispanic Latino, and East Asian populations, the model demonstrated substantially higher sensitivity and better overall discrimination than standard supervised machine-learning models, while also minimizing differences in performance between groups.1
Corresponding author Timothy S Chang, MD, PhD, assistant professor of neurology at the David Geffen School of Medicine at UCLA and colleagues frame their work within a well-documented discrepancy between AD prevalence estimates from longitudinal cohort studies and diagnoses recorded in real-world clinical data.2 Claims- and EHR-based identification capture only a fraction of true cases, and underdiagnosis is particularly pronounced among Black, Hispanic, and Asian American patients.3 Chang et al cite research demonstrating structural inequities, differences in access to care, cultural stigma, and clinician bias as factors contributing to delayed or missed diagnoses.
Although prior studies have used EHR data to predict AD, most rely on supervised learning models trained on existing diagnoses. According to the authors, this approach assumes that diagnostic labels represent an unbiased ground truth, an assumption that does not hold in dementia care. Models trained solely on documented diagnoses risk perpetuating existing disparities rather than mitigating them.1
To overcome these limitations, the investigators used a semi-supervised positive unlabeled learning framework (SSPUL), which is well suited to EHR data where the absence of a diagnosis does not reliably indicate absence of disease. Rather than treating all unlabeled patients as disease-free, SSPUL allows a subset of unlabeled individuals to be classified as likely positives based on clinical patterns.
The study included more than 115,000 adults aged 65 to 90 years receiving care at UCLA Health. The model incorporated a broad range of diagnostic codes and healthcare utilization features rather than a narrow, expert-curated list of risk factors. To promote equity, the researchers applied both pre-processing and post-processing bias mitigation strategies, including race- and ethnicity-specific prevalence estimates and classification thresholds optimized to balance benefit across groups.1
Model performance was evaluated across non-Hispanic White, non-Hispanic African American, Hispanic Latino, and East Asian patients using repeated train–test splits. Predictions were validated using proxy AD diagnoses and medications, as well as genetic data, including polygenic risk scores and APOE ε4 allele counts, in a biobank-linked subset of patients.1
Across all racial and ethnic groups, SSPUL outperformed supervised baseline models in identifying patients with likely Alzheimer’s disease. Sensitivity ranged from 0.77 to 0.81 with SSPUL, compared with 0.39 to 0.53 for supervised models. The semi-supervised approach also achieved higher area under the precision–recall curve and better calibration, indicating more reliable probability estimates.
The authors emphasized that the gains in performance did not come at the expense of equity. SSPUL demonstrated the lowest cumulative parity loss across groups, reflecting smaller differences in sensitivity, precision, and related metrics between underrepresented populations and non-Hispanic White patients. In contrast, supervised models tended either to under-identify cases in minority populations or to show greater variability in performance across groups.
Feature analyses showed that the model relied on both expected neurologic indicatorsm —such as memory loss, delirium, and mild cognitive impairment—and non-neurologic signals, including healthcare utilization patterns and conditions such as decubitus ulcers. Patients predicted to have undiagnosed Alzheimer’s disease also demonstrated genetic risk profiles consistent with Alzheimer’s susceptibility, supporting the clinical plausibility of the model’s predictions.1
Among the study's limitations the authors acknowledged its reliance on a single academic health system, which may limit generalizability. In addition, AD status was inferred using proxy measures rather than gold-standard clinical assessments. Polygenic risk scores are developed primarily from European ancestry cohorts, they noted, leading to less consistent genetic validation in some racial and ethnic groups. Finally, residual bias in EHR documentation can't be ruled out.1
Limitations of the study notwithstanding, the authors state the implications of the findings are 3-fold:
More broadly, the study demonstrates that machine-learning models explicitly designed to address bias can improve both accuracy and equity, offering a practical path toward more timely and fair recognition of Alzheimer’s disease in routine clinical care.