Investigating Factors Influencing Disease Progression in Patients With Non-Alcoholic Fatty Liver Disease
DOI:
https://doi.org/10.14740/jocmr6424Keywords:
Cluster analysis, Non-alcoholic fatty liver disease, Electronic health records, Unsupervised learning, Clinical phenotypesAbstract
Background: With no approved pharmacological treatments for non-alcoholic fatty liver disease (NAFLD) in Taiwan, identifying protective and risk factors is crucial for preventing disease progression. Given the clinical heterogeneity of NAFLD, this study aimed to identify clinically meaningful NAFLD phenotypes using electronic medical records (EMRs) and unsupervised clustering, stratify risk across different clusters, identify factors associated with disease progression, and derive a parsimonious set of predictors for high-risk phenotypes.
Methods: This study was a retrospective cohort study conducted in three steps with iterative model training. In step 1, patients diagnosed with NAFLD were identified, and all relevant patient data were extracted, followed by clustering analysis using the k-prototype algorithm. In step 2, survival analysis and Cox regression were applied to perform risk stratification across clusters. In step 3, Lasso regression, logistic regression, and receiver operating characteristic (ROC) curve analysis were used to identify potential protective and risk factors associated with NAFLD and to derive a parsimonious set of predictors for high-risk phenotypes across different risk strata.
Results: Step 1: The analysis of 6,023 patients identified four distinct phenotypic clusters. The first cluster had the most severe disease, the second the least. Step 2: Among 4,998 patients, the first cluster faced the highest risk for all outcomes, with a median survival of 3.06 years, significantly different from the others. There was no significant risk difference between the second and third clusters. Step 3: A comparison of the highest-risk and lowest-risk clusters finally identified 17 potential variables.
Conclusions: Using multiple analytical models, this study identified 17 potential risk factors associated with NAFLD progression. Their combined assessment may inform future risk stratification and hypothesis generation. Further validation is required before clinical application.
Published
Issue
Section
License
Copyright (c) 2026 The authors

This work is licensed under a Creative Commons Attribution 4.0 International License.






