Abstract
Diabetes and its complications have been recognized worldwide as a major public health threat. Predicting diabetic complications is regarded as a highly effective technique for increasing the survival rate of diabetic patients. While many studies currently use medical images and structured medical records, very limited efforts have been dedicated to applying data mining techniques for unstructured textual medical records, such as admission and discharge records. Moreover, the similarities among medical records that are overlooked by existing approaches could potentially improve the accuracy of prediction models. In this paper, we propose an approach for diabetic complication prediction based on a similarity-enhanced latent Dirichlet allocation (seLDA) model. Specifically, we first estimate the similarity between textual medical records after data preprocessing, and then we perform seLDA-based diabetic complication topic mining based on similarity constraints. Finally, we construct a prediction model by solving a multilabel classification problem with support vector machines (SVMs). The experimental results show that our approach outperforms the conventional LDA-based approach in similarity indices by 22.49%. Additionally, our approach shows significant improvements in prediction accuracy over four other representative seLDA-based approaches, including random forests (RF), k-nearest neighbors (KNN), logistic regression (LR) and deep neural networks (DNNs).
| Original language | English |
|---|---|
| Pages (from-to) | 12-24 |
| Number of pages | 13 |
| Journal | Information Sciences |
| Volume | 499 |
| DOIs | |
| State | Published - Oct 2019 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Diabetic complication prediction
- Latent Dirichlet allocation
- Multilabel classification
- Similarity enhancement
- Topic mining
Fingerprint
Dive into the research topics of 'Diabetic complication prediction using a similarity-enhanced latent Dirichlet allocation model'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver