Comparing three data mining algorithms for identifying associated risk factors of Type 2 Diabetes

  • Maryam Tayefi Department of Modern Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. Biochemistry of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Habibollah Esmaeily Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Majid Ghayour-Mobarhan Department of Modern Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. Biochemistry of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Ali Reza Amirabadizadeh Medical Toxicology and Drug Abuse Research Center (MTDRC), School of Medicine, Birjand University of Medical Sciences, Birjand, Iran.
Keywords: Artificial neural network, Support vector machine, Logistic regression method, Type 2 diabetes

Abstract

Introduction: Type 2 diabetes (T2DM) shows increasing prevalence and global health burden, causing a concern among health service providers and health administrators. The current study is aimed at developing and comparing some statistical models that are useful in measuring or establishing such associations. The three particular statistical methods investigated in this study are artificial neural network (ANN), support vector machines (SVM) and multivariate logistic regression (MLR) using demographic, anthropometric and biochemical characteristics on a sample of 9528 individuals from Mashhad city.

Methods: The statistical methods involved in this study are also known as machine learning algorithms and require dividing the available data in to training and testing dataset. This study has randomly selected 70% cases (6654 cases) for training and reserved the remaining 30% (2874 cases) for testing. The three methods are compared with help of the receiver operating characteristic (ROC) curve.

Results: The prevalence rate of T2DM is 14% in our population. The ANN model has 78.7% , accuracy, 63.1% sensitivity and 81.2% specificity. Values of these three parameters are 76.8%, 64.5% and 78.9% respectively for SVM and 77.7%, 60.1% and 80.5%, respectively for MLR. The area under the ROC curve (AUC) is 0.71 for ANN, in SVM model was 0.73 for SVM, and 0.70 for MLR. 

Conclusion: The overall conclusion is that ANN performs better than two models and can be used effectively to identify associated risk factors of T2DM.

 

Downloads

Download data is not yet available.
Published
2017-11-29
Section
Conference proceedings and abstracts

Most read articles by the same author(s)