Review of Naïve Bayes Classification Techniques Applied to Diabetes Data Challenges
Keywords:
Naïve Bayes, Diabetes Prediction, Machine Learning, Medical Data MiningAbstract
Diabetes mellitus is one of the most widespread chronic diseases worldwide, and its early detection has become a critical research focus in medical data mining. Machine learning algorithms play an essential role in identifying high-risk individuals and supporting clinical decision-making through predictive models. Among these algorithms, the Naïve Bayes classifier has been widely studied for diabetes prediction due to its simplicity, computational efficiency, and transparent probabilistic framework. This review explores the theoretical foundations of Naïve Bayes, including its assumption of feature independence, and evaluates its application to widely used medical repositories such as the Pima Indian Diabetes Dataset. The discussion highlights both strengths, such as ease of implementation and adaptability, and limitations, including sensitivity to class imbalance, missing values, and correlated features. Furthermore, the paper compares Naïve Bayes with alternative classifiers like Decision Trees, Support Vector Machines, and Neural Networks. The findings suggest that while Naïve Bayes does not always outperform advanced models, it remains a valuable tool when efficiency and interpretability are prioritized
References
Anderson, R. M., Funnell, M. M., & Fitzgerald, J. T. (2000). The Diabetes Educator’s Guide to the Diabetes Patient. Alexandria, VA: American Diabetes Association.
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Hybrid prediction model for Type-2 diabetic patients. Expert Systems with Applications, 37(12), 8102–8108.
Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., & Johannes, R. S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Annual Symposium on Computer Applications in Medical Care (pp. 261–265).
Mohapatra, H., Patra, S. R., & Dash, P. K. (2014). Performance evaluation of classification methods in diabetes diagnosis. Procedia Computer Science, 46, 284–290.
Sisodia, D., & Sisodia, D. S. (2014). Prediction of diabetes using classification algorithms. Procedia Computer Science, 132, 1578–1585.
Polat, K., & Güneş, S. (2007). An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digital Signal Processing, 17(4), 702–710.
Tomar, D., & Agarwal, S. (2013). A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241–266.
Reddy, K., & Valli, S. (2015). A survey on predictive data mining approaches for diabetes. International Journal of Computer Applications, 117(6), 1–5.
UCI Machine Learning Repository. (2016). Pima Indians Diabetes Dataset. Retrieved from https://archive.ics.uci.edu/ml/datasets/diabetes
Choubey, D. K., Paul, S., Kumar, R., & Kumar, P. (2016). Classification of Pima Indian diabetes dataset using Naïve Bayes with genetic algorithm as an attribute selection method. In 2016 International Conference on Computing, Communication and Automation (ICCCA) (pp. 52–57). IEEE.
Kaur, H., & Kumari, V. (2016). Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics, 12(3), 1–9.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2018 International Journal of Engineering, Science and Humanities

This work is licensed under a Creative Commons Attribution 4.0 International License.