An Enhanced Ensemble Machine Learning Framework with Explainable AI for Cyber Threat and Network Outlier Detection

Anjali, Dr. Vineet Agarwal

Authors

Anjali, Dr. Vineet Agarwal

Keywords:

Intrusion Detection System; LightGBM; XGBoost; Random Forest; MLP; LIME; CICIDS2017; Imbalanced Classification; Explainable AI; Cybersecurity

Abstract

Intrusion detection systems (IDS) are a cornerstone of modern cybersecurity infrastructure. Traditional machine learning approaches for IDS—including artificial neural networks (ANNs)—often suffer from class imbalance, limited generalization, and insufficient interpretability. This paper presents an enhanced ensemble machine learning framework that integrates Random Forest (RF), XGBoost, LightGBM, and a Multi-Layer Perceptron (MLP) deep learning classifier with a comprehensive preprocessing pipeline on the CICIDS2017 benchmark dataset. The preprocessing pipeline incorporates duplicate removal, IQR-based outlier clipping, Yeo-Johnson power transformation, standard scaling, and Principal Component Analysis (PCA) retaining 95% of explained variance. Experimental results demonstrate that LightGBM achieves the highest accuracy of 99.81% with an AUC-ROC of 0.9998, Matthews Correlation Coefficient (MCC) of 0.9832, and a training time of only 4.7 seconds. To address model transparency, Local Interpretable Model-agnostic Explanations (LIME) are applied to the best model to provide feature-level decision explanations. Comparative evaluation against a baseline ANN (92% accuracy) and prior state-of-the-art methods confirms the superiority of the proposed framework. These results demonstrate that ensemble methods with principled preprocessing and explainability mechanisms can significantly advance the effectiveness and trustworthiness of cyber threat detection.

References

M. Alawida, A. E. Omolara, O. I. Abiodun, and A. Al-Rajab, "A deeper look into cybersecurity issues in the wake of COVID-19: A survey," J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 10, pp. 8176–8206, 2022.

Cybersecurity Ventures, "Cybercrime to Cost the World $10.5 Trillion Annually by 2025," Cybercrime Magazine, 2020.

Y. Li and Q. Liu, "A comprehensive review study of cyber-attacks and cyber security: Emerging trends and recent developments," Energy Rep., vol. 7, pp. 8176–8186, Nov. 2021.

R. Kaur, D. Gabrijelcic, and T. Klobucar, "Artificial intelligence for cybersecurity: Literature review and future research directions," Inf. Fusion, vol. 97, p. 101804, Apr. 2023.

T. S. Oyinloye, M. O. Arowolo, and R. Prasad, "Enhancing cyber threat detection with an improved artificial neural network model," Data Sci. Manag., vol. 8, pp. 107–115, 2025.

G. Apruzzese, P. Laskov, E. Montes de Oca et al., "The role of machine learning in cybersecurity," Digital Threats: Res. Pract., vol. 4, no. 1, pp. 1–38, 2023.

N. Ahmed, A. Ngadi, J. M. Sharif et al., "Network threat detection using machine/deep learning in SDN-based platforms: A comprehensive analysis," Sensors, vol. 22, no. 20, p. 7896, 2022.

M. Ahsan, K. E. Nygard, R. Gomes et al., "Cybersecurity threats and their mitigation approaches using machine learning—A review," J. Cybersecur. Priv., vol. 2, no. 3, pp. 527–555, 2022.

J. Lee, J. Kim, I. Kim, and K. Han, "Cyber threat detection based on artificial neural networks using event profiles," IEEE Access, vol. 7, pp. 165607–165626, Nov. 2019.

A. Yazdinejad, M. Kazemi, R. M. Parizi et al., "An ensemble deep learning model for cyber threat hunting in industrial internet of things," Digit. Commun. Netw., vol. 9, no. 1, pp. 101–110, 2023.

K. Simran, P. Balakrishna, R. Vinayakumar et al., "Deep learning approach for enhanced cyber threat indicators in Twitter stream," in Proc. SSCC 2019, CCIS 1208, pp. 135–145, Springer, 2020.

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," in Proc. ICISSP, pp. 108–116, 2018. [Dataset: https://www.unb.ca/cic/datasets/ids-2017.html]

M. T. Ribeiro, S. Singh, and C. Guestrin, "'Why should I trust you?': Explaining the predictions of any classifier," in Proc. ACM SIGKDD, pp. 1135–1144, 2016.

M. F. A. Razak, N. B. Anuar, F. Othman et al., "Bio-inspired for features optimization and malware detection," Arab. J. Sci. Eng., vol. 43, pp. 6963–6979, Dec. 2018.

Q. Liu, P. Li, W. Zhao et al., "A survey on security threats and defensive techniques of machine learning: A data driven view," IEEE Access, vol. 6, pp. 12103–12117, Feb. 2018.

A. Basit, M. Zafar, X. Liu et al., "A comprehensive survey of AI-enabled phishing attacks detection techniques," Telecommun. Syst., vol. 76, pp. 139–154, Oct. 2020.

P. Ke, G. Meng, T. Finley et al., "LightGBM: A highly efficient gradient boosting decision tree," in Proc. NeurIPS, vol. 30, pp. 3146–3154, 2017.

T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. ACM SIGKDD, pp. 785–794, 2016.

L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

M. Pawlicki, M. Choras, R. Kozik et al., "On the impact of network data balancing in cybersecurity applications," in Proc. ICCS 2020, LNCS 12140, pp. 196–210, 2020.

An Enhanced Ensemble Machine Learning Framework with Explainable AI for Cyber Threat and Network Outlier Detection

Authors

Keywords:

Abstract

References

Downloads

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Our Indexing