Impact of Feature Engineering and Parameter Optimization on Machine Learning-Based Vulnerability Exploit Prediction

Deepanshu Sharma , Dr. Inderpal Singh Oberoi

Authors

Deepanshu Sharma , Dr. Inderpal Singh Oberoi

Keywords:

Vulnerability Exploit Prediction • Machine Learning • Feature Engineering • Hyperparameter Optimization • CVSS Metrics • Code Complexity • Bayesian Optimization • Grid Search • Exploit-Prone Vulnerabilities

Abstract

Machine learning (ML) has emerged as an effective approach for predicting exploit-prone software vulnerabilities. However, the quality of predictions heavily depends on the selection and transformation of features and the tuning of model parameters. This study investigates the impact of feature engineering and hyperparameter optimization on the performance of machine learning models predicting exploitability of software vulnerabilities. We define a comprehensive set of vulnerability parameters including CVSS metrics, code complexity indicators, and temporal attributes. Using public vulnerability datasets (e.g., NVD and Exploit-DB), we compare baseline models with enhanced models incorporating engineered features and optimized hyperparameters via Grid Search and Bayesian Optimization. Results show significant improvements in prediction metrics (accuracy, F1-score, AUC) when feature engineering and hyperparameter tuning are applied. Our findings highlight that careful engineering of vulnerability parameters and systematic parameter search substantially improve exploit prediction performance, offering practical insights for vulnerability management and security automation.

References

 Allodi, L. & Massacci, F. (2014). ‘Comparing vulnerability severity and exploitability using CVSS’, IEEE Security & Privacy, 12(1), pp. 52–60.

 Arora, A., Telang, R. & Xu, H. (2008). ‘Optimal policy for software vulnerability disclosure’, Management Science, 54(4), pp. 642–656.

 Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer, New York.

 Breiman, L. (2001). ‘Random forests’, Machine Learning, 45(1), pp. 5–32.

 Chowdhury, I. & Zulkernine, M. (2011). ‘Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities’, Journal of Systems Architecture, 57(3), pp. 294–313.

 Cover, T.M. & Hart, P.E. (1967). ‘Nearest neighbor pattern classification’, IEEE Transactions on Information Theory, 13(1), pp. 21–27.

 CVSS Special Interest Group (2015). Common Vulnerability Scoring System v3.0 Specification. FIRST.

 Feurer, M., Klein, A. & Hutter, F. (2015). ‘Efficient hyperparameter optimization of machine learning algorithms’, Advances in Neural Information Processing Systems, 28, pp. 2962–2970.

 Fenton, N. & Neil, M. (1999). ‘A critique of software defect prediction models’, IEEE Transactions on Software Engineering, 25(5), pp. 675–689.

 Friedman, J.H. (2001). ‘Greedy function approximation: A gradient boosting machine’, Annals of Statistics, 29(5), pp. 1189–1232.

 Giger, E., Pinzger, M. & Gall, H. (2012). ‘Predicting the fix time of bugs’, Proceedings of MSR, IEEE, pp. 52–56.

 Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning. Springer, New York.

 Houmb, S.H., Franqueira, V.N.L. & Engum, E.A. (2010). ‘Estimating software security risk’, Information and Software Technology, 52(6), pp. 589–599.

 Jansen, N. (2016). ‘Vulnerability prioritization based on exploit likelihood’, Computers & Security, 62, pp. 278–292.

 Joachims, T. (1998). ‘Text categorization with support vector machines’, Proceedings of ECML, Springer, pp. 137–142.

 Khoshgoftaar, T.M. & Allen, E.B. (2003). ‘Logistic regression modeling of software quality’, International Journal of Reliability, Quality and Safety Engineering, 10(4), pp. 435–448.

 Li, Z., Tan, K.L. & Li, Y. (2016). ‘Predicting vulnerability exploitability using machine learning’, IEEE International Conference on Software Quality, pp. 1–10.

 Mell, P., Scarfone, K. & Romanosky, S. (2007). ‘A complete guide to the Common Vulnerability Scoring System’, Forum of Incident Response and Security Teams.

 Neuhaus, S. & Zimmermann, T. (2010). ‘Security trend analysis with CVE topic models’, IEEE Symposium on Security and Privacy, pp. 111–125.

 Ozment, A. (2007). ‘Improving vulnerability discovery models’, Proceedings of ACM CCS, pp. 327–338.

 Provost, F. & Fawcett, T. (2013). Data Science for Business. O’Reilly Media.

 Rescorla, E. (2005). ‘Is finding security holes a good idea?’, IEEE Security & Privacy, 3(1), pp. 14–19.

 Sabottke, C., Suciu, O. & Dumitraș, T. (2015). ‘Vulnerability disclosure in the age of social media’, USENIX Security Symposium, pp. 1041–1056.

 Scikit-learn Developers (2011). ‘Scikit-learn: Machine learning in Python’, Journal of Machine Learning Research, 12, pp. 2825–2830.

 Shin, Y., Meneely, A., Williams, L. & Osborne, J.A. (2011). ‘Evaluating complexity, code churn, and developer activity metrics’, IEEE Transactions on Software Engineering, 37(6), pp. 772–787.

 Sommer, R. & Paxson, V. (2010). ‘Outside the closed world’, IEEE Symposium on Security and Privacy, pp. 305–316.

 Sutton, R.S. & Barto, A.G. (1998). Reinforcement Learning: An Introduction. MIT Press.

 Tsipenyuk, K., Chess, B. & McGraw, G. (2005). ‘Seven pernicious kingdoms’, IEEE Security & Privacy, 3(6), pp. 81–84.

 Verendel, V. (2009). ‘Quantified security is a weak hypothesis’, Proceedings of NSPW, pp. 37–49.

 Zhang, H., Gong, L. & Tan, K. (2011). ‘Measuring software security defects using complexity metrics’, Journal of Systems and Software, 84(9), pp. 1608–1620.

Impact of Feature Engineering and Parameter Optimization on Machine Learning-Based Vulnerability Exploit Prediction

Authors

Keywords:

Abstract

References

Downloads

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Our Indexing