Imbalanced Data NearMiss for Comparison of SVM and Naive Bayes Algorithms

Wawan  Gunawan; Yudo Devianto; Anggi Puspita Sari

Authors

Wawan Gunawan Universitas Mercu Buana
Yudo Devianto Universitas Mercu Buana
Anggi Puspita Sari Universitas Bina Sarana Informatika

Keywords:

HIV, Imbalance, K-Fold, Klasifikasi, ODHA

Abstract

The study aims to improve the diagnosis, management, and prevention of HIV/AIDS by using classification algorithms. The dataset used consists of 707,379 records and 89 columns. Data preprocessing includes removing irrelevant attributes, handling inconsistencies, and balancing the data using the NearMiss method, resulting in a balanced proportion of reactive and non-reactive HIV cases. Once the data is balanced, it is split into several ratios: 60:40, 70:30, 80:20, and 90:10. The classification models used in this study are Naive Bayes and SVM. The models are evaluated using the metrics Accuracy, Precision, Recall, and F1-Score. The results show that the SVM model achieves the highest accuracy of 82.6% with a 90:10 data split at a 6-fold value, and 82.2% with a 60:40 data split at a 5-fold value. On the other hand, Naive Bayes achieves the highest accuracy of 61.1% with a 60:40 data split.

Imbalanced Data NearMiss for Comparison of SVM and Naive Bayes Algorithms

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Call for Papers

Journal Template

Accredited

Journal Policies

Information