Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN

Main Article Content

Irfan Pratama
Albert Yakobus Chandra
Putri Taqwa Presetyaningrum

Abstract

Proses data mining bekerja terhadap data yang tersedia. Jika dataset tidak tersedia sepenuhnya, hasil pengolahan data mining menjadi tidak optimal. Terdapat beberapa kondisi data yang perlu penanganan terlebih dahulu sebelum memasuki tahap data mining. Salah satunya ialah imbalanced class yang merupakan kondisi di mana distribusi data pada setiap kelas tidak proporsional. Sebagai salah satu cara untuk efisiensi proses klasifikasi, seleksi fitur dapat memenuhi kebutuhan tersebut karena hasil dari seleksi fitur adalah sebuah dataset dengan jumlah atribut yang lebih sedikit dari sebelumnya. Untuk menyelesaikan permasalahan imbalanced class, ADASYN digunakan dalam penelitian ini sebagai metode untuk menyeimbangkan proporsi kelas pada dataset. Sedangkan RFECV digunakan sebagai metode fitur seleksi yang dapat meningkatkan efisiensi pada proses klasifikasi. Setelah dilakukan evaluasi dari hasil klasifikasi pada dataset yang menggunakan seleksi fitur, didapatkan hasil klasifikasi yang lebih baik dibandingkan dengan hasil klasifikasi pada dataset tanpa seleksi fitur. Hal tersebut dibuktikan dengan perbandingan antara hasil terbaik dari akurasi klasifikasi dataset tanpa seleksi fitur. Hasil dari metode CART sebesar 85.1% yang merupakan hasil dari pengolahan data tanpa menggunakan metode fitur seleksi. sedangkan metode Bagging k-NN yang menghasilkan akurasi sebesar 88% yang di aplikasikan pada dataset dengan seleksi fitur. Sehingga dapat disimpulkan bahwa seleksi fitur dapat meningkatkan akurasi pada klasifikasi.

Downloads

Download data is not yet available.

Article Details

How to Cite
Pratama, I., Chandra, A., & Presetyaningrum, P. (2022). Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN. Jurnal Eksplora Informatika, 11(1), 38-49. https://doi.org/10.30864/eksplora.v11i1.578
Section
Articles

References

H. Jiawei and M. Kamber, Data mining: concepts and techniques. 2001.

V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.

H. Kuswanto and A. Naufal, “Evaluation of performance of drought prediction in Indonesia based on TRMM and MERRA-2 using machine learning methods,” MethodsX, vol. 6, no. March, pp. 1238–1251, 2019, doi: 10.1016/j.mex.2019.05.029.

Y. Zhu, C. Jia, F. Li, and J. Song, “Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling,” Anal. Biochem., vol. 593, no. January, p. 113592, 2020, doi: 10.1016/j.ab.2020.113592.

B. S. Raghuwanshi and S. Shukla, “SMOTE based class-specific extreme learning machine for imbalanced learning,” Knowledge-Based Syst., vol. 187, p. 104814, 2020, doi: 10.1016/j.knosys.2019.06.022.

A. Aditsania, Adiwijaya, and A. L. Saonard, “Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm,” Proceeding - 2017 3rd Int. Conf. Sci. Inf. Technol. Theory Appl. IT Educ. Ind. Soc. Big Data Era, ICSITech 2017, vol. 2018-January, pp. 533–536, 2017, doi: 10.1109/ICSITech.2017.8257170.

M. Kuhn and K. Johnson, Applied Predictive Modeling with Applications in R, vol. 26. 2013.

M. Artur, “Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,” Procedia Comput. Sci., vol. 190, no. 2019, pp. 564–570, 2021, doi: 10.1016/j.procs.2021.06.066.

A. Bahl et al., “Recursive feature elimination in random forest classification supports nanomaterial grouping,” NanoImpact, vol. 15, no. June, p. 100179, 2019, doi: 10.1016/j.impact.2019.100179.

D. J. Stekhoven and P. Bühlmann, “Missforest-Non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, 2012, doi: 10.1093/bioinformatics/btr597.

F. Tang and H. Ishwaran, “Random forest missing data algorithms,” Stat. Anal. Data Min., vol. 10, no. 6, pp. 363–377, 2017, doi: 10.1002/sam.11348.

T. E. Tallo and A. Musdholifah, “The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem,” in 2018 4th International Conference on Science and Technology (ICST), 2018, pp. 1–4.

Y. Pristyanto, I. Pratama, and A. F. Nugraha, “Data level approach for imbalanced class handling on educational data mining multiclass classification,” in 2018 International Conference on Information and Communications Technology, ICOIACT 2018, 2018, vol. 2018-Janua, doi: 10.1109/ICOIACT.2018.8350792.

S. Gavankar and S. Sawarkar, “Decision Tree: Review of Techniques for Missing Values at Training, Testing and Compatibility,” in 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), 2015, pp. 122–126, doi: 10.1109/AIMS.2015.29.

L. Breiman, “Random Forests,” Int. J. Adv. Comput. Sci. Appl., 2001.

R. Berwick, “An Idiot’s Guide to Support vector machines (SVMs): A New Generation of Learning Algorithms Key Ideas,” pp. 1–28, 2003.

L. Mohan, J. Pant, P. Suyal, and A. Kumar, “Support Vector Machine Accuracy Improvement with Classification,” Proc. - 2020 12th Int. Conf. Comput. Intell. Commun. Networks, CICN 2020, pp. 477–481, 2020, doi: 10.1109/CICN49253.2020.9242572.

M. R. Romadhon and F. Kurniawan, “A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia,” 3rd 2021 East Indones. Conf. Comput. Inf. Technol. EIConCIT 2021, pp. 41–44, 2021, doi: 10.1109/EIConCIT50028.2021.9431845.

S. Ahmed, A. Mahbub, F. Rayhan, R. Jani, S. Shatabda, and D. M. Farid, “Hybrid Methods for Class Imbalance Learning Employing Bagging with Sampling Techniques,” 2nd Int. Conf. Comput. Syst. Inf. Technol. Sustain. Solut. CSITSS 2017, pp. 1–5, 2018, doi: 10.1109/CSITSS.2017.8447799.

L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996, doi: 10.1007/BF00058655.

L. Rokach and O. Maimon, Data Mining With Decision Trees: Theory and Applications, 2nd ed. River Edge, NJ, USA: World Scientific Publishing Co., Inc., 2014.

D. Soni, “Introduction to Naive Bayes Classification,” 2018. [Online]. Available: https://towardsdatascience.com/introduction-to-naive-bayes-classification-4cffabb1ae54. [Accessed: 06-May-2019].

C. Tang, P. Xu, Z. Luo, G. Zhao, and T. Zou, “Automatic facial expression analysis of students in teaching environments,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9428, pp. 439–447, 2015, doi: 10.1007/978-3-319-25417-3_52.

K. Leartpantulak and Y. Kitjaidure, “Music genre classification of audio signals using particle swarm optimization and stacking ensemble,” iEECON 2019 - 7th Int. Electr. Eng. Congr. Proc., pp. 1–4, 2019, doi: 10.1109/iEECON45304.2019.8938995.

J. Ling and G. Li, “A two-level stacking model for detecting abnormal users in Wechat activities,” Proc. - 2019 Int. Conf. Inf. Technol. Comput. Appl. ITCA 2019, pp. 229–232, 2019, doi: 10.1109/ITCA49981.2019.00057.