Abstract
Creating an effective classifier in the presence of imbalanced data is a challenging task. The objective of this work was to apply machine learning technique to automatically identify review articles given the imbalance representation of publications types in publications. As a contribution in that direction; we develop a hybrid ensemble algorithm, called Balanced MultiBoost (BMB). The presented algorithm provides an efficient alternative to existing algorithms, by combines the strengths of Multiboost ensemble with the sampling technique. In order to demonstrate the effectiveness of BMB, we compared its performance with five existing algorithms, based on established metrics, precision, recall, F1-measure and AUC-ROC. For the comparison, we used two customized datasets extracted from Medline citations database. These datasets contain 19,299 examples for 2005 and 19,200 examples for 2006 with imbalance ratio 1:6 and 1:7, respectively. The results show, BMB is a powerful ensemble solution for identifying minority examples in a text corpus.
Authors
-
Dr. Ghulam Mustafa
- Assistant Professor, Department of Information Technology, University of the Punjab, Gujranwala Campus, Punjab, Pakistan
-
Dr. Naveed Jhamat
- Assistant Professor, Department of Information Technology, University of the Punjab, Gujranwala Campus, Punjab, Pakistan
-
Dr. Khurram Shahzad
- Assistant Professor, Punjab University College of Information Technology, University of the Punjab, Lahore, Punjab, Pakistan
Keywords
Class Imbalance Learning, Ensemble Learning, Expert System, Machine Learning Medline Abstract Classification, Performance Evaluation
DOI Number
10.35484/pssr.2021(5-I)38
Page Nos
491-504
Volume & Issue
v5-1