Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification

dc.coverageDOI: 10.1016/j.eswa.2024.123149
dc.creatorVairetti, Carla
dc.creatorAssadi, José Luis
dc.creatorMaldonado, Sebastián
dc.date2024
dc.date.accessioned2026-01-05T21:05:50Z
dc.date.available2026-01-05T21:05:50Z
dc.description<p>Imbalanced classification is a well-known challenge faced by many real-world applications. This issue occurs when the distribution of the target variable is skewed, leading to a prediction bias toward the majority class. With the arrival of the Big Data era, there is a pressing need for efficient solutions to solve this problem. In this work, we present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework. Both procedures are performed on the same pass over the data, conferring efficiency to the technique. The SMOTENN method is complemented with an efficient implementation of the neighborhoods related to the minority samples. Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets while achieving positive results on large datasets with reduced running times.</p>eng
dc.identifierhttps://investigadores.uandes.cl/en/publications/ae7689ae-2984-44b3-9410-00a8f74a5fcf
dc.identifier.urihttps://repositorio.uandes.cl/handle/uandes/62145
dc.languageeng
dc.rightsinfo:eu-repo/semantics/restrictedAccess
dc.sourcevol.246 (2024) date: 2024-07-15 p.1-11
dc.subjectBig data
dc.subjectImbalanced classification
dc.subjectIntelligent undersampling
dc.subjectMapReduce
dc.subjectSMOTE
dc.titleEfficient hybrid oversampling and intelligent undersampling for imbalanced big data classificationeng
dc.typeArticleeng
dc.typeArtículospa
Files
Collections