FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification

dc.coverageDOI: 10.1016/j.patcog.2021.108511
dc.creatorMaldonado, Sebastián
dc.creatorVairetti, Carla
dc.creatorFernandez, Alberto
dc.creatorHerrera, Francisco
dc.date2022
dc.date.accessioned2026-01-05T21:20:16Z
dc.date.available2026-01-05T21:20:16Z
dc.description<p>The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.</p>eng
dc.descriptionThe Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.spa
dc.identifierhttps://investigadores.uandes.cl/en/publications/68e80212-3b69-4c3b-b8c2-7f7d8e6eeb7e
dc.identifier.urihttps://repositorio.uandes.cl/handle/uandes/68909
dc.languageeng
dc.rightsinfo:eu-repo/semantics/openAccess
dc.sourcevol.124 (2022)
dc.subjectData resampling
dc.subjectFeature selection
dc.subjectImbalanced data classification
dc.subjectOWA Operators
dc.subjectSMOTE
dc.subjectData resampling
dc.subjectFeature selection
dc.subjectImbalanced data classification
dc.subjectOWA Operators
dc.subjectSMOTE
dc.titleFW-SMOTE: A feature-weighted oversampling approach for imbalanced classificationeng
dc.typeArticleeng
dc.typeArtículospa
Files
Collections