Hamish Burke | 2025-04-03
Related to: #bigData
Feature Selection
- Select relevant features
Terms
- Classical
- select m features where
- select m features where
- Idealised
- Find minimally sized feature subset that is necessary and sufficient
- Improve classification accuracy/reduce complexity
- Approximating the original class distribution
- Find minimally sized feature subset that is necessary and sufficient
- Feature Selection Bias
- Putting the test set through FS pipeline
- K-fold cross validation
- When doing FS, put training through the pipeline k different times (so your not doing the test data)
- Use the average test accuracy as the final performance
Single Feature Ranking
- Use an algorithm to measure the importance of each feature individually
Eg for decision trees:
- The frequency of the feature for splits can be showed to measure the importance of them
- Can plot number of features used vs accuracy
- Find the maximum number of features
Types
Filter Approach
- No learning algorithm during feature selection
- So very fast
Wrapper Approach
- Evaluation is a classifier
- Usually better performance than filter
Embedded Approach
- Build a tree
- Look at built tree, select the most used features
- Performance in-between filter and wrapper
Methods
Sequential Forward Selection (SFS)
SFS works best when the optimal subset is small
- Heuristic, Greedy search
- Nesting problem
- Start with empty set
- Select the next best feature
Sequential Backward Selection (SBS)
SBS is best when the optimal subset is large
- Start with all features selected
- Nesting problem
- Iterately remove the worst feature from feature subset
- Requires computing criterion for n-1 subset at the 1st iteration
Bi-directional Search (BDS)
- Does SFS and SBS at same time, should converge on same solution
- Features already selected by SFS are not removed by SBS
- Features already removed by SBS are not added by SFS
Plus-L, Minus-R Selection (LRS)
- Generalisation of SFS and SBS
- If L>R, LRS start from the empty set
- Then adds L features
- And removes R features
- If R>L, LRS start from the full set