main

Main Execution Script

Main execution script for benchmarking.

SToG.main.main()[source]: Main execution function.

Overview

The SToG.main module provides the entry point for running complete feature selection benchmarks with all implemented methods.

Main Function

SToG.main.main()[source]: Main execution function.

The SToG.main.main() function:

Loads all available benchmark datasets
Initializes the comprehensive benchmark
Runs feature selection with all methods
Compares results with scikit-learn L1 regularization
Prints summary statistics

Running from Command Line

Execute the entire benchmarking pipeline:

python -m SToG.main

Or from Python:

from SToG.main import main
main()

What Gets Executed

Load Datasets - Breast Cancer (UCI dataset) - Wine (UCI dataset) - Synthetic High-Dimensional (MADELON-like) - Synthetic Correlated (custom)
Run Benchmarks - Tests all 5 methods (STG, STE, Gumbel, CorrelatedSTG, L1) - Searches optimal lambda for each method - Reports accuracy and sparsity
Compare with Scikit-learn - Tests LogisticRegression with L1 penalty - Compares feature selection results
Print Summary - Tabular results for each dataset - Method rankings by accuracy - Recommendations for different use cases

Output Example

======================================================================
                     Breast Cancer Dataset
======================================================================

STG        | Accuracy: 95.67% | Selected:  8/30 (73.3% sparse)
STE        | Accuracy: 95.08% | Selected: 10/30 (66.7% sparse)
Gumbel     | Accuracy: 96.04% | Selected:  9/30 (70.0% sparse)
Correlated | Accuracy: 96.04% | Selected:  9/30 (70.0% sparse)
L1         | Accuracy: 94.29% | Selected: 12/30 (60.0% sparse)
sklearn L1 | Accuracy: 94.50% | Selected: 14/30 (53.3% sparse)

======================================================================
                 Synthetic High-Dimensional Dataset
======================================================================

STG        | Accuracy: 98.33% | Selected:  7/100 (93.0% sparse)
STE        | Accuracy: 97.50% | Selected:  8/100 (92.0% sparse)
Gumbel     | Accuracy: 98.33% | Selected:  7/100 (93.0% sparse)
Correlated | Accuracy: 98.33% | Selected:  6/100 (94.0% sparse)
L1         | Accuracy: 96.67% | Selected: 12/100 (88.0% sparse)
sklearn L1 | Accuracy: 96.00% | Selected: 15/100 (85.0% sparse)

Customizing Benchmarks

To modify benchmarking behavior, edit or extend the main function:

from SToG import ComprehensiveBenchmark, DatasetLoader

def custom_benchmark():
    """Custom benchmarking with specific settings."""
    loader = DatasetLoader()

    # Select specific datasets
    datasets = [
        loader.create_synthetic_high_dim(),
        loader.create_synthetic_correlated(),
    ]

    # Run with GPU if available
    import torch
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    benchmark = ComprehensiveBenchmark(device=device)
    benchmark.run_benchmark(datasets)

if __name__ == '__main__':
    custom_benchmark()

Performance Tips

Use GPU for faster training: device='cuda'
Reduce epochs for quick testing: modify trainer defaults
Subset datasets for quick validation: load only 1-2 datasets
Parallel processing would require modifying benchmark class