======= trainer ======= Training Utilities ================== .. automodule:: SToG.trainer :members: :undoc-members: :show-inheritance: FeatureSelectionTrainer ======================= .. autoclass:: SToG.trainer.FeatureSelectionTrainer :members: :undoc-members: :show-inheritance: :special-members: __init__ Overview -------- The :class:`SToG.trainer.FeatureSelectionTrainer` handles joint optimization of a classification model and a feature selector. It implements: - **Two-optimizer approach** - Separate optimizers for model and selector - **Early stopping** - Validation-based stopping with configurable patience - **Gradient clipping** - Prevents gradient explosion - **History tracking** - Records metrics for analysis - **Model checkpointing** - Saves best model state Architecture ~~~~~~~~~~~~ .. code-block:: text Input Data │ ├─> Selector (Feature Gates) │ │ │ [Gate Parameters] │ └─> Model (Classifier) │ [Model Parameters] │ Output Logits │ Classification Loss + Regularization Loss │ ┌─────┴──────┐ │ │ Model Selector Optimizer Optimizer (lr=0.001) (lr=0.01) │ │ └─────┬───────┘ │ Update Parameters Joint Loss Function ~~~~~~~~~~~~~~~~~~~ The trainer optimizes: .. math:: \mathcal{L}_{\text{total}} = \mathcal{L}_{\text{task}}(\mathbf{f}, \mathbf{g}) + \lambda \Omega(\mathbf{g}) where: - :math:`\mathcal{L}_{\text{task}}` is the classification loss (CrossEntropyLoss) - :math:`\Omega(\mathbf{g})` is the regularization from selector - :math:`\lambda` controls sparsity-accuracy trade-off Two-Optimizer Strategy ~~~~~~~~~~~~~~~~~~~~~~ **Model Optimizer:** - Lower learning rate (default: 0.001) - Updates classification parameters :math:`\mathbf{f}` - Learns from task loss **Selector Optimizer:** - Higher learning rate (default: 0.01) - Updates gate parameters :math:`\mathbf{g}` - Learns from task + regularization loss - 10x higher learning rate enables faster adaptation Early Stopping ~~~~~~~~~~~~~~ Early stopping monitors