References
This section lists academic references for the methods implemented in SToG.
Primary References
Stochastic Gating (STG)
Yamada, Y., Lindenbaum, O., Negahban, S., & Kluger, Y. (2020). “Feature Selection using Stochastic Gates.” In International Conference on Machine Learning (ICML) (pp. 10648-10659). https://proceedings.mlr.press/v119/yamada20a/yamada20a.pdf
Original stochastic gating method
Gaussian-based continuous relaxation
Foundational work for the library
Straight-Through Estimator (STE)
Bengio, Y., Léonard, N., & Courville, A. (2013). “Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.” arXiv preprint arXiv:1308.0850. https://arxiv.org/abs/1308.0850
Gradient approximation through discrete operations
Enables backpropagation through binarization
Courbariaux, M., Bengio, Y., & David, J. P. (2015). “Binarized Neural Networks.” In Advances in Neural Information Processing Systems (NIPS) (pp. 3123-3131). https://arxiv.org/abs/1602.02830
Application of STE to binarized networks
Practical implementation details
Gumbel-Softmax
Jang, E., Gu, S., & Poole, B. (2017). “Categorical Reparameterization with Gumbel-Softmax.” In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1611.01144
Gumbel trick for categorical distribution
Temperature-annealed softmax
Maddison, C. J., Hoffman, M. D., & Mnih, A. (2017). “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables.” In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1611.00712
Concrete (Gumbel-Softmax) distribution
Continuous relaxation theory
Feature Selection Overview
Guyon, I., & Elisseeff, A. (2003). “An introduction to variable and feature selection.” Journal of machine learning research, 3(Mar), 1157-1182. http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf
Comprehensive feature selection survey
Classical methods and evaluation
Kohavi, R., & John, G. H. (1997). “Wrappers for feature subset selection.” Artificial intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
Wrapper methods for feature selection
Cross-validation approaches
Implementation References
- PyTorch Documentation
- NumPy Documentation
- scikit-learn Documentation
Citing SToG
If you use SToG in your research, please cite:
@software{stog2025,
title={SToG: Stochastic Gating for Feature Selection},
author={Eynullayev, A. and Rubtsov, D. and Firsov, S. and Karpeev, G.},
year={2025},
url={https://github.com/intsystems/SToG}
}