References

This section lists academic references for the methods implemented in SToG.

Primary References

Stochastic Gating (STG)

[Yamada2020]

Yamada, Y., Lindenbaum, O., Negahban, S., & Kluger, Y. (2020). “Feature Selection using Stochastic Gates.” In International Conference on Machine Learning (ICML) (pp. 10648-10659). https://proceedings.mlr.press/v119/yamada20a/yamada20a.pdf

  • Original stochastic gating method

  • Gaussian-based continuous relaxation

  • Foundational work for the library

Straight-Through Estimator (STE)

[Bengio2013]

Bengio, Y., Léonard, N., & Courville, A. (2013). “Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.” arXiv preprint arXiv:1308.0850. https://arxiv.org/abs/1308.0850

  • Gradient approximation through discrete operations

  • Enables backpropagation through binarization

[Courbariaux2015]

Courbariaux, M., Bengio, Y., & David, J. P. (2015). “Binarized Neural Networks.” In Advances in Neural Information Processing Systems (NIPS) (pp. 3123-3131). https://arxiv.org/abs/1602.02830

  • Application of STE to binarized networks

  • Practical implementation details

Gumbel-Softmax

[Jang2017]

Jang, E., Gu, S., & Poole, B. (2017). “Categorical Reparameterization with Gumbel-Softmax.” In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1611.01144

  • Gumbel trick for categorical distribution

  • Temperature-annealed softmax

[Maddison2017]

Maddison, C. J., Hoffman, M. D., & Mnih, A. (2017). “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables.” In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1611.00712

  • Concrete (Gumbel-Softmax) distribution

  • Continuous relaxation theory

Correlated Features

[FeatureSelection2020]

Yuan, M., Lin, Y., & Bi, W. (2010). “Robust Estimation of Structured Sparsity.” In The Annals of Statistics, 38(6), 3242-3274. https://doi.org/10.1214/10-AOS815

  • Group sparsity for correlated features

  • Structured variable selection

[Jiang2017]

Jiang, B., Wang, S., & Zhu, S. (2017). “Data Poisoning Attacks Against Multi-Party Machine Learning.” In 35th International Conference on Machine Learning (ICML).

  • Handling correlated feature groups

Feature Selection Overview

[Guyon2003]

Guyon, I., & Elisseeff, A. (2003). “An introduction to variable and feature selection.” Journal of machine learning research, 3(Mar), 1157-1182. http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf

  • Comprehensive feature selection survey

  • Classical methods and evaluation

[Kohavi1997]

Kohavi, R., & John, G. H. (1997). “Wrappers for feature subset selection.” Artificial intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X

  • Wrapper methods for feature selection

  • Cross-validation approaches

Implementation References

PyTorch Documentation

https://pytorch.org/docs/stable/

NumPy Documentation

https://numpy.org/doc/stable/

scikit-learn Documentation

https://scikit-learn.org/stable/

Citing SToG

If you use SToG in your research, please cite:

@software{stog2025,
  title={SToG: Stochastic Gating for Feature Selection},
  author={Eynullayev, A. and Rubtsov, D. and Firsov, S. and Karpeev, G.},
  year={2025},
  url={https://github.com/intsystems/SToG}
}