NES via Bayesian Sampling

To reduce the prohibitive computational cost of standard NES, one can use Neural Ensemble Search via Bayesian Sampling.

It utilizes training a Supernet with uniform path sampling to share weights across different model architectures. A variational posterior over architectures \(p_\alpha(\mathcal{A}) \approx p(\mathcal{A}|\mathcal{D})\) is learned via ELBO minimization.

Ensemble member architectures can then be sampled from the variational posterior using two methods:

Monte-Carlo Sampling: Simple random sampling from the posterior.
SVGD-RD: Stein Variational Gradient Descent with Regularized Diversity. This uses controlled optimization of the set of architectures with the following objective:

\[ q^* = \arg\min_{q\in\mathcal{Q}} \text{KL}(q\|p) + n\delta\mathbb{E}_{x, x' \sim q}[k(x, x')] \]

This repulsive force mathematically ensures that the sampled architectures are highly diverse.

Yao Shu et al. "Neural Ensemble Search via Bayesian Sampling" (2022)