Practical Variational Inference
Variational Inference approximates the true posterior with a diagonal Gaussian \(q_\theta(w)\). The objective is to minimize the variational free energy (Evidence Lower Bound, ELBO):
To optimize the error cost efficiently, one can use the Local Reparameterization Trick. Instead of sampling weights directly, which introduces high variance in gradients, LRT samples the pre-activations.
For a linear layer with inputs \(X\), weight means \(M\), and variances \(V\), the pre-activation \(\Gamma\) is distributed as:
We sample \(\zeta = XM^T + \varepsilon \odot \sqrt{X^2 V^T}\), where \(\varepsilon \sim \mathcal{N}(0, I)\), allowing for stable, low-variance backpropagation.
Alex Graves "Practical Variational Inference for Neural Networks" (2011)
Diederik P. Kingma, Tim Salimans, Max Welling "Variational Dropout and the Local Reparameterization Trick" (2015)