Usage

You can use our library to tune almost all (see below) hyperparameters in your own code. The HyperOptimizer interface is very similar to Optimizer from PyTorch.

It supports key functionalities:

  1. step to do an optimization step over parameters (or hyperparameters, see below)

  2. zero_grad to zero out the parameters gradients (same as optimizer.zero_grad())

We provide demo experiments with each implemented method in this notebook. They works as follows:

  1. Get next batch from train dataloader

  2. Forward and backward on calculated loss

  3. hyper_optimizer.step(loss) do model parameters step and (if inner steps were accumulated) hyperparameters step (calculate hypergradients, do the optimization step, zeroes hypergradients)

  4. hyper_optimizer.zero_grad() zeroes the model parameters gradients (same as optimizer.zero_grad())

Optimizer vs. HyperOptimizer method step

Gradient-based hyperparameters optimization involves hyper-optimization steps during the model parameters optimization. Thus, we combine Optimizer method step with inner_steps, defined by each method.

For example, T1T2 do NOT use any inner steps, therefore optimization over parameters and hyperparameters is done step by step. But Neumann method do some inner optimization steps over model parameters before it do the hyperstep.

See more details here.

Supported hyperparameters types

The HyperOptimizer logic is well-suited for almost all CONTINUOUS (required for gradient-based methods) hyperparameters types:

  1. Model hyperparameters (e.g., gate coefficients)

  2. Loss hyperparameters (e.g., L1/L2-regularization)

However, it currently does not support (or support, but actually was not sufficiently tested) learning rate tuning. We plan to improve our functionality in future releases, stay tuned!