Probabilistic Metrics
Metrics designed to evaluate the quality of uncertainty and calibration.
bensemble.metrics
brier_score
Computes the Brier Score for multi-class classification.
The Brier Score is the mean squared difference between the predicted probability distribution and the one-hot encoded true label. Lower is better.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs | Tensor | Predicted probabilities of shape [Batch, Num_classes]. | required |
targets | Tensor | Ground truth class indices of shape [Batch]. | required |
Returns:
| Name | Type | Description |
|---|---|---|
float | float | The Brier score. |
Source code in bensemble/metrics.py
expected_calibration_error
Computes the Expected Calibration Error (ECE).
Divides the confidence space into n_bins and measures the weighted absolute difference between the model's accuracy and confidence in each bin. Lower is better (0.0 means perfectly calibrated).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs | Tensor | Predicted probabilities of shape [Batch, Num_classes]. | required |
targets | Tensor | Ground truth class indices of shape [Batch]. | required |
n_bins | int | Number of bins. Defaults to 15. | 15 |
Returns:
| Name | Type | Description |
|---|---|---|
float | float | The ECE score. |
Source code in bensemble/metrics.py
negative_log_likelihood
Computes the Negative Log-Likelihood (NLL) for predicted probabilities.
This is a strictly proper scoring rule. Lower is better.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs | Tensor | Predicted probabilities of shape [Batch, Num_classes]. | required |
targets | Tensor | Ground truth class indices of shape[Batch]. | required |
eps | float | Small value to prevent log(0). Defaults to 1e-8. | 1e-08 |
Returns:
| Name | Type | Description |
|---|---|---|
float | float | The average NLL over the batch. |
Source code in bensemble/metrics.py
reliability_diagram
Computes data points needed to plot a Reliability Diagram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs | Tensor | Predicted probabilities of shape [Batch, Num_classes]. | required |
targets | Tensor | Ground truth class indices of shape [Batch]. | required |
n_bins | int | Number of bins. Defaults to 15. | 15 |
Returns:
| Type | Description |
|---|---|
Dict[str, list] | Dict[str, list]: A dictionary containing lists of: - 'confidences': Average confidence in each bin. - 'accuracies': Average accuracy in each bin. - 'proportions': Fraction of samples in each bin. |