decent_bench.metrics.metric_library#

Collection of pre-defined table and plot metrics.

class decent_bench.metrics.metric_library.Regret(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Global regret.

Table:: Global regret using the agents’/clients’ final x.
Plot:: Global regret (y-axis) per iteration (x-axis).

Global regret is defined as:

\[\frac{1}{N} \sum_i (f_i(\mathbf{\bar{x}}) - f_i(\mathbf{x}^\star))\]

where \(f_i\) is agent i’s local cost function, \(\mathbf{\bar{x}}\) is the mean x across all \(N\) agents, and \(\mathbf{x}^\star\) is the optimal x defined in the problem.

Note

Available only when problem.x_optimal is provided.

description: str = 'regret'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.GradientNorm(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Global gradient norm.

Table:: Gradient norm using the agents’/clients’ final x.
Plot:: Gradient norm (y-axis) per iteration (x-axis).

Gradient norm is defined as:

\[\| \frac{1}{N} \sum_i \nabla f_i(\mathbf{\bar{x}}) \|\]

where N is the number of agents, \(f_i\) is agent i’s local cost function, and \(\mathbf{\bar{x}}\) is the mean x across all agents.

description: str = 'gradient norm'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.XError(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Distance to optimal solution.

Table:: Distance to optimal solution using the mean of the agents’/clients’ final x.
Plot:: Distance to optimal solution (y-axis) per iteration (x-axis).

X error is defined as:

\[\|\mathbf{\bar{x}} - \mathbf{x}^\star\|\]

where \(\mathbf{\bar{x}}\) is the mean x across all agents/clients, and \(\mathbf{x}^\star\) is the optimal x defined in the problem.

Note

Available only when problem.x_optimal is provided.

description: str = 'x error'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ConsensusError(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Distance to consensus.

Table:: Distance of the agents’/clients’ states from their current average.
Plot:: Distance to consensus (y-axis) per iteration (x-axis).

The consensus error per agent/client is defined as:

\[\{ \|\mathbf{x}_i - \bar{\mathbf{x}}\|, \|\mathbf{x}_j - \bar{\mathbf{x}}\|, ... \}\]

where \(\mathbf{x}_i\) is agent/client i’s current state, \(\bar{\mathbf{x}}\) is the average of all agents’/clients’ states, and \(\| \cdot \|\) is the 2-norm.

See also

RuntimeConsensusError for the runtime version.

description: str = 'consensus error'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.XUpdates(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of x iterations/updates.

Table:: Number of x iterations/updates per agent.
Plot:: Number of x iterations/updates (y-axis) per iteration (x-axis). Will be a flat line as the number of x iterations/updates is only calculated at the end of the trial, not per iteration.

description: str = 'nr x updates'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[int][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.FunctionCalls(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of function calls.

Table:: Number of function calls per agent.
Plot:: Number of function calls (y-axis) per iteration (x-axis). Will be a flat line as the number of function calls is only calculated at the end of the trial, not per iteration.

Note

Can be a floating point number if EmpiricalRiskCost is used and a batch size other than the full dataset size is used.

description: str = 'nr function calls'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.GradientCalls(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of gradient calls.

Table:: Number of gradient calls per agent.
Plot:: Number of gradient calls (y-axis) per iteration (x-axis). Will be a flat line as the number of gradient calls is only calculated at the end of the trial, not per iteration.

Note

Can be a floating point number if EmpiricalRiskCost is used and a batch size other than the full dataset size is used.

description: str = 'nr gradient calls'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.HessianCalls(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of Hessian calls.

Table:: Number of Hessian calls per agent.
Plot:: Number of Hessian calls (y-axis) per iteration (x-axis). Will be a flat line as the number of Hessian calls is only calculated at the end of the trial, not per iteration.

Note

Can be a floating point number if EmpiricalRiskCost is used and a batch size other than the full dataset size is used.

description: str = 'nr Hessian calls'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ProximalCalls(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of proximal calls.

Table:: Number of proximal calls per agent.
Plot:: Number of proximal calls (y-axis) per iteration (x-axis). Will be a flat line as the number of proximal calls is only calculated at the end of the trial, not per iteration.

description: str = 'nr proximal calls'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.SentMessages(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of sent messages.

Table:: Number of sent messages per agent. For federated networks, this includes the server.
Plot:: Number of sent messages (y-axis) per iteration (x-axis). Will be a flat line as the number of sent messages is calculated at the end of the trial, not per iteration.

description: str = 'nr sent messages'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ReceivedMessages(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of received messages.

Table:: Number of received messages per agent. For federated networks, this includes the server.
Plot:: Number of received messages (y-axis) per iteration (x-axis). Will be a flat line as the number of received messages are calculated at the end of the trial, not per iteration.

description: str = 'nr received messages'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.SentMessagesDropped(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Number of sent messages dropped.

Table:: Number of sent messages dropped per agent. For federated networks, this includes the server.
Plot:: Number of sent messages dropped (y-axis) per iteration (x-axis). Will be a flat line as the number of sent messages dropped is calculated at the end of the trial, not per iteration.

description: str = 'nr sent messages dropped'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.Accuracy(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Accuracy of the agents’/clients’ predictions.

Table:

Accuracy of the agents’/clients’ final x.

Plot:

Accuracy (y-axis) per iteration (x-axis).

Accuracy is calculated as the mean accuracy across agents/clients, where each agent’s/client’s accuracy is calculated using its recorded x at that iteration.

Only available for EmpiricalRiskCost and integer targets.

Accuracy measures the proportion of correct predictions:

\[\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}\]

where TP, TN, FP, and FN are true positives, true negatives, false positives, and false negatives, respectively.

Note

Available only when:

problem.test_data is provided,
all agent costs are EmpiricalRiskCost,
target labels are integer-valued.

description: str = 'accuracy'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.MSE(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Mean squared error of the agents’/clients’ predictions.

Table:

Mean squared error of the agents’/clients’ final x.

Plot:

Mean Squared Error (MSE) (y-axis) per iteration (x-axis).

MSE is calculated as the mean MSE across agents/clients, where each agent’s/client’s MSE is calculated using its recorded x at that iteration.

Only available for EmpiricalRiskCost.

MSE measures the average squared difference between predictions and true values:

\[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2\]

where \(\hat{y}_i\) are the predicted values, \(y_i\) are the true values, and \(n\) is the number of samples.

Note

Available only when problem.test_data is provided and all agent costs are EmpiricalRiskCost.

description: str = 'mse'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.Precision(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Precision of the agents’/clients’ predictions.

Table:

Precision of the agents’/clients’ final x.

Plot:

Precision (y-axis) per iteration (x-axis).

Precision is calculated as the mean precision across agents/clients, where each agent’s/client’s precision is calculated using its recorded x at that iteration.

Only available for EmpiricalRiskCost and integer targets.

Precision measures the proportion of positive predictions that are correct:

\[\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}\]

where TP is the number of true positives and FP is the number of false positives.

Note

Available only when:

problem.test_data is provided,
all agent costs are EmpiricalRiskCost,
target labels are integer-valued.

description: str = 'precision'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.Recall(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Recall of the agents’/clients’ predictions.

Table:

Recall of the agents’/clients’ final x.

Plot:

Recall (y-axis) per iteration (x-axis).

Recall is calculated as the mean recall across agents/clients, where each agent’s/client’s recall is calculated using its recorded x at that iteration.

Only available for EmpiricalRiskCost and integer targets.

Recall measures the proportion of actual positives that are correctly identified:

\[\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}\]

where TP is the number of true positives and FN is the number of false negatives.

Note

Available only when:

problem.test_data is provided,
all agent costs are EmpiricalRiskCost,
target labels are integer-valued.

description: str = 'recall'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.Loss(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Loss of the agents’/clients’ predictions.

Table:

Loss of the agents’/clients’ final x.

Plot:

Loss (y-axis) per iteration (x-axis).

Loss is calculated as the mean loss across agents/clients, where each agent’s/client’s loss is calculated using its recorded x at that iteration.

description: str = 'loss'#

compute(network: NetworkMetricsView, _: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ClientDriftFromServer(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Distance between client local models and the server model.

Table:: Distance of the clients’ final states from the final server state.
Plot:: Client drift from server (y-axis) per iteration (x-axis).

The client drift per client is defined as:

\[\{ \|\mathbf{x}_i - \mathbf{x}_s\|, \|\mathbf{x}_j - \mathbf{x}_s\|, ... \}\]

where \(\mathbf{x}_s\) is the current server state.

Note

Available only for FedNetwork.

description: str = 'client drift from server'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.FractionSelectedClients(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Fraction of clients selected by the federated algorithm to perform local training.

Table:: Fraction of selected clients over the algorithm run.

Note

Available only for FedNetwork.

description: str = 'fraction selected clients'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, _: BenchmarkProblem, __: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ServerMSE(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Mean squared error of the server model’s predictions.

Table:: Mean squared error of the final server x.
Plot:: Server MSE (y-axis) per iteration (x-axis).

Note

Available only for FedNetwork with problem.test_data and empirical-risk client costs.

description: str = 'server mse'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values

class decent_bench.metrics.metric_library.ServerAccuracy(fmt: str = '.2e', x_log: bool = False, y_log: bool = True)[source]#

Bases: Metric

Accuracy of the server model’s predictions.

Table:: Accuracy of the final server x.
Plot:: Server accuracy (y-axis) per iteration (x-axis).

Note

Available only for FedNetwork with problem.test_data, empirical-risk client costs, and integer-valued targets.

description: str = 'server accuracy'#

is_available(problem: BenchmarkProblem) → tuple[bool, str | None][source]#

Check whether this metric can be computed for the given problem.

Override in subclasses that have availability preconditions (e.g. requiring problem.x_optimal or problem.test_data). The default implementation always returns available.

Parameters:: problem – the benchmark problem being evaluated
Returns:: A tuple (available, reason). When available is True, reason is None. When available is False, reason contains a human-readable explanation.

compute(network: NetworkMetricsView, problem: BenchmarkProblem, iteration: int) → list[float][source]#

Evaluate the metric on the results of a trial.

Parameters:

network – the snapshotted network view being evaluated.
problem – the benchmark problem being evaluated
iteration – the iteration at which to compute the metric, or -1 to use the agents’ final x

Returns:

a sequence of metric values