decent_bench.benchmark#

decent_bench.benchmark.benchmark(algorithms: list[Algorithm[Network]], benchmark_problem: BenchmarkProblem, *, n_trials: int = 30, max_processes: int | None = 1, progress_step: int | None = 100, show_speed: bool = False, show_trial: bool = False, checkpoint_manager: CheckpointManager | None = None, runtime_metrics: list[RuntimeMetric] | None = None, log_level: int = logging.INFO) → BenchmarkResult[source]#

Benchmark decentralized algorithms.

Parameters:

algorithms – algorithms to benchmark
benchmark_problem – problem to benchmark on, defines the network topology, cost functions, and communication constraints.
n_trials – number of times to run each algorithm on the benchmark problem, running more trials improves the statistical results, at least 30 trials are recommended for the central limit theorem to apply.
max_processes – maximum number of processes to use when running trials, multiprocessing improves performance but can be inhibiting when debugging or using a profiler, set to 1 to disable multiprocessing or None to use ProcessPoolExecutor’s default. If your algorithm is very lightweight you may want to set this to 1 to avoid the multiprocessing overhead.
progress_step – if provided, the progress bar will step every progress_step iterations. When provided, each algorithm’s task total becomes n_trials * ceil(algorithm.iterations / progress_step). If None, the progress bar uses 1 unit per trial.
show_speed – whether to show speed (iterations/second) in the progress bar.
show_trial – whether to show which trials are currently running in the progress bar.
checkpoint_manager – if provided, will be used to save checkpoints during execution.
runtime_metrics – optional list of RuntimeMetric to compute and plot during algorithm execution. Each metric will open a plot window for each trial showing live updates. Useful for early stopping if convergence is not happening. Disabled by default. When using multiprocessing (max_processes > 1), each trial will open its own plot windows in separate processes.
log_level – minimum level to log, e.g. logging.INFO.

Returns:

BenchmarkResult containing the results of the benchmark.

Important

Multiprocessing with certain frameworks (e.g., PyTorch) can lead to unexpected behavior due to how they handle multiprocessing. It is recommended to not use multiprocessing when benchmarking algorithms that utilize such frameworks. If you choose to use multiprocessing with such frameworks, please ensure that you understand the potential issues and have taken appropriate measures. Decent-Bench will attempt to detect if any cost function is a PyTorchCost and warn the user accordingly. Multiprocessing is mostly intended to be used with Numpy-based implementations. Feel free to try using multiprocessing by setting max_processes to a value other than 1 to see if it works with your specific algorithm and setup. See the documentation for max_processes for available options.

Note

If progress_step is too small performance may degrade due to the overhead of updating the progress bar too often.

Raises:

ValueError – If the checkpoint directory is not empty when initializing the CheckpointManager.
ValueError – If any two algorithms share the same name.

decent_bench.benchmark.resume_benchmark(checkpoint_manager: CheckpointManager, increase_iterations: int = 0, increase_trials: int = 0, create_backup: bool = True, *, max_processes: int | None = 1, progress_step: int | None = 100, show_speed: bool = False, show_trial: bool = False, runtime_metrics: list[RuntimeMetric] | None = None, log_level: int = logging.INFO) → BenchmarkResult[source]#

Resume a benchmark from an existing checkpoint directory.

Parameters:

checkpoint_manager – CheckpointManager instance to load checkpoints from. Must contain valid checkpoints and metadata from a previous benchmark run. Progress will be loaded from the latest checkpoints and the benchmark will resume from there.
increase_iterations – number of iterations to add to each algorithm’s existing iteration count. This allows you to extend the benchmark run and collect more data points for the metrics. The additional iterations will be added on top of the existing iterations defined in the checkpoint metadata for each algorithm. If set to 0 (default), the benchmark will resume with the same number of iterations as defined in the checkpoint.
increase_trials – number of additional trials to run for each algorithm. This allows you to increase the statistical significance of the benchmark results by collecting more trials. If set to 0 (default), the benchmark will resume with the same number of trials as defined in the checkpoint.
create_backup – whether to create a backup of the existing checkpoint directory before resuming. It is recommended to set this to True to avoid accidental data loss, as resuming will modify the checkpoint directory by adding new checkpoints and metadata. If True, a backup will be created with the name {checkpoint_manager.checkpoint_dir}_backup_{timestamp}.zip before resuming.
max_processes – maximum number of processes to use when running trials, multiprocessing improves performance but can be inhibiting when debugging or using a profiler, set to 1 to disable multiprocessing or None to use ProcessPoolExecutor’s default. If your algorithm is very lightweight you may want to set this to 1 to avoid the multiprocessing overhead.
progress_step – if provided, the progress bar will step every progress_step iterations. When provided, each algorithm’s task total becomes n_trials * ceil(algorithm.iterations / progress_step). If None, the progress bar uses 1 unit per trial.
show_speed – whether to show speed (iterations/second) in the progress bar.
show_trial – whether to show which trials are currently running in the progress bar.
runtime_metrics – optional list of RuntimeMetric to compute and plot during algorithm execution. Each metric will open a plot window for each trial showing live updates. Useful for early stopping if convergence is not happening. Disabled by default. When using multiprocessing (max_processes > 1), each trial will open its own plot windows in separate processes.
log_level – minimum level to log, e.g. logging.INFO

Returns:

BenchmarkResult containing the results of the benchmark.

Important

Note

If progress_step is too small performance may degrade due to the overhead of updating the progress bar too often.

Raises:

ValueError – If the checkpoint directory does not exist, is empty, or contains invalid metadata.
ValueError – If increase_iterations or increase_trials is negative.

decent_bench.benchmark.compute_metrics(benchmark_result: BenchmarkResult | None = None, checkpoint_manager: CheckpointManager | None = None, *, table_metrics: list[Metric] | None = None, plot_metrics: list[Metric] | None = None, statistics_across_agents: list[str] | None = None, log_level: int = logging.INFO) → MetricResult[source]#

Compute metrics from a benchmark result.

Parameters:

benchmark_result – result of a benchmark execution. If not provided, the result will be loaded from the checkpoint manager
checkpoint_manager – if provided, will be used to save results of metrics computation and/or load benchmark result.
table_metrics – metrics to be displayed in a table of results. Table metrics are computed only at the recentmost iteration reached during benchmarking. If None, all table metrics available for the benchmark problem will be used. For example, federated-only metrics are removed when a non-federated network is passed.
plot_metrics – metrics to be plotted over algorithm iterations. Plot metrics are computed at all the iterations reached during benchmarking. If None, all plot metrics available for the benchmark problem will be used.
statistics_across_agents – statistics to compute across agents for metrics that return one value per agent (like ConsensusError or Accuracy). Available statistics are “mean” (aliases “average”, “avg”), “std”, “max” (alias “maximum”), “min” (alias “minimum”), and “median” (alias “mdn”). If None, “mean” and “std” are used.
log_level – minimum level to log, e.g. logging.INFO

Returns:

MetricsResult containing the computed metrics.

Raises:

ValueError – If neither benchmark_result nor checkpoint_manager is provided, or if the checkpoint manager does not contain a valid benchmark result to load.
ValueError – If duplicate metrics (i.e. with same description) are provided in table_metrics or plot_metrics.

Note

If benchmark_result is not provided, it will be loaded from the checkpoint manager. If both are provided, then the results from the provided benchmark_result will be used and the checkpoint manager will only be used to save the computed metrics result. If neither is provided, an error will be raised.

All used table- and plot-metrics will be saved to the checkpoints’ metadata if a checkpoint manager is provided, in order to know which metrics were computed and can be displayed later.

Metrics that return False from is_available() for the given problem are filtered out from the returned metric lists. Warnings are emitted with the omitted metric names.

Plot metrics can still be available even when their final table value is inf/nan: plot computation keeps the finite part of a trajectory, while table metrics are evaluated at the final iteration.

decent_bench.benchmark.display_metrics(metrics_result: MetricResult | None = None, checkpoint_manager: CheckpointManager | None = None, *, table_metrics: list[Metric | str] | None = None, plot_metrics: list[Metric | str] | None = None, algorithms: list[Algorithm[Network] | str] | None = None, table_fmt: Literal['text', 'latex'] = 'text', plot_grid: bool = True, individual_plots: bool = False, computational_cost: ComputationalCost | None = None, scale_x_axis: float = 1e-4, scale_compute: float = 1.0, compare_iterations_and_computational_cost: bool = False, save_path: str | Path | None = None, plot_format: Literal['png', 'pdf', 'svg'] = 'png', show_plots: bool = True, log_level: int = logging.INFO) → None[source]#

Display the results of metrics computation.

Parameters:

metrics_result – result of metrics computation containing the metrics to display. If not provided, the result will be loaded from the checkpoint manager
checkpoint_manager – if provided, will be used to load metrics result.
table_metrics – metrics to tabulate. Entries can be Metric objects or strings (matching description). If None all table metrics in metrics_result are displayed.
plot_metrics – metrics to plot. Entries can be Metric objects or strings (matching description). If None all plot metrics in metrics_result are displayed. If individual_plots is True, each metric is plotted in its own figure; otherwise a maximum of 3 metrics are plotted as subplots in the same figure.
algorithms – algorithms to display. Entries can be Algorithm objects or strings (matching name). If None all algorithms in metrics_result are displayed.
table_fmt – table format, text is suitable for the terminal while latex can be copy-pasted into a latex document
plot_grid – whether to show grid lines on the plots
individual_plots – whether to plot each metric in a separate figure
computational_cost – computational cost settings for plot metrics, if None x-axis will be iterations instead of computational cost
scale_x_axis – scaling factor for computational cost x-axis, used to convert the cost units into more manageable units for plotting. Only used if computational_cost is provided.
scale_compute – scaling factor for the compute-related metrics (i.e. FunctionCalls, GradientCalls, HessianCalls and ProximalCalls) shown in the table, used to convert the raw count into more manageable units for display.
compare_iterations_and_computational_cost – whether to plot both metric vs computational cost and metric vs iterations. Only used if computational_cost is provided.
save_path – optional directory path to save the tables and plots to. Tables will be saved as table.txt and table.tex while plots will be saved as plot_{#}.{format} in the specified directory. If checkpoint_manager is provided then the default save path will be the results path in the checkpoint manager, which is determined by get_results_path(). If both are provided, the provided save_path will be used. If neither a checkpoint manager or a save path is provided, the tables and plots are not saved to disk.
plot_format – format to save plots in, defaults to png. Can be png, pdf, or svg.
show_plots – whether to show the plots after creating them, defaults to True. Can be useful to set to False when running in a non-interactive environment or when only saving the plots without displaying.
log_level – minimum level to log, e.g. logging.INFO

Raises:

ValueError – If neither metrics_result nor checkpoint_manager is provided, or if the checkpoint manager does not contain a valid metrics result to load.
FileNotFoundError – If metrics_result is not provided and the checkpoint manager does not contain a metrics result file to load.

Note

Checkpoint_manager is ignored if metrics_result is provided.

Computational cost can be interpreted as the cost of running the algorithm on a specific hardware setup. Therefore the computational cost could be seen as the number of operations performed (similar to FLOPS) but weighted by the time or energy it takes to perform them on the specific hardware.

Computational cost is calculated as:

\[\text{Total Cost} = c_f N_f + c_g N_g + c_h N_h + c_p N_p + c_c N_c\]

where \(c_f, c_g, c_h, c_p, c_c\) are the costs per function, gradient, Hessian, proximal, and communication call respectively, and \(N_f, N_g, N_h, N_p, N_c\) are the mean number of function, gradient, Hessian, proximal, and communication calls across all agents and trials.

If computational_cost is provided and compare_iterations_and_computational_cost is True, each metric will be plotted twice: once against computational cost and once against iterations. Computational cost plots will be shown on the left and iteration plots on the right.

class decent_bench.benchmark.BenchmarkProblem(network: Network, x_optimal: Array | None = None, test_data: Dataset | None = None)[source]#

Bases: object

Dataclass containing all benchmark data.

Subclass it to add more benchmark data (e.g. validation data).

Parameters:

network – network of agents, each with a local cost function. This network represents the initial state of the network over which algorithms are executed. Specifically, algorithms are executed over copies of this network, and those copies are stored in BenchmarkResult. BenchmarkProblem.network will never be modified, in order to preserve information on the initial state
x_optimal – optional Array representing the optimal solution
test_data – optional Dataset containing test data

Example

>>> from dataclasses import dataclass
>>> from decent_bench.benchmark import BenchmarkProblem
>>> from decent_bench.utils.types import Dataset
>>>
>>> @dataclass(eq=False)
... class MyBenchmarkProblem(BenchmarkProblem):
...     validation_data: Dataset

network: Network#

x_optimal: Array | None = None#

test_data: Dataset | None = None#

class decent_bench.benchmark.BenchmarkResult(problem: BenchmarkProblem, states: Mapping[Algorithm[Network], Sequence[Network]])[source]#

Bases: object

Result of a benchmark execution, containing the results and metadata.

This class is used to store the results and metadata of a benchmark execution. It is returned by the benchmark() function and contains all the information about the benchmark run, including the problem definition, and final algorithm states.

problem: contains the definition of the benchmark problem that was executed.
states: contains the final states of the algorithms after execution, organized by algorithm where each algorithm maps to a sequence of network states (one per trial).

These results can be used to compute metrics after the benchmark run using compute_metrics().

problem: BenchmarkProblem#

states: Mapping[Algorithm[Network], Sequence[Network]]#

class decent_bench.benchmark.MetricResult(network_views: Mapping[Algorithm[Network], Sequence[NetworkMetricsView]] | None, raw_table_results: Mapping[Metric, DataFrame] | None, raw_plot_results: Mapping[Metric, DataFrame] | None, table_results: DataFrame | None, plot_results: DataFrame | None)[source]#

Bases: object

Result of metric computation, containing raw data and statistics across agents and trials.

This class is used to store the computed metrics from a benchmark execution. It is returned by the compute_metrics() function and contains all the information about the computed metrics, including agent-level metrics, table statistics, and plot data for visualization.

network_views: contains the raw network-level metrics for each algorithm, organized by algorithm where
each algorithm maps to a sequence of trials, with each trial containing a NetworkMetricsView.
raw_table_results: contains raw metric evaluations in a dictionary mapping Metric to pandas.DataFrame. Each
DataFrame has columns (algorithm, trial, agent, value). Table metrics are evaluated only at the most recent iteration reached during benchmarking.
raw_plot_results: contains raw metric evaluations in a dictionary mapping Metric to pandas.DataFrame. Each
DataFrame has columns (algorithm, trial, agent, iteration, value). Plot metrics are evaluated at all iterations reached during benchmarking.
table_results: contains the aggregated results in a pandas.DataFrame with columns
(metric, algorithm, statistic, mean, std).
plot_results: contains the aggregated results in a pandas.DataFrame with columns
(metric, algorithm, iteration, mean, min, max).

table_results can be recomputed with a new set of statistics across agents by using update_table_results().

Use the properties algorithms, table_metrics and plot_metrics to check for which algorithms and metrics the object stores data. Note that these methods assume that all attributes have the same set of metrics and algorithms, since the object is generated by the backend; no sanity check is performed, so altering any of the attributes might lead to unexpected results.

network_views: Mapping[Algorithm[Network], Sequence[NetworkMetricsView]] | None#

raw_table_results: Mapping[Metric, DataFrame] | None#

raw_plot_results: Mapping[Metric, DataFrame] | None#

table_results: DataFrame | None#

plot_results: DataFrame | None#

update_table_results(statistics_across_agents: list[str] | None) → DataFrame | None[source]#: Recompute aggregated table statistics from stored raw table results.

property algorithms: list[str]#: Return name of available algorithms, which can be used for filtering in display_metrics().

property table_metrics: list[str]#: Return description of available table metrics, which can be used for filtering in display_metrics().

property plot_metrics: list[str]#: Return description of available plot metrics, which can be used for filtering in display_metrics().

property iterations: list[int]#: Return all the iterations that were reached in at least one trial by at least one algorithm.

decent_bench.benchmark.create_classification_problem(cost_cls: type[LogisticRegressionCost | PyTorchCost] = LogisticRegressionCost, *, device: SupportedDevices = SupportedDevices.CPU, n_agents: int = 100, batch_size: EmpiricalRiskBatchSize = 'all', compute_x_optimal: bool = True, show_progress: bool = True) → tuple[Sequence[Cost], Array | None, Dataset][source]#

Create out-of-the-box classification problems.

Parameters:

cost_cls – type of cost function
device – device to create the problem on (only relevant for PyTorchCost)
n_agents – number of agents
batch_size – size of mini-batches for stochastic methods, or “all” for full-batch
compute_x_optimal – if the optimal solution should be computed (using solve()). It is ignored when PyTorchCost is selected.
show_progress – whether to display a progress bar while computing x_optimal. Defaults to True.

Note

If cost_cls is PyTorchCost, x_optimal is not computed and set to None. Be aware that metrics that rely on x_optimal (e.g. Regret) will not be available when using PyTorchCost.

Raises:

ValueError – if an unsupported cost class is provided
ImportError – if PyTorchCost is selected but PyTorch is not installed

decent_bench.benchmark.create_regression_problem(cost_cls: type[LinearRegressionCost | PyTorchCost] = LinearRegressionCost, *, device: SupportedDevices = SupportedDevices.CPU, n_agents: int = 100, batch_size: EmpiricalRiskBatchSize = 'all', compute_x_optimal: bool = True) → tuple[Sequence[Cost], Array | None, Dataset][source]#

Create out-of-the-box regression problems.

Parameters:

cost_cls – type of cost function
device – device to create the problem on (only relevant for PyTorchCost)
n_agents – number of agents
batch_size – size of mini-batches for stochastic methods, or “all” for full-batch
compute_x_optimal – if the optimal solution should be computed (by solving the linear system of equations). It is ignored when PyTorchCost is selected.

Note

If cost_cls is PyTorchCost, x_optimal is not computed and set to None. Be aware that metrics that rely on x_optimal (e.g. Regret) will not be available when using PyTorchCost.

Raises:

ValueError – if an unsupported cost class is provided
ImportError – if PyTorchCost is selected but PyTorch is not installed

decent_bench.benchmark.create_quadratic_problem(size: int = 10, n_agents: int = 100) → tuple[Sequence[Cost], Array][source]#

Create out-of-the-box quadratic problems.

Parameters:

size – number of dimensions
n_agents – number of agents