decent_bench.benchmark#
- decent_bench.benchmark.benchmark(algorithms: list[Algorithm[Network]], benchmark_problem: BenchmarkProblem, *, n_trials: int = 30, max_processes: int | None = 1, progress_step: int | None = 100, show_speed: bool = False, show_trial: bool = False, checkpoint_manager: CheckpointManager | None = None, runtime_metrics: list[RuntimeMetric] | None = None, log_level: int = logging.INFO) BenchmarkResult[source]#
Benchmark decentralized algorithms.
- Parameters:
algorithms – algorithms to benchmark
benchmark_problem – problem to benchmark on, defines the network topology, cost functions, and communication constraints.
n_trials – number of times to run each algorithm on the benchmark problem, running more trials improves the statistical results, at least 30 trials are recommended for the central limit theorem to apply.
max_processes – maximum number of processes to use when running trials, multiprocessing improves performance but can be inhibiting when debugging or using a profiler, set to 1 to disable multiprocessing or
Noneto useProcessPoolExecutor’s default. If your algorithm is very lightweight you may want to set this to 1 to avoid the multiprocessing overhead.progress_step – if provided, the progress bar will step every progress_step iterations. When provided, each algorithm’s task total becomes n_trials * ceil(algorithm.iterations / progress_step). If None, the progress bar uses 1 unit per trial.
show_speed – whether to show speed (iterations/second) in the progress bar.
show_trial – whether to show which trials are currently running in the progress bar.
checkpoint_manager – if provided, will be used to save checkpoints during execution.
runtime_metrics – optional list of
RuntimeMetricto compute and plot during algorithm execution. Each metric will open a plot window for each trial showing live updates. Useful for early stopping if convergence is not happening. Disabled by default. When using multiprocessing (max_processes > 1), each trial will open its own plot windows in separate processes.log_level – minimum level to log, e.g.
logging.INFO.
- Returns:
BenchmarkResult containing the results of the benchmark.
Important
Multiprocessing with certain frameworks (e.g., PyTorch) can lead to unexpected behavior due to how they handle multiprocessing. It is recommended to not use multiprocessing when benchmarking algorithms that utilize such frameworks. If you choose to use multiprocessing with such frameworks, please ensure that you understand the potential issues and have taken appropriate measures. Decent-Bench will attempt to detect if any cost function is a PyTorchCost and warn the user accordingly. Multiprocessing is mostly intended to be used with Numpy-based implementations. Feel free to try using multiprocessing by setting
max_processesto a value other than1to see if it works with your specific algorithm and setup. See the documentation formax_processesfor available options.Note
If
progress_stepis too small performance may degrade due to the overhead of updating the progress bar too often.- Raises:
ValueError – If the checkpoint directory is not empty when initializing the CheckpointManager.
ValueError – If any two algorithms share the same name.
- decent_bench.benchmark.resume_benchmark(checkpoint_manager: CheckpointManager, increase_iterations: int = 0, increase_trials: int = 0, create_backup: bool = True, *, max_processes: int | None = 1, progress_step: int | None = 100, show_speed: bool = False, show_trial: bool = False, runtime_metrics: list[RuntimeMetric] | None = None, log_level: int = logging.INFO) BenchmarkResult[source]#
Resume a benchmark from an existing checkpoint directory.
- Parameters:
checkpoint_manager – CheckpointManager instance to load checkpoints from. Must contain valid checkpoints and metadata from a previous benchmark run. Progress will be loaded from the latest checkpoints and the benchmark will resume from there.
increase_iterations – number of iterations to add to each algorithm’s existing iteration count. This allows you to extend the benchmark run and collect more data points for the metrics. The additional iterations will be added on top of the existing iterations defined in the checkpoint metadata for each algorithm. If set to 0 (default), the benchmark will resume with the same number of iterations as defined in the checkpoint.
increase_trials – number of additional trials to run for each algorithm. This allows you to increase the statistical significance of the benchmark results by collecting more trials. If set to 0 (default), the benchmark will resume with the same number of trials as defined in the checkpoint.
create_backup – whether to create a backup of the existing checkpoint directory before resuming. It is recommended to set this to True to avoid accidental data loss, as resuming will modify the checkpoint directory by adding new checkpoints and metadata. If True, a backup will be created with the name
{checkpoint_manager.checkpoint_dir}_backup_{timestamp}.zipbefore resuming.max_processes – maximum number of processes to use when running trials, multiprocessing improves performance but can be inhibiting when debugging or using a profiler, set to 1 to disable multiprocessing or
Noneto useProcessPoolExecutor’s default. If your algorithm is very lightweight you may want to set this to 1 to avoid the multiprocessing overhead.progress_step – if provided, the progress bar will step every progress_step iterations. When provided, each algorithm’s task total becomes n_trials * ceil(algorithm.iterations / progress_step). If None, the progress bar uses 1 unit per trial.
show_speed – whether to show speed (iterations/second) in the progress bar.
show_trial – whether to show which trials are currently running in the progress bar.
runtime_metrics – optional list of
RuntimeMetricto compute and plot during algorithm execution. Each metric will open a plot window for each trial showing live updates. Useful for early stopping if convergence is not happening. Disabled by default. When using multiprocessing (max_processes > 1), each trial will open its own plot windows in separate processes.log_level – minimum level to log, e.g.
logging.INFO
- Returns:
BenchmarkResult containing the results of the benchmark.
Important
Multiprocessing with certain frameworks (e.g., PyTorch) can lead to unexpected behavior due to how they handle multiprocessing. It is recommended to not use multiprocessing when benchmarking algorithms that utilize such frameworks. If you choose to use multiprocessing with such frameworks, please ensure that you understand the potential issues and have taken appropriate measures. Decent-Bench will attempt to detect if any cost function is a PyTorchCost and warn the user accordingly. Multiprocessing is mostly intended to be used with Numpy-based implementations. Feel free to try using multiprocessing by setting
max_processesto a value other than1to see if it works with your specific algorithm and setup. See the documentation formax_processesfor available options.Note
If
progress_stepis too small performance may degrade due to the overhead of updating the progress bar too often.- Raises:
ValueError – If the checkpoint directory does not exist, is empty, or contains invalid metadata.
ValueError – If increase_iterations or increase_trials is negative.
- decent_bench.benchmark.compute_metrics(benchmark_result: BenchmarkResult | None = None, checkpoint_manager: CheckpointManager | None = None, *, table_metrics: list[Metric] | None = None, plot_metrics: list[Metric] | None = None, statistics_across_agents: list[str] | None = None, log_level: int = logging.INFO) MetricResult[source]#
Compute metrics from a benchmark result.
- Parameters:
benchmark_result – result of a benchmark execution. If not provided, the result will be loaded from the checkpoint manager
checkpoint_manager – if provided, will be used to save results of metrics computation and/or load benchmark result.
table_metrics – metrics to be displayed in a table of results. Table metrics are computed only at the recentmost iteration reached during benchmarking. If
None, all table metrics available for the benchmark problem will be used. For example, federated-only metrics are removed when a non-federated network is passed.plot_metrics – metrics to be plotted over algorithm iterations. Plot metrics are computed at all the iterations reached during benchmarking. If
None, all plot metrics available for the benchmark problem will be used.statistics_across_agents – statistics to compute across agents for metrics that return one value per agent (like
ConsensusErrororAccuracy). Available statistics are “mean” (aliases “average”, “avg”), “std”, “max” (alias “maximum”), “min” (alias “minimum”), and “median” (alias “mdn”). IfNone, “mean” and “std” are used.log_level – minimum level to log, e.g.
logging.INFO
- Returns:
MetricsResult containing the computed metrics.
- Raises:
ValueError – If neither
benchmark_resultnorcheckpoint_manageris provided, or if the checkpoint manager does not contain a valid benchmark result to load.ValueError – If duplicate metrics (i.e. with same
description) are provided intable_metricsorplot_metrics.
Note
If
benchmark_resultis not provided, it will be loaded from the checkpoint manager. If both are provided, then the results from the providedbenchmark_resultwill be used and the checkpoint manager will only be used to save the computed metrics result. If neither is provided, an error will be raised.All used table- and plot-metrics will be saved to the checkpoints’ metadata if a checkpoint manager is provided, in order to know which metrics were computed and can be displayed later.
Metrics that return
Falsefromis_available()for the given problem are filtered out from the returned metric lists. Warnings are emitted with the omitted metric names.Plot metrics can still be available even when their final table value is
inf/nan: plot computation keeps the finite part of a trajectory, while table metrics are evaluated at the final iteration.
- decent_bench.benchmark.display_metrics(metrics_result: MetricResult | None = None, checkpoint_manager: CheckpointManager | None = None, *, table_metrics: list[Metric | str] | None = None, plot_metrics: list[Metric | str] | None = None, algorithms: list[Algorithm[Network] | str] | None = None, table_fmt: Literal['text', 'latex'] = 'text', plot_grid: bool = True, individual_plots: bool = False, computational_cost: ComputationalCost | None = None, scale_x_axis: float = 1e-4, scale_compute: float = 1.0, compare_iterations_and_computational_cost: bool = False, save_path: str | Path | None = None, plot_format: Literal['png', 'pdf', 'svg'] = 'png', show_plots: bool = True, log_level: int = logging.INFO) None[source]#
Display the results of metrics computation.
- Parameters:
metrics_result – result of metrics computation containing the metrics to display. If not provided, the result will be loaded from the checkpoint manager
checkpoint_manager – if provided, will be used to load metrics result.
table_metrics – metrics to tabulate. Entries can be
Metricobjects or strings (matchingdescription). IfNoneall table metrics in metrics_result are displayed.plot_metrics – metrics to plot. Entries can be
Metricobjects or strings (matchingdescription). IfNoneall plot metrics in metrics_result are displayed. Ifindividual_plotsis True, each metric is plotted in its own figure; otherwise a maximum of 3 metrics are plotted as subplots in the same figure.algorithms – algorithms to display. Entries can be
Algorithmobjects or strings (matchingname). IfNoneall algorithms in metrics_result are displayed.table_fmt – table format, text is suitable for the terminal while latex can be copy-pasted into a latex document
plot_grid – whether to show grid lines on the plots
individual_plots – whether to plot each metric in a separate figure
computational_cost – computational cost settings for plot metrics, if
Nonex-axis will be iterations instead of computational costscale_x_axis – scaling factor for computational cost x-axis, used to convert the cost units into more manageable units for plotting. Only used if
computational_costis provided.scale_compute – scaling factor for the compute-related metrics (i.e.
FunctionCalls,GradientCalls,HessianCallsandProximalCalls) shown in the table, used to convert the raw count into more manageable units for display.compare_iterations_and_computational_cost – whether to plot both metric vs computational cost and metric vs iterations. Only used if
computational_costis provided.save_path – optional directory path to save the tables and plots to. Tables will be saved as
table.txtandtable.texwhile plots will be saved asplot_{#}.{format}in the specified directory. If checkpoint_manager is provided then the default save path will be the results path in the checkpoint manager, which is determined byget_results_path(). If both are provided, the providedsave_pathwill be used. If neither a checkpoint manager or a save path is provided, the tables and plots are not saved to disk.plot_format – format to save plots in, defaults to
png. Can bepng,pdf, orsvg.show_plots – whether to show the plots after creating them, defaults to
True. Can be useful to set toFalsewhen running in a non-interactive environment or when only saving the plots without displaying.log_level – minimum level to log, e.g.
logging.INFO
- Raises:
ValueError – If neither
metrics_resultnorcheckpoint_manageris provided, or if the checkpoint manager does not contain a valid metrics result to load.FileNotFoundError – If
metrics_resultis not provided and the checkpoint manager does not contain a metrics result file to load.
Note
Checkpoint_manager is ignored if
metrics_resultis provided.Computational cost can be interpreted as the cost of running the algorithm on a specific hardware setup. Therefore the computational cost could be seen as the number of operations performed (similar to FLOPS) but weighted by the time or energy it takes to perform them on the specific hardware.
Computational cost is calculated as:
\[\text{Total Cost} = c_f N_f + c_g N_g + c_h N_h + c_p N_p + c_c N_c\]where \(c_f, c_g, c_h, c_p, c_c\) are the costs per function, gradient, Hessian, proximal, and communication call respectively, and \(N_f, N_g, N_h, N_p, N_c\) are the mean number of function, gradient, Hessian, proximal, and communication calls across all agents and trials.
If
computational_costis provided andcompare_iterations_and_computational_costisTrue, each metric will be plotted twice: once against computational cost and once against iterations. Computational cost plots will be shown on the left and iteration plots on the right.
- class decent_bench.benchmark.BenchmarkProblem(network: Network, x_optimal: Array | None = None, test_data: Dataset | None = None)[source]#
Bases:
objectDataclass containing all benchmark data.
Subclass it to add more benchmark data (e.g. validation data).
- Parameters:
network – network of agents, each with a local cost function. This network represents the initial state of the network over which algorithms are executed. Specifically, algorithms are executed over copies of this network, and those copies are stored in
BenchmarkResult. BenchmarkProblem.network will never be modified, in order to preserve information on the initial statex_optimal – optional Array representing the optimal solution
test_data – optional Dataset containing test data
Example
>>> from dataclasses import dataclass >>> from decent_bench.benchmark import BenchmarkProblem >>> from decent_bench.utils.types import Dataset >>> >>> @dataclass(eq=False) ... class MyBenchmarkProblem(BenchmarkProblem): ... validation_data: Dataset
- class decent_bench.benchmark.BenchmarkResult(problem: BenchmarkProblem, states: Mapping[Algorithm[Network], Sequence[Network]])[source]#
Bases:
objectResult of a benchmark execution, containing the results and metadata.
This class is used to store the results and metadata of a benchmark execution. It is returned by the
benchmark()function and contains all the information about the benchmark run, including the problem definition, and final algorithm states.problem: contains the definition of the benchmark problem that was executed.
states: contains the final states of the algorithms after execution, organized by algorithm where each algorithm maps to a sequence of network states (one per trial).
These results can be used to compute metrics after the benchmark run using
compute_metrics().- problem: BenchmarkProblem#
- class decent_bench.benchmark.MetricResult(network_views: Mapping[Algorithm[Network], Sequence[NetworkMetricsView]] | None, raw_table_results: Mapping[Metric, DataFrame] | None, raw_plot_results: Mapping[Metric, DataFrame] | None, table_results: DataFrame | None, plot_results: DataFrame | None)[source]#
Bases:
objectResult of metric computation, containing raw data and statistics across agents and trials.
This class is used to store the computed metrics from a benchmark execution. It is returned by the
compute_metrics()function and contains all the information about the computed metrics, including agent-level metrics, table statistics, and plot data for visualization.- network_views: contains the raw network-level metrics for each algorithm, organized by algorithm where
each algorithm maps to a sequence of trials, with each trial containing a
NetworkMetricsView.
- raw_table_results: contains raw metric evaluations in a dictionary mapping Metric to pandas.DataFrame. Each
DataFrame has columns (algorithm, trial, agent, value). Table metrics are evaluated only at the most recent iteration reached during benchmarking.
- raw_plot_results: contains raw metric evaluations in a dictionary mapping Metric to pandas.DataFrame. Each
DataFrame has columns (algorithm, trial, agent, iteration, value). Plot metrics are evaluated at all iterations reached during benchmarking.
- table_results: contains the aggregated results in a pandas.DataFrame with columns
(metric, algorithm, statistic, mean, std).
- plot_results: contains the aggregated results in a pandas.DataFrame with columns
(metric, algorithm, iteration, mean, min, max).
table_results can be recomputed with a new set of statistics across agents by using
update_table_results().Use the properties algorithms, table_metrics and plot_metrics to check for which algorithms and metrics the object stores data. Note that these methods assume that all attributes have the same set of metrics and algorithms, since the object is generated by the backend; no sanity check is performed, so altering any of the attributes might lead to unexpected results.
- update_table_results(statistics_across_agents: list[str] | None) DataFrame | None[source]#
Recompute aggregated table statistics from stored raw table results.
- property algorithms: list[str]#
Return
nameof available algorithms, which can be used for filtering indisplay_metrics().
- property table_metrics: list[str]#
Return
descriptionof available table metrics, which can be used for filtering indisplay_metrics().
- property plot_metrics: list[str]#
Return
descriptionof available plot metrics, which can be used for filtering indisplay_metrics().
- decent_bench.benchmark.create_classification_problem(cost_cls: type[LogisticRegressionCost | PyTorchCost] = LogisticRegressionCost, *, device: SupportedDevices = SupportedDevices.CPU, n_agents: int = 100, batch_size: EmpiricalRiskBatchSize = 'all', compute_x_optimal: bool = True, show_progress: bool = True) tuple[Sequence[Cost], Array | None, Dataset][source]#
Create out-of-the-box classification problems.
- Parameters:
cost_cls – type of cost function
device – device to create the problem on (only relevant for PyTorchCost)
n_agents – number of agents
batch_size – size of mini-batches for stochastic methods, or “all” for full-batch
compute_x_optimal – if the optimal solution should be computed (using
solve()). It is ignored when PyTorchCost is selected.show_progress – whether to display a progress bar while computing
x_optimal. Defaults toTrue.
Note
If cost_cls is
PyTorchCost, x_optimal is not computed and set to None. Be aware that metrics that rely on x_optimal (e.g.Regret) will not be available when using PyTorchCost.- Raises:
ValueError – if an unsupported cost class is provided
ImportError – if PyTorchCost is selected but PyTorch is not installed
- decent_bench.benchmark.create_regression_problem(cost_cls: type[LinearRegressionCost | PyTorchCost] = LinearRegressionCost, *, device: SupportedDevices = SupportedDevices.CPU, n_agents: int = 100, batch_size: EmpiricalRiskBatchSize = 'all', compute_x_optimal: bool = True) tuple[Sequence[Cost], Array | None, Dataset][source]#
Create out-of-the-box regression problems.
- Parameters:
cost_cls – type of cost function
device – device to create the problem on (only relevant for PyTorchCost)
n_agents – number of agents
batch_size – size of mini-batches for stochastic methods, or “all” for full-batch
compute_x_optimal – if the optimal solution should be computed (by solving the linear system of equations). It is ignored when PyTorchCost is selected.
Note
If cost_cls is
PyTorchCost, x_optimal is not computed and set to None. Be aware that metrics that rely on x_optimal (e.g.Regret) will not be available when using PyTorchCost.- Raises:
ValueError – if an unsupported cost class is provided
ImportError – if PyTorchCost is selected but PyTorch is not installed