decent_bench.costs#
Cost composition rules.
- Developer note:
Generic cost arithmetic falls back to
SumCostandScaledCost.Regularizers preserve their abstraction under
+,-, unary-,*, and/by returningBaseRegularizerCost-aware composites. Empirical-risk costs preserve their abstraction under scalar scaling through a private empirical scaling wrapper, and underempirical + regularizerthroughEmpiricalRegularizedCost. Unsupported mixed compositions still fall back to the generic wrappers.EmpiricalRegularizedCost.gradientuses broadcast semantics whenreduction=None: it returns one composite gradient per sample by adding the regularizer gradient to each per-sample empirical gradient. Averaging over the leading sample dimension recovers the composite mean gradient.Composition wrappers keep references to their underlying cost objects; they do not make implicit shallow or deep copies at construction time. Mutating a wrapped cost after composition therefore affects the composite view as well. Agent-installed call-counting hooks on reused cost objects are therefore shared too. Use
copy.deepcopy()explicitly when an independent copy of a composed objective or independent counting behavior is needed.Proximal support is intentionally conservative for the specialized wrappers: concrete costs may implement specialized proximals, positive scalar scaling preserves proximal support, and a single positively scaled regularizer term preserves regularizer proximal support.
SumCostcomputes the proximal of the full summed objective throughdecent_bench.centralized_algorithms.proximal_solver()when that accelerated-gradient backend is applicable. Multi-term regularizer composites andEmpiricalRegularizedCostdo not provide a generic proximal. Use a specialized proximal if one exists, or usedecent_bench.centralized_algorithms.proximal_solver()when applicable.
- class decent_bench.costs.BaseRegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
Bases:
CostBase class for regularizers with regularizer-preserving arithmetic.
Adding, subtracting, negating, scaling, or dividing regularizers returns another regularizer subclass instead of falling back immediately to generic
SumCostorScaledCost. This preserves regularizer-specific structure and can improve performance. Mixing a regularizer with an arbitrary non-regularizer still falls back to generic cost composition.- __init__(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- __add__(other: BaseRegularizerCost) BaseRegularizerCost[source]#
- __add__(other: Cost) Cost
Add another cost, preserving the regularizer abstraction when possible.
- class decent_bench.costs.Cost[source]#
Bases:
ABCUsed by agents to evaluate the cost and its derivatives at a certain x.
- abstract property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- abstract property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- abstract property m_smooth: float#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- abstract property m_cvx: float#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- abstractmethod proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Proximal at x.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- __add__(other: Cost) Cost[source]#
Add another cost function to create a new one.
The generic fallback returns
SumCost([self, other]). Subclasses can override this to preserve specialized structure when the result remains in the same abstraction. For example, the addition of twoQuadraticCostobjects benefits from returning a newQuadraticCostinstead of aSumCostas this preserves the closed form proximal solution and only requires one evaluation instead of two when callingfunction(),gradient(), andhessian().
- class decent_bench.costs.EmpiricalRegularizedCost(empirical_cost: EmpiricalRiskCost, regularizer: BaseRegularizerCost)[source]#
Bases:
EmpiricalRiskCostComposite objective of an empirical risk term plus a regularizer.
This wrapper preserves empirical-risk-specific behavior from the empirical term, including
predict(), dataset access, and batch metadata, while combining function, gradient, and Hessian values with the regularizer. Whengradient()is called withreduction=None, the regularizer gradient is broadcast across the leading sample dimension so that averaging over samples recovers the composite mean gradient. A generic proximal is intentionally not implemented.Instances keep references to the wrapped cost objects. No implicit copying is performed; use
copy.deepcopy()explicitly if independent objects are required.- __init__(empirical_cost: EmpiricalRiskCost, regularizer: BaseRegularizerCost)[source]#
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property batch_used: list[int]#
Indices of samples used in the most recent batch.
- Raises:
ValueError – If no batch has been used yet.
- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- predict(x: Array, data: list[Array]) Array[source]#
Predictions are determined by the empirical term.
- function(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) float[source]#
Evaluate function at x using datapoints at the given indices.
The returned value is the mean loss over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- gradient(x: Array, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean', **kwargs: Any) Array[source]#
Gradient of the empirical objective plus regularizer.
When
reduction="mean", this returns the mean empirical gradient over the selected samples plus the regularizer gradient.When
reduction=None, this returns one gradient per selected sample with the regularizer gradient broadcast along the leading sample dimension. Averaging the result over that leading dimension recovers the composite gradient returned byreduction="mean".
- hessian(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) Array[source]#
Hessian at x using datapoints at the given indices.
The returned Hessian is the mean of per-sample Hessians over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Raise
NotImplementedErrorfor the generic proximal of an empirical cost plus regularizer.This wrapper preserves evaluation, gradient, and Hessian structure, but does not imply a closed-form proximal. Use a specialized composite cost if one exists, or use
decent_bench.centralized_algorithms.proximal_solver()when its assumptions are satisfied.- Raises:
NotImplementedError – Always, because no generic closed-form proximal is provided.
- _sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') list[int][source]#
Sample a batch of indices if indices is “batch”, otherwise use the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
This method uses
batch_sizeto determine the size of the batch. Forindices="batch"withbatch_size<n_samples, batches are sampled without replacement across successive calls until the full dataset is covered (epoch-style sampling). When there are fewer unseen indices left thanbatch_size, the remaining unseen indices are used first, and the rest of the batch is drawn from a newly shuffled epoch.Once a batch is sampled, it is also stored in
batch_usedfor later reference.Override this method for custom sampling strategies. Do not forget to update _last_batch_used accordingly if you override this method.
- Returns:
List of sampled indices.
- Raises:
ValueError – If an invalid string is provided for indices.
- _get_batch_data(indices: EmpiricalRiskIndices = 'batch') Any[source]#
Get training data corresponding to the given batch indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
Make sure to call
_sample_batch_indices()(indices) to handle batch sampling and tracking.
- class decent_bench.costs.EmpiricalRiskCost[source]#
-
Base class for empirical risk cost functions.
This class provides an interface for implementing various empirical risk minimization problems, supporting both full-batch and mini-batch computations. This cost function class is designed to work with
Datasetwhere each datapoint is a tuple of (features, target), or (features, None) for unsupervised learning.All empirical risk values, gradients, and Hessians are defined as means over the selected samples (full dataset or batch), not sums.
Mathematical Definition#
Given a dataset with \(m\) samples \(\{d_i\}_{i=1}^{m}\), the empirical risk is defined as:
\[\mathcal{f}(x) = \frac{1}{m} \sum_{i=1}^{m} \ell(x, d_i)\]- where:
\(x\) are the model parameters
\(\ell\) is the loss function measuring the discrepancy between predictions and true targets
Stochastic Variant#
For large datasets, computing the full empirical risk can be expensive. Instead, a stochastic approximation using a mini-batch of size \(b < m\) is often used:
\[\mathcal{f}(x) = \frac{1}{b} \sum_{i \in \mathcal{B}} \ell(x, d_i)\]where \(\mathcal{B}\) is a randomly sampled batch of \(b\) indices from \(\{1, \ldots, m\}\).
- property batch_used: list[int]#
Indices of samples used in the most recent batch.
- Raises:
ValueError – If no batch has been used yet.
- abstractmethod predict(x: Array, data: list[Array]) Array[source]#
Make predictions using the model parameters x on the given data.
- Returns:
Predicted targets as an array.
- abstractmethod function(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) float[source]#
Evaluate function at x using datapoints at the given indices.
The returned value is the mean loss over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- __add__(other: Cost) Cost[source]#
Add another cost, preserving the empirical-risk abstraction for regularization.
- evaluate(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) float[source]#
Alias for
function().
- loss(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) float[source]#
Alias for
function().
- f(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) float[source]#
Alias for
function().
- abstractmethod gradient(x: Array, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean', **kwargs: Any) Array[source]#
Gradient at x using datapoints at the given indices.
The returned gradient is the mean of per-sample gradients over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- Supported values for reduction are:
“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.
Note
When reduction is None, the returned array will have an additional leading dimension corresponding to the number of samples used. Indexing into this dimension will give the gradient for the respective sample in
batch_used.
- abstractmethod hessian(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) Array[source]#
Hessian at x using datapoints at the given indices.
The returned Hessian is the mean of per-sample Hessians over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Proximal at x using the full dataset.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- _sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') list[int][source]#
Sample a batch of indices if indices is “batch”, otherwise use the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
This method uses
batch_sizeto determine the size of the batch. Forindices="batch"withbatch_size<n_samples, batches are sampled without replacement across successive calls until the full dataset is covered (epoch-style sampling). When there are fewer unseen indices left thanbatch_size, the remaining unseen indices are used first, and the rest of the batch is drawn from a newly shuffled epoch.Once a batch is sampled, it is also stored in
batch_usedfor later reference.Override this method for custom sampling strategies. Do not forget to update _last_batch_used accordingly if you override this method.
- Returns:
List of sampled indices.
- Raises:
ValueError – If an invalid string is provided for indices.
- abstractmethod _get_batch_data(indices: EmpiricalRiskIndices = 'batch') Any[source]#
Get training data corresponding to the given batch indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
Make sure to call
_sample_batch_indices()(indices) to handle batch sampling and tracking.
- class decent_bench.costs.FractionalQuadraticRegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU, prox_max_iter: int = 100, prox_tol: float | None = 1e-10)[source]#
Bases:
BaseRegularizerCostNonconvex fractional quadratic regularizer.
\[f(\mathbf{x}) = \sum_i \frac{x_i^2}{1 + x_i^2}\]- __init__(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU, prox_max_iter: int = 100, prox_tol: float | None = 1e-10)[source]#
- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Proximal at x.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- class decent_bench.costs.L1RegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
Bases:
BaseRegularizerCostL1 regularizer cost.
\[f(\mathbf{x}) = \|\mathbf{x}\|_1 = \sum_i |x_i|\]- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Proximal at x.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- class decent_bench.costs.L2RegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
Bases:
BaseRegularizerCostL2 regularizer cost.
\[f(\mathbf{x}) = \frac{1}{2}\|\mathbf{x}\|_2^2\]- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Proximal at x.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- class decent_bench.costs.LinearRegressionCost(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#
Bases:
EmpiricalRiskCostLinear regression cost function.
Given a data matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and target vector \(\mathbf{b} \in \mathbb{R}^{m}\), the linear regression cost function is defined as:
\[ \begin{align}\begin{aligned}f(\mathbf{x}) = \frac{1}{2m} \| \mathbf{Ax} - \mathbf{b} \|^2\\= \frac{1}{2m} \sum_{i = 1}^m (A_i x - b_i)^2\end{aligned}\end{align} \]where \(A_i\) and \(b_i\) are the i-th row of \(\mathbf{A}\) and the i-th element of \(\mathbf{b}\) respectively.
In the stochastic setting, a mini-batch of size \(b < m\) is used to compute the cost and its derivatives. The cost function then becomes:
\[ \begin{align}\begin{aligned}f(\mathbf{x}) = \frac{1}{2b} \| \mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{b}_{\mathcal{B}} \|^2\\= \frac{1}{2b} \sum_{i \in \mathcal{B}} (A_i x - b_i)^2\end{aligned}\end{align} \]where \(\mathcal{B}\) is a sampled batch of \(b\) indices from \(\{1, \ldots, m\}\), \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- __init__(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#
Initialize a LinearRegressionCost instance.
- Parameters:
dataset (Dataset) – Dataset containing features and targets. The expected shapes are: - Features: (n_features,) - Targets: single dimensional values
batch_size (EmpiricalRiskBatchSize) – Size of mini-batches for stochastic methods, or “all” for full-batch.
- Raises:
ValueError – If input dimensions are inconsistent or batch_size is invalid.
TypeError – If dataset targets are not single dimensional values.
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
The cost function’s smoothness constant.
\[\max_{i} \left| \frac{1}{m} \lambda_i \right|\]where \(\lambda_i\) are the eigenvalues of \(\frac{1}{m}\mathbf{A}^T \mathbf{A}\).
For the general definition, see
Cost.m_smooth.
- property m_cvx: float[source]#
The cost function’s convexity constant.
\[\begin{split}\begin{array}{ll} \frac{1}{m} \min_i \lambda_i, & \text{if } \min_i \lambda_i > 0, \\ 0, & \text{if } \min_i \lambda_i = 0, \\ \text{NaN}, & \text{if } \min_i \lambda_i < 0 \end{array}\end{split}\]where \(\lambda_i\) are the eigenvalues of \(\frac{1}{m}\mathbf{A}^T \mathbf{A}\).
For the general definition, see
Cost.m_cvx.
- predict(x: ndarray[tuple[Any, ...], dtype[float64]], data: list[ndarray[tuple[Any, ...], dtype[float64]]]) ndarray[tuple[Any, ...], dtype[float64]][source]#
Make predictions at x on the given data.
The predicted targets are computed as \(\mathbf{Ax}\).
- Parameters:
x – Point to make predictions at.
data – List of NDArray containing data to make predictions on.
- Returns:
Predicted targets as an array.
- function(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') float[source]#
Evaluate function at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
If no batching is used, this is:
\[\frac{1}{2m} \| \mathbf{Ax} - \mathbf{b} \|^2\]If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[\frac{1}{2b} \| \mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{b}_{\mathcal{B}} \|^2\]where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- gradient(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean') ndarray[tuple[Any, ...], dtype[float64]][source]#
Gradient at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- Supported values for reduction are:
“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.
If no batching is used, this is:
\[\frac{1}{m}(\mathbf{A}^T\mathbf{Ax} - \mathbf{A}^T \mathbf{b})\]If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[\frac{1}{b}(\mathbf{A}_{\mathcal{B}}^T\mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{A}_{\mathcal{B}}^T \mathbf{b}_{\mathcal{B}})\]where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
Note
When reduction is None, the returned array will have an additional leading dimension corresponding to the number of samples used. Indexing into this dimension will give the gradient for the respective sample in
batch_used.
- hessian(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') ndarray[tuple[Any, ...], dtype[float64]][source]#
Hessian at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
If no batching is used, this is:
\[\frac{1}{m}\mathbf{A}^T\mathbf{A}\]If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[\frac{1}{b}\mathbf{A}_{\mathcal{B}}^T \mathbf{A}_{\mathcal{B}}\]where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- proximal(x: ndarray[tuple[Any, ...], dtype[float64]], penalty: float) ndarray[tuple[Any, ...], dtype[float64]][source]#
Proximal at x using the full dataset.
The proximal operator for the linear regression cost function is given by:
\[\frac{1}{m}(\rho \mathbf{A}^T \mathbf{A} + \mathbf{I})^{-1} (\mathbf{x} + \rho \mathbf{A}^T\mathbf{b})\]where \(\rho > 0\) is the penalty. This is a closed form solution.
- class decent_bench.costs.LogisticRegressionCost(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#
Bases:
EmpiricalRiskCostLogistic regression cost function.
Given a data matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and target vector \(\mathbf{b} \in \mathbb{R}^{m}\), the logistic regression cost function is defined as:
\[ \begin{align}\begin{aligned}f(\mathbf{x}) = -\frac{1}{m}\left[ \mathbf{b}^T \log( \sigma(\mathbf{Ax}) ) + ( \mathbf{1} - \mathbf{b} )^T \log( 1 - \sigma(\mathbf{Ax}) ) \right]\\= -\frac{1}{m}\sum_{i = 1}^m \left[ b_i \log( \sigma(A_i x) ) + (1 - b_i) \log( 1 - \sigma(A_i x) ) \right]\end{aligned}\end{align} \]where \(\sigma(z) = \frac{1}{1 + e^{-z}}\) is the sigmoid function, \(A_i\) and \(b_i\) are the i-th row of \(\mathbf{A}\) and the i-th element of \(\mathbf{b}\) respectively.
In the stochastic setting, a mini-batch of size \(b < m\) is used to compute the cost and its derivatives. The cost function then becomes:
\[ \begin{align}\begin{aligned}f(\mathbf{x}) = -\frac{1}{b} \left[ \mathbf{b}_{\mathcal{B}}^T \log( \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) + ( \mathbf{1} - \mathbf{b}_{\mathcal{B}} )^T \log( 1 - \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) \right]\\= -\frac{1}{b} \sum_{i \in \mathcal{B}} \left[ b_i \log( \sigma(A_i x) ) + (1 - b_i) \log( 1 - \sigma(A_i x) ) \right]\end{aligned}\end{align} \]where \(\mathcal{B}\) is a sampled batch of \(b\) indices from \(\{1, \ldots, m\}\), \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- __init__(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#
Initialize logistic regression cost function.
- Parameters:
dataset (Dataset) – Dataset containing features and targets. The expected shapes are: - Features: (n_features,) - Targets: single dimensional values
batch_size (EmpiricalRiskBatchSize) – Size of mini-batch to use for stochastic methods. If “all”, full-batch methods are used.
- Raises:
ValueError – If input dimensions are incorrect or batch_size is invalid.
TypeError – If dataset targets are not single dimensional values.
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
The cost function’s smoothness constant.
\[\frac{1}{m} \frac{m}{4} \max_i \|\mathbf{A}_i\|^2 = \frac{1}{4} \max_i \|\mathbf{A}_i\|^2\]where m is the number of rows in \(\mathbf{A}\).
For the general definition, see
Cost.m_smooth.
- property m_cvx: float#
The cost function’s convexity constant, 0.
For the general definition, see
Cost.m_cvx.
- predict(x: ndarray[tuple[Any, ...], dtype[float64]], data: list[ndarray[tuple[Any, ...], dtype[float64]]]) ndarray[tuple[Any, ...], dtype[float64]][source]#
Make predictions at x on the given data.
The predicted targets are computed as \(\sigma(\mathbf{Ax}) > 0.5\), where \(\sigma\) is the sigmoid function.
- Parameters:
x – Point to make predictions at.
data – List of NDArray containing data to make predictions on.
- Returns:
Predicted targets as an array.
- function(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') float[source]#
Evaluate function at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
If no batching is used, this is:
\[-\frac{1}{m}\left[ \mathbf{b}^T \log( \sigma(\mathbf{Ax}) ) + ( \mathbf{1} - \mathbf{b} )^T \log( 1 - \sigma(\mathbf{Ax}) ) \right]\]If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[-\frac{1}{b} \left[ \mathbf{b}_{\mathcal{B}}^T \log( \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) + ( \mathbf{1} - \mathbf{b}_{\mathcal{B}} )^T \log( 1 - \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) \right]\]where \(\sigma\) is the sigmoid function, \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- gradient(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean') ndarray[tuple[Any, ...], dtype[float64]][source]#
Gradient at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- Supported values for reduction are:
“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.
If no batching is used, this is:
\[\frac{1}{m}\mathbf{A}^T (\sigma(\mathbf{Ax}) - \mathbf{b})\]If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[\frac{1}{b} \mathbf{A}_{\mathcal{B}}^T (\sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) - \mathbf{b}_{\mathcal{B}})\]where \(\sigma\) is the sigmoid function, \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
Note
When reduction is None, the returned array will have an additional leading dimension corresponding to the number of samples used. Indexing into this dimension will give the gradient for the respective sample in
batch_used.
- hessian(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') ndarray[tuple[Any, ...], dtype[float64]][source]#
Hessian at x using datapoints at the given indices.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
If no batching is used, this is:
\[\frac{1}{m}\mathbf{A}^T \mathbf{DA}\]where \(\sigma\) is the sigmoid function and \(\mathbf{D}\) is a diagonal matrix such that \(\mathbf{D}_i = \sigma(\mathbf{Ax}_i) (1-\sigma(\mathbf{Ax}_i))\)
If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with
batch_sizesamples.\[\frac{1}{b} \mathbf{A}_{\mathcal{B}}^T \mathbf{D}_{\mathcal{B}} \mathbf{A}_{\mathcal{B}}\]where \(\mathbf{A}_B\) and \(\mathbf{D}_B\) are the rows corresponding to the batch \(\mathcal{B}\).
- proximal(x: Array, penalty: float) Array[source]#
Proximal at x solved using an iterative method.
The proximal for logistic regression does not have closed form solution, will use a gradient based approximation method over the entire dataset, over at most 100 iterations.
See
Cost.proximal()for the general proximal definition.
- class decent_bench.costs.PyTorchCost(dataset: Dataset, model: torch.nn.Module, loss_fn: torch.nn.Module, final_activation: torch.nn.Module | None = None, *, batch_size: EmpiricalRiskBatchSize = 'all', max_batch_size: int | None = None, device: SupportedDevices = SupportedDevices.CPU, use_dataloader: bool = False, dataloader_kwargs: dict[str, Any] | None = None, load_dataset: bool = True, compile_model: bool = False, compile_kwargs: dict[str, Any] | None = None)[source]#
Bases:
EmpiricalRiskCostCost function wrapper for PyTorch neural networks that integrates with the decentralized optimization framework.
Supports batch-based training and gradient computation for decentralized learning scenarios.
Note
It is generally recommended to set agent_state_snapshot_period to a value greater than 1 when using PyTorchCost, as recording the full model parameters at every iteration can be expensive.
- __init__(dataset: Dataset, model: torch.nn.Module, loss_fn: torch.nn.Module, final_activation: torch.nn.Module | None = None, *, batch_size: EmpiricalRiskBatchSize = 'all', max_batch_size: int | None = None, device: SupportedDevices = SupportedDevices.CPU, use_dataloader: bool = False, dataloader_kwargs: dict[str, Any] | None = None, load_dataset: bool = True, compile_model: bool = False, compile_kwargs: dict[str, Any] | None = None)[source]#
Initialize the PyTorch cost function.
- Parameters:
dataset (Dataset) – Dataset partition containing features and targets. Transformations should be applied beforehand such as converting to tensors. See torch.utils.data.Dataset for details.
model (torch.nn.Module) – PyTorch neural network model.
loss_fn – (torch.nn.Module): PyTorch loss function.
final_activation (torch.nn.Module | None) – Optional final activation layer to apply after model output when predicting targets using
predict(). E.g., argmax if classification and model outputs logits.batch_size (EmpiricalRiskBatchSize) – Size of mini-batches for stochastic methods, or “all” for full-batch.
max_batch_size (int | None) – Optional maximum batch size to perform computations in, which can be used to avoid out-of-memory errors for large models/datasets. If specified, computations will be calculated in chunks of size at most max_batch_size. This limit will be applied to all computations irregardless of the batch_size or indices parameters; the result will still be the same. This is especially useful for when indices is set to “all” but the dataset is too large to fit in memory at once. If not specified, it will default to the batch_size (if batch_size is an int) or the total number of samples (if batch_size is “all”).
device (SupportedDevices) – Device to run computations on. Make sure to test CPU vs GPU performance for your specific model and dataset, as it can vary.
use_dataloader (bool) – Whether to use DataLoader for batching. Can be beneficial for large datasets which can’t fit into memory or when using an accelerator. Dataloaders cannot be pickled so resumption of iterrupted runs will start with a new random batch order.
dataloader_kwargs (dict | None) – Additional arguments for the DataLoader.
load_dataset (bool) – If True, loads the entire dataset into memory to optimize data access. This may lead to major speedups if the dataset is lazily loaded (e.g., loading data from disk), but it might increase memory usage so set to False if memory is an issue. Setting this to False might break checkpointing if the underlying dataset is not pickleable.
compile_model (bool) – Whether to compile the model using torch.compile for performance. May improve speed after warm-up. Might need to try different modes based on the model and OS, use compile_kwargs.
compile_kwargs (dict | None) – Additional arguments for torch.compile. Commonly used mode is “reduce_overhead” for performance optimization. See https://pytorch.org/docs/stable/generated/torch.compile.html for details.
- Raises:
ImportError – If PyTorch is not available
ValueError – If batch_size is larger than the number of samples in the dataset
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- predict(x: torch.Tensor, data: list[torch.Tensor]) list[torch.Tensor][source]#
Make predictions at x on the given data.
- Parameters:
x – Point to make predictions at.
data – List of torch.Tensor containing features to make predictions on.
- Returns:
Predicted targets as an array
- Raises:
TypeError – If data is not a list of torch.Tensor or a single torch.Tensor.
- function(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch') float[source]#
Evaluate function at x using datapoints at the given indices.
The returned value is the mean loss over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- gradient(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean') torch.Tensor[source]#
Gradient at x using datapoints at the given indices.
The returned gradient is the mean of per-sample gradients over the selected samples.
- Supported values for indices are:
int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with
batch_sizesamples.
- Supported values for reduction are:
“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.
Note
When reduction is None, the returned array will have an additional leading dimension corresponding to the number of samples used. Indexing into this dimension will give the gradient for the respective sample in
batch_used.
- hessian(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch') torch.Tensor[source]#
Compute the Hessian matrix.
Note
This is computationally expensive for neural networks and typically not used.
- Raises:
NotImplementedError – Always raised to indicate Hessian computation is not implemented.
- proximal(x: torch.Tensor, penalty: float) torch.Tensor[source]#
Compute the proximal operator.
Note
This is computationally expensive for neural networks and typically not used.
- Raises:
NotImplementedError – Always raised to indicate proximal computation is not implemented.
- init_local_training(opt_cls: type[torch.optim.Optimizer], opt_kwargs: dict[str, Any] | None = None, sched_cls: type[torch.optim.lr_scheduler.LRScheduler] | None = None, sched_kwargs: dict[str, Any] | None = None) None[source]#
Initialize the optimizer and scheduler for local training.
This method is required to be called before using
local_training()to set up the optimizer and scheduler.- Parameters:
opt_cls (type[torch.optim.Optimizer]) – PyTorch optimizer class to use for local training.
opt_kwargs (dict[str, Any] | None) – Keyword arguments for initializing the optimizer. The model parameters will be passed as the first argument, so do not include them in opt_kwargs.
sched_cls (type[torch.optim.lr_scheduler.LRScheduler] | None) – Optional PyTorch learning rate scheduler class to use for local training. The scheduler will be stepped once at the end of each call to
local_training().sched_kwargs (dict[str, Any] | None) – Keyword arguments for initializing the scheduler. The optimizer will be passed as the first argument, so do not include it in sched_kwargs.
- Raises:
RuntimeError – If the optimizer is already initialized. This method is intended to be called only once to set the optimizer for local training.
- local_training(x: torch.Tensor, iterations: int, agent: Agent, regularization: torch.Tensor | Callable[[torch.Tensor], torch.Tensor] | None, indices: EmpiricalRiskIndices = 'batch') torch.Tensor[source]#
Perform local training steps using the provided optimizer.
Note
This method is intended to be used in decentralized algorithms that support local training.
- Parameters:
x (torch.Tensor) – Initial parameters to start local training from.
iterations (int) – Number of local training iterations to perform.
agent (Agent) – The agent performing the local training.
regularization (torch.Tensor | Callable[[torch.Tensor], torch.Tensor] | None) –
Optional regularization. Two forms are supported:
Scalar tensor (or callable returning a scalar): interpreted as an additive loss penalty.
Flat tensor with the same number of elements as the flattened parameter vector: interpreted as a parameter-space correction step applied after each optimizer step (i.e., \(x \leftarrow x - r\)).
indices (EmpiricalRiskIndices) – Indices of the samples to use for local training.
- Returns:
Updated parameters after local training.
- Return type:
- Raises:
RuntimeError – If no optimizer was provided during initialization.
ValueError – If regularization is a non-scalar tensor but does not have the same number of elements as the flattened parameter vector.
TypeError – If regularization is not a torch.Tensor or a callable returning a torch.Tensor.
- _sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') list[int][source]#
Not used in PyTorchCost, implemented in _get_batch_data.
- _get_batch_data(indices: EmpiricalRiskIndices = 'batch') list[tuple[torch.Tensor, torch.Tensor]][source]#
Get a list of batch data for the specified indices, each list item contains a tuple of (batch_x, batch_y).
The max size of each batch is determined by self._max_batch_size.
- Raises:
RuntimeError – If batch data could not be retrieved, which should not happen under normal circumstances
- class decent_bench.costs.QuadraticCost(A: Array, b: Array, c: float = 0)[source]#
Bases:
CostQuadratic cost function.
\[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T \mathbf{Ax} + \mathbf{b}^T \mathbf{x} + c\]- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
The cost function’s smoothness constant.
\[\max_{i} \left| \lambda_i \right|\]where \(\lambda_i\) are the eigenvalues of \(\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\).
For the general definition, see
Cost.m_smooth.
- property m_cvx: float[source]#
The cost function’s convexity constant.
\[\begin{split}\begin{array}{ll} \min_i \lambda_i, & \text{if } \min_i \lambda_i > 0, \\ 0, & \text{if } \min_i \lambda_i = 0, \\ \text{NaN}, & \text{if } \min_i \lambda_i < 0 \end{array}\end{split}\]where \(\lambda_i\) are the eigenvalues of \(\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\).
For the general definition, see
Cost.m_cvx.
- function(x: ndarray[tuple[Any, ...], dtype[float64]]) float[source]#
Evaluate function at x.
\[\frac{1}{2} \mathbf{x}^T \mathbf{Ax} + \mathbf{b}^T \mathbf{x} + c\]
- gradient(x: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]#
Gradient at x.
\[\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\mathbf{x} + \mathbf{b}\]
- hessian(x: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]#
Hessian at x.
\[\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\]
- proximal(x: ndarray[tuple[Any, ...], dtype[float64]], penalty: float) ndarray[tuple[Any, ...], dtype[float64]][source]#
Proximal at x.
\[(\frac{\rho}{2} (\mathbf{A} + \mathbf{A}^T) + \mathbf{I})^{-1} (\mathbf{x} - \rho \mathbf{b})\]where \(\rho > 0\) is the penalty.
This is a closed form solution, see
Cost.proximal()for the general proximal definition.
- class decent_bench.costs.ScaledCost(cost: Cost, scalar: float)[source]#
Bases:
CostGeneric scalar wrapper for arbitrary costs.
ScaledCostis the fallback result of scalar arithmetic when no more specialized wrapper is available. It delegates evaluation, gradient, Hessian, and metadata to the wrapped cost, and preserves proximal support only for nonnegative scalars.Instances keep references to the wrapped cost objects. No implicit copying is performed; use
copy.deepcopy()explicitly if independent objects are required.- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- proximal(x: Array, penalty: float, *args: Any, **kwargs: Any) Array[source]#
Proximal at x.
The proximal operator is defined as:
\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]where \(\rho > 0\) is the penalty and \(f\) the cost function.
If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using
proximal_solver().
- __add__(other: Cost) Cost[source]#
Add another cost function to create a new one.
The generic fallback returns
SumCost([self, other]). Subclasses can override this to preserve specialized structure when the result remains in the same abstraction. For example, the addition of twoQuadraticCostobjects benefits from returning a newQuadraticCostinstead of aSumCostas this preserves the closed form proximal solution and only requires one evaluation instead of two when callingfunction(),gradient(), andhessian().
- class decent_bench.costs.SumCost(costs: list[Cost])[source]#
Bases:
CostGeneric additive fallback for cost composition.
SumCostis returned when two costs can be added but no more specialized composite is available. It preserves the coreCostinterface, but does not preserve regularizer-specific or empirical-risk-specific behavior.Instances keep references to the wrapped cost objects. No implicit copying is performed; use
copy.deepcopy()explicitly if independent objects are required.- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
The cost function’s smoothness constant.
\[\sum m_{\text{smooth}, k}\]where \(m_{\text{smooth}, k}\) is the smoothness constant of each individual cost function \(f_k\). If any \(m_{\text{smooth}, k} = \text{NaN}\), the result is \(\text{NaN}\).
For the general definition, see
Cost.m_smooth.
- property m_cvx: float[source]#
The cost function’s convexity constant.
\[\sum m_{\text{cvx}, k}\]where \(m_{\text{cvx}, k}\) is the convexity constant of each individual cost function \(f_k\). If any \(m_{\text{cvx}, k} = \text{NaN}\), the result is \(\text{NaN}\).
For the general definition, see
Cost.m_cvx.
- function(x: Array, *args: Any, **kwargs: Any) float[source]#
Sum the
Cost.functionof each cost function.
- gradient(x: Array, *args: Any, **kwargs: Any) Array[source]#
Sum the
Cost.gradientof each cost function.
- hessian(x: Array, *args: Any, **kwargs: Any) Array[source]#
Sum the
Cost.hessianof each cost function.
- proximal(x: Array, penalty: float, *args: Any, **kwargs: Any) Array[source]#
Approximate the proximal of the full summed objective.
SumCostcomputes its proximal throughdecent_bench.centralized_algorithms.proximal_solver(), which solves the proximal subproblem for the full summed objective using accelerated gradient descent. Extraargsandkwargsare ignored.- Raises:
NotImplementedError – If the accelerated-gradient backend assumptions are not satisfied.
- class decent_bench.costs.ZeroCost(shape: tuple[int, ...], framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
Bases:
CostA cost function that is identically zero.
This function is used as default for the server in
FedNetwork.- __init__(shape: tuple[int, ...], framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#
- property framework: SupportedFrameworks#
The framework used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this framework.
- property device: SupportedDevices#
The device used by this cost function.
Make sure that all
decent_bench.utils.array.Arrayobjects returned by this cost function’s methods use this device.
- property m_smooth: float[source]#
Lipschitz constant of the cost function’s gradient.
The gradient’s Lipschitz constant m_smooth is the smallest value such that
\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
non-negative finite number if function is L-smooth
np.infif function is differentiable everywhere but not L-smoothnp.nanif function is not differentiable everywhere
- property m_cvx: float[source]#
Convexity constant of the cost function.
The convexity constant m_cvx is the largest value such that
\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).
- Returns:
positive finite number if function is strongly convex
0if function is convex but not strongly convexnp.nanif function is not guaranteed to be convex
- proximal(x: Array, penalty: float, **kwargs: Any) Array[source]#
Return
xunchanged.Since
ZeroCostis identically zero, its proximal operator is the identity map. The method still validates thatpenaltyis positive and thatxhas the expected shape.- Raises:
ValueError – if
penaltyis not positive orxhas the wrong shape.
- __add__(other: Cost) Cost[source]#
Add another cost function to create a new one.
The generic fallback returns
SumCost([self, other]). Subclasses can override this to preserve specialized structure when the result remains in the same abstraction. For example, the addition of twoQuadraticCostobjects benefits from returning a newQuadraticCostinstead of aSumCostas this preserves the closed form proximal solution and only requires one evaluation instead of two when callingfunction(),gradient(), andhessian().