decent_bench.costs#

Cost composition rules.

Developer note:

Generic cost arithmetic falls back to SumCost and ScaledCost.

Regularizers preserve their abstraction under +, -, unary -, *, and / by returning BaseRegularizerCost-aware composites. Empirical-risk costs preserve their abstraction under scalar scaling through a private empirical scaling wrapper, and under empirical + regularizer through EmpiricalRegularizedCost. Unsupported mixed compositions still fall back to the generic wrappers.

EmpiricalRegularizedCost.gradient uses broadcast semantics when reduction=None: it returns one composite gradient per sample by adding the regularizer gradient to each per-sample empirical gradient. Averaging over the leading sample dimension recovers the composite mean gradient.

Composition wrappers keep references to their underlying cost objects; they do not make implicit shallow or deep copies at construction time. Mutating a wrapped cost after composition therefore affects the composite view as well. Agent-installed call-counting hooks on reused cost objects are therefore shared too. Use copy.deepcopy() explicitly when an independent copy of a composed objective or independent counting behavior is needed.

Proximal support is intentionally conservative for the specialized wrappers: concrete costs may implement specialized proximals, positive scalar scaling preserves proximal support, and a single positively scaled regularizer term preserves regularizer proximal support. SumCost computes the proximal of the full summed objective through decent_bench.centralized_algorithms.proximal_solver() when that accelerated-gradient backend is applicable. Multi-term regularizer composites and EmpiricalRegularizedCost do not provide a generic proximal. Use a specialized proximal if one exists, or use decent_bench.centralized_algorithms.proximal_solver() when applicable.

class decent_bench.costs.BaseRegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

Bases: Cost

Base class for regularizers with regularizer-preserving arithmetic.

Adding, subtracting, negating, scaling, or dividing regularizers returns another regularizer subclass instead of falling back immediately to generic SumCost or ScaledCost. This preserves regularizer-specific structure and can improve performance. Mixing a regularizer with an arbitrary non-regularizer still falls back to generic cost composition.

__init__(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

__add__(other: BaseRegularizerCost) → BaseRegularizerCost[source]#
__add__(other: Cost) → Cost: Add another cost, preserving the regularizer abstraction when possible.

class decent_bench.costs.Cost[source]#

Bases: ABC

Used by agents to evaluate the cost and its derivatives at a certain x.

abstract property shape: tuple[int, ...]#: Required shape of x.

property domain_shape: tuple[int, ...]#: Alias for shape.

property size: int[source]#: Number of elements in x.

abstract property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

abstract property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

abstract property m_smooth: float#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

abstract property m_cvx: float#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

abstractmethod function(x: Array, **kwargs: Any) → float[source]#: Evaluate function at x.

evaluate(x: Array, **kwargs: Any) → float[source]#: Alias for function().

loss(x: Array, **kwargs: Any) → float[source]#: Alias for function().

f(x: Array, **kwargs: Any) → float[source]#: Alias for function().

abstractmethod gradient(x: Array, **kwargs: Any) → Array[source]#: Gradient at x.

abstractmethod hessian(x: Array, **kwargs: Any) → Array[source]#: Hessian at x.

abstractmethod proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Proximal at x.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

__add__(other: Cost) → Cost[source]#

Add another cost function to create a new one.

The generic fallback returns SumCost([self, other]). Subclasses can override this to preserve specialized structure when the result remains in the same abstraction. For example, the addition of two QuadraticCost objects benefits from returning a new QuadraticCost instead of a SumCost as this preserves the closed form proximal solution and only requires one evaluation instead of two when calling function(), gradient(), and hessian().

class decent_bench.costs.EmpiricalRegularizedCost(empirical_cost: EmpiricalRiskCost, regularizer: BaseRegularizerCost)[source]#

Bases: EmpiricalRiskCost

Composite objective of an empirical risk term plus a regularizer.

This wrapper preserves empirical-risk-specific behavior from the empirical term, including predict(), dataset access, and batch metadata, while combining function, gradient, and Hessian values with the regularizer. When gradient() is called with reduction=None, the regularizer gradient is broadcast across the leading sample dimension so that averaging over samples recovers the composite mean gradient. A generic proximal is intentionally not implemented.

Instances keep references to the wrapped cost objects. No implicit copying is performed; use copy.deepcopy() explicitly if independent objects are required.

__init__(empirical_cost: EmpiricalRiskCost, regularizer: BaseRegularizerCost)[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property n_samples: int#: Total number of samples in dataset.

property batch_size: int#: Batch size used for stochastic methods.

property batch_used: list[int]#

Indices of samples used in the most recent batch.

Raises:: ValueError – If no batch has been used yet.

property dataset: Dataset#: Dataset used in the empirical risk cost.

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

predict(x: Array, data: list[Array]) → Array[source]#: Predictions are determined by the empirical term.

function(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → float[source]#

Evaluate function at x using datapoints at the given indices.

The returned value is the mean loss over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

gradient(x: Array, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean', **kwargs: Any) → Array[source]#

Gradient of the empirical objective plus regularizer.

When reduction="mean", this returns the mean empirical gradient over the selected samples plus the regularizer gradient.

When reduction=None, this returns one gradient per selected sample with the regularizer gradient broadcast along the leading sample dimension. Averaging the result over that leading dimension recovers the composite gradient returned by reduction="mean".

hessian(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → Array[source]#

Hessian at x using datapoints at the given indices.

The returned Hessian is the mean of per-sample Hessians over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Raise NotImplementedError for the generic proximal of an empirical cost plus regularizer.

This wrapper preserves evaluation, gradient, and Hessian structure, but does not imply a closed-form proximal. Use a specialized composite cost if one exists, or use decent_bench.centralized_algorithms.proximal_solver() when its assumptions are satisfied.

Raises:: NotImplementedError – Always, because no generic closed-form proximal is provided.

_sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') → list[int][source]#

Sample a batch of indices if indices is “batch”, otherwise use the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

This method uses batch_size to determine the size of the batch. For indices="batch" with batch_size < n_samples, batches are sampled without replacement across successive calls until the full dataset is covered (epoch-style sampling). When there are fewer unseen indices left than batch_size, the remaining unseen indices are used first, and the rest of the batch is drawn from a newly shuffled epoch.

Once a batch is sampled, it is also stored in batch_used for later reference.

Override this method for custom sampling strategies. Do not forget to update _last_batch_used accordingly if you override this method.

Returns:: List of sampled indices.
Raises:: ValueError – If an invalid string is provided for indices.

_get_batch_data(indices: EmpiricalRiskIndices = 'batch') → Any[source]#

Get training data corresponding to the given batch indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Make sure to call _sample_batch_indices() (indices) to handle batch sampling and tracking.

__add__(other: Cost) → Cost[source]#: Add another cost, preserving the empirical-risk abstraction for regularization.

class decent_bench.costs.EmpiricalRiskCost[source]#

Bases: Cost, ABC

Base class for empirical risk cost functions.

This class provides an interface for implementing various empirical risk minimization problems, supporting both full-batch and mini-batch computations. This cost function class is designed to work with Dataset where each datapoint is a tuple of (features, target), or (features, None) for unsupervised learning.

All empirical risk values, gradients, and Hessians are defined as means over the selected samples (full dataset or batch), not sums.

Mathematical Definition#

Given a dataset with \(m\) samples \(\{d_i\}_{i=1}^{m}\), the empirical risk is defined as:

\[\mathcal{f}(x) = \frac{1}{m} \sum_{i=1}^{m} \ell(x, d_i)\]

where:

\(x\) are the model parameters
\(\ell\) is the loss function measuring the discrepancy between predictions and true targets

Stochastic Variant#

For large datasets, computing the full empirical risk can be expensive. Instead, a stochastic approximation using a mini-batch of size \(b < m\) is often used:

\[\mathcal{f}(x) = \frac{1}{b} \sum_{i \in \mathcal{B}} \ell(x, d_i)\]

where \(\mathcal{B}\) is a randomly sampled batch of \(b\) indices from \(\{1, \ldots, m\}\).

abstract property n_samples: int#: Total number of samples in dataset.

abstract property batch_size: int#: Batch size used for stochastic methods.

property batch_used: list[int]#

Indices of samples used in the most recent batch.

Raises:: ValueError – If no batch has been used yet.

abstract property dataset: Dataset#: Dataset used in the empirical risk cost.

abstractmethod predict(x: Array, data: list[Array]) → Array[source]#

Make predictions using the model parameters x on the given data.

Returns:: Predicted targets as an array.

abstractmethod function(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → float[source]#

Evaluate function at x using datapoints at the given indices.

The returned value is the mean loss over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

__add__(other: Cost) → Cost[source]#: Add another cost, preserving the empirical-risk abstraction for regularization.

evaluate(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → float[source]#: Alias for function().

loss(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → float[source]#: Alias for function().

f(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → float[source]#: Alias for function().

abstractmethod gradient(x: Array, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean', **kwargs: Any) → Array[source]#

Gradient at x using datapoints at the given indices.

The returned gradient is the mean of per-sample gradients over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Supported values for reduction are:

“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.

Note

When reduction is None, the returned array will have an additional leading dimension corresponding to the number of samples used. Indexing into this dimension will give the gradient for the respective sample in batch_used.

abstractmethod hessian(x: Array, indices: EmpiricalRiskIndices = 'batch', **kwargs: Any) → Array[source]#

Hessian at x using datapoints at the given indices.

The returned Hessian is the mean of per-sample Hessians over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Proximal at x using the full dataset.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

_sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') → list[int][source]#

Sample a batch of indices if indices is “batch”, otherwise use the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Once a batch is sampled, it is also stored in batch_used for later reference.

Override this method for custom sampling strategies. Do not forget to update _last_batch_used accordingly if you override this method.

Returns:: List of sampled indices.
Raises:: ValueError – If an invalid string is provided for indices.

abstractmethod _get_batch_data(indices: EmpiricalRiskIndices = 'batch') → Any[source]#

Get training data corresponding to the given batch indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Make sure to call _sample_batch_indices() (indices) to handle batch sampling and tracking.

class decent_bench.costs.FractionalQuadraticRegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU, prox_max_iter: int = 100, prox_tol: float | None = 1e-10)[source]#

Bases: BaseRegularizerCost

Nonconvex fractional quadratic regularizer.

\[f(\mathbf{x}) = \sum_i \frac{x_i^2}{1 + x_i^2}\]

__init__(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU, prox_max_iter: int = 100, prox_tol: float | None = 1e-10)[source]#

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

function(x: Array, **kwargs: Any) → float[source]#: Evaluate function at x.

gradient(x: Array, **kwargs: Any) → Array[source]#: Gradient at x.

hessian(x: Array, **kwargs: Any) → Array[source]#: Hessian at x.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Proximal at x.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

class decent_bench.costs.L1RegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

Bases: BaseRegularizerCost

L1 regularizer cost.

\[f(\mathbf{x}) = \|\mathbf{x}\|_1 = \sum_i |x_i|\]

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

function(x: Array, **kwargs: Any) → float[source]#: Evaluate function at x.

gradient(x: Array, **kwargs: Any) → Array[source]#: Gradient at x.

hessian(x: Array, **kwargs: Any) → Array[source]#: Hessian at x.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Proximal at x.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

class decent_bench.costs.L2RegularizerCost(shape: tuple[int, ...], *, framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

Bases: BaseRegularizerCost

L2 regularizer cost.

\[f(\mathbf{x}) = \frac{1}{2}\|\mathbf{x}\|_2^2\]

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

function(x: Array, **kwargs: Any) → float[source]#: Evaluate function at x.

gradient(x: Array, **kwargs: Any) → Array[source]#: Gradient at x.

hessian(x: Array, **kwargs: Any) → Array[source]#: Hessian at x.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Proximal at x.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

class decent_bench.costs.LinearRegressionCost(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#

Bases: EmpiricalRiskCost

Linear regression cost function.

Given a data matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and target vector \(\mathbf{b} \in \mathbb{R}^{m}\), the linear regression cost function is defined as:

\[ \begin{align}\begin{aligned}f(\mathbf{x}) = \frac{1}{2m} \| \mathbf{Ax} - \mathbf{b} \|^2\\= \frac{1}{2m} \sum_{i = 1}^m (A_i x - b_i)^2\end{aligned}\end{align} \]

where \(A_i\) and \(b_i\) are the i-th row of \(\mathbf{A}\) and the i-th element of \(\mathbf{b}\) respectively.

In the stochastic setting, a mini-batch of size \(b < m\) is used to compute the cost and its derivatives. The cost function then becomes:

\[ \begin{align}\begin{aligned}f(\mathbf{x}) = \frac{1}{2b} \| \mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{b}_{\mathcal{B}} \|^2\\= \frac{1}{2b} \sum_{i \in \mathcal{B}} (A_i x - b_i)^2\end{aligned}\end{align} \]

where \(\mathcal{B}\) is a sampled batch of \(b\) indices from \(\{1, \ldots, m\}\), \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

__init__(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#

Initialize a LinearRegressionCost instance.

Parameters:

dataset (Dataset) – Dataset containing features and targets. The expected shapes are: - Features: (n_features,) - Targets: single dimensional values
batch_size (EmpiricalRiskBatchSize) – Size of mini-batches for stochastic methods, or “all” for full-batch.

Raises:

ValueError – If input dimensions are inconsistent or batch_size is invalid.
TypeError – If dataset targets are not single dimensional values.

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property n_samples: int#: Total number of samples in dataset.

property batch_size: int#: Batch size used for stochastic methods.

property dataset: Dataset#: Dataset used in the empirical risk cost.

property m_smooth: float[source]#

The cost function’s smoothness constant.

\[\max_{i} \left| \frac{1}{m} \lambda_i \right|\]

where \(\lambda_i\) are the eigenvalues of \(\frac{1}{m}\mathbf{A}^T \mathbf{A}\).

For the general definition, see Cost.m_smooth.

property m_cvx: float[source]#

The cost function’s convexity constant.

\[\begin{split}\begin{array}{ll} \frac{1}{m} \min_i \lambda_i, & \text{if } \min_i \lambda_i > 0, \\ 0, & \text{if } \min_i \lambda_i = 0, \\ \text{NaN}, & \text{if } \min_i \lambda_i < 0 \end{array}\end{split}\]

where \(\lambda_i\) are the eigenvalues of \(\frac{1}{m}\mathbf{A}^T \mathbf{A}\).

For the general definition, see Cost.m_cvx.

predict(x: ndarray[tuple[Any, ...], dtype[float64]], data: list[ndarray[tuple[Any, ...], dtype[float64]]]) → ndarray[tuple[Any, ...], dtype[float64]][source]#

Make predictions at x on the given data.

The predicted targets are computed as \(\mathbf{Ax}\).

Parameters:

x – Point to make predictions at.
data – List of NDArray containing data to make predictions on.

Returns:

Predicted targets as an array.

function(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') → float[source]#

Evaluate function at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

If no batching is used, this is:

\[\frac{1}{2m} \| \mathbf{Ax} - \mathbf{b} \|^2\]

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[\frac{1}{2b} \| \mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{b}_{\mathcal{B}} \|^2\]

where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

gradient(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean') → ndarray[tuple[Any, ...], dtype[float64]][source]#

Gradient at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Supported values for reduction are:

“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.

If no batching is used, this is:

\[\frac{1}{m}(\mathbf{A}^T\mathbf{Ax} - \mathbf{A}^T \mathbf{b})\]

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[\frac{1}{b}(\mathbf{A}_{\mathcal{B}}^T\mathbf{A}_{\mathcal{B}}\mathbf{x} - \mathbf{A}_{\mathcal{B}}^T \mathbf{b}_{\mathcal{B}})\]

where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

Note

hessian(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') → ndarray[tuple[Any, ...], dtype[float64]][source]#

Hessian at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

If no batching is used, this is:

\[\frac{1}{m}\mathbf{A}^T\mathbf{A}\]

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[\frac{1}{b}\mathbf{A}_{\mathcal{B}}^T \mathbf{A}_{\mathcal{B}}\]

where \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

proximal(x: ndarray[tuple[Any, ...], dtype[float64]], penalty: float) → ndarray[tuple[Any, ...], dtype[float64]][source]#

Proximal at x using the full dataset.

The proximal operator for the linear regression cost function is given by:

\[\frac{1}{m}(\rho \mathbf{A}^T \mathbf{A} + \mathbf{I})^{-1} (\mathbf{x} + \rho \mathbf{A}^T\mathbf{b})\]

where \(\rho > 0\) is the penalty. This is a closed form solution.

_get_batch_data(indices: EmpiricalRiskIndices = 'batch') → tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]][source]#: Get data for a batch. Returns A, A.T@A and b for the batch.

class decent_bench.costs.LogisticRegressionCost(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#

Bases: EmpiricalRiskCost

Logistic regression cost function.

Given a data matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and target vector \(\mathbf{b} \in \mathbb{R}^{m}\), the logistic regression cost function is defined as:

\[ \begin{align}\begin{aligned}f(\mathbf{x}) = -\frac{1}{m}\left[ \mathbf{b}^T \log( \sigma(\mathbf{Ax}) ) + ( \mathbf{1} - \mathbf{b} )^T \log( 1 - \sigma(\mathbf{Ax}) ) \right]\\= -\frac{1}{m}\sum_{i = 1}^m \left[ b_i \log( \sigma(A_i x) ) + (1 - b_i) \log( 1 - \sigma(A_i x) ) \right]\end{aligned}\end{align} \]

where \(\sigma(z) = \frac{1}{1 + e^{-z}}\) is the sigmoid function, \(A_i\) and \(b_i\) are the i-th row of \(\mathbf{A}\) and the i-th element of \(\mathbf{b}\) respectively.

In the stochastic setting, a mini-batch of size \(b < m\) is used to compute the cost and its derivatives. The cost function then becomes:

\[ \begin{align}\begin{aligned}f(\mathbf{x}) = -\frac{1}{b} \left[ \mathbf{b}_{\mathcal{B}}^T \log( \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) + ( \mathbf{1} - \mathbf{b}_{\mathcal{B}} )^T \log( 1 - \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) \right]\\= -\frac{1}{b} \sum_{i \in \mathcal{B}} \left[ b_i \log( \sigma(A_i x) ) + (1 - b_i) \log( 1 - \sigma(A_i x) ) \right]\end{aligned}\end{align} \]

where \(\mathcal{B}\) is a sampled batch of \(b\) indices from \(\{1, \ldots, m\}\), \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

__init__(dataset: Dataset, batch_size: EmpiricalRiskBatchSize = 'all')[source]#

Initialize logistic regression cost function.

Parameters:

dataset (Dataset) – Dataset containing features and targets. The expected shapes are: - Features: (n_features,) - Targets: single dimensional values
batch_size (EmpiricalRiskBatchSize) – Size of mini-batch to use for stochastic methods. If “all”, full-batch methods are used.

Raises:

ValueError – If input dimensions are incorrect or batch_size is invalid.
TypeError – If dataset targets are not single dimensional values.

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property n_samples: int#: Total number of samples in dataset.

property batch_size: int#: Batch size used for stochastic methods.

property dataset: Dataset#: Dataset used in the empirical risk cost.

property m_smooth: float[source]#

The cost function’s smoothness constant.

\[\frac{1}{m} \frac{m}{4} \max_i \|\mathbf{A}_i\|^2 = \frac{1}{4} \max_i \|\mathbf{A}_i\|^2\]

where m is the number of rows in \(\mathbf{A}\).

For the general definition, see Cost.m_smooth.

property m_cvx: float#

The cost function’s convexity constant, 0.

For the general definition, see Cost.m_cvx.

predict(x: ndarray[tuple[Any, ...], dtype[float64]], data: list[ndarray[tuple[Any, ...], dtype[float64]]]) → ndarray[tuple[Any, ...], dtype[float64]][source]#

Make predictions at x on the given data.

The predicted targets are computed as \(\sigma(\mathbf{Ax}) > 0.5\), where \(\sigma\) is the sigmoid function.

Parameters:

x – Point to make predictions at.
data – List of NDArray containing data to make predictions on.

Returns:

Predicted targets as an array.

function(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') → float[source]#

Evaluate function at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

If no batching is used, this is:

\[-\frac{1}{m}\left[ \mathbf{b}^T \log( \sigma(\mathbf{Ax}) ) + ( \mathbf{1} - \mathbf{b} )^T \log( 1 - \sigma(\mathbf{Ax}) ) \right]\]

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[-\frac{1}{b} \left[ \mathbf{b}_{\mathcal{B}}^T \log( \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) + ( \mathbf{1} - \mathbf{b}_{\mathcal{B}} )^T \log( 1 - \sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) ) \right]\]

where \(\sigma\) is the sigmoid function, \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

Gradient at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Supported values for reduction are:

“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.

If no batching is used, this is:

\[\frac{1}{m}\mathbf{A}^T (\sigma(\mathbf{Ax}) - \mathbf{b})\]

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[\frac{1}{b} \mathbf{A}_{\mathcal{B}}^T (\sigma(\mathbf{A}_{\mathcal{B}}\mathbf{x}) - \mathbf{b}_{\mathcal{B}})\]

where \(\sigma\) is the sigmoid function, \(\mathbf{A}_B\) and \(\mathbf{b}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

Note

hessian(x: ndarray[tuple[Any, ...], dtype[float64]], indices: EmpiricalRiskIndices = 'batch') → ndarray[tuple[Any, ...], dtype[float64]][source]#

Hessian at x using datapoints at the given indices.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

If no batching is used, this is:

\[\frac{1}{m}\mathbf{A}^T \mathbf{DA}\]

where \(\sigma\) is the sigmoid function and \(\mathbf{D}\) is a diagonal matrix such that \(\mathbf{D}_i = \sigma(\mathbf{Ax}_i) (1-\sigma(\mathbf{Ax}_i))\)

If indices is “batch”, a random batch \(\mathcal{B}\) is drawn with batch_size samples.

\[\frac{1}{b} \mathbf{A}_{\mathcal{B}}^T \mathbf{D}_{\mathcal{B}} \mathbf{A}_{\mathcal{B}}\]

where \(\mathbf{A}_B\) and \(\mathbf{D}_B\) are the rows corresponding to the batch \(\mathcal{B}\).

proximal(x: Array, penalty: float) → Array[source]#

Proximal at x solved using an iterative method.

The proximal for logistic regression does not have closed form solution, will use a gradient based approximation method over the entire dataset, over at most 100 iterations.

See Cost.proximal() for the general proximal definition.

_get_batch_data(indices: EmpiricalRiskIndices = 'batch') → tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]][source]#: Get data for a batch. Returns A and b for the batch.

class decent_bench.costs.PyTorchCost(dataset: Dataset, model: torch.nn.Module, loss_fn: torch.nn.Module, final_activation: torch.nn.Module | None = None, *, batch_size: EmpiricalRiskBatchSize = 'all', max_batch_size: int | None = None, device: SupportedDevices = SupportedDevices.CPU, use_dataloader: bool = False, dataloader_kwargs: dict[str, Any] | None = None, load_dataset: bool = True, compile_model: bool = False, compile_kwargs: dict[str, Any] | None = None)[source]#

Bases: EmpiricalRiskCost

Cost function wrapper for PyTorch neural networks that integrates with the decentralized optimization framework.

Supports batch-based training and gradient computation for decentralized learning scenarios.

Note

It is generally recommended to set agent_state_snapshot_period to a value greater than 1 when using PyTorchCost, as recording the full model parameters at every iteration can be expensive.

__init__(dataset: Dataset, model: torch.nn.Module, loss_fn: torch.nn.Module, final_activation: torch.nn.Module | None = None, *, batch_size: EmpiricalRiskBatchSize = 'all', max_batch_size: int | None = None, device: SupportedDevices = SupportedDevices.CPU, use_dataloader: bool = False, dataloader_kwargs: dict[str, Any] | None = None, load_dataset: bool = True, compile_model: bool = False, compile_kwargs: dict[str, Any] | None = None)[source]#

Initialize the PyTorch cost function.

Parameters:

dataset (Dataset) – Dataset partition containing features and targets. Transformations should be applied beforehand such as converting to tensors. See torch.utils.data.Dataset for details.
model (torch.nn.Module) – PyTorch neural network model.
loss_fn – (torch.nn.Module): PyTorch loss function.
final_activation (torch.nn.Module | None) – Optional final activation layer to apply after model output when predicting targets using predict(). E.g., argmax if classification and model outputs logits.
batch_size (EmpiricalRiskBatchSize) – Size of mini-batches for stochastic methods, or “all” for full-batch.
max_batch_size (int | None) – Optional maximum batch size to perform computations in, which can be used to avoid out-of-memory errors for large models/datasets. If specified, computations will be calculated in chunks of size at most max_batch_size. This limit will be applied to all computations irregardless of the batch_size or indices parameters; the result will still be the same. This is especially useful for when indices is set to “all” but the dataset is too large to fit in memory at once. If not specified, it will default to the batch_size (if batch_size is an int) or the total number of samples (if batch_size is “all”).
device (SupportedDevices) – Device to run computations on. Make sure to test CPU vs GPU performance for your specific model and dataset, as it can vary.
use_dataloader (bool) – Whether to use DataLoader for batching. Can be beneficial for large datasets which can’t fit into memory or when using an accelerator. Dataloaders cannot be pickled so resumption of iterrupted runs will start with a new random batch order.
dataloader_kwargs (dict | None) – Additional arguments for the DataLoader.
load_dataset (bool) – If True, loads the entire dataset into memory to optimize data access. This may lead to major speedups if the dataset is lazily loaded (e.g., loading data from disk), but it might increase memory usage so set to False if memory is an issue. Setting this to False might break checkpointing if the underlying dataset is not pickleable.
compile_model (bool) – Whether to compile the model using torch.compile for performance. May improve speed after warm-up. Might need to try different modes based on the model and OS, use compile_kwargs.
compile_kwargs (dict | None) – Additional arguments for torch.compile. Commonly used mode is “reduce_overhead” for performance optimization. See https://pytorch.org/docs/stable/generated/torch.compile.html for details.

Raises:

ImportError – If PyTorch is not available
ValueError – If batch_size is larger than the number of samples in the dataset

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property n_samples: int#: Total number of samples in dataset.

property batch_size: int#: Batch size used for stochastic methods.

property dataset: Dataset#: Dataset used in the empirical risk cost.

property m_smooth: float#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

predict(x: torch.Tensor, data: list[torch.Tensor]) → list[torch.Tensor][source]#

Make predictions at x on the given data.

Parameters:

x – Point to make predictions at.
data – List of torch.Tensor containing features to make predictions on.

Returns:

Predicted targets as an array

Raises:

TypeError – If data is not a list of torch.Tensor or a single torch.Tensor.

function(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch') → float[source]#

Evaluate function at x using datapoints at the given indices.

The returned value is the mean loss over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

gradient(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch', reduction: EmpiricalRiskReduction = 'mean') → torch.Tensor[source]#

Gradient at x using datapoints at the given indices.

The returned gradient is the mean of per-sample gradients over the selected samples.

Supported values for indices are:

int: datapoint to use.
list[int]: datapoints to use.
“all”: use the full dataset.
“batch”: draw a batch with batch_size samples.

Supported values for reduction are:

“mean”: average the gradients over the samples.
None: return the gradients for each sample, index as the first dimension.

Note

hessian(x: torch.Tensor, indices: EmpiricalRiskIndices = 'batch') → torch.Tensor[source]#

Compute the Hessian matrix.

Note

This is computationally expensive for neural networks and typically not used.

Raises:: NotImplementedError – Always raised to indicate Hessian computation is not implemented.

proximal(x: torch.Tensor, penalty: float) → torch.Tensor[source]#

Compute the proximal operator.

Note

This is computationally expensive for neural networks and typically not used.

Raises:: NotImplementedError – Always raised to indicate proximal computation is not implemented.

init_local_training(opt_cls: type[torch.optim.Optimizer], opt_kwargs: dict[str, Any] | None = None, sched_cls: type[torch.optim.lr_scheduler.LRScheduler] | None = None, sched_kwargs: dict[str, Any] | None = None) → None[source]#

Initialize the optimizer and scheduler for local training.

This method is required to be called before using local_training() to set up the optimizer and scheduler.

Parameters:

opt_cls (type[torch.optim.Optimizer]) – PyTorch optimizer class to use for local training.
opt_kwargs (dict[str, Any] | None) – Keyword arguments for initializing the optimizer. The model parameters will be passed as the first argument, so do not include them in opt_kwargs.
sched_cls (type[torch.optim.lr_scheduler.LRScheduler] | None) – Optional PyTorch learning rate scheduler class to use for local training. The scheduler will be stepped once at the end of each call to local_training().
sched_kwargs (dict[str, Any] | None) – Keyword arguments for initializing the scheduler. The optimizer will be passed as the first argument, so do not include it in sched_kwargs.

Raises:

RuntimeError – If the optimizer is already initialized. This method is intended to be called only once to set the optimizer for local training.

local_training(x: torch.Tensor, iterations: int, agent: Agent, regularization: torch.Tensor | Callable[[torch.Tensor], torch.Tensor] | None, indices: EmpiricalRiskIndices = 'batch') → torch.Tensor[source]#

Perform local training steps using the provided optimizer.

Note

This method is intended to be used in decentralized algorithms that support local training.

Parameters:

x (torch.Tensor) – Initial parameters to start local training from.
iterations (int) – Number of local training iterations to perform.
agent (Agent) – The agent performing the local training.
regularization (torch.Tensor | Callable[[torch.Tensor], torch.Tensor] | None) –
Optional regularization. Two forms are supported:
- Scalar tensor (or callable returning a scalar): interpreted as an additive loss penalty.
- Flat tensor with the same number of elements as the flattened parameter vector: interpreted as a parameter-space correction step applied after each optimizer step (i.e., \(x \leftarrow x - r\)).
indices (EmpiricalRiskIndices) – Indices of the samples to use for local training.

Returns:

Updated parameters after local training.

Return type:

torch.Tensor

Raises:

RuntimeError – If no optimizer was provided during initialization.
ValueError – If regularization is a non-scalar tensor but does not have the same number of elements as the flattened parameter vector.
TypeError – If regularization is not a torch.Tensor or a callable returning a torch.Tensor.

_sample_batch_indices(indices: EmpiricalRiskIndices = 'batch') → list[int][source]#: Not used in PyTorchCost, implemented in _get_batch_data.

_get_batch_data(indices: EmpiricalRiskIndices = 'batch') → list[tuple[torch.Tensor, torch.Tensor]][source]#

Get a list of batch data for the specified indices, each list item contains a tuple of (batch_x, batch_y).

The max size of each batch is determined by self._max_batch_size.

Raises:: RuntimeError – If batch data could not be retrieved, which should not happen under normal circumstances

__add__(other: Cost) → Cost[source]#: Add another cost, preserving the empirical-risk abstraction for regularization.

class decent_bench.costs.QuadraticCost(A: Array, b: Array, c: float = 0)[source]#

Bases: Cost

Quadratic cost function.

\[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T \mathbf{Ax} + \mathbf{b}^T \mathbf{x} + c\]

__init__(A: Array, b: Array, c: float = 0)[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property m_smooth: float[source]#

The cost function’s smoothness constant.

\[\max_{i} \left| \lambda_i \right|\]

where \(\lambda_i\) are the eigenvalues of \(\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\).

For the general definition, see Cost.m_smooth.

property m_cvx: float[source]#

The cost function’s convexity constant.

\[\begin{split}\begin{array}{ll} \min_i \lambda_i, & \text{if } \min_i \lambda_i > 0, \\ 0, & \text{if } \min_i \lambda_i = 0, \\ \text{NaN}, & \text{if } \min_i \lambda_i < 0 \end{array}\end{split}\]

where \(\lambda_i\) are the eigenvalues of \(\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\).

For the general definition, see Cost.m_cvx.

function(x: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]#: Evaluate function at x.

\[\frac{1}{2} \mathbf{x}^T \mathbf{Ax} + \mathbf{b}^T \mathbf{x} + c\]

gradient(x: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]][source]#: Gradient at x.

\[\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\mathbf{x} + \mathbf{b}\]

hessian(x: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]][source]#: Hessian at x.

\[\frac{1}{2} (\mathbf{A}+\mathbf{A}^T)\]

proximal(x: ndarray[tuple[Any, ...], dtype[float64]], penalty: float) → ndarray[tuple[Any, ...], dtype[float64]][source]#

Proximal at x.

\[(\frac{\rho}{2} (\mathbf{A} + \mathbf{A}^T) + \mathbf{I})^{-1} (\mathbf{x} - \rho \mathbf{b})\]

where \(\rho > 0\) is the penalty.

This is a closed form solution, see Cost.proximal() for the general proximal definition.

__add__(other: Cost) → Cost[source]#: Add another cost function.

class decent_bench.costs.ScaledCost(cost: Cost, scalar: float)[source]#

Bases: Cost

Generic scalar wrapper for arbitrary costs.

ScaledCost is the fallback result of scalar arithmetic when no more specialized wrapper is available. It delegates evaluation, gradient, Hessian, and metadata to the wrapped cost, and preserves proximal support only for nonnegative scalars.

Instances keep references to the wrapped cost objects. No implicit copying is performed; use copy.deepcopy() explicitly if independent objects are required.

__init__(cost: Cost, scalar: float)[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

function(x: Array, *args: Any, **kwargs: Any) → float[source]#: Evaluate function at x.

gradient(x: Array, *args: Any, **kwargs: Any) → Array[source]#: Gradient at x.

hessian(x: Array, *args: Any, **kwargs: Any) → Array[source]#: Hessian at x.

proximal(x: Array, penalty: float, *args: Any, **kwargs: Any) → Array[source]#

Proximal at x.

The proximal operator is defined as:

\[\operatorname{prox}_{\rho f}(\mathbf{x}) = \arg\min_{\mathbf{y}} \left\{ f(\mathbf{y}) + \frac{1}{2\rho} \| \mathbf{y} - \mathbf{x} \|^2 \right\}\]

where \(\rho > 0\) is the penalty and \(f\) the cost function.

If the cost function’s proximal does not have a closed form solution, it can be solved iteratively using proximal_solver().

__add__(other: Cost) → Cost[source]#

Add another cost function to create a new one.

class decent_bench.costs.SumCost(costs: list[Cost])[source]#

Bases: Cost

Generic additive fallback for cost composition.

SumCost is returned when two costs can be added but no more specialized composite is available. It preserves the core Cost interface, but does not preserve regularizer-specific or empirical-risk-specific behavior.

Instances keep references to the wrapped cost objects. No implicit copying is performed; use copy.deepcopy() explicitly if independent objects are required.

__init__(costs: list[Cost])[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property m_smooth: float[source]#

The cost function’s smoothness constant.

\[\sum m_{\text{smooth}, k}\]

where \(m_{\text{smooth}, k}\) is the smoothness constant of each individual cost function \(f_k\). If any \(m_{\text{smooth}, k} = \text{NaN}\), the result is \(\text{NaN}\).

For the general definition, see Cost.m_smooth.

property m_cvx: float[source]#

The cost function’s convexity constant.

\[\sum m_{\text{cvx}, k}\]

where \(m_{\text{cvx}, k}\) is the convexity constant of each individual cost function \(f_k\). If any \(m_{\text{cvx}, k} = \text{NaN}\), the result is \(\text{NaN}\).

For the general definition, see Cost.m_cvx.

function(x: Array, *args: Any, **kwargs: Any) → float[source]#: Sum the Cost.function of each cost function.

gradient(x: Array, *args: Any, **kwargs: Any) → Array[source]#: Sum the Cost.gradient of each cost function.

hessian(x: Array, *args: Any, **kwargs: Any) → Array[source]#: Sum the Cost.hessian of each cost function.

proximal(x: Array, penalty: float, *args: Any, **kwargs: Any) → Array[source]#

Approximate the proximal of the full summed objective.

SumCost computes its proximal through decent_bench.centralized_algorithms.proximal_solver(), which solves the proximal subproblem for the full summed objective using accelerated gradient descent. Extra args and kwargs are ignored.

Raises:: NotImplementedError – If the accelerated-gradient backend assumptions are not satisfied.

__add__(other: Cost) → SumCost[source]#: Add another cost function.

class decent_bench.costs.ZeroCost(shape: tuple[int, ...], framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

Bases: Cost

A cost function that is identically zero.

This function is used as default for the server in FedNetwork.

__init__(shape: tuple[int, ...], framework: SupportedFrameworks = SupportedFrameworks.NUMPY, device: SupportedDevices = SupportedDevices.CPU)[source]#

property shape: tuple[int, ...]#: Required shape of x.

property framework: SupportedFrameworks#

The framework used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this framework.

property device: SupportedDevices#

The device used by this cost function.

Make sure that all decent_bench.utils.array.Array objects returned by this cost function’s methods use this device.

property m_smooth: float[source]#

Lipschitz constant of the cost function’s gradient.

The gradient’s Lipschitz constant m_smooth is the smallest value such that

\[\| \nabla f(\mathbf{x_1}) - \nabla f(\mathbf{x_2}) \| \leq m_{\text{smooth}} \cdot \|\mathbf{x_1} - \mathbf{x_2}\|\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

non-negative finite number if function is L-smooth
np.inf if function is differentiable everywhere but not L-smooth
np.nan if function is not differentiable everywhere

property m_cvx: float[source]#

Convexity constant of the cost function.

The convexity constant m_cvx is the largest value such that

\[f(\mathbf{x_1}) \geq f(\mathbf{x_2}) + \nabla f(\mathbf{x_2})^T (\mathbf{x_1} - \mathbf{x_2}) + \frac{m_{\text{cvx}}}{2} \|\mathbf{x_1} - \mathbf{x_2}\|^2\]

for all \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

Returns:

positive finite number if function is strongly convex
0 if function is convex but not strongly convex
np.nan if function is not guaranteed to be convex

function(x: Array, **kwargs: Any) → float[source]#: Evaluate function at x.

gradient(x: Array, **kwargs: Any) → Array[source]#: Gradient at x.

hessian(x: Array, **kwargs: Any) → Array[source]#: Hessian at x.

proximal(x: Array, penalty: float, **kwargs: Any) → Array[source]#

Return x unchanged.

Since ZeroCost is identically zero, its proximal operator is the identity map. The method still validates that penalty is positive and that x has the expected shape.

Raises:: ValueError – if penalty is not positive or x has the wrong shape.

__add__(other: Cost) → Cost[source]#

Add another cost function to create a new one.