decent_bench.algorithms.federated#

class decent_bench.algorithms.federated.FedAlgorithm[source]#

Bases: Algorithm[FedNetwork]

Federated algorithm - clients collaborate via a central server.

selection_scheme: ClientSelectionScheme | None = None#

num_local_steps: LocalSteps = 1#

cleanup_agents(network: FedNetwork) → Iterable[Agent][source]#

Return the agents whose auxiliary variables should be cleared.

Parameters:: network – provides the agents and topology for this algorithm.

select_clients(network: FedNetwork, iteration: int) → list[Agent][source]#

Select clients for the current federated round.

The method selects the subset of active clients that will receive the server broadcast and perform local training. The clients are selected using self.selection_scheme.

If self.selection_scheme is None, all active clients are selected.

server_broadcast(network: FedNetwork, selected_clients: Sequence[Agent], channel: str = 'default') → None[source]#: Send the current server model to the selected clients under channel.

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate client model uploads at the server using uniform averaging.

This default federated aggregation assumes clients upload final local model states.

cleanup(network: NetworkT) → None#

Clean up the algorithm state by clearing auxiliary variables from agents.

This method is used to free up memory used by auxiliary variables that are not needed after training. Can be overridden to control what gets cleaned up.

Note

Override cleanup_agents() to control which agents are cleaned up.

Parameters:: network – provides the agents and topology for this algorithm.

abstractmethod initialize(network: NetworkT) → None#

Initialize the algorithm.

Parameters:: network – provides the agents and topology for this algorithm.

abstract property name: str#: Name of the algorithm.

run(network: NetworkT, start_iteration: int = 0, progress_callback: Callable[[int], None] | None = None) → None#

Run the algorithm.

This method first calls initialize(), then step() for the specified number of iterations. Optionally call cleanup() after run() to clear auxiliary variables and free up memory.

Parameters:

network – provides the agents and topology for this algorithm.
start_iteration – iteration number to start from, used when resuming from a checkpoint. If greater than 0, initialize() will be skipped.
progress_callback – optional callback to report progress after each iteration.

Raises:

ValueError – if start_iteration is not in [0, iterations]

Warning

Do not override this method. Instead, override initialize() and step() as needed.

Note

The algorithm saves the agents’ states every state_snapshot_period.

abstractmethod step(network: NetworkT, iteration: int) → None#

Perform one iteration of the algorithm.

Parameters:

network – provides the agents and topology for this algorithm.
iteration – current iteration number.

iterations: int#: Number of iterations to run the algorithm for.

class decent_bench.algorithms.federated.FedAdagrad(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, server_step_size: float = 0.001, beta_1: float = 0.9, epsilon: float = 1e-06, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedAdagrad')[source]#

Bases: FedOpt

FedAdagrad uses local SGD on clients and an Adagrad-style adaptive server update [1].

Each selected client starts from the broadcast global model \(\mathbf{x}_t\) and performs num_local_steps local SGD steps with client step size step_size.

\[\mathbf{x}_{i, t}^{(k+1)} = \mathbf{x}_{i, t}^{(k)} - \eta_l \nabla f_i(\mathbf{x}_{i, t}^{(k)}).\]

The final client model defines the uploaded delta

\[\delta_i^t = \mathbf{x}_{i, t}^{(K)} - \mathbf{x}_t.\]

The server aggregates client model deltas uniformly over the participating clients:

\[\Delta_t = \frac{1}{|S_t|} \sum_{i \in S_t} \delta_i^t.\]

FedAdagrad then updates its moment buffers and global model as

\[\mathbf{m}_t = \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \Delta_t\]

\[\mathbf{v}_t = \mathbf{v}_{t-1} + \Delta_t^2\]

\[\mathbf{x}_{t+1} = \mathbf{x}_t + \eta \frac{\mathbf{m}_t}{\sqrt{\mathbf{v}_t} + \epsilon}.\]

Here \(\eta_l\) is the client learning rate (the corresponding argument is step_size), \(K\) is the number of local SGD steps (the corresponding argument is num_local_steps), \(\eta\) is the server learning rate (the corresponding argument is server_step_size), \(\beta_1\) is the first-moment coefficient (the corresponding argument is beta_1), \(\epsilon\) is the numerical stability term (the corresponding argument is epsilon), and \(S_t\) is the set of clients whose uploads are actually received in round \(t\). Aggregation is always uniform across the received clients. Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs use their usual full-gradient updates.

name: str = 'FedAdagrad'#

class decent_bench.algorithms.federated.FedAdam(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, server_step_size: float = 0.001, beta_1: float = 0.9, epsilon: float = 1e-06, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedAdam', beta_2: float = 0.99)[source]#

Bases: FedOpt

FedAdam uses local SGD on clients and an Adam-style adaptive server update [1].

Each selected client starts from the broadcast global model \(\mathbf{x}_t\) and performs num_local_steps local SGD steps with client step size step_size.

\[\mathbf{x}_{i, t}^{(k+1)} = \mathbf{x}_{i, t}^{(k)} - \eta_l \nabla f_i(\mathbf{x}_{i, t}^{(k)}).\]

The final client model defines the uploaded delta

\[\delta_i^t = \mathbf{x}_{i, t}^{(K)} - \mathbf{x}_t.\]

The server aggregates client model deltas uniformly over the participating clients:

\[\Delta_t = \frac{1}{|S_t|} \sum_{i \in S_t} \delta_i^t.\]

FedAdam then updates its moment buffers and global model as

\[\mathbf{m}_t = \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \Delta_t\]

\[\mathbf{v}_t = \beta_2 \mathbf{v}_{t-1} + (1 - \beta_2) \Delta_t^2\]

\[\mathbf{x}_{t+1} = \mathbf{x}_t + \eta \frac{\mathbf{m}_t}{\sqrt{\mathbf{v}_t} + \epsilon}.\]

Here \(\eta_l\) is the client learning rate (the corresponding argument is step_size), \(K\) is the number of local SGD steps (the corresponding argument is num_local_steps), \(\eta\) is the server learning rate (the corresponding argument is server_step_size), \(\beta_1\) and \(\beta_2\) are the first- and second-moment coefficients (the corresponding arguments are beta_1 and beta_2), \(\epsilon\) is the numerical stability term (the corresponding argument is epsilon), and \(S_t\) is the set of clients whose uploads are actually received in round \(t\). Aggregation is always uniform across the received clients. Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs use their usual full-gradient updates.

beta_2: float = 0.99#

name: str = 'FedAdam'#

class decent_bench.algorithms.federated.FedAvg(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedAvg')[source]#

Bases: FedAlgorithm

Federated Averaging (FedAvg) with local SGD epochs [2].

\[\mathbf{x}_{i, k}^{(t+1)} = \mathbf{x}_{i, k}^{(t)} - \eta \nabla f_i(\mathbf{x}_{i, k}^{(t)})\]

\[\mathbf{x}_{k+1} = \frac{1}{|S_k|} \sum_{i \in S_k} \mathbf{x}_{i, k}^{(E)}\]

where \(t\) indexes the local training epochs on each client and \(E\) is the number of local epochs per round (the corresponding argument is num_local_steps), \(\eta\) is the step size (the corresponding argument is step_size), and \(S_k\) is the set of participating clients at round \(k\). In FedAvg, each selected client performs num_local_steps local SGD epochs, then the server aggregates the final local models to form \(\mathbf{x}_{k+1}\) using uniform averaging over the participating clients. Client selection (subsampling) defaults to uniform sampling with fraction 1.0 (all active clients) and can be customized via selection_scheme. Costs that preserve the EmpiricalRiskCost abstraction use client-side mini-batches of size EmpiricalRiskCost.batch_size; generic cost wrappers fall back to full-gradient local updates.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: int = 1#

x0: InitialStates = None#

name: str = 'FedAvg'#

class decent_bench.algorithms.federated.FedDyn(iterations: int = 100, step_size: float = 0.001, penalty: float = 0.01, num_local_steps: int = 1, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedDyn')[source]#

Bases: FedAlgorithm

Federated Dynamic Regularization (FedDyn) with local gradient steps [3].

FedDyn keeps one dynamic state \(g_i\) per client and one server auxiliary vector \(h\). In each communication round, the paper writes the selected-client update as the exact minimizer of the dynamic-regularized local objective

\[f_i(\theta) - \langle g_i, \theta \rangle + \frac{\alpha}{2}\|\theta - \theta^t\|^2\]

In practice, and following the local SGD device update used in the paper’s experiments, this implementation approximates that minimization by running num_local_steps gradient steps from the received server model, with local gradient

\[\nabla f_i(\theta) - g_i + \alpha(\theta - \theta^t).\]

After local training, each participating client updates its dynamic state as

\[g_i^+ = g_i - \alpha(\theta_i^+ - \theta^t).\]

The server aggregates only the selected client models it actually receives. If \(R_t\) is the received subset, \(m\) is the total number of clients, and \(\theta^t\) is the server model before aggregation, then

\[h^+ = h - \frac{\alpha}{m}\sum_{i \in R_t}(\theta_i^+ - \theta^t), \qquad \theta^+ = \frac{1}{|R_t|}\sum_{i \in R_t}\theta_i^+ - \frac{1}{\alpha}h^+.\]

Here \(\alpha\) is the dynamic regularization coefficient (the corresponding argument is penalty), and the local step size is the scalar used in local SGD (the corresponding argument is step_size).

If no selected client model is received, the server model and h remain unchanged. Unselected clients and selected clients that miss the server broadcast keep their previous local model and dynamic state. Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs keep their usual full-gradient behavior. Client selection defaults to uniform sampling with fraction 1.0.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

penalty: float = 0.01#

num_local_steps: int = 1#

x0: InitialStates = None#

name: str = 'FedDyn'#

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate received FedDyn client models and apply the server dynamic correction.

Only client models received in the current round are aggregated.

class decent_bench.algorithms.federated.FedLT(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, penalty: float = 1.0, local_solver: str = 'gd', solver_args: dict = <factory>, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, z0: InitialStates = None, name: str = 'FedLT')[source]#

Bases: FedAlgorithm

Federated Local Training (Fed-LT) with cost-driven local gradients [4][5].

Fed-LT maintains one auxiliary variable \(z_i\) per client. This \(z_i\) state is maintained during execution and is not a constructor argument. At the start of round \(k\), the server computes the broadcast variable

\[y_{k+1} = \operatorname{prox}_{\rho h / N}\left(\frac{1}{N}\sum_{i=1}^N z_{i,k}\right)\]

where \(N\) is the number of clients, and \(\rho\) is the regularization strength (the corresponding argument is penalty). The server sends \(y_{k+1}\) to the selected clients. The initial auxiliary states z0 follow the same InitialStates convention as x0. If z0 is None, Fed-LT initializes \(z_{i,0}=x_{i,0}\) for every client.

In this implementation, each client cost \(f_i\) is treated as the full local objective already available on that client. The global regularizer \(h\) is represented by the server cost’s proximal operator; with the default ZeroCost server, this is the documented \(h=0\) case and the server step is plain averaging.

Local solvers. A selected client \(i\) sets \(w^0_{i,k}=x_{i,k}\) and \(v_{i,k}=2y_{k+1}-z_{i,k}\), then uses local_solver to approximately minimize the regularized local objective

\[f_i(w) + \frac{1}{2\rho}\|w - v_{i,k}\|^2.\]

The local gradient of this subproblem is

\[\nabla f_i(w^\ell_{i,k}) + \frac{1}{\rho}(w^\ell_{i,k} - v_{i,k}).\]

Costs preserving the EmpiricalRiskCost abstraction use its default mini-batch sampling, so gradient-based local solvers use mini-batches. Generic Cost objects use their normal full-gradient behavior. Solver-specific hyperparameters are passed through solver_args.

Gradient descent. The default local_solver="gd" uses step_size as the local step size and expects empty solver_args:

\[w^{\ell+1}_{i,k} = w^\ell_{i,k} - \gamma\left(\nabla f_i(w^\ell_{i,k}) + \frac{1}{\rho}(w^\ell_{i,k} - v_{i,k})\right).\]

Here \(\gamma\) is the local step size (the corresponding argument is step_size).

Nesterov. The local_solver="nesterov" option applies a Nesterov-style update to the same local gradient. It initializes \(u_i^0=w^0_{i,k}\) and uses step_size as the local step size. Its solver_args may contain "momentum"; the default is 0.9:

\[u_i^{\ell+1} = w^\ell_{i,k} - \gamma\left(\nabla f_i(w^\ell_{i,k}) + \frac{1}{\rho}(w^\ell_{i,k} - v_{i,k})\right)\]

\[w^{\ell+1}_{i,k} = u_i^{\ell+1} + \beta\left(u_i^{\ell+1} - u_i^\ell\right).\]

Here \(\beta\) is the Nesterov momentum coefficient (the corresponding argument is solver_args["momentum"]).

One possible centralized strongly-convex choice is \(\beta=(\sqrt{L_i + 1/\rho} - \sqrt{\mu_i + 1/\rho}) / (\sqrt{L_i + 1/\rho} + \sqrt{\mu_i + 1/\rho})\), with local step size \(1/(L_i + 1/\rho)\), where m_smooth supplies \(L_i\) and m_cvx supplies \(\mu_i\).

Adam. The local_solver="adam" option applies Adam to the same local gradient. Adam moments are reset at the start of every local solve because Fed-LT locally trains the current round’s subproblem rather than maintaining a persistent optimizer state across rounds. Its solver_args may contain "beta1", "beta2", and "epsilon"; the defaults are 0.9, 0.999, and 1e-8:

\[g^\ell_{i,k} = \nabla f_i(w^{\ell-1}_{i,k}) + \frac{1}{\rho}(w^{\ell-1}_{i,k} - v_{i,k})\]

\[m^\ell_{i,k} = \beta_1 m^{\ell-1}_{i,k} + (1-\beta_1)g^\ell_{i,k}, \qquad s^\ell_{i,k} = \beta_2 s^{\ell-1}_{i,k} + (1-\beta_2)(g^\ell_{i,k})^2\]

\[w^\ell_{i,k} = w^{\ell-1}_{i,k} - \gamma\frac{\hat m^\ell_{i,k}}{\sqrt{\hat s^\ell_{i,k}} + \epsilon},\]

for \(\ell=1,\ldots,N_e\), with \(m^0_{i,k}=s^0_{i,k}=0\). Here \(N_e\) is the local step count (the corresponding argument is num_local_steps), \(\beta_1\) and \(\beta_2\) are the Adam moment coefficients (the corresponding arguments are solver_args["beta1"] and solver_args["beta2"]), and \(\epsilon\) is the Adam numerical stability term (the corresponding argument is solver_args["epsilon"]). The terms \(\hat m^\ell_{i,k}\) and \(\hat s^\ell_{i,k}\) are the Adam bias-corrected moments.

After local training, the client sets

\[x_{i,k+1}=w^{N_e}_{i,k}, \qquad z_{i,k+1}=z_{i,k}+2(x_{i,k+1}-y_{k+1}),\]

and uploads \(z_{i,k+1}\) to the server. Inactive clients keep \(x_{i,k+1}=x_{i,k}\) and \(z_{i,k+1}=z_{i,k}\). For later server averages, the server stores a received fresh \(z_{i,k+1}\) when the upload arrives and otherwise keeps its previous stored \(z_i\).

Fed-PLT is the privacy-noise version of Fed-LT [4].

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: int = 1#

penalty: float = 1.0#

local_solver: str = 'gd'#

solver_args: dict#

x0: InitialStates = None#

z0: InitialStates = None#

name: str = 'FedLT'#

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Store received Fed-LT z uploads for future server averages.

Clients whose uploads are not received keep their previous server-side z value, matching the stale-value aggregation in partial participation and lossy communication settings.

class decent_bench.algorithms.federated.FedNova(iterations: int = 100, step_size: float = 0.001, num_local_steps: LocalSteps = 1, use_momentum: bool = False, momentum: float = 0.9, use_prox: bool = False, penalty: float = 0.01, use_server_momentum: bool = False, server_momentum: float = 0.9, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedNova')[source]#

Bases: FedAlgorithm

FedNova with optional local momentum, proximal correction, and server momentum [6].

Each selected client starts from the broadcast global model \(\mathbf{x}_t = \mathbf{x}^{(t,0)}\) and performs \(\tau_i\) local steps with client step size step_size. At local step \(k\), client \(i\) computes the gradient

\[\mathbf{g}_{i, t}^{(k)} = \nabla F_i(\mathbf{x}_{i, t}^{(k)}) + \mu \left(\mathbf{x}_{i, t}^{(k)} - \mathbf{x}_t\right),\]

where the proximal term is present only when use_prox=True. If local momentum is enabled, the momentum buffer and local direction update as

\[\mathbf{v}_{i, t}^{(k+1)} = \beta \mathbf{v}_{i, t}^{(k)} + \mathbf{g}_{i, t}^{(k)}, \qquad \mathbf{d}_{i, t}^{(k)} = \mathbf{v}_{i, t}^{(k+1)},\]

otherwise \(\mathbf{d}_{i, t}^{(k)} = \mathbf{g}_{i, t}^{(k)}\). The local model update is

\[\mathbf{x}_{i, t}^{(k+1)} = \mathbf{x}_{i, t}^{(k)} - \eta_l \mathbf{d}_{i, t}^{(k)}.\]

Client \(i\) accumulates the local update

\[\mathbf{c}_i^t = \sum_{k=0}^{\tau_i - 1} \eta_l \mathbf{d}_{i, t}^{(k)},\]

and maintains the FedNova scalar recurrences

\[s_i^{(k+1)} = \beta s_i^{(k)} + 1\]

when use_momentum=True, and \(s_i^{(k+1)} = 1\) otherwise, together with

\[a_i^{(k+1)} = (1 - \eta_l \mu) a_i^{(k)} + s_i^{(k+1)}\]

when use_prox=True and \(a_i^{(k+1)} = a_i^{(k)} + s_i^{(k+1)}\) otherwise. During initialize, the server resolves and stores each client’s sample count \(n_i\). After \(\tau_i\) local steps, client \(i\) first uploads the FedNova coefficient \(a_i\) and then uploads the cumulative local update \(\mathbf{c}_i^t\) in a second transmission. For the clients in the current round whose two uploads are both actually received, the server forms the data-proportional client weight

\[p_i = \frac{n_i}{\sum_{j \in S_t} n_j}.\]

The server forms the weighted effective local-step coefficient

\[\tau_{\mathrm{eff}, t} = \bar{a}_t = \sum_{i \in S_t} p_i a_i,\]

and the normalized FedNova aggregate

\[\mathbf{G}_t = \sum_{i \in S_t} p_i \frac{\tau_{\mathrm{eff}, t}}{a_i} \mathbf{c}_i^t.\]

Without server momentum, the server update is

\[\mathbf{x}_{t+1} = \mathbf{x}_t - \mathbf{G}_t.\]

With use_server_momentum=True, the server momentum buffer and model update become

\[\mathbf{m}_{t+1} = \gamma \mathbf{m}_t + \mathbf{G}_t, \qquad \mathbf{x}_{t+1} = \mathbf{x}_t - \mathbf{m}_{t+1}.\]

When use_momentum=False, use_prox=False, and use_server_momentum=False, this reduces to the plain local-SGD FedNova variant. In that plain setting, FedNova reduces to FedAvg if and only if all participating clients use the same number of local steps (\(\tau_i = \tau_j\) for all \(i, j \in S_t\)) and FedNova and FedAvg both use data-proportional aggregation weights.

Here \(\tau_i\) is the number of local SGD steps used by client \(i\) (the corresponding argument is num_local_steps), \(\eta_l\) is the local step size (the corresponding argument is step_size), \(\mu\) is the proximal coefficient (the corresponding argument is penalty), \(\beta\) is the local momentum coefficient (the corresponding argument is momentum), and \(\gamma\) is the server momentum coefficient (the corresponding argument is server_momentum).

In this implementation, \(n_i\) is inferred once during initialize from each client’s local cost via infer_client_data_size(), then stored on the server for later rounds. If no first-phase a_i uploads are received in a round under network impairments, the server skips that round without error.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: LocalSteps = 1#

use_momentum: bool = False#

momentum: float = 0.9#

use_prox: bool = False#

penalty: float = 0.01#

use_server_momentum: bool = False#

server_momentum: float = 0.9#

x0: InitialStates = None#

name: str = 'FedNova'#

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate FedNova client uploads following the Local-SGD FedNova pseudocode.

This method assumes the current round has already cached the received FedNova coefficients a_i in server.aux_vars["received_a_i"] and that cumulative local updates \(\mathbf{c}_i^t\) are read from the server inbox under the cumulative-gradient channel. Client sample counts are looked up from the mapping stored on the server during initialize. Only clients with both uploads available in the current round are aggregated; if none are available, this method returns without updating the server model.

Raises:: ValueError – if any received FedNova coefficient a_i is non-positive.

class decent_bench.algorithms.federated.FedOpt(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, server_step_size: float = 0.001, beta_1: float = 0.9, epsilon: float = 1e-06, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedOpt')[source]#

Bases: FedAlgorithm, ABC

Shared FedOpt template with client local SGD and server adaptive optimization [1].

Each selected client starts from the broadcast global model \(\mathbf{x}_t\) and performs num_local_steps local SGD steps with client step size step_size:

\[\mathbf{x}_{i, t}^{(k+1)} = \mathbf{x}_{i, t}^{(k)} - \eta_l \nabla f_i(\mathbf{x}_{i, t}^{(k)}).\]

After \(K\) local steps, client \(i\) forms the model delta

\[\delta_i^t = \mathbf{x}_{i, t}^{(K)} - \mathbf{x}_t\]

and uploads \(\delta_i^t\) to the server. The server averages these client deltas uniformly:

\[\Delta_t = \frac{1}{|S_t|} \sum_{i \in S_t} \delta_i^t.\]

The server then applies the shared FedOpt first-moment and model updates

\[\mathbf{m}_t = \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \Delta_t\]

\[\mathbf{v}_t = \Phi(\mathbf{v}_{t-1}, \Delta_t)\]

\[\mathbf{x}_{t+1} = \mathbf{x}_t + \eta \frac{\mathbf{m}_t}{\sqrt{\mathbf{v}_t} + \epsilon}.\]

Here \(\eta_l\) is the client learning rate (the corresponding argument is step_size), \(K\) is the number of local SGD steps (the corresponding argument is num_local_steps), \(\eta\) is the server learning rate (the corresponding argument is server_step_size), \(\beta_1\) is the first-moment coefficient (the corresponding argument is beta_1), \(\epsilon\) is the numerical stability term (the corresponding argument is epsilon), and \(S_t\) is the set of clients whose uploads are actually received in round \(t\). The second-moment update \(\Phi\) is variant-specific and is defined by subclasses. Aggregation is always uniform across the received clients. Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs use their usual full-gradient updates.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: int = 1#

server_step_size: float = 0.001#

beta_1: float = 0.9#

epsilon: float = 1e-06#

x0: InitialStates = None#

name: str = 'FedOpt'#

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate client model deltas uniformly, then apply the server adaptive optimizer.

This method assumes clients upload final local model deltas.

class decent_bench.algorithms.federated.FedPD(iterations: int = 100, step_size: float = 0.001, penalty: float = 1.0, skip_probability: float = 0.0, num_local_steps: LocalSteps = 1, x0: InitialStates = None, name: str = 'FedPD')[source]#

Bases: FedAlgorithm

Federated Primal-Dual (FedPD) with local gradient steps [7].

FedPD rewrites federated learning as a global consensus problem with client primal variables \(\mathbf{x}_i\), local centre variables \(\mathbf{x}_{0,i}\), and dual variables \(\lambda_i\). In each round, all active clients run num_local_steps gradient steps on the local augmented Lagrangian

\[f_i(\mathbf{x}_i) + \langle \lambda_i, \mathbf{x}_i - \mathbf{x}_{0,i} \rangle + \frac{1}{2 \eta}\|\mathbf{x}_i - \mathbf{x}_{0,i}\|^2.\]

The local gradient is

\[\nabla f_i(\mathbf{x}_i) + \lambda_i + \frac{1}{\eta}(\mathbf{x}_i - \mathbf{x}_{0,i}).\]

After the local gradient steps, clients update their dual variables and local centre candidates:

\[\lambda_i^+ = \lambda_i + \frac{1}{\eta}(\mathbf{x}_i^+ - \mathbf{x}_{0,i}), \qquad \mathbf{x}_{0,i}^+ = \mathbf{x}_i^+ + \eta \lambda_i^+.\]

Here \(\eta\) is the quadratic-penalty and dual-update scale (the corresponding argument is penalty).

With probability 1 - skip_probability, clients upload their local centre candidates and the server uniformly averages the candidates it actually receives. If at least one candidate is received, the server centre is then sent back to all active clients; clients that do not receive the server’s synchronized centre keep their local centre candidate. With probability skip_probability, aggregation is skipped and every participating client keeps its local centre candidate.

Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs keep their usual full-gradient behavior. num_local_steps can be homogeneous (single integer) or heterogeneous via a mapping keyed by client agent. Partial client participation is not supported.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

penalty: float = 1.0#

skip_probability: float = 0.0#

num_local_steps: LocalSteps = 1#

x0: InitialStates = None#

name: str = 'FedPD'#

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate received FedPD centre candidates and broadcast the synchronized centre.

Unlike most federated algorithms, a FedPD communication round couples aggregation with centre synchronization: after the server averages the received centre candidates, it immediately broadcasts the updated centre back to all participating clients.

If no centre candidates are received, the server state is left unchanged and no synchronization is sent.

class decent_bench.algorithms.federated.FedProx(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, penalty: float = 0.01, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedProx')[source]#

Bases: FedAlgorithm

Federated Proximal (FedProx) with local SGD epochs [8].

Each client solves a proximalized local subproblem around the round’s server model:

\[h_k(\mathbf{w}; \mathbf{w}^t) = F_k(\mathbf{w}) + \frac{\mu}{2} \|\mathbf{w} - \mathbf{w}^t\|^2\]

\[\mathbf{x}_{i, k}^{(t+1)} = \mathbf{x}_{i, k}^{(t)} - \eta \nabla h_k(\mathbf{x}_{i, k}^{(t)}; \mathbf{w}^t)\]

where \(\nabla h_k(\mathbf{w}; \mathbf{w}^t) = \nabla F_k(\mathbf{w}) + \mu (\mathbf{w} - \mathbf{w}^t)\).

\[\mathbf{x}_{k+1} = \frac{1}{|S_k|} \sum_{i \in S_k} \mathbf{x}_{i, k}^{(E)}\]

where \(\mathbf{w}^t\) is the server model broadcast at the start of round \(k\), held fixed throughout each selected client’s local epochs, \(\mu \geq 0\) is the proximal coefficient (the corresponding argument is penalty), \(\eta\) is the step size (the corresponding argument is step_size), and \(S_k\) is the set of participating clients. Setting penalty=0.0 recovers FedAvg exactly. Aggregation uses uniform averaging over the participating clients. Client selection defaults to uniform sampling with fraction 1.0. For EmpiricalRiskCost, local updates use mini-batches of size EmpiricalRiskCost.batch_size; for generic costs, local updates use full-batch gradients.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: int = 1#

penalty: float = 0.01#

x0: InitialStates = None#

name: str = 'FedProx'#

class decent_bench.algorithms.federated.FedYogi(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, server_step_size: float = 0.001, beta_1: float = 0.9, epsilon: float = 1e-06, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, name: str = 'FedYogi', beta_2: float = 0.99)[source]#

Bases: FedOpt

FedYogi uses local SGD on clients and a Yogi-style adaptive server update [1].

Each selected client starts from the broadcast global model \(\mathbf{x}_t\) and performs num_local_steps local SGD steps with client step size step_size.

\[\mathbf{x}_{i, t}^{(k+1)} = \mathbf{x}_{i, t}^{(k)} - \eta_l \nabla f_i(\mathbf{x}_{i, t}^{(k)}).\]

The final client model defines the uploaded delta

\[\delta_i^t = \mathbf{x}_{i, t}^{(K)} - \mathbf{x}_t.\]

The server aggregates client model deltas uniformly over the participating clients:

\[\Delta_t = \frac{1}{|S_t|} \sum_{i \in S_t} \delta_i^t.\]

FedYogi then updates its moment buffers and global model as

\[\mathbf{m}_t = \beta_1 \mathbf{m}_{t-1} + (1 - \beta_1) \Delta_t\]

\[\mathbf{v}_t = \mathbf{v}_{t-1} - (1 - \beta_2) \Delta_t^2 \operatorname{sign}(\mathbf{v}_{t-1} - \Delta_t^2)\]

\[\mathbf{x}_{t+1} = \mathbf{x}_t + \eta \frac{\mathbf{m}_t}{\sqrt{\mathbf{v}_t} + \epsilon}.\]

Here \(\eta_l\) is the client learning rate (the corresponding argument is step_size), \(K\) is the number of local SGD steps (the corresponding argument is num_local_steps), \(\eta\) is the server learning rate (the corresponding argument is server_step_size), \(\beta_1\) and \(\beta_2\) are the first- and second-moment coefficients (the corresponding arguments are beta_1 and beta_2), \(\epsilon\) is the numerical stability term (the corresponding argument is epsilon), and \(S_t\) is the set of clients whose uploads are actually received in round \(t\). Aggregation is always uniform across the received clients. Costs that preserve the EmpiricalRiskCost abstraction use mini-batch local updates; generic costs use their usual full-gradient updates.

beta_2: float = 0.99#

name: str = 'FedYogi'#

class decent_bench.algorithms.federated.Scaffold(iterations: int = 100, step_size: float = 0.001, num_local_steps: int = 1, server_step_size: float = 1.0, selection_scheme: ClientSelectionScheme | None = <factory>, x0: InitialStates = None, c0: InitialStates = None, name: str = 'Scaffold')[source]#

Bases: FedAlgorithm

SCAFFOLD with client/server control variates for variance-reduced local training [9].

When the server control variate \(\mathbf{c}\) and all client control variates \(\mathbf{c}_i\) are zero, the local updates reduce to FedAvg.

Here, \(\eta_l\) is the local client step size (the corresponding argument is step_size), \(\eta_g\) is the global server step size (the corresponding argument is server_step_size), \(S\) is the set of selected clients, and \(|S|\) its size.

Selected clients perform local steps with the correction

\[\mathbf{y}_{i}^{(t+1)} = \mathbf{y}_{i}^{(t)} - \eta_l \left(\nabla f_i(\mathbf{y}_{i}^{(t)}) - \mathbf{c}_i + \mathbf{c}\right)\]

and update their control variates using the practical SCAFFOLD rule, where \(K\) is the number of local steps (the corresponding argument is num_local_steps) and \(\mathbf{y}_i\) is the final local model after those \(K\) steps:

\[\mathbf{c}_i^+ = \mathbf{c}_i - \mathbf{c} + \frac{1}{K \eta_l} (\mathbf{x} - \mathbf{y}_i).\]

The server aggregates the model and control-variate deltas as

\[\Delta \mathbf{x} = \frac{1}{|S|} \sum_{i \in S} (\mathbf{y}_i - \mathbf{x})\]

\[\Delta \mathbf{c} = \frac{1}{|S|} \sum_{i \in S} (\mathbf{c}_i^+ - \mathbf{c}_i)\]

and then applies

\[\mathbf{x} \leftarrow \mathbf{x} + \eta_g \Delta \mathbf{x}, \qquad \mathbf{c} \leftarrow \mathbf{c} + \frac{|S|}{N} \Delta \mathbf{c},\]

where both aggregated deltas are averaged uniformly over the selected clients.

Note

x0 and c0 follow the InitialStates convention and are resolved per agent during initialize via initial_states(). For federated networks, c0 dictionaries can include both client control variates and the server control variate. If the server entry is missing, it is inferred as the average of the client control variates.

iterations: int = 100#: Number of iterations to run the algorithm for.

step_size: float = 0.001#

num_local_steps: int = 1#

server_step_size: float = 1.0#

x0: InitialStates = None#

c0: InitialStates = None#

name: str = 'Scaffold'#

server_broadcast(network: FedNetwork, selected_clients: Sequence[Agent], channel: str = 'default') → None[source]#: Send the current server model and control variate under channel.

aggregate(network: FedNetwork, participating_clients: Sequence[Agent]) → None[source]#

Aggregate received SCAFFOLD client deltas using uniform averaging.

Only current-round client deltas received by the server are aggregated.