Clean up the algorithm state by clearing auxiliary variables from agents.
This method is used to free up memory used by auxiliary variables that are not needed after training.
Can be overridden to control what gets cleaned up.
Note
Override cleanup_agents() to control which
agents are cleaned up.
Parameters:
network – provides the agents and topology for this algorithm.
This method first calls initialize(), then step() for the specified number of iterations.
Optionally call cleanup() after run() to clear auxiliary variables
and free up memory.
Parameters:
network – provides the agents and topology for this algorithm.
start_iteration – iteration number to start from, used when resuming from a checkpoint. If greater than 0,
initialize() will be skipped.
progress_callback – optional callback to report progress after each iteration.
Raises:
ValueError – if start_iteration is not in [0, iterations]
Warning
Do not override this method. Instead, override initialize() and step() as needed.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\operatorname{prox}\) is the proximal operator described in Cost.proximal(),
\(\rho > 0\) is the Lagrangian penalty parameter (the corresponding argument is penalty),
\(N_i\) is the number of neighbors of i,
\(f_i\) is i’s local cost function,
j is a neighbor of i,
and \(\alpha \in (0, 1)\) is the relaxation parameter (the corresponding argument is relaxation).
Note
x0 and z0 follow the InitialStates convention and are resolved
per agent during initialize via
initial_states().
If x0 is None and z0 is provided, each agent initializes x0 from z0 with one proximal
update:
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
j is a neighbor of i or i itself,
\(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j,
\(\rho\) is the step size (the corresponding argument is step_size),
and \(f_i\) is agent i’s local cost function.
where
\(\mathbf{x}_{i, k} \in \mathbb{R}^n\) is agent i’s local optimization variable at iteration k,
\(\mathbf{y}_{i,k}, \ \mathbf{s}_{i,k} \in \mathbb{R}^n\)
and \(\mathbf{z}_{ij,k} \in \mathbb{R}^{2n}\) are auxiliary variables,
\(N_i\) is the number of neighbors of i,
\(f_i\) is i’s local cost function,
j is a neighbor of i. The parameters are: the penalty \(\rho > 0\) (the corresponding argument is
penalty), the relaxation \(\alpha \in (0, 1)\) (the corresponding argument is relaxation), the
step-size \(\delta > 0\) (the corresponding argument is delta), and the mixing parameter
\(\gamma > 0\) (the corresponding argument is gamma). Notice that the convergence of the algorithm is
guaranteed provided that \(\delta, \ \gamma\) are below certain thresholds.
The idea of the algorithm is to apply distributed ADMM to perform gradient tracking,
instead of the usual average consensus.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
j is a neighbor of i or i itself,
\(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j,
\(\rho\) is the step size (the corresponding argument is step_size),
and \(f_i\) is agent i’s local cost function.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\mathbf{z}_{i,k}\) is the local dual variable,
\(f_i\) is i’s local cost function,
j is a neighbor of i. The parameters are: the penalty \(\rho > 0\) and the step-size \(\mu > 0\).
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size (the corresponding argument is step_size),
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size (the corresponding argument is step_size),
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
\(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j,
and \(\tilde{\mathbf{W}} = (\mathbf{I} + \mathbf{W}) / 2\).
Local Training ADMM with Variance Reduction (LT-ADMM-VR) [12].
Extends LT-ADMM with variance reduction techniques for improved convergence.
This variant implements additional gradient variance reduction mechanisms
during the local training phase.
Parameters:
iterations – Total number of communication rounds (K)
num_local_steps – Number of local training steps (tau)
step_size – Local step size (gamma)
aux_step_size – Local step size (beta)
penalty – Penalty parameter (rho)
alpha – Relaxation parameter (alpha)
x0 – Initial parameters (optional)
v2 – Whether to use the LT-ADMM-VR-2 variant with improved
variance reduction techniques which is less computational heavy (default True).
name – Algorithm name (default “LT-ADMM-VR”)
Raises:
TypeError – If any agent’s cost function is not an instance of EmpiricalRiskCost.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size,
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\tilde{\mathbf{W}} = (\mathbf{I} + \mathbf{W}) / 2\)
with \(\mathbf{W}\) are the Metropolis weights.
This is a simplified version of the algorithm proposed in [13] (see eq. (9) therein).
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size,
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size,
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j.
Each iteration, each agent approximately optimizes an augmented Lagrangian function which
is then communicated to its neighbors in order to update the dual variables. This is
then repeated for a number of iterations.
Parameters:
iterations – Total number of outer iterations (K)
step_size – Step size for primal updates
num_local_steps – Number of inner iterations (B) for approximate primal update
penalty – Penalty parameter (rho) for augmented Lagrangian
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size,
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\mathbf{W}_{ij}\) is the metropolis weight between agent i and j.
where
\(\mathbf{x}_{i, k}\) is agent i’s local optimization variable at iteration k,
\(\rho\) is the step size,
\(f_i\) is agent i’s local cost function,
j is a neighbor of i or i itself,
and \(\mathbf{K}_{ij}\) is the weight between agent i and j.
The matrix \(\mathbf{K}\) is chosen as \(0.5 (\mathbf{I} - \mathbf{W})\),
where \(\mathbf{W}\) is the Metropolis weight matrix.