Decent-DP documentation

The repository is the official implementation of the paper From Promise to Practice: Realizing High-performance Decentralized Training (arXiv version, OpenReview) accepted by ICLR 2025.

The package is an PyTorch extension that faciliates efficient multi-worker decentralized data parallel training which fits in certain algorithm schemas.

Quick Start

Installation

  • Install PyTorch (See PyTorch Installation Guide for platform-specific instructions)
    pip3 install torch torchvision torchaudio
    
  • Install Decent-DP
    pip3 install decent-dp
    

Basic Usage

Here is a pseudocode exmaple of how to use Decent-DP to train a model

import torch.distributed as dist
from decent_dp.ddp import DecentralizedDataParallel as DecentDP

# Initialize process group
dist.init_process_group(backend='nccl' if torch.cuda.is_available() else 'gloo', init_method='env://')

# Initialize model (move to device before wrapping with DecentDP)
model = ...
model = model.to(device)

# Wrap model with DecentDP
model = DecentDP(model,
                 # optimizer constructor function which takes List[Tuple[str, Tensor]] as input and returns an optimizer
                 # examples could be found in `decent_dp.optim` module
                 optim_fn=<optimizer constructor function>,
                 # lr scheduler constructor function which takes an optimizer as input and returns a lr scheduler.
                 # None if no lr scheduler is used
                 # examples could be found in `decent_dp.optim` module
                 lr_scheduler_fn=<lr scheduler constructor function>,
                 # topology of the network which is a string
                 # supported topologies are 'ring', 'exp', 'complete', 'alternating-exp-ring'
                 # see Section `Communication topology` for more details
                 topology=<topology>)

# Training loop
for epoch in range(num_epochs):
    model.train()
    for batch in data_loader:
        loss = model(batch)
        model.zero_grad()
        loss.backward()
        # no need for optimizer.step() as it is handled by DecentDP

    model.eval()
    for batch in val_data_loader:
        with torch.no_grad():
            loss = model(batch)

Citation

If you find this repository helpful, please consider citing the following paper:

@inproceedings{wang2024promise,
    title={From Promise to Practice: Realizing High-performance Decentralized Training},
    author={Zesen Wang, Jiaojiao Zhang, Xuyang Wu, and Mikael Johansson},
    booktitle={International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=lo3nlFHOft},
}