DCP: Addressing Input Dynamism in Long-Context Training via Dynamic Context Parallelism Download Paper | Download Slides
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication Download Paper | Download Slides
dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training Download Paper