Infrastructure & Agents
What Is Data Parallelism?
In data parallelism each device holds a full copy of the model but processes a different slice of the training data. The gradients from each device are combined so all copies stay synchronized during training.
Further reading
Read more about data parallelism — articles and blogs from around the web: