Infrastructure & Agents

What Is Data Parallelism?

In data parallelism each device holds a full copy of the model but processes a different slice of the training data. The gradients from each device are combined so all copies stay synchronized during training.

Further reading

Read more about data parallelism — articles and blogs from around the web: