Meta today announced that it has selected Microsoft Azure as its strategic cloud provider for AI research and development. As part of this partnership, Meta AI group will expand its use of Azure’s supercomputing capabilities including a dedicated Azure cluster of 5400 GPUs using the latest VM series in Azure (NDm A100 v4 series, featuring NVIDIA A100 Tensor Core 80GB GPUs) for some of their large-scale AI research workloads.
Azure enables faster distributed AI, thanks to the capability to support four times the GPU-to-GPU bandwidth between VMs compared to other public cloud offerings. Meta used Azure to train its recent OPT-175B language model.
The Meta AI team is expanding their usage and bringing more cutting-edge ML training workloads to Azure to help further advance their leading AI research.
Meta and Microsoft will also work together to increase the PyTorch adoption on Azure. In the coming months, Microsoft will build new PyTorch development accelerators to enable rapid implementation of PyTorch-based solutions on Azure. On a related note, Microsoft already offers enterprise-grade support to deploy PyTorch models in production on both cloud and edge.
“We are excited to deepen our collaboration with Azure to advance Meta’s AI research, innovation and open-source efforts in a way that benefits more developers around the world,” Jerome Pesenti, VP of AI, Meta.