Microsoft DeepSpeed Chat

Yesterday, Microsoft announced the release of DeepSpeed-Chat, a low-cost, open-source solution for RLHF training that will allow anyone to create high-quality ChatGPT-style models even with a single GPU. Microsoft claims that you can train up to a 13B model on a single GPU, or at low-cost of $300 on Azure Cloud using DeepSpeed-Chat.

You can train a 13B ChatGPT like model in 1.25 hours and a massive OPT-175B model in a day on 64-GPUs.

DeepSpeed doesn’t have any limits on no.of parameters. It can support parameters ranging in size from a few to hundreds of billions.

DeepSpeed-Chat RLHF training experience is made possible using DeepSpeed-Inference and DeepSpeed-Training to offer 15x faster throughput than SoTA, while also supporting model sizes that are up to 7.5x larger on the same hardware. DeepSpeed-Chat makes complex RLHF training fast, affordable, and easily accessible to the AI community. It democratizes ChatGPT-like models!

The initial release of DeepSpeed-Chat includes the following three capabilities:

Easy-to-use Training and Inference Experience for ChatGPT Like Models: A single script capable of taking a pre-trained Huggingface model, running it through all three steps of InstructGPT training using DeepSpeed-RLHF system and producing your very own ChatGPT like model. In addition, it provides an inference API for testing conversation-style interactions after the model is trained.

DeepSpeed-RLHF Pipeline: DeepSpeed-RLHF pipeline primarily replicates the training pipeline from the InstructGPT paper with careful attention to ensure completeness and one-to-one correspondence with the three-steps that includes a) Supervised Fine-tuning (SFT), b) Reward Model Fine-tuning and c) Reinforcement Learning with Human Feedback (RLHF). Additionally, it offers data abstraction and blending capabilities to enable training with multiple data sources.

(iii) DeepSpeed-RLHF System: A robust and sophisticated RLHF system that combines the training and inference prowess of DeepSpeed into single unified Hybrid Engine (DeepSpeed-HE) for RLHF. The Hybrid-Engine is capable of seamlessly transitioning between inference and training modes within RLHF, allowing it to leverage various optimizations from DeepSpeed-Inference such as tensor-parallelism and high-performance transformer kernels for generation, while also benefiting from the multitude of ZeRO- and LoRA-based memory optimization strategies for RL training. DeepSpeed-HE is also aware of the full RLHF pipeline, allowing it to make optimal decisions in terms of memory management and data movement across different phases of RLHF.

To get started, visit the Github page for DeepSpeed-Chat.