Using DeepSpeed and Megatron to Train Megatron-Turing NLG ...
3 Originally published in Microsoft Research Blog, Oct 11, 2021. We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model …