All the articles with the tag "moe".
A 671B Mixture of Experts model can be faster and cheaper to run than a dense 70B. The headline parameter count stopped meaning what it used to mean.