Tag: neural-networks
All the articles with the tag "neural-networks".
-
llm-concepts8 min readMixture of Experts: Why 671B Does Not Equal 671B
A 671B Mixture of Experts model can be faster and cheaper to run than a dense 70B. The headline parameter count stopped meaning what it used to mean.
-
llm-concepts6 min readParameter Counts and Scaling Laws: What 70B Actually Means
What does 70B actually mean? It tells you about memory requirements, inference speed, and training costs, but almost nothing about model quality on its own.
-
ai6 min readThe Transformer: How Attention Solved the Problem Everything Else Could Not
In 2017, eight researchers replaced the entire approach to language modeling with a single idea: let every word attend to every other word directly.
-
ai7 min readBefore the Transformer: A Short History of Machines That Read
Why did the transformer matter so much that we measure AI in 'before' and 'after' it? A short history of every approach that tried and hit a wall first.
-
neural-networksUpdated: 11 min readNeural Networks: How AI Mimics the Brain
How artificial neurons combine into networks that recognize images, understand language, and generate text. From perceptrons to transformers.