Tag: mechanistic-interpretability

All the articles with the tag "mechanistic-interpretability".

llm-concepts
4 May, 2026 7 min read

Interpretability: What's Actually Inside

We can train a 70B model and watch it work. We mostly cannot explain why it works. Interpretability is the science trying to fix that.