All the articles with the tag "mechanistic-interpretability".
We can train a 70B model and watch it work. We mostly cannot explain why it works. Interpretability is the science trying to fix that.