All the articles with the tag "interpretability".
We can train a 70B model and watch it work. We mostly cannot explain why it works. Interpretability is the science trying to fix that.