All the articles with the tag "kv-cache".
Context windows are not memory. They are working memory. Here is what the model can see right now, why extending that limit is hard, and what it costs to try.