Tag: inference

All the articles with the tag "inference".

llm-concepts
21 May, 2026 8 min read

Running Local Models: What It Actually Takes

Quantization shrank the model down to 40 GB. Now what hardware, what software, and what setup actually run a 70B model at home in 2026?
llm-concepts
15 May, 2026 7 min read

Quantization: How a 70B Model Fits on Your Laptop

Quantization shrinks a 70B model from 140 GB to 20 GB with almost no quality loss. What it actually does, and why the trick works.
ai
5 Mar, 2026 8 min read

AI Inference and Scaling: From Training to Serving Billions

How trained AI models serve billions of requests through inference optimization, scaling infrastructure, and cost engineering.