Ideas

Why Raw Webpages Are Bad Context for AI Agents

July 3, 2026

Raw webpages are usually the wrong unit of context. They include navigation, repeated boilerplate, unrelated sections, tracking fragments, and more text than a model should need for a focused answer.

An agent that pastes whole pages into the prompt spends context on noise. It also makes citations harder because the model has to infer which parts of a page mattered.

TinySearch treats a page as material to process. Crawl it, convert it, split it into chunks, rank those chunks against the query, and return the smallest useful set with source URLs.

The result is not magic. It is a boring, useful compression step between the web and the model.