YESH
BLOG BOOKMARKS
Offline Blog Index All Bookmarks
<

August 10, 2025

3 links
>
  • Agentic Coding Ecosystem 2025 - YouTube
    📝 Notes
    ### How They Work & Why Performance Varies * While the basic concept is an LLM in a loop with tools (read/write files, execute commands), the details are crucial. * The best-performing agents use tools that the **foundation model was specifically trained on**. A mismatch between the agent's tools and the model's training (e.g., using a generic agent with a model like GPT-5) leads to worse performance. * Agent quality differs significantly even with the same underlying model. Key differentiators include: * **Safety checks**: Some agents use a second, faster LLM as a "judge" to prevent harmful commands. * **Error recovery**: Many agents get stuck in loops or fail to recover from errors, while more battle-tested ones are more robust. * **Tool implementation**: The way an agent executes code and handles processes can vary greatly in quality. ### The Challenge of Evaluation * **Cost is deceptive**. A model with a cheaper per-token price (like GPT-5) might be more expensive overall if the agent requires more tokens and interactions to solve a problem compared to a more efficient one (like Claude Code). * Similarly, faster inference speed doesn't matter if the model's output quality is lower. ### Models & Pricing * Open-weight models are improving but are not yet as reliable. **Self-hosting them is currently more expensive** and technically challenging than using commercial APIs. * **costs will go up** as VC subsidies wane and users tackle more complex problems that require more tokens.
  • Towards full self-hosting and IRL agentic capability evaluations of LLMs - YouTube
  • AGI is not coming! - YouTube