GAIA: The LLM Agent Benchmark Everyone’s Talking About

were making headlines last week. In Microsoft’s Build 2025, CEO Satya Nadella introduced the vision of an “open agentic web” ...
Read more
From Data to Stories: Code Agents for KPI Narratives

, we often need to investigate what’s going on with KPIs: whether we’re reacting to anomalies on our dashboards or ...
Read more
Code Agents: The Future of Agentic AI

of AI agents. LLMs are no longer just tools. They’ve become active participants in our lives, boosting productivity and transforming ...
Read more
How to Evaluate LLMs and Algorithms — The Right Way

Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, ...
Read more
Agentic AI 102: Guardrails and Agent Evaluation

In the first post of this series (Agentic AI 101: Starting Your Journey Building AI Agents), we talked about the ...
Read more
Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer

AlphaEvolve imagined as a genetic algorithm coupled to a large language model. Picture created by the author using various tools ...
Read more
Effortless Spreadsheet Normalisation With LLM

This article is part of a series of articles on automating Data Cleaning for any tabular dataset. You can test ...
Read more
Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs

Creating efficient prompts for large language models often starts as a simple task… but it doesn’t always stay that way. ...
Read more
Tutorial: Semantic Clustering of User Messages with LLM Prompts

As a Developer Advocate, it’s challenging to keep up with user forum messages and understand the big picture of what ...
Read more
I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and ...
Read more