GAIA: The LLM Agent Benchmark Everyone’s Talking About

GAIA: The LLM Agent Benchmark Everyone’s Talking About
were making headlines last week. In Microsoft’s Build 2025, CEO Satya Nadella introduced the vision of an “open agentic web” ...
Read more

From Data to Stories: Code Agents for KPI Narratives

From Data to Stories: Code Agents for KPI Narratives
, we often need to investigate what’s going on with KPIs: whether we’re reacting to anomalies on our dashboards or ...
Read more

Code Agents: The Future of Agentic AI

Code Agents: The Future of Agentic AI
of AI agents. LLMs are no longer just tools. They’ve become active participants in our lives, boosting productivity and transforming ...
Read more

How to Evaluate LLMs and Algorithms — The Right Way

How to Evaluate LLMs and Algorithms — The Right Way
Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, ...
Read more

Agentic AI 102: Guardrails and Agent Evaluation

Agentic AI 102: Guardrails and Agent Evaluation
In the first post of this series (Agentic AI 101: Starting Your Journey Building AI Agents), we talked about the ...
Read more

Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer

Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer
AlphaEvolve imagined as a genetic algorithm coupled to a large language model. Picture created by the author using various tools ...
Read more

Effortless Spreadsheet Normalisation With LLM

Effortless Spreadsheet Normalisation With LLM
This article is part of a series of articles on automating Data Cleaning for any tabular dataset. You can test ...
Read more

Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs 

Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs 
Creating efficient prompts for large language models often starts as a simple task… but it doesn’t always stay that way. ...
Read more

Tutorial: Semantic Clustering of User Messages with LLM Prompts

Tutorial: Semantic Clustering of User Messages with LLM Prompts
As a Developer Advocate, it’s challenging to keep up with user forum messages and understand the big picture of what ...
Read more

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms
Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and ...
Read more