From Data to Stories: Code Agents for KPI Narratives

, we often need to investigate what’s going on with KPIs: whether we’re reacting to anomalies on our dashboards or just routinely doing a numbers update. Based on my years of experience as a KPI analyst, I would estimate that more than 80% of these tasks are fairly standard and can be solved just by following a simple checklist.

Here’s a high-level plan for investigating a KPI change (you can find more details in the article “Anomaly Root Cause Analysis 101”):

Estimate the top-line change in the metric to understand the magnitude of the shift.
Check data quality to ensure that the numbers are accurate and reliable.
Gather context about internal and external events that might have influenced the change.
Slice and dice the metric to identify which segments are contributing to the metric’s shift.
Consolidate your findings in an executive summary that includes hypotheses and estimates of their impacts on the main KPI.

Since we have a clear plan to execute, such tasks can potentially be automated using AI agents. The code agents we recently discussed could be a good fit there, as their ability to write and execute code will help them to analyse data efficiently, with minimal back-and-forth. So, let’s try building such an agent using the HuggingFace smolagents framework.

While working on our task, we will discuss more advanced features of the smolagents framework:

Techniques for tweaking all kinds of prompts to ensure the desired behaviour.
Building a multi-agent system that can explain the Kpi changes and link them to root causes.
Adding reflection to the flow with supplementary planning steps.

MVP for explaining KPI changes

As usual, we will take an iterative approach and start with a simple MVP, focusing on the slicing and dicing step of the analysis. We will analyse the changes of a simple metric (revenue) split by one dimension (country). We will use the dataset from my previous article, “Making sense of KPI changes”.

Let’s load the data first.

raw_df = pd.read_csv('absolute_metrics_example.csv', sep = '\t')
df = raw_df.groupby('country')[['revenue_before', 'revenue_after_scenario_2']].sum()\
  .sort_values('revenue_before', ascending = False).rename(
    columns = {'revenue_after_scenario_2': 'after', 
      'revenue_before': 'before'})

Next, let’s initialise the model. I’ve chosen the OpenAI GPT-4o-mini as my preferred option for simple tasks. However, the smolagents framework supports all kinds of models, so you can use the model you prefer. Then, we just need to create an agent and give it the task and the dataset.

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(model_id="openai/gpt-4o-mini", 
  api_key=config['OPENAI_API_KEY']) 

agent = CodeAgent(
    model=model, tools=[], max_steps=10,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", 
      "plotly.*"], verbosity_level=1 
)

task = """
Here is a dataframe showing revenue by segment, comparing values 
before and after.
Could you please help me understand the changes? Specifically:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total 
change in revenue.

Please round all floating-point numbers in the output 
to two decimal places.
"""

agent.run(
    task,
    additional_args={"data": df},
)

The agent returned quite a plausible result. We got detailed statistics on the metric changes in each segment and their impact on the top-line KPI.

{'total_before': 1731985.21, 'total_after': 
1599065.55, 'total_change': -132919.66, 'segment_changes': 
{'absolute_change': {'other': 4233.09, 'UK': -4376.25, 'France': 
-132847.57, 'Germany': -690.99, 'Italy': 979.15, 'Spain': 
-217.09}, 'percentage_change': {'other': 0.67, 'UK': -0.91, 
'France': -55.19, 'Germany': -0.43, 'Italy': 0.81, 'Spain': 
-0.23}, 'contribution_to_change': {'other': -3.18, 'UK': 3.29, 
'France': 99.95, 'Germany': 0.52, 'Italy': -0.74, 'Spain': 0.16}}}

Let’s take a look at the code generated by the agent. It is fine, but there’s one potential issue. The Llm recreated the dataframe based on the input data instead of referencing it directly. This approach is not ideal (especially when working with massive datasets), as it can lead to errors and higher token usage.

import pandas as pd                                                                                                        
 
# Creating the DataFrame from the provided data                 
data = {                                                        
    'before': [632767.39, 481409.27, 240704.63, 160469.75,      
120352.31, 96281.86],                                           
    'after': [637000.48, 477033.02, 107857.06, 159778.76,       
121331.46, 96064.77]                                            
}                                                               
index = ['other', 'UK', 'France', 'Germany', 'Italy', 'Spain']  
df = pd.DataFrame(data, index=index)                            
                                                                
# Calculating total revenue before and after                    
total_before = df['before'].sum()                               
total_after = df['after'].sum()                                 
                                                                
# Calculating absolute and percentage change for each segment   
df['absolute_change'] = df['after'] - df['before']              
df['percentage_change'] = (df['absolute_change'] /              
df['before']) * 100                                             
                                                                
# Calculating total revenue change                              
total_change = total_after - total_before                       
                                                                
# Calculating contribution of each segment to the total change  
df['contribution_to_change'] = (df['absolute_change'] /         
total_change) * 100                                             
                                                                
# Rounding results                                              
df = df.round(2)                                                
                                                                
# Printing the calculated results                               
print("Total revenue before:", total_before)                    
print("Total revenue after:", total_after)                      
print("Total change in revenue:", total_change)                 
print(df)

It’s worth fixing this problem before moving on to building a more complex system.

Tweaking prompts

Since the LLM is just following the instructions given to it, we will address this issue by tweaking the prompt.

Initially, I attempted to make the task prompt more explicit, clearly instructing the LLM to use the provided variable.

task = """Here is a dataframe showing revenue by segment, comparing 
values before and after. The data is stored in df variable. 
Please, use it and don't try to parse the data yourself. 

Could you please help me understand the changes?
Specifically:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total change in revenue.

Please round all floating-point numbers in the output to two decimal places.
"""

It didn’t work. So, the next step is to examine the system prompt and see why it works this way.

print(agent.prompt_templates['system_prompt'])

#... 
# Here are the rules you should always follow to solve your task:
# 1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```' sequence, else you will fail.
# 2. Use only variables that you have defined.
# 3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
# 4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
# 5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
# 6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
# 7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
# 8. You can use imports in your code, but only from the following list of modules: ['collections', 'datetime', 'itertools', 'math', 'numpy', 'pandas', 'queue', 'random', 're', 'stat', 'statistics', 'time', 'unicodedata']
# 9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
# 10. Don't give up! You're in charge of solving the task, not providing directions to solve it.
# Now Begin!

At the end of the prompt, we have the instruction "# 2. Use only variables that you have defined!". This might be interpreted as a strict rule not to use any other variables. So, I changed it to "# 2. Use only variables that you have defined or ones provided in additional arguments! Never try to copy and parse additional arguments."

modified_system_prompt = agent.prompt_templates['system_prompt']\
    .replace(
        '2. Use only variables that you have defined!', 
        '2. Use only variables that you have defined or ones provided in additional arguments! Never try to copy and parse additional arguments.'
    )
agent.prompt_templates['system_prompt'] = modified_system_prompt

This change alone didn’t help either. Then, I examined the task message.

╭─────────────────────────── New run ────────────────────────────╮
│                                                                │
│ Here is a pandas dataframe showing revenue by segment,         │
│ comparing values before and after.                             │
│ Could you please help me understand the changes?               │
│ Specifically:                                                  │
│ 1. Estimate how the total revenue and the revenue for each     │
│ segment have changed, both in absolute terms and as a          │
│ percentage.                                                    │
│ 2. Calculate the contribution of each segment to the total     │
│ change in revenue.                                             │
│                                                                │
│ Please round all floating-point numbers in the output to two   │
│ decimal places.                                                │
│                                                                │
│ You have been provided with these additional arguments, that   │
│ you can access using the keys as variables in your python      │
│ code:                                                          │
│ {'df':             before      after                           │
│ country                                                        │
│ other    632767.39  637000.48                                  │
│ UK       481409.27  477033.02                                  │
│ France   240704.63  107857.06                                  │
│ Germany  160469.75  159778.76                                  │
│ Italy    120352.31  121331.46                                  │
│ Spain     96281.86   96064.77}.                                │
│                                                                │
╰─ LiteLLMModel - openai/gpt-4o-mini ────────────────────────────╯

It has an instruction related the the usage of additional arguments "You have been provided with these additional arguments, that you can access using the keys as variables in your python code". We can try to make it more specific and clear. Unfortunately, this parameter is not exposed externally, so I had to locate it in the source code. To find the path of a Python package, we can use the following code.

import smolagents 
print(smolagents.__path__)

Then, I found the agents.py file and modified this line to include a more specific instruction.

self.task += f"""
You have been provided with these additional arguments available as variables 
with names {",".join(additional_args.keys())}. You can access them directly. 
Here is what they contain (just for informational purposes): 
{str(additional_args)}."""

It was a bit of hacking, but that is sometimes what happens with the LLM frameworks. Don’t forget to reload the package afterwards, and we’re good to go. Let’s test whether it works now.

task = """
Here is a pandas dataframe showing revenue by segment, comparing values 
before and after. 

Your task will be understand the changes to the revenue (after vs before) 
in different segments and provide executive summary.
Please, follow the following steps:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total change 
in revenue.

Round all floating-point numbers in the output to two decimal places. 
"""
agent.logger.level = 1 # Lower verbosity level
agent.run(
    task,
    additional_args={"df": df},
)

Hooray! The problem has been fixed. The agent no longer copies the input variables and references df variable directly instead. Here’s the newly generated code.

import pandas as pd                                             
                                                                  
# Calculate total revenue before and after                      
total_before = df['before'].sum()                               
total_after = df['after'].sum()                                 
total_change = total_after - total_before                       
percentage_change_total = (total_change / total_before * 100)   
if total_before != 0 else 0                                     
                                                                
# Round values                                                  
total_before = round(total_before, 2)                           
total_after = round(total_after, 2)                             
total_change = round(total_change, 2)                           
percentage_change_total = round(percentage_change_total, 2)     
                                                                
# Display results                                               
print(f"Total Revenue Before: {total_before}")                  
print(f"Total Revenue After: {total_after}")                    
print(f"Total Change: {total_change}")                          
print(f"Percentage Change: {percentage_change_total}%")

Now, we’re ready to move on to building the actual agent that will solve our task.

AI agent for KPI narratives

Finally, it’s time to work on the AI agent that will help us explain KPI changes and create an executive summary.

Our agent will follow this plan for the root cause analysis:

Estimate the top-line KPI change.
Slice and dice the metric to understand which segments are driving the shift.
Look up events in the change log to see whether they can explain the metric changes.
Consolidate all the findings in the comprehensive executive summary.

After a lot of experimentation and several tweaks, I’ve arrived at a promising result. Here are the key adjustments I made (we will discuss them in detail later):

I leveraged the multi-agent setup by adding another team member — the change log Agent, who can access the change log and assist in explaining KPI changes.
I experimented with more powerful models like gpt-4o and gpt-4.1-mini since gpt-4o-mini wasn’t sufficient. Using stronger models not only improved the results, but also significantly reduced the number of steps: with gpt-4.1-miniI got the final result after just six steps, compared to 14–16 steps with gpt-4o-mini. This suggests that investing in more expensive models might be worthwhile for agentic workflows.
I provided the agent with the complex tool to analyse KPI changes for simple metrics. The tool performs all the calculations, while LLM can just interpret the results. I discussed the approach to KPI changes analysis in detail in my previous article.
I reformulated the prompt into a very clear step-by-step guide to help the agent stay on track.
I added planning steps that encourage the LLM agent to think through its approach first and revisit the plan every three iterations.

After all the adjustments, I got the following summary from the agent, which is pretty good.

Executive Summary:
Between April 2025 and May 2025, total revenue declined sharply by
approximately 36.03%, falling from 1,731,985.21 to 1,107,924.43, a
drop of -624,060.78 in absolute terms.
This decline was primarily driven by significant revenue 
reductions in the 'new' customer segments across multiple 
countries, with declines of approximately 70% in these segments.

The most impacted segments include:
- other_new: before=233,958.42, after=72,666.89, 
abs_change=-161,291.53, rel_change=-68.94%, share_before=13.51%, 
impact=25.85, impact_norm=1.91
- UK_new: before=128,324.22, after=34,838.87, 
abs_change=-93,485.35, rel_change=-72.85%, share_before=7.41%, 
impact=14.98, impact_norm=2.02
- France_new: before=57,901.91, after=17,443.06, 
abs_change=-40,458.85, rel_change=-69.87%, share_before=3.34%, 
impact=6.48, impact_norm=1.94
- Germany_new: before=48,105.83, after=13,678.94, 
abs_change=-34,426.89, rel_change=-71.56%, share_before=2.78%, 
impact=5.52, impact_norm=1.99
- Italy_new: before=36,941.57, after=11,615.29, 
abs_change=-25,326.28, rel_change=-68.56%, share_before=2.13%, 
impact=4.06, impact_norm=1.91
- Spain_new: before=32,394.10, after=7,758.90, 
abs_change=-24,635.20, rel_change=-76.05%, share_before=1.87%, 
impact=3.95, impact_norm=2.11

Based on analysis from the change log, the main causes for this 
trend are:
1. The introduction of new onboarding controls implemented on May 
8, 2025, which reduced new customer acquisition by about 70% to 
prevent fraud.
2. A postal service strike in the UK starting April 5, 2025, 
causing order delivery delays and increased cancellations 
impacting the UK new segment.
3. An increase in VAT by 2% in Spain as of April 22, 2025, 
affecting new customer pricing and causing higher cart 
abandonment.

These factors combined explain the outsized negative impacts 
observed in new customer segments and the overall revenue decline.

The LLM agent also generated a bunch of illustrative charts (they were part of our growth explaining tool). For example, this one shows the impacts across the combination of country and maturity.

The results look really exciting. Now let’s dive deeper into the actual implementation to understand how it works under the hood.

Multi-AI agent setup

We will start with our change log agent. This agent will query the change log and try to identify potential root causes for the metric changes we observe. Since this agent doesn’t need to do complex operations, we implement it as a ToolCallingAgent. Because this agent will be called by another agent, we need to define its name and description attributes.

@tool 
def get_change_log(month: str) -> str: 
    """
    Returns the change log (list of internal and external events that might have affected our KPIs) for the given month 

    Args:
        month: month in the format %Y-%m-01, for example, 2025-04-01
    """
    return events_df[events_df.month == month].drop('month', axis = 1).to_dict('records')

model = LiteLLMModel(model_id="openai/gpt-4.1-mini", api_key=config['OPENAI_API_KEY'])
change_log_agent = ToolCallingAgent(
    tools=[get_change_log],
    model=model,
    max_steps=10,
    name="change_log_agent",
    description="Helps you find the relevant information in the change log that can explain changes on metrics. Provide the agent with all the context to receive info",
)

Since the manager agent will be calling this agent, we won’t have any control over the query it receives. Therefore, I decided to modify the system prompt to include additional context.

change_log_system_prompt = '''
You're a master of the change log and you help others to explain 
the changes to metrics. When you receive a request, look up the list of events 
happened by month, then filter the relevant information based 
on provided context and return back. Prioritise the most probable factors 
affecting the KPI and limit your answer only to them.
'''

modified_system_prompt = change_log_agent.prompt_templates['system_prompt'] \
  + '\n\n\n' + change_log_system_prompt

change_log_agent.prompt_templates['system_prompt'] = modified_system_prompt

To enable the primary agent to delegate tasks to the change log agent, we simply need to specify it in the managed_agents field.

agent = CodeAgent(
    model=model,
    tools=[calculate_simple_growth_metrics],
    max_steps=20,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", "plotly.*"],
    verbosity_level = 2, 
    planning_interval = 3,
    managed_agents = [change_log_agent]
)

Let’s see how it works. First, we can look at the new system prompt for the primary agent. It now includes information about team members and instructions on how to ask them for help.

You can also give tasks to team members.
Calling a team member works the same as for calling a tool: simply, 
the only argument you can give in the call is 'task'.
Given that this team member is a real human, you should be very verbose 
in your task, it should be a long string providing informations 
as detailed as necessary.
Here is a list of the team members that you can call:
```python
def change_log_agent("Your query goes here.") -> str:
    """Helps you find the relevant information in the change log that 
    can explain changes on metrics. Provide the agent with all the context 
    to receive info"""
```

The execution log shows that the primary agent successfully delegated the task to the second agent and received the following response.

<-- Primary agent calling the change log agent -->

─ Executing parsed code: ─────────────────────────────────────── 
  # Query change_log_agent with the detailed task description     
  prepared                                                        
  context_for_change_log = (                                      
      "We analyzed changes in revenue from April 2025 to May      
  2025. We found large decreases "                                
      "mainly in the 'new' maturity segments across countries:    
  Spain_new, UK_new, Germany_new, France_new, Italy_new, and      
  other_new. "                                                    
      "The revenue fell by around 70% in these segments, which    
  have outsized negative impact on total revenue change. "        
      "We want to know the 1-3 most probable causes for this      
  significant drop in revenue in the 'new' customer segments      
  during this period."                                            
  )                                                               
                                                                  
  explanation = change_log_agent(task=context_for_change_log)     
  print("Change log agent explanation:")                          
  print(explanation)                                              
 ──────────────────────────────────────────────────────────────── 

<-- Change log agent execution start -->
╭──────────────────── New run - change_log_agent ─────────────────────╮
│                                                                     │
│ You're a helpful agent named 'change_log_agent'.                    │
│ You have been submitted this task by your manager.                  │
│ ---                                                                 │
│ Task:                                                               │
│ We analyzed changes in revenue from April 2025 to May 2025.         │
│ We found large decreases mainly in the 'new' maturity segments      │
│ across countries: Spain_new, UK_new, Germany_new, France_new,       │
│ Italy_new, and other_new. The revenue fell by around 70% in these   │
│ segments, which have outsized negative impact on total revenue      │
│ change. We want to know the 1-3 most probable causes for this       │
│ significant drop in revenue in the 'new' customer segments during   │
│ this period.                                                        │
│ ---                                                                 │
│ You're helping your manager solve a wider task: so make sure to     │
│ not provide a one-line answer, but give as much information as      │
│ possible to give them a clear understanding of the answer.          │
│                                                                     │
│ Your final_answer WILL HAVE to contain these parts:                 │
│ ### 1. Task outcome (short version):                                │
│ ### 2. Task outcome (extremely detailed version):                   │
│ ### 3. Additional context (if relevant):                            │
│                                                                     │
│ Put all these in your final_answer tool, everything that you do     │
│ not pass as an argument to final_answer will be lost.               │
│ And even if your task resolution is not successful, please return   │
│ as much context as possible, so that your manager can act upon      │
│ this feedback.                                                      │
│                                                                     │
╰─ LiteLLMModel - openai/gpt-4.1-mini ────────────────────────────────╯

Using the smolagents framework, we can easily set up a simple multi-agent system, where a manager agent coordinates and delegates tasks to team members with specific skills.

Iterating on the prompt

We’ve started with a very high-level prompt outlining the goal and a vague direction, but unfortunately, it didn’t work consistently. LLMs are not smart enough yet to figure out the approach on their own. So, I created a detailed step-by-step prompt describing the whole plan and including the detailed specifications of the growth narrative tool we’re using.

task = """
Here is a pandas dataframe showing the revenue by segment, comparing values 
before (April 2025) and after (May 2025). 

You're a senior and experienced data analyst. Your task will be to understand 
the changes to the revenue (after vs before) in different segments 
and provide executive summary.

## Follow the plan:
1. Start by udentifying the list of dimensions (columns in dataframe that 
are not "before" and "after")
2. There might be multiple dimensions in the dataframe. Start high-level 
by looking at each dimension in isolation, combine all results 
together into the list of segments analysed (don't forget to save 
the dimension used for each segment). 
Use the provided tools to analyse the changes of metrics: {tools_description}. 
3. Analyse the results from previous step and keep only segments 
that have outsized impact on the KPI change (absolute of impact_norm 
is above 1.25). 
4. Check what dimensions are present in the list of significant segment, 
if there are multiple ones - execute the tool on their combinations 
and add to the analysed segments. If after adding an additional dimension, 
all subsegments show close different_rate and impact_norm values, 
then we can exclude this split (even though impact_norm is above 1.25), 
since it doesn't explain anything. 
5. Summarise the significant changes you identified. 
6. Try to explain what is going on with metrics by getting info 
from the change_log_agent. Please, provide the agent the full context 
(what segments have outsized impact, what is the relative change and 
what is the period we're looking at). 
Summarise the information from the changelog and mention 
only 1-3 the most probable causes of the KPI change 
(starting from the most impactful one).
7. Put together 3-5 sentences commentary what happened high-level 
and why (based on the info received from the change log). 
Then follow it up with more detailed summary: 
- Top-line total value of metric before and after in human-readable format, 
absolute and relative change 
- List of segments that meaningfully influenced the metric positively 
or negatively with the following numbers: values before and after, 
absoltue and relative change, share of segment before, impact 
and normed impact. Order the segments by absolute value 
of absolute change since it represents the power of impact. 

## Instruction on the calculate_simple_growth_metrics tool:
By default, you should use the tool for the whole dataset not the segment, 
since it will give you the full information about the changes.

Here is the guidance how to interpret the output of the tool
- difference - the absolute difference between after and before values
- difference_rate - the relative difference (if it's close for 
  all segments then the dimension is not informative)
- impact - the share of KPI differnce explained by this segment 
- segment_share_before - share of segment before
- impact_norm - impact normed on the share of segments, we're interested 
  in very high or very low numbers since they show outsized impact, 
  rule of thumb - impact_norm between -1.25 and 1.25 is not-informative 

If you're using the tool on the subset of dataframe keep in mind, 
that the results won't be aplicable to the full dataset, so avoid using it 
unless you want to explicitly look at subset (i.e. change in France). 
If you decided to use the tool on a particular segment 
and share these results in the executive summary, explicitly outline 
that we're diving deeper into a particular segment.
""".format(tools_description = tools_description)
agent.run(
    task,
    additional_args={"df": df},
)

Explaining everything in such detail was quite a daunting task, but it’s necessary if we want consistent results.

Planning steps

The smolagents framework lets you add planning steps to your agentic flow. This encourages the agent to start with a plan and update it after the specified number of steps. From my experience, this reflection is very helpful for maintaining focus on the problem and adjusting actions to stay aligned with the initial plan and goal. I definitely recommend using it in cases when complex reasoning is required.

Setting it up is as easy as specifying planning_interval = 3 for the code agent.

agent = CodeAgent(
    model=model,
    tools=[calculate_simple_growth_metrics],
    max_steps=20,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", "plotly.*"],
    verbosity_level = 2, 
    planning_interval = 3,
    managed_agents = [change_log_agent]
)

That’s it. Then, the agent provides reflections starting with thinking about the initial plan.

────────────────────────── Initial plan ──────────────────────────
Here are the facts I know and the plan of action that I will 
follow to solve the task:
```
## 1. Facts survey

### 1.1. Facts given in the task
- We have a pandas dataframe `df` showing revenue by segment, for 
two time points: before (April 2025) and after (May 2025).
- The dataframe columns include:
  - Dimensions: `country`, `maturity`, `country_maturity`, 
`country_maturity_combined`
  - Metrics: `before` (revenue in April 2025), `after` (revenue in
May 2025)
- The task is to understand the changes in revenue (after vs 
before) across different segments.
- Key instructions and tools provided:
  - Identify all dimensions except before/after for segmentation.
  - Analyze each dimension independently using 
`calculate_simple_growth_metrics`.
  - Filter segments with outsized impact on KPI change (absolute 
normed impact > 1.25).
  - Examine combinations of dimensions if multiple dimensions have
significant segments.
  - Summarize significant changes and engage `change_log_agent` 
for contextual causes.
  - Provide a final executive summary including top-line changes 
and segment-level detailed impacts.
- Dataset snippet shows segments combining countries (`France`, 
`UK`, `Germany`, `Italy`, `Spain`, `other`) and maturity status 
(`new`, `existing`).
- The combined segments are uniquely identified in columns 
`country_maturity` and `country_maturity_combined`.

### 1.2. Facts to look up
- Definitions or descriptions of the segments if unclear (e.g., 
what defines `new` vs `existing` maturity).
  - Likely not mandatory to proceed, but could be requested from 
business documentation or change log.
- More details on the change log (accessible via 
`change_log_agent`) that could provide probable causes for revenue
changes.
- Confirmation on handling combined dimension splits - how exactly
`country_maturity_combined` is formed and should be interpreted in
combined dimension analysis.
- Data dictionary or description of metrics if any additional KPI 
besides revenue is relevant (unlikely given data).
- Dates confirm period of analysis: April 2025 (before) and May 
2025 (after). No need to look these up since given.

### 1.3. Facts to derive
- Identify all dimension columns available for segmentation:
  - By excluding 'before' and 'after', likely candidates are 
`country`, `maturity`, `country_maturity`, and 
`country_maturity_combined`.
- For each dimension, calculate change metrics using the given 
tool:
  - Absolute and relative difference in revenue per segment.
  - Impact, segment share before, and normed impact for each 
segment.
- Identify which segments have outsized impact on KPI change 
(|impact_norm| > 1.25).
- If multiple dimensions have significant segments, combine 
dimensions (e.g., country + maturity) and reanalyze.
- Determine if combined dimension splits provide meaningful 
differentiation or not, based on delta rate and impact_norm 
consistency.
- Summarize direction and magnitude of KPI changes at top-line 
level (aggregate revenue before and after).
- Identify top segments driving positive and negative changes 
based on ordered absolute absolute_change.
- Gather contextual insights from the change log agent regarding 
probable causes tied to significant segments and the May 2025 vs 
April 2025 period.

## 2. Plan

1. Identify all dimension columns present in the dataframe by 
listing columns and excluding 'before' and 'after'.
2. For each dimension identified (`country`, `maturity`, 
`country_maturity`, `country_maturity_combined`):
   - Use `calculate_simple_growth_metrics` on the full dataframe 
grouped by that dimension.
   - Extract segments with calculated metrics including 
impact_norm.
3. Aggregate results from all single-dimension analyses and filter
segments where |impact_norm| > 1.25.
4. Determine which dimensions these significant segments belong 
to.
5. If more than one dimension is represented in these significant 
segments, analyze the combined dimension formed by those 
dimensions (for example, combination of `country` and `maturity` 
or use existing combined dimension columns).
6. Repeat metric calculation using 
`calculate_simple_growth_metrics` on the combined dimension.
7. Examine if the combined dimension splits create meaningful 
differentiation - if all subsegments show close difference_rate 
and impact_norm, exclude the split.
8. Prepare a summary of significant changes:
   - Top-line KPIs before and after (absolute and relative 
changes).
   - List of impactful segments sorted by absolute absolute_change
that influenced overall revenue.
9. Provide the list of segments with details (values before, 
after, absolute and relative change, share before, impact, 
impact_norm).
10. Using this summarized information, query `change_log_agent` 
with full context:
    - Include significant segments, their relative changes, and 
periods (April to May 2025).
11. Process the agent's response to identify 1-3 main probable 
causes of the KPI changes.
12. Draft executive summary commentary:
    - High-level overview of what happened and why, based on log 
info.
    - Detailed summary including top-line changes and 
segment-level metrics impact.
13. Deliver the final answer using `final_answer` tool containing 
the above executive summary and data-driven insights.

Then, after each three steps, the agent revisits and updates the plan.

────────────────────────── Updated plan ──────────────────────────
I still need to solve the task I was given:
```

Here is a pandas dataframe showing the revenue by segment, 
comparing values before (April 2025) and after (May 2025). 

You're a senior and experienced data analyst. Your task will be 
understand the changes to the revenue (after vs before) in 
different segments 
and provide executive summary.

<... repeating the full initial task ...>
```

Here are the facts I know and my new/updated plan of action to 
solve the task:
```
## 1. Updated facts survey

### 1.1. Facts given in the task
- We have a pandas dataframe with revenue by segment, showing 
values "before" (April 2025) and "after" (May 2025).
- Columns in the dataframe include multiple dimensions and the 
"before" and "after" revenue values.
- The goal is to understand revenue changes by segment and provide
an executive summary.
- Guidance and rules about how to analyze and interpret results 
from the `calculate_simple_growth_metrics` tool are provided.
- The dataframe contains columns: country, maturity, 
country_maturity, country_maturity_combined, before, after.

### 1.2. Facts that we have learned
- The dimensions to analyze are: country, maturity, 
country_maturity, and country_maturity_combined.
- Analyzed revenue changes by dimension.
- Only the "new" maturity segment has significant impact 
(impact_norm=1.96 > 1.25), with a large negative revenue change (~
-70.6%).
- In the combined segment "country_maturity," the "new" segments 
across countries (Spain_new, UK_new, Germany_new, France_new, 
Italy_new, other_new) all have outsized negative impacts with 
impact_norm values all above 1.9.
- The mature/existing segments in those countries have smaller 
normed impacts below 1.25.
- Country-level and maturity-level segment dimension alone are 
less revealing than the combined country+maturity segment 
dimension which highlights the new segments as strongly impactful.
- Total revenue dropped substantially from before to after, mostly
driven by new segments shrinking drastically.

### 1.3. Facts still to look up
- Whether splitting the data by additional dimensions beyond 
country and maturity (e.g., country_maturity_combined) explains 
further heterogeneous impacts or if the pattern is uniform.
- Explanation/context from change log about what caused the major 
drop predominantly in new segments in all countries.
- Confirming whether any country within the new segment behaved 
differently or mitigated losses.

### 1.4. Facts still to derive
- A concise executive summary describing the top-level revenue 
change and identifying which segments explain the declines.
- Explanation involving the change log agent with summary of 
probable reasons for these outsized reductions in revenue in the 
new segments across countries for April-May 2025.

## 2. Plan

### 2.1. Verify if adding the additional dimension 
'country_maturity_combined' splits the impactful "new" segments 
into subsegments with significantly different impacts or if the 
change rates and normed impacts are relatively homogeneous. If 
homogeneous, we do not gain deeper insight and should disregard 
further splitting.

### 2.2. Summarize all significant segments identified with 
outsized impact_norm ≥ 1.25, including their before and after 
values, absolute and relative changes, segment shares before, 
impact, and normalized impact, ordered by absolute value of the 
change.

### 2.3. Query the change_log_agent with the full context: 
significant segments are the new country_maturity segments with 
large negative changes (~ -70%), timeframe April 2025 to May 2025,
and request top 1-3 most probable causes for the KPI revenue drop 
in these segments.

### 2.4. Based on the change log agent's response, synthesize a 
3-5 sentence high-level commentary explaining what happened 
broadly and why.

### 2.5. Draft a detailed executive summary including:
- Total revenue before and after in human-readable format with 
absolute and relative change.
- A list of significant segments driving these changes, in order 
by absolute impact, with detailed numbers (before, after, absolute
and relative change, segment share before, impact, normed impact).

### 2.6. Use the `final_answer` tool to produce the finalized 
executive summary report.

I really like how the agent is encouraged to reiterate on the initial task and stay focused on the main problem. Regular reflection like this is helpful in real life as well, as teams often get bogged down in the process and lose sight of the why behind what they’re doing. It’s pretty cool to see managerial best practices being integrated into agentic frameworks.

That’s it! We’ve built a code agent capable of analysing KPI changes for simple metrics and explored all the key nuances of the process.

You can find the complete code and execution logs on GitHub.

Summary

We’ve experimented a lot with code agents and are now ready to draw conclusions. For our experiments, we used the HuggingFace smolagents framework for code agents — a very handy toolset that offers:

easy integration with different LLMs (from local models via Ollama to public providers like Anthropic or OpenAI),
outstanding logging that makes it easy to understand the whole thought process of the agent and debug issues,
ability to build complex systems leveraging multi-AI agent setups or planning features without much effort.

While smolagents is currently my favourite agentic framework, it has its limitations:

It can lack flexibility at times. For example, I had to modify the prompt directly in the source code to get the behaviour I wanted.
It only supports hierarchical multi-agent set-up (where one manager can delegate tasks to other agents), but doesn’t cover sequential workflow or consensual decision-making processes.
There’s no support for long-term memory out of the box, meaning you’re starting from scratch with every task.

Thank you a lot for reading this article. I hope this article was insightful for you.

Reference

This article is inspired by the “Building Code Agents with Hugging Face smolagents” short course by DeepLearning.AI.

Source link

#Data #Stories #Code #Agents #KPI #Narratives

From Data to Stories: Code Agents for KPI Narratives

MVP for explaining KPI changes

Tweaking prompts

AI agent for KPI narratives

Multi-AI agent setup

Iterating on the prompt

Planning steps

Summary

Reference

Recent Posts

Meet VR natives, the new key demographic for virtual reality

Australians embrace digital banking as digital wallet use surges

How I Fine-Tuned Granite-Vision 2B to Beat a 90B Model — Insights and Lessons Learned

This aerogel and some sun could make saltwater drinkable

Do You Need a Barbecue Knife?

MAGA influencer cope is now Congressional strategy

The Texas Floods Were a Preview of What’s to Come

Scientists Find Secret Code in Human DNA

A ‘Grand Unified Theory’ of Math Just Got a Little Bit Closer

I spent a weekend with Meta’s new Oakley smart glasses – they beat my Ray-Bans in every way