Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

[ad_1]

, the standard “text in, text out” paradigm will only take you so far.

Real applications that deliver actual value should be able to examine visuals, reason through complex problems, and produce results that systems can actually use.

In this post, we’ll design this stack by bringing together three powerful capabilities: multimodal input, reasoning, and structured output.

To illustrate this, we’ll walk through a hands-on example: building a time-series anomaly detection system for e-commerce order data using OpenAI’s o3 model. Specifically, we’ll show how to pair o3’s reasoning capability with image input and emit validated JSON, so that the downstream system can easily consume it.

By the end, our app will:

See: analyze charts of e-commerce order volume time series
Think: identify unusual patterns
Integrate: output a structured anomaly report

You’ll leave with functional code you can reuse for various use cases that go beyond just anomaly detection.

Let’s dive in.

Interested in learning the broader landscape of how LLMs are being applied for anomaly detection? Check out my previous post: Boosting Your Anomaly Detection With LLMs, where I summarized 7 emerging application patterns that you shouldn’t miss.

1. Case Study

In this post, we aim to build an anomaly detection solution for identifying abnormal patterns in e-commerce order time series data.

For this case study, we generated three sets of synthetic daily order data. The datasets represent three different profiles of the daily order over roughly one month of time. To make seasonality obvious, we have shaded the weekends. The x-axis shows the day of the week.

Figure 1. Dataset 1, with the shaded regions being the weekends. (Image by author)

Figure 2. Dataset 2, with the shaded regions being the weekends. (Image by author)

Figure 3. Dataset 3, with the shaded regions being the weekends. (Image by author)

Each figure contains one specific type of anomaly (can you find them?). We’ll later use those figures to test our anomaly detection solution and see if it can accurately recover those anomalies.

2. Our Solution

2.1 Overview

Unlike the traditional machine learning approaches that require tedious feature engineering and model training, our current approach is much simpler. It works with the following steps:

We prepare the figure for visualizing the e-commerce order time series data.
We prompt the reasoning model o3, ask it to take a closer look at the time series image we fed to it, and determine if an unusual pattern exists.
The o3 model will then output its findings in a pre-defined JSON format.

And that’s it. Simple.

Of course, to deliver this solution, we need to enable o3 model to take image input and emit structured output. We will see how to do that shortly.

2.2 Setting up the reasoning model

As mentioned before, we’ll use o3 model, which is the flagship reasoning model from OpenAI that can tackle complex multi-step problems with state-of-the-art performance. Specifically, we’ll use the Azure OpenAI endpoint to call the model.

Make sure you have put the endpoint, API key, and deployment name in an .env file, we can then proceed to setting up the LLM client:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from openai import AzureOpenAI
from dotenv import load_dotenv
import os

load_dotenv()

# Setup LLM client
endpoint = os.getenv("api_base")
api_key = os.getenv("o3_API_KEY")
api_version = "2025-04-01-preview"
model_name = "o3"
deployment = os.getenv("deployment_name")

LLM_client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint=endpoint
)

We use the following instruction as the system message for the o3 model (tuned by GPT-5):

instruction = f"""

[Role]
You are a meticulous data analyst.

[Task]
You will be given a line chart image related to daily e-commerce orders. 
Your task is to identify prominent anomalies in the data.

[Rules]
The anomaly kinds can be spike, drop, level_shift, or seasonal_outlier.
A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single point.
A seasonal_outlier happens if a weekend/weekday behaves unlike peers in its category. 
For example, weekend orders are usually lower than the weekdays'.
Read dates/values from axes; if you can’t read exactly, snap to the nearest tick and note uncertainty in explanation.
The weekends are shaded in the figure.
"""

In the above instruction, we clearly defined the role of the LLM, the task that the LLM should complete, and the rules the LLM should follow.

To limit the complexity of our case study, we intentionally specified only four anomaly types that LLM needs to identify. We also provided clear definitions of those anomaly types to remove ambiguity.

Finally, we injected a bit of domain knowledge about e-commerce patterns, i.e., lower weekend orders are expected compared to weekdays. Incorporating domain know-how is generally considered good practice for guiding the model’s analytical process.

Now that we have our model set up, let’s discuss how to prepare the image for o3 model to consume.

2.3 Image preparation

To enable o3’s multimodal capabilities, we need to provide figures in a specific format, i.e., either publicly accessible web URLs or as base64-encoded data URLs. Since our figures are generated locally, we’ll use the second approach.

What is Base64 Encoding anyway? Base64 is a way to represent binary data (like our image files) using only text characters that are safe to transmit over the internet. It converts binary image data into a string of letters, numbers, and a few symbols.

And what about data URL? A data URL is a type of URL that embeds the file content directly in the URL string, rather than pointing to a file location.

We can use the following function to handle this conversion automatically:

import io
import base64

def fig_to_data_url(fig, fmt="png"):
    """
    Converts a Matplotlib figure to a base64 data URL without saving to disk.

    Args:
    -----
    fig (matplotlib.figure.Figure): The figure to convert.
    fmt (str): The format of the image ("png", "jpeg", etc.)

    Returns:
    --------
    str: The data URL representing the figure.
    """

    buf = io.BytesIO()
    fig.savefig(buf, format=fmt, bbox_inches="tight")
    buf.seek(0)
    
    base64_encoded_data = base64.b64encode(buf.read()).decode("utf-8")
    mime_type = f"image/{fmt.lower()}"
    
    return f"data:{mime_type};base64,{base64_encoded_data}"

Essentially, our function first saves the matplotlib figure to a memory buffer. It then encodes the binary PNG data as base64 text and wraps it in the desired data URL format.

Assuming we have access to the synthetic daily order data, we can use the following function to generate the plot and convert it into a proper data URL format in one go:

def create_fig(df):
    """
    Create a Matplotlib figure and convert it to a base64 data URL.
    Weekends (Sat–Sun) are shaded.

    Args:
    -----
    df: dataframe contains one profile of daily order time series. 
        dataframe has "date" and "orders" columns.

    Returns:
    --------
    image_url: The data URL representing the figure.
    """

    df = df.copy()
    df['date'] = pd.to_datetime(df['date'])

    fig, ax = plt.subplots(figsize=(8, 4.5))
    ax.plot(df["date"], df["orders"], linewidth=2)
    ax.set_xlabel('Date', fontsize=14)
    ax.set_ylabel('Daily Orders', fontsize=14)

    # Weekend shading
    start = df["date"].min().normalize()
    end   = df["date"].max().normalize()
    cur = start
    while cur

Figures 1-3 are generated by the above plotting routine.

2.4 Structured output

In this section, let’s discuss how to ensure the o3 model outputs a consistent JSON format instead of free-form text. This is what’s known as “structured output,” and it’s one of the key enablers for integrating LLMs into existing automatic workflows.

To achieve that, we start by defining the schema that governs the expected output structure. We’ll be using a Pydantic model:

from pydantic import BaseModel, Field
from typing import Literal
from datetime import date

AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]

class DateWindow(BaseModel):
    start: date = Field(description="Earliest plausible date the anomaly begins (ISO YYYY-MM-DD)")
    end: date = Field(description="Latest plausible date the anomaly ends, inclusive (ISO YYYY-MM-DD)")

class AnomalyReport(BaseModel):
    when: DateWindow = Field(
        description=(
            "Minimal window that contains the anomaly. "
            "For single-point anomalies, use the interval that covers reading uncertainty, if the tick labels are unclear"
        )
    )
    y: int = Field(description="Approx value at the anomaly’s most representative day (peak/lowest), rounded")
    kind: AnomalyKind = Field(description="The type of the anomaly")
    why: str = Field(description="One-sentence reason for why this window is unusual")
    date_confidence: Literal["low","medium","high"] = Field(
        default="medium", description="Confidence that the window localization is correct"
    )

Our Pydantic schema tries to capture both the quantitative and qualitative aspects of the detected anomalies. For each field, we specify its data type (e.g., int for numerical values, Literal for a fixed set of choices, etc.).

Also, we use Field function to provide detailed descriptions of each key. Those descriptions are especially important as they effectively serve as inline instructions for o3, so that it understands the semantic meaning of each component.

Now, we have covered the multimodal input and structured output, time to put them together in one LLM call.

2.5 o3 model invocation

To interact with o3 using multimodal input and structured output, we use LLM_client.beta.chat.completions.parse() API. Some of the key arguments include:

model: the deployment name;
messages: the message object sent to o3 model;
max_completion_token: the maximum number of tokens the model can generate in its final response. Note that for reasoning models like o3, they will generate reasoning_tokens internally to “think through” the problem. The current max_completion_token only limits the visible output tokens that users receive;
response_format: the Pydantic model that defines the expected JSON schema structure;
reasoning_effort: a control knob that dictates how much computational effort o3 should use for reasoning. The available options include low, medium, and high.

We can define a helper function to interact with the o3 model:

def anomaly_detection(instruction, fig_path, 
                      response_format, prompt=None, 
                      deployment="o3", reasoning_effort="high"):

    # Compose messages
    messages=[
            { "role": "system", "content": instruction},
            { "role": "user", "content": [  
                { 
                    "type": "image_url",
                    "image_url": {
                        "url": fig_path,
                        "detail": "high"
                    }
                },
            ]} 
    ]

    # Add prompt if it is given
    if prompt is not None:
        messages[1]["content"].append({"type": "text", "text": prompt})

    # Invoke LLM API
    response = LLM_client.beta.chat.completions.parse(
        model=deployment,
        messages=messages,
        max_completion_tokens=4000,
        reasoning_effort=reasoning_effort,
        response_format=response_format
    )

    return response.choices[0].message.parsed.model_dump()

Note that the messages object accepts both text and image content. Since we’ll solely use figures to prompt the model, the text prompt is optional.

We set the "detail": "high" to enable high-resolution image processing. For our current case study, this is most likely necessary as we need o3 to better read fine details like axis tick labels, data point values, and subtle visual patterns. However, bear in mind that high-detail processing would incur more tokens and higher API costs.

Finally, by using .parsed.model_dump(), we turn the JSON output into a usual Python dictionary.

That’s it for the implementation. Let’s see some results next.

3. Results

In this section, we’ll input the previously generated figures into the o3 model and ask it to identify potential anomalies.

3.1 Spike anomaly

# df_spike_anomaly is the dataframe of the first set of synthetic data (Figure 1)
spike_anomaly_url = create_fig(df_spike_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          spike_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

In the call above, the spike_anomaly_url is the data URL for Figure 1. The output of the result is shown below:

{
  'when': {'start': datetime.date(2025, 8, 19), 'end': datetime.date(2025, 8, 21)}, 
  'y': 166, 
  'kind': 'spike', 
  'why': 'Single day orders jump to ~166, far above adjacent days that sit near 120–130.', 
  'date_confidence': 'medium'
}

We see that o3 model faithfully returned the output exactly in the format we designed. Now, we can grab this result and generate a visualization programmatically:

# Create image
fig, ax = plt.subplots(figsize=(8, 4.5))
df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Daily Orders', fontsize=14)

# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))  
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1)) 

# Add anomaly overlay
start_date = pd.to_datetime(result['when']['start'])
end_date = pd.to_datetime(result['when']['end'])

# Add shaded region
ax.axvspan(start_date, end_date, alpha=0.3, color='red', label=f"Anomaly ({result['kind']})")

# Add text annotation
mid_date = start_date + (end_date - start_date) / 2  # Middle of anomaly window
ax.annotate(
    result['why'], 
    xy=(mid_date, result['y']), 
    xytext=(10, 20),  # Offset from the point
    textcoords='offset points',
    bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.7),
    arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
    fontsize=10,
    wrap=True
)

# Add legend
ax.legend()

plt.xticks(rotation=0)
plt.tight_layout()

The generated visualization looks like this:

Figure 4. The anomaly detection results for Figure 1. (Image by author)

We can see that the o3 model correctly identified the spike anomaly presented in this first set of synthetic data.

Not bad, especially considering the fact that we didn’t do any conventional model training, just by prompting an LLM.

3.2 Level shift anomaly

# df_level_shift_anomaly is the dataframe of the 2nd set of synthetic data (Figure 2)
level_shift_anomaly_url = create_fig(df_level_shift_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          level_shift_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

The output of the result is shown below:

{
  'when': {'start': datetime.date(2025, 8, 26), 'end': datetime.date(2025, 9, 2)}, 
  'y': 150, 
  'kind': 'level_shift', 
  'why': 'Orders suddenly jump from the 120-135 range to ~150 on Aug 26 and remain elevated for all subsequent days, indicating a sustained baseline change.', 
  'date_confidence': 'high'
}

Again, we see that the model accurately identified that a “level_shift” anomaly is present in the plot:

Figure 5. The anomaly detection results for Figure 2. (Image by author)

3.3 Seasonality anomaly

# df_seasonality_anomaly is the dataframe of the 3rd set of synthetic data (Figure 3)
seasonality_anomaly_url = create_fig(df_seasonality_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          seasonality_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

The output of the result is shown below:

{
  'when': {'start': datetime.date(2025, 8, 23), 'end': datetime.date(2025, 8, 24)}, 
  'y': 132, 
  'kind': 'seasonal_outlier', 
  'why': 'Weekend of Aug 23-24 shows order volumes (~130+) on par with surrounding weekdays, whereas other weekends consistently drop to ~115, making it an out-of-season spike.', 
  'date_confidence': 'high'
}

This is a challenging case. Nevertheless, our o3 model managed to tackle it properly, with accurate localization and a clear reasoning trace. Pretty impressive:

Figure 6. The anomaly detection results for Figure 3. (Image by author)

4. Summary

Congratulations! We’ve successfully built an anomaly detection solution for time-series data that worked entirely through visualization and prompting.

By feeding daily order plots into the o3 reasoning model and constraining its output to a JSON schema, the LLM managed to identify three different anomaly types with accurate localization. All of this was achieved without training any ML model. Impressive!

If we take a step back, we can see that the solution we built illustrates the broader pattern of combining three capabilities:

See: multimodal input to let the model consume figures directly.
Think: step-by-step reasoning capability to tackle complex problems.
Integrate: structured output that downstream systems can easily consume (e.g., generating visualizations).

The combination of multimodal input + reasoning + structured output really creates a versatile foundation for useful LLM applications.

You now have the building blocks ready. What do you want to build next?

Source link

#Building #LLM #Apps #Integrate #Multimodal #Input #Structured #Output

[ad_2]

Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

Recent Posts

New Google Cloud tool fights future quantum attacks

Western Union to launch stablecoin

“We will never build a sex robot,” says Mustafa Suleyman

Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood)

Mazda shows a rotary hybrid concept for Tokyo with evolved design language

Donald Trump’s Truth Social Is Launching a Polymarket Competitor

Roundtables: Seeking Climate Solutions in Turbulent Times

Withings’ urine scanning health tracker is now available for $350

Google Workspace Promo Code: Up to 14% Off in October 2025

University Denies Monkeys That Escaped in Truck Crash Were Infected With Horrific Diseases