my first n8n workflow, as a data scientist, it felt like I was cheating.
I could connect to APIs without reading 30-page docs, trigger workflows from Gmail or Sheets, and deploy something useful in minutes.
However, the significant drawback is that n8n is not natively optimised to run a Python environment in the cloud instances used by our customers.
Like many data scientists, my daily toolbox for data analytics is built on NumPy and Pandas.
To stay in my comfort zone, I often outsourced calculations to external APIs instead of using n8n JavaScript code nodes.
For instance, this is what is done with a Production Planning Optimisation tool, which is orchestrated through a workflow that includes an Agent node that calls a FastAPI microservice.
This approach worked, but I had clients who requested to have complete visibility of the data analytics tasks on their n8n user interface.
I realised that I need to learn just enough JavaScript to perform data processing with the native code nodes of n8n.
In this article, we will experiment with small JavaScript snippets inside n8n Code nodes to perform everyday data analytics tasks.
For this exercise, I will use a dataset of sales transactions and walk it through to an ABC and Pareto analysis, which are widely used in Supply Chain Management.
I will provide side-by-side examples of Pandas vs. JavaScript in n8n Code nodes, allowing us to translate our familiar Python data analysis steps directly into automated n8n workflows.
The idea is to implement these solutions for small datasets or quick prototyping within the capabilities of a cloud enterprise n8n instance (i.e. without community nodes).
I will end the experiment with a quick comparative study of the performance versus a FastAPI call.
You can follow me and replicate the entire workflow using a Google Sheet and a workflow template shared in the article.
Let’s start!
Building a Data Analytics Workflow using JavaScript in n8n
Before starting to build nodes, I will introduce the context of this analysis.
ABC & Pareto Charts for Supply Chain Management
For this tutorial, I propose that you build a simple workflow that takes sales transactions from Google Sheets and transforms them into a comprehensive ABC and Pareto charts.
This will replicate the ABC and Pareto Analysis module of the LogiGreen Apps developed by my startup, LogiGreen.
The goal is to generate a set of visuals for the inventory teams of a supermarket chain to help them understand the distribution of sales across their stores.
We will focus on generating two visuals.
The first chart shows an ABC-XYZ analysis of sales items:
- X-axis (Percentage of Turnover %): the contribution of each item to total revenue.
- Y-axis (Coefficient of Variation): demand variability of each item.
- Vertical red lines split items into A, B, and C classes based on turnover share.
- The horizontal blue line marks stable vs variable demand (CV=1)
Together, it highlights which items are high-value & stable (A, low CV) versus those that are low-value or highly variable, guiding prioritisation in inventory management.
The second visual is a Pareto analysis of sales turnover:
- X-axis: percentage of SKUs (ranked by sales).
- Y-axis: cumulative percentage of annual turnover.
- The curve illustrates how a small fraction of items contributes to the majority of revenue.
In short, this highlights (or not) the classic Pareto rule, which affirms that 80% of sales can come from 20% of the SKUs.
How did I generate these two visuals? I simply used Python.
On my YouTube channel, I shared a complete tutorial on how to do it using Pandas and Matplotlib.
The objective of this tutorial is to prepare sales transactions and generate these visuals in a Google Sheet using only n8n’s native JavaScript nodes.
Building a Data Analytics Workflow in n8n
I propose to build a workflow that is manually triggered to facilitate debugging during development.
To follow this tutorial, you need to
You can now connect your duplicated sheet using the second node, which will extract the dataset from the worksheet: Input Data
.
This dataset includes retail sales transactions at the daily granularity:
ITEM
: an item that can be sold in multiple storesSKU
: represents an `SKU` sold in a specific storeFAMILY
: a group of itemsCATEGORY
: a product category can include multiple familiesSTORE
: a code representing a sales locationDAY
of the transactionQTY
: sales quantity in unitsTO
: sales quantity in euros
The output is the table’s content in JSON format, ready to be ingested by other nodes.
Python Code
import pandas as pd
df = pd.read_csv("sales.csv")
We can now begin processing the dataset to build our two visualisations.
Step 1: Filter out transactions without sales
Let us begin with the simple action of filtering out transactions with sales QTY
equal to zero.
We do not need JavaScript; a simple Filter node can do the job.
Python Code
df = df[df["QTY"] != 0]
Step 2: Prepare data for Pareto Analysis
We first need to aggregate the sales per ITEM
and rank products by turnover.
Python Code
sku_agg = (df.groupby("ITEM", as_index=False)
.agg(TO=("TO","sum"), QTY=("QTY","sum"))
.sort_values("TO", ascending=False))
In our workflow, this step will be done in the JavaScript node TO, QTY GroupBY ITEM
:
const agg = {};
for (const {json} of items) {
const ITEM = json.ITEM;
const TO = Number(json.TO);
const QTY = Number(json.QTY);
if (!agg[ITEM]) agg[ITEM] = { ITEM, TO: 0, QTY: 0 };
agg[ITEM].TO += TO;
agg[ITEM].QTY += QTY;
}
const rows = Object.values(agg).sort((a,b)=> b.TO - a.TO);
return rows.map(r => ({ json: r }));
This node returns a ranked table of sales per ITEM
in quantity (QTY) and turnover (TO):
- We initiate agg as a dictionary keyed by ITEM
- We loop over n8n rows in items
- Converting TO and QTY to numbers
- Add the QTY and TO value into the running totals of each ITEM
- We finally transform the dictionary into an array sorted by TO desc and return items
ITEM
– (Image by Samir Saci)We now have the data ready to perform a Pareto Analysis on sales quantity (QTY) or turnover (TO).
For that, we need to calculate cumulative sales and rank SKUs from the highest to the lowest contributor.
Python Code
abc = sku_agg.copy() # from Step 2, already sorted by TO desc
total = abc["TO"].sum() or 1.0
abc["cum_turnover"] = abc["TO"].cumsum()
abc["cum_share"] = abc["cum_turnover"] / total
abc["sku_rank"] = range(1, len(abc) + 1)
abc["cum_skus"] = abc["sku_rank"] / len(abc)
abc["cum_skus_pct"] = abc["cum_skus"] * 100
This step will be done in the code node Pareto Analysis
:
const rows = items
.map(i => ({
...i.json,
TO: Number(i.json.TO || 0),
QTY: Number(i.json.QTY || 0),
}))
.sort((a, b) => b.TO - a.TO);
const n = rows.length; // number of ITEM
const totalTO = rows.reduce((s, r) => s + r.TO, 0) || 1;
We collect the dataset items
from the previous node
- For each row, we clean up the fields
TO
andQTY
(in case we have missing values) - We sort all SKUs by turnover in descending order.
- We store in variables the number of items and the total turnover
let cumTO = 0;
rows.forEach((r, idx) => {
cumTO += r.TO;
r.cum_turnover = cumTO;
r.cum_share = +(cumTO / totalTO).toFixed(6);
r.sku_rank = idx + 1;
r.cum_skus = +((idx + 1) / n).toFixed(6);
r.cum_skus_pct = +(r.cum_skus * 100).toFixed(2);
});
return rows.map(r => ({ json: r }));
Then we loop over all items in sorted order.
- Use the variable
cumTO
to compute the cumulative contribution - Add several Pareto metrics to each row:
cum_turnover
: cumulative turnover up to this itemcum_share
: cumulative share of turnoversku_rank
: ranking position of the itemcum_skus
: cumulative number of SKUs as a fraction of total SKUscum_skus_pct
: same ascum_skus
, but in %.
We are then done with the data preparation of the pareto chart.
This dataset will be stored in the worksheet Pareto
by the node Update Pareto Sheet
.
And with a bit of magic, we can generate this graph in the first worksheet:
We can now continue with the ABC XYZ chart.
Step 3: Calculate the demand variability and sales contribution
We could reuse the output of the pareto chart for the sales contribution, but we will consider each chart as independent.
I will split the code for the node Demand Variability’ and ‘Sales x Sales % into multiple segments for clarity.
Block 1: define functions for mean and standard deviation
function mean(a){ return a.reduce((s,x)=>s + x, 0) / (a.length || 1); }
function stdev_samp(a){
if (a.length s + (x - m) ** 2, 0) / (a.length - 1);
return Math.sqrt(v);
}
These two functions will be used for the coefficient of variation (Cov)
mean(a)
: computes the average of an array.stdev_samp(a)
: computes the sample standard deviation
They take as inputs the daily sales distributions of each ITEM
that we build in this second block.
Block 2: Create the daily sales distribution of each ITEM
const series = {}; // ITEM -> { day -> qty_sum }
let totalQty = 0;
for (const { json } of items) {
const item = String(json.ITEM);
const day = String(json.DAY);
const qty = Number(json.QTY || 0);
if (!series[item]) series[item] = {};
series[item][day] = (series[item][day] || 0) + qty;
totalQty += qty;
}
Python Code
import pandas as pd
import numpy as np
df['QTY'] = pd.to_numeric(df['QTY'], errors='coerce').fillna(0)
daily_series = df.groupby(['ITEM', 'DAY'])['QTY'].sum().reset_index()
Now we can compute the metrics applied to the daily sales distributions.
const out = [];
for (const [item, dayMap] of Object.entries(series)) {
const daily = Object.values(dayMap); // daily sales quantities
const qty_total = daily.reduce((s,x)=>s+x, 0);
const m = mean(daily); // average daily sales
const sd = stdev_samp(daily); // variability of sales
const cv = m ? sd / m : null; // coefficient of variation
const share_qty_pct = totalQty ? (qty_total / totalQty) * 100 : 0;
out.push({
ITEM: item,
qty_total,
share_qty_pct: Number(share_qty_pct.toFixed(2)),
mean_qty: Number(m.toFixed(3)),
std_qty: Number(sd.toFixed(3)),
cv_qty: cv == null ? null : Number(cv.toFixed(3)),
});
}
For each ITEM
, we calculate
qty_total
: total salesmean_qty
: average daily sales.std_qty
: standard deviation of daily sales.cv_qty
: coefficient of variation (variability measure for XYZ classification)share_qty_pct
: % contribution to total sales (used for ABC classification)
Here is the Python version in case you were lost:
summary = daily_series.groupby('ITEM').agg(
qty_total=('QTY', 'sum'),
mean_qty=('QTY', 'mean'),
std_qty=('QTY', 'std')
).reset_index()
summary['std_qty'] = summary['std_qty'].fillna(0)
total_qty = summary['qty_total'].sum()
summary['cv_qty'] = summary['std_qty'] / summary['mean_qty'].replace(0, np.nan)
summary['share_qty_pct'] = 100 * summary['qty_total'] / total_qty
We are nearly done.
We just need to sort by descending contribution to prepare for the ABC class mapping:
out.sort((a,b) => b.share_qty_pct - a.share_qty_pct);
return out.map(r => ({ json: r }));
We now have for each ITEM
, the key metrics needed to create the scatter plot.
Demand Variability x Sales %
– (Image by Samir Saci)Only the ABC classes are missing at this step.
Step 4: Add ABC classes
We take the output of the previous node as input.
let rows = items.map(i => i.json);
rows.sort((a, b) => b.share_qty_pct - a.share_qty_pct);
Just in case, we sort ITEMS
by descending by sales share (%) → most important SKUs first.
(This step can be omitted as it is normally already completed at the end of the previous code node.)
Then we can apply the class based on hardcoded conditions:
- A: SKUs that together represent the first 5% of sales
- B: SKUs that together represent the next 15% of sales
- C: Everything after 20%.
let cum = 0;
for (let r of rows) {
cum += r.share_qty_pct;
// 3) Assign class based on cumulative %
if (cum ({ json: r }));
This can be done that way using Python Code.
df = df.sort_values('share_qty_pct', ascending=False).reset_index(drop=True)
df['cum_share'] = df['share_qty_pct'].cumsum()
def classify(cum):
if cum
The results can now be used to generate this chart, which can be found in the first sheet of the Google Sheet:
I struggled (probably due to my limited knowledge of Google Sheets) to find a “manual” solution to create this scatter plot with the correct colour mapping.
Therefore, I used a Google Apps Script available in the Google Sheet to create it.
As a bonus, I added more nodes to the n8n template that perform the same type of GroupBy to calculate sales by store or a pair of ITEM-store.
They can be used to create visuals like this one:
To conclude this tutorial, we can confidently declare that the job is done.
For a live demo of the workflow, you can have a look at this short tutorial
Our customers, who run this workflow on their n8n cloud instance, can now gain visibility into each step of the data processing.
But at which cost? Are we loosing in performance?
This is what we will discover in the next section.
Comparative Study of Performance: n8n JavaScript Node vs. Python in FastAPI
To answer this question, I prepared a straightforward experiment.
The same dataset and transformations were processed using two different approaches inside n8n:
- All in JavaScript nodes with functions directly within n8n.
- Outsourcing to FastAPI microservices by replacing the JavaScript logic with HTTP requests to Python endpoints.
These two endpoints are connected to functions that will load the data directly from the VPS instance where I hosted the microservice.
@router.post("/launch_pareto")
async def launch_speedtest(request: Request):
try:
session_id = request.headers.get('session_id', 'session')
folder_in = f'data/session/speed_test/input'
if not path.exists(folder_in):
makedirs(folder_in)
file_path = folder_in + '/sales.csv'
logger.info(f"[SpeedTest]: Loading data from session file: {file_path}")
df = pd.read_csv(file_path, sep=";")
logger.info(f"[SpeedTest]: Data loaded successfully: {df.head()}")
speed_tester = SpeedAnalysis(df)
output = await speed_tester.process_pareto()
result = output.to_dict(orient="records")
result = speed_tester.convert_numpy(result)
logger.info(f"[SpeedTest]: /launch_pareto completed successfully for {session_id}")
return result
except Exception as e:
logger.error(f"[SpeedTest]: Error /launch_pareto: {str(e)}\n{traceback.format_exc()}")
raise HTTPException(status_code=500, detail=f"Failed to process Speed Test Analysis: {str(e)}")
@router.post("/launch_abc_xyz")
async def launch_abc_xyz(request: Request):
try:
session_id = request.headers.get('session_id', 'session')
folder_in = f'data/session/speed_test/input'
if not path.exists(folder_in):
makedirs(folder_in)
file_path = folder_in + '/sales.csv'
logger.info(f"[SpeedTest]: Loading data from session file: {file_path}")
df = pd.read_csv(file_path, sep=";")
logger.info(f"[SpeedTest]: Data loaded successfully: {df.head()}")
speed_tester = SpeedAnalysis(df)
output = await speed_tester.process_abcxyz()
result = output.to_dict(orient="records")
result = speed_tester.convert_numpy(result)
logger.info(f"[SpeedTest]: /launch_abc_xyz completed successfully for {session_id}")
return result
except Exception as e:
logger.error(f"[SpeedTest]: Error /launch_abc_xyz: {str(e)}\n{traceback.format_exc()}")
raise HTTPException(status_code=500, detail=f"Failed to process Speed Test Analysis: {str(e)}")
I want to focus this test only on the data processing performance.
The SpeedAnalysis
includes all the data processing steps listed in the previous section
- Grouping sales by
ITEM
- Sorting
ITEM
by descending order and calculate cumulative sales - Calculating standard deviations and means of sales distribution by
ITEM
class SpeedAnalysis:
def __init__(self, df: pd.DataFrame):
config = load_config()
self.df = df
def processing(self):
try:
sales = self.df.copy()
sales = sales[sales['QTY']>0].copy()
self.sales = sales
except Exception as e:
logger.error(f'[SpeedTest] Error for processing : {e}\n{traceback.format_exc()}')
def prepare_pareto(self):
try:
sku_agg = self.sales.copy()
sku_agg = (sku_agg.groupby("ITEM", as_index=False)
.agg(TO=("TO","sum"), QTY=("QTY","sum"))
.sort_values("TO", ascending=False))
pareto = sku_agg.copy()
total = pareto["TO"].sum() or 1.0
pareto["cum_turnover"] = pareto["TO"].cumsum()
pareto["cum_share"] = pareto["cum_turnover"] / total
pareto["sku_rank"] = range(1, len(pareto) + 1)
pareto["cum_skus"] = pareto["sku_rank"] / len(pareto)
pareto["cum_skus_pct"] = pareto["cum_skus"] * 100
return pareto
except Exception as e:
logger.error(f'[SpeedTest]Error for prepare_pareto: {e}\n{traceback.format_exc()}')
def abc_xyz(self):
daily = self.sales.groupby(["ITEM", "DAY"], as_index=False)["QTY"].sum()
stats = (
daily.groupby("ITEM")["QTY"]
.agg(
qty_total="sum",
mean_qty="mean",
std_qty="std"
)
.reset_index()
)
stats["cv_qty"] = stats["std_qty"] / stats["mean_qty"].replace(0, np.nan)
total_qty = stats["qty_total"].sum()
stats["share_qty_pct"] = (stats["qty_total"] / total_qty * 100).round(2)
stats = stats.sort_values("share_qty_pct", ascending=False).reset_index(drop=True)
stats["cum_share"] = stats["share_qty_pct"].cumsum().round(2)
def classify(cum):
if cum
Now that we have these endpoints ready, we can begin testing.
The results are shown above:
- JavaScript-only workflow: The whole process was completed in a bit more than 11.7 seconds.
Most of the time was spent updating sheets and performing iterative calculations within n8n nodes. - FastAPI-backed workflow: The equivalent “outsourced” process was completed in ~11.0 seconds.
Heavy computations were offloaded to Python microservices, which handled them faster than native JavaScript nodes.
In other words, outsourcing complex computations to Python actually improves the performance.
The reason is that FastAPI endpoints execute optimised Python functions directly, while JavaScript nodes inside n8n must iterate (with loops).
For large datasets, I would imagine a delta that is probably not negligible.
This demonstrates that you can do simple data processing within n8n using small JavaScript snippets.
However, our Supply Chain Analytics products can require more advanced processing involving optimisation and advanced statistical libraries.
For that, customers can accept dealing with a “black box” approach, as seen in the Production Planning workflow presented in this Towards Data Science article.
But for light processing tasks, we can integrate them into the workflow to provide visibility to non-code users.
For another project, I use n8n to connect Supply Chain IT systems for the transfer of Purchase Orders using Electronic Data Interchange (EDI).
This workflow, deployed for a small logistics company, entirely parses EDI messages using JavaScript nodes.
As you can discover in this tutorial, we have performed 100% of the Electronic Data Interchange message parsing using JavaScript nodes.
This helped us to improve the robustness of the solution and reduce our workload by handing over the maintenance to the customer.
What is the best approach?
For me, n8n should be used as an orchestration and integration tool connected to our core analytics products.
These analytics products require specific input formats that may not align with our customers’ data.
Therefore, I would advise using JavaScript code nodes to perform this preprocessing.
For example, the workflow above connects a Google Sheet (containing input data) to a FastAPI microservice that runs an algorithm for Distribution Planning Optimisation.
The idea is to plug our optimisation algorithm into a Google Sheet used by Distribution Planners to organise store deliveries.
The JavaScript code node is used to transform the data collected from the Google Sheet into the input format required by our algorithm.
By doing the job inside the workflow, it remains under the control of the customer who runs the workflow in their own instance.
And we can keep the optimisation part in a microservice hosted on our instance.
To better understand the setup, feel free to have a look at this short presentation
I hope this tutorial and the examples above have given you enough insight to understand what can be done with n8n in terms of data analytics.
Feel free to share your comments about the approach and your thoughts on what could be improved to enhance the workflow’s performance with me.
About Me
Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.
For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.
Find your complete guide for Supply Chain Analytics: Analytics Cheat Sheet.
If you are interested in Data Analytics and Supply Chain, look at my website.
Source link
#Python #JavaScript #Playbook #Data #Analytics #n8n #Code #Node #Examples