• About
  • Advertise
  • Privacy & Policy
  • Contact
Tuesday, January 20, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

Using Local LLMs to Discover High-Performance Algorithms

AiNEWS2025 by AiNEWS2025
2026-01-20
in Machine Learning
0
Using Local LLMs to Discover High-Performance Algorithms
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Ever since I was a child, I’ve been fascinated by drawing. What struck me was not only the drawing act itself, but also the idea that every drawing could be improved more and more. I remember reaching very high levels with my drawing style. However, once I reached the peak of perfection, I would try to see how I could improve the drawing even further – alas, with disastrous results.

From there I always keep in mind the same mantra: “refine and iterate and you’ll reach perfection”. At university, my approach was to read books many times, expanding my knowledge searching for other sources, for finding hidden layers of meaning in each concept. Today, I apply this same philosophy to AI/ML and coding.

We know that matrix multiplication (matmul for simplicity here), is the core part of any AI process. Back in the past I developed LLM.rust, a Rust mirror of Karpathy’s LLM.c. The hardest point in the Rust implementation has been the matrix multiplication. Since we have to perform thousands of iterations for fine-tuning a GPT-based model, we need an efficient matmul operation. For this purpose, I had to use the BLAS library, implementing an unsafe strategy for overcoming the limits and barriers. The usage of unsafe in Rust is against Rust’s philosophy, that’s why I am always looking for safer methods for improve matmul in this context.

So, taking inspiration from Sam Altman’s statement – “ask GPT how to create value” – I decided to ask local LLMs to generate, benchmark, and iterate on their own algorithms to create a better, native Rust matmul implementation.

The challenge has some constraints:

  • We need to use our local environment. In my case, a MacBook Pro, M3, 36GB RAM;
  • Overcome the limits of tokens;
  • Time and benchmark the code within the generation loop itself

I know that achieving BLAS-level performances with this method is almost impossible, but I want to highlight how we can leverage AI for custom needs, even with our “tiny” laptops, so that we can unblock ideas and push boundaries in any field. This post wants to be an inspiration for practitioners, and people who want to get more familiar with Microsoft Autogen, and local LLM deployment.

All the cod implementation can be found in this Github repo. This is an on-going experiment, and many changes/improvements will be committed.

General idea

The overall idea is to have a roundtable of agents. The starting point is the MrAderMacher Mixtral 8x7B model Q4 K_M local model. From the model we create 5 entities:

  • the Proposer comes up with a new Strassen-like algorithm, to find a better and more efficient way to perform matmul;
  • the Verifier reviews the matmul formulation through symbolic math;
  • the Coder creates the underlying Rust code;
  • the Tester executes it and saves all the info to the vector database;
  • the Manager acts silently, controlling the overall workflow.
Agent Role function
Proposer Analyses benchmark times, and it proposes new tuning parameters and matmul formulations.
Verifier (Currently disabled in the code). It verifies the proposer’s mathematical formulation through symbolic verification.
Coder It takes the parameters, and it works out the Rust template code.
Tester It runs the Rust code, it saves the code and computes the benchmark timing.
Manager Overall control of the workflow.
Tab. 1: Roles of agents.

The overall workflow can be orchestrated through Microsoft Autogen as depicted in fig.1.

Fig.1: Matmul optimisation. The user have an initial request with a prompt. From there the manager orchestrates the overall workflow: 1) The proposer acts a theorist and generates a Strassen-like algorithm; 2) The verifier checks the mathematical correctness of the code; 3) The coder generates a Rust Neon code; 4) The tester runs the benchmark. [Image generated with Nano Banana Pro].

Prepare the input data and vector database

The input data is collected from all academic papers, focused on matrix multiplication optimisation. Many of these papers are referenced in, and related to, DeepMind’s Strassen paper. I want to start simply, so I collected 50 papers, published from 2020 till 2025, that specifically address matrix multiplication.

Next, I’ve used chroma to create the vector database. The critical aspect in generating a new vector database is how the PDFs are chunked. In this context, I used a semantic chunker. Differently from split text methods, the semantic chunker uses the actual meaning of the text, to determine where to cut. The goal is to keep the related sentences together in one chunk, making the final vector database more coherent and accurate. This is done using the local model BAAI/bge-base-en-v1.5. The Github gist below shows the full implementation.

The core code: autogen-core and GGML models

I have used Microsoft Autogen, in particular the autogen-core variant (version 0.7.5). Differently from the higher-level chat, in autogen-core we can have access to low-level event-driven building blocks, that are necessary to create a state-machine-driven workflow as we need. As a matter of fact, the challenge is to maintain a strict workflow. All the acting agents must act in a specific order: Proposer –> Verifier –> Coder –> Tester.

The core part is the BaseMatMulAgent, that inherits from AutoGen’s RoutedAgent. This base class allows us to standardise how LLM agents will take part in the chat, and they will behave.

From the code above, we can see the class is designed to participate in an asynchronous group chat, handling conversation history, calls to external tools and generating responses through the local LLM.

The core component is @message_handler, a decorator that registers a method as listener or subscriber , based on the message type. The decorator automatically detects the type hint of the first method’s argument – in our case is message: GroupChatMessage. It then subscribes the agent to receive any events of that type sent to the agent’s topic. The handle_message async method is then responsible for updating the agent’s internal memory, without generating a response.

With the listener-subscriber mechanism is in place, we can focus on the Manager class. The MatMulManager inherits RoutedAgent and orchestrates the overall agents’ flow.

The code above handles all the agents. We are skipping the Verifier part, for the moment. The Coder publish the final code, and the Tester takes care of saving both the code and the whole context to the Vector Database. In this way, we can avoid consuming all the tokens of our local model. At each new run, the model will catch-up on the latest generated algorithms from the vector database and propose a new solution.

A very important caveat, for making sure autogen-core can work with llama models on MacOS, make use of the following snippet:

#!/bin/bash 

CMAKE_ARGS="-DGGML_METAL=on" FORCE_CMAKE=1 pip install --upgrade --verbose --force-reinstall llama-cpp-python --no-cache-dir

Fig.2 summarises the entire code. We can roughly subdivide the code into 3 main blocks:

  • The BaseAgent, that handles messages through LLM’s agents, evaluating the mathematical formulation and generating code;
  • The MatMulManager orchestrates the entire agents’ flow;
  • autogen_core.SingleThreadedAgentRuntime allows us to make the entire workflow a reality.
Fig.2: Overall workflow in a nutshell. The base agent executes the LLM through agents, it evaluates the mathematical formulation, creates the algorithm in Rust, and save all the info in the vector database. The MatMulManager is the real core of the overall workflow. Finally, the autogen_core.SingleThreadedAgentRuntime makes all of this to work on our MacBook PRO. [Image created with Nano Banana Pro.]

Results and benchmark

All the Rust code has been revised and re-run manually. While the workflow is robust, working with LLMs requires a critical eye. Several times the model confabulated*, generating code that looked optimised but failed to perform the actual matmul work.

The very first iteration generates a sort of Strassen-like algorithm (“Run 0” code in the fig.3):

The model thinks of better implementations, more Rust-NEON like, so that after 4 iterations it gives the following code (“Run 3” in fig.3):

We can see the usage of functions like vaddq_f32, specific CPU instruction for ARM processors, coming from std::arch::aarch64. The model manages to use rayon to split the workflow across multiple CPU cores, and inside the parallel threads it uses NEON intrinsics. The code itself is not totally correct, moreover, I’ve noticed that we’re running into an out-of-memory error when dealing with 1024×1024 matrices. I had to manually re-work out the code to make it work.

This brings us back to our my mantra “iterating to perfection”, and we can ask ourselves: ‘can a local agent autonomously refine Rust code to the point of mastering complex NEON intrinsics?’. The findings show that yes, even on consumer hardware, this level of optimisation is achievable.

Fig.3 shows the final results I’ve obtained after each iterations.

Fig.3: Logarithmic plot of the Rust-Neon implementation at various iterations. The calculations have been performed on 1024×1024 Matrix Multiplication benchmarks. [Image generated by the author].

The 0th and 2nd benchmark have some errors, as it is physically impossible to achieve such a results on a 1024×1024 matmul on a CPU:

  • the first code suffers from a diagonal fallacy, so the code is computing only diagonal blocks of the matrix and it is ignoring the rest;
  • the second code has a broken buffer, as it is repeatedly overwriting a small, cache-hot buffer 1028 floats, rather than transversing the full 1 million elements.

However, the code produced two real code, the run 1 and run 3. The first iteration achieves 760 ms, and it constitutes a real baseline. It suffers from cache misses and lack of SIMD vectorisation. The run 3 records 359 ms, the improvement is the implementation of NEON SIMD and Rayon parallelism.

*: I wrote “the model confabulates” on purposes. From a medical point-of-view, all the LLMs are not hallucinating, but confabulating. Hallucinations are a totally different situation w.r.t what LLMs are doing when babbling and generating “wrong” answers.

Conclusions

This experiment started with a question that seemed an impossible challenge: “can we use consumer-grade local LLMs to discover high-performance Rust algorithms that can compete with BLAS implementation?”.

We can say yes, or at least we have a valid and solid background, where we can build up better code to achieve a full BLAS-like code in Rust.

The post showed how to interact with Microsoft Autogen, autogen-core, and how to create a roundtable of agents.

The base model in use comes from GGUF, and it can run on a MacBook Pro M3, 36GB.

Of course, we didn’t find (yet) anything better than BLAS in a single simple code. However, we proved that local agentic workflow, on a MacBook Pro, can achieve what was previously thought to require a massive cluster and massive models. Eventually, the model managed to find a reasonable Rust-NEON implementation, “Run 3 above”, that has a speed up of over 50% on standard Rayon implementation. We must highlight that the backbone implementation was AI generated.

The frontier is open. I hope this blogpost can inspire you in trying to see what limits we can overcome with local LLM deployment.


I am writing this in a personal capacity; these views are my own.

Source link

#Local #LLMs #Discover #HighPerformance #Algorithms

Tags: artificial intelligenceEditors PickLlmProgrammingRust
Previous Post

The fastest human spaceflight mission in history crawls closer to liftoff

AiNEWS2025

AiNEWS2025

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Using Local LLMs to Discover High-Performance Algorithms

Using Local LLMs to Discover High-Performance Algorithms

2026-01-20
The fastest human spaceflight mission in history crawls closer to liftoff

The fastest human spaceflight mission in history crawls closer to liftoff

2026-01-20
What it’s like to be banned from the US for fighting online hate

What it’s like to be banned from the US for fighting online hate

2026-01-20
Roland’s Go:Mixer Studio turns your phone into a mobile music studio

Roland’s Go:Mixer Studio turns your phone into a mobile music studio

2026-01-20

Recent News

Using Local LLMs to Discover High-Performance Algorithms

Using Local LLMs to Discover High-Performance Algorithms

2026-01-20
The fastest human spaceflight mission in history crawls closer to liftoff

The fastest human spaceflight mission in history crawls closer to liftoff

2026-01-20
What it’s like to be banned from the US for fighting online hate

What it’s like to be banned from the US for fighting online hate

2026-01-20
Roland’s Go:Mixer Studio turns your phone into a mobile music studio

Roland’s Go:Mixer Studio turns your phone into a mobile music studio

2026-01-20
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Using Local LLMs to Discover High-Performance Algorithms

Using Local LLMs to Discover High-Performance Algorithms

2026-01-20
The fastest human spaceflight mission in history crawls closer to liftoff

The fastest human spaceflight mission in history crawls closer to liftoff

2026-01-20
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.