Energy-Efficient NPU Technology Cuts AI Power Use by 44%

Researchers at the Korea Advanced Institute of Science and Technology (KAIST) have developed energy-efficient NPU technology that demonstrates substantial performance improvements in laboratory testing.

Their specialised AI chip ran AI models 60% faster while using 44% less electricity than the graphics cards currently powering most AI systems, based on results from controlled experiments.

To put it simply, the research, led by Professor Jongse Park from KAIST’s School of Computing in collaboration with HyperAccel Inc., addresses one of the most pressing challenges in modern AI infrastructure: the enormous energy and hardware requirements of large-scale generative AI models.

Current systems such as OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 demand not only high memory bandwidth but also substantial memory capacity, driving companies like Microsoft and Google to purchase hundreds of thousands of NVIDIA GPUs.

The memory bottleneck challenge

The core innovation lies in the team’s approach to solving memory bottleneck issues that plague existing AI infrastructure. Their energy-efficient NPU technology focuses on “lightweight” the inference process while minimising accuracy loss—a critical balance that has proven challenging for previous solutions.

PhD student Minsu Kim and Dr Seongmin Hong from HyperAccel Inc., serving as co-first authors, presented their findings at the 2025 International Symposium on Computer Architecture (ISCA 2025) in Tokyo. The research paper, titled “Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization,” details their comprehensive approach to the problem.

The technology centres on KV cache quantisation, which the researchers identify as accounting for most memory usage in generative AI systems. By optimising this component, the team enables the same level of AI infrastructure performance using fewer NPU devices compared to traditional GPU-based systems.

Technical innovation and architecture

The KAIST team’s energy-efficient NPU technology employs a three-pronged quantisation algorithm: threshold-based online-offline hybrid quantisation, group-shift quantisation, and fused dense-and-sparse encoding. This approach allows the system to integrate with existing memory interfaces without requiring changes to operational logic in current NPU architectures.

The hardware architecture incorporates page-level memory management techniques for efficient utilisation of limited memory bandwidth and capacity. Additionally, the team introduced new encoding techniques specifically optimised for quantised KV cache, addressing the unique requirements of their approach.

“This research, through joint work with HyperAccel Inc., found a solution in generative AI inference light-weighting algorithms and succeeded in developing a core NPU technology that can solve the memory problem,” Professor Park explained.

“Through this technology, we implemented an NPU with over 60% improved performance compared to the latest GPUs by combining quantisation techniques that reduce memory requirements while maintaining inference accuracy.”

Sustainability implications

The environmental impact of AI infrastructure has become a growing concern as generative AI adoption accelerates. The energy-efficient NPU technology developed by KAIST offers a potential path toward more sustainable AI operations.

With 44% lower power consumption compared to current GPU solutions, widespread adoption could significantly reduce the carbon footprint of AI cloud services. However, the technology’s real-world impact will depend on several factors, including manufacturing scalability, cost-effectiveness, and industry adoption rates.

The researchers acknowledge that their solution represents a significant step forward, but widespread implementation will require continued development and industry collaboration.

Industry context and future outlook

The timing of this energy-efficient NPU technology breakthrough is particularly relevant as AI companies face increasing pressure to balance performance with sustainability. The current GPU-dominated market has created supply chain constraints and elevated costs, making alternative solutions increasingly attractive.

Professor Park noted that the technology “has demonstrated the possibility of implementing high-performance, low-power infrastructure specialised for generative AI, and is expected to play a key role not only in AI cloud data centres but also in the AI transformation (AX) environment represented by dynamic, executable AI such as agentic AI.”

The research represents a significant step toward more sustainable AI infrastructure, but its ultimate impact will be determined by how effectively it can be scaled and deployed in commercial environments. As the AI industry continues to grapple with energy consumption concerns, innovations like KAIST’s energy-efficient NPU technology offer hope for a more sustainable future in artificial intelligence computing.

(Photo by Korea Advanced Institute of Science and Technology)

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Source link

#EnergyEfficient #NPU #Technology #Cuts #Power

Tags: AI