...

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs



arXiv:2410.05265v1 Announce Kind: cross
Summary: Quantization is important for deploying Massive Language Fashions (LLMs) by enhancing reminiscence effectivity and inference velocity. Current strategies for activation quantization primarily handle channel-wise outliers, usually neglecting token-wise outliers, resulting in reliance on expensive per-token dynamic quantization. To deal with this, we introduce PrefixQuant, a novel approach that isolates outlier tokens offline with out re-training. Particularly, PrefixQuant identifies high-frequency outlier tokens and prefixes them within the KV cache, stopping the era of outlier tokens throughout inference and simplifying quantization. To our data, PrefixQuant is the primary to allow environment friendly per-tensor static quantization to outperform costly per-token dynamic quantization. For example, in W4A4KV4 (4- bit weight, 4-bit activation, and 4-bit KV cache) Llama-3-8B, PrefixQuant with per-tensor static quantization achieves a 7.43 WikiText2 perplexity and 71.08% common accuracy on 5 common sense reasoning duties, outperforming earlier per-token dynamic quantization strategies like QuaRot with 0.98 perplexity enchancment and +5.98 factors accuracy. Moreover, the inference velocity of W4A4 quantized fashions utilizing PrefixQuant is 1.60x to 2.81x sooner than FP16 fashions and exceeds QuaRot fashions by 1.2x to 1.3x. Our code is accessible at url{https://github.com/ChenMnZ/PrefixQuant}.

Source link

#PrefixQuant #Static #Quantization #Beats #Dynamic #Prefixed #Outliers #LLMs


Unlock the potential of cutting-edge AI options with our complete choices. As a number one supplier within the AI panorama, we harness the facility of synthetic intelligence to revolutionize industries. From machine studying and knowledge analytics to pure language processing and pc imaginative and prescient, our AI options are designed to reinforce effectivity and drive innovation. Discover the limitless prospects of AI-driven insights and automation that propel what you are promoting ahead. With a dedication to staying on the forefront of the quickly evolving AI market, we ship tailor-made options that meet your particular wants. Be a part of us on the forefront of technological development, and let AI redefine the best way you use and achieve a aggressive panorama. Embrace the longer term with AI excellence, the place prospects are limitless, and competitors is surpassed.