Learning Triton One Kernel at a Time: Softmax

In the previous article of this series, operation in all fields of computer science: matrix multiplication. It is heavily used ...
Read more Learning Triton One Kernel at a Time: Matrix Multiplication

multiplication is undoubtedly the most common operation performed by GPUs. It is the fundamental building block of linear algebra and ...
Read more Learning Triton One Kernel At a Time: Vector Addition

, a little optimisation goes a long way. Models like GPT4 cost more than $100 millions to train, which makes ...
Read more 








