...

Learning Triton One Kernel at a Time: Matrix Multiplication

Learning Triton One Kernel at a Time: Matrix Multiplication
[ad_1] multiplication is undoubtedly the most common operation performed by GPUs. It is the fundamental building block of linear algebra ...
Read more

Learning Triton One Kernel At a Time: Vector Addition

Learning Triton One Kernel At a Time: Vector Addition
[ad_1] , a little optimisation goes a long way. Models like GPT4 cost more than $100 millions to train, which ...
Read more