Author: Shitanshu Bhushan
-

Speeding Up Llama: A Hybrid Approach to Attention Mechanisms
12 min read -

From attention to gradient descent: unraveling how transformers learn from examples
6 min read -

Breaking the Quadratic Barrier: Modern Alternatives to Softmax Attention
8 min read