Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed
Paper review by Michael A. Alcorn
2. BigBird Architecture
3. Theoretical Results about Sparse Attention Mechanism
4. Experiments: Natural Language Processing
4. Experiments: Natural Language Processing
4. Experiments: Natural Language Processing
5. Experiments Genomics
5. Experiments Genomics
D. Implementation Details
D. Implementation Details
D. Implementation Details