22:56
Deep sequence models tend to memorize geometrically; it is unclear why
Xiaol.x
25:58
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
23:10
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brai
17:22
Understanding and Improving Length Generalization in Recurrent Models
20:29
Differential Mamba
18:59
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
15:25
Understanding the Skill Gap in Recurrent Language Models: Role of the Gather-and-Aggregate Mechanism