SSM

Xiaol.x | 7 videos | Updated 2 weeks ago

View playlist on YouTube | Switch Invidious Instance

22:56

Deep sequence models tend to memorize geometrically; it is unclear why

Xiaol.x

25:58

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Xiaol.x

23:10

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brai

Xiaol.x

17:22

Understanding and Improving Length Generalization in Recurrent Models

Xiaol.x

20:29

Differential Mamba

Xiaol.x

18:59

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Xiaol.x

15:25

Understanding the Skill Gap in Recurrent Language Models: Role of the Gather-and-Aggregate Mechanism

Xiaol.x