Annotated S4

Huge thanks to Albert Gu and Karan Goel, who were super helpful in putting this together. Their paper and codebase.

Ankit Gupta for helping with his DSS model

Thanks to Conner Vercellino, Laurel Orr, Ankit Gupta, Ekin Akyürek, Saurav Maheshkar

Generating Extremely Long Sequences in JAX

Intro

Intro

Talk Goals

JAX: Pros and Cons

Cons

Pros

Problem Context

Sequence Modeling

The Transformer

Transformer Dominance

The Transformer Weakness

Recurrent Neural Networks (RNN)

Long Range Arena

Linearized Images

Path-X

Method

Efficiently Modeling Long Sequences with Structured State Spaces

Punchline

Challenges

Goal

The Annotated S4

Image Generation

Speech Generation

Part 1: SSM

State Space Models (SSM)

Discretization

Discretized SSM as RNN

Tangent: A Mechanics Example

Tangent: A Mechanics Example [Matrix form]

Tangent: A Mechanics Example (with force)

Training SSMs

Key Properties

SSMs as wide CNNs

SSMs as wide CNNs

SSMs as wide CNNs

Initialization with HiPPO

HiPPO Intuition Sketch

Tangent: Neat JAX things.

SSM Network Layer

Lifting SSM Layer

SSM RNN Layer

Part 2: S4

Issue: Calculating KKK

Two S4 Tricks

Trick 1. SSM Generating Functions

Trick 1. SSM Generating Functions

Trick 1. SSM Generating Functions

Trick 2. Exploiting Structure

Trick 2. Exploiting Structure

Part 3: S4 in Practice

Training S4

Goal

S4 Model

Training to Generate by Pixeal

Generating by Pixel

Prefix Generation

Experiments: QuickDraw

Experiments: Sound

Conclusion & Future Work

Conclusion (on JAX)

New Paper - Diagonal State Spaces.

Thank You

Issue: Calculating $K$