Neural SDEs
as a Unified Approach to Continuous-Domain Sequence Modeling
Under Double-blind Review
Overview figure Our approach introduces a new paradigm for continuous-domain sequence modeling by representing dynamics with SDEs, instead of directly modeling conditional densities. The Fokker-Planck equation provides the theoretical link between these two paradigms, describing the time evolution of the probability density. This framework unifies embodied and generative AI under the same continuous sequence modeling paradigm.

Abstract

Inspired by the ubiquitous use of differential equations to model continuous dynamics across diverse scientific and engineering domains, we propose a novel and intuitive approach to continuous sequence modeling. Our method interprets time-series data as discrete samples from an underlying continuous dynamical system, and models its time evolution using Neural Stochastic Differential Equation (Neural SDE), where both the flow (drift) and diffusion terms are parameterized by neural networks. We derive a principled maximum likelihood objective and a simulation-free scheme for efficient training of our Neural SDE model. We demonstrate the versatility of our approach through experiments on sequence modeling tasks across both embodied and generative AI. Notably, to the best of our knowledge, this is the first work to show that SDE-based continuous-time modeling also excels in such complex scenarios, and we hope that our work opens up new avenues for research of SDE models in high-dimensional and temporally intricate domains.

High Temporal Resolution Video
Neural SDE is able to generate intermediate frames, demonstrating its capacity to generate videos at high temporal resolution even with relatively sparse training data. Green borders indicate ground truth context frames, while Red border indicates predicted frames. All videos are sampled with 10 fixed steps per frame.

Ground Truth (3 FPS):

Ground truth

Neural SDE (3 FPS):

SDE 3FPS

Neural SDE (6 FPS):

SDE 6FPS

Neural SDE (12 FPS):

SDE 12FPS

Neural SDE (24 FPS):

SDE 24FPS

Stochastic Interpolant (3 FPS):

PFI

Flow Matching (3 FPS):

FM

BibTeX

BibTex Code Here