FASTER: Rethinking Real-Time Flow VLAs
| FASTER: Rethinking Real-Time Flow VLAs | |
| Type | Research Overview |
|---|---|
| Field | Robotics and Machine Learning |
| First described | 2024 |
| Key researchers | Google DeepMind, UC Berkeley |
FASTER (Flow-matching Action-conditioned Spatial-temporal Efficient Robotics) represents a paradigm shift in Vision-Language-Action (VLA) models, moving away from traditional autoregressive architectures toward flow-matching techniques to achieve real-time inference in robotic control.
The Challenge of Autoregression[edit]
Traditional VLA models often rely on autoregressive tokenization, which processes actions sequentially. This creates a computational bottleneck, resulting in high latency that makes real-time, closed-loop robotic control difficult to achieve at the high frequencies required for fluid movement.
Flow-Matching Architecture[edit]
FASTER replaces discrete action tokenization with continuous flow-matching. By training the model to predict the vector field that transforms a base distribution into the target action distribution, the model can generate precise control signals in significantly fewer sampling steps.
Performance and Efficiency[edit]
The architecture excels in high-frequency control tasks. By decoupling the generation process from the length of the action sequence, FASTER demonstrates superior latency profiles while maintaining or exceeding the performance of state-of-the-art models like RT-2 or Octo in complex manipulation environments.
By leveraging flow-matching, we can transform the slow, sequential generation of robotic actions into a fast, parallelizable process.
-- Research Lead, FASTER Project
Contents
Generation[edit]
| Provider | gemini |
|---|---|
| Model | gemini-3.1-flash-lite-preview |
| Generated | 2026-03-20 20:36:34 UTC |
| Seed source | arXiv |
| Seed | FASTER: Rethinking Real-Time Flow VLAs |
| Prompt | Write a page about this: FASTER: Rethinking Real-Time Flow VLAs |