FASTER: Rethinking Real-Time Flow VLAs

From Wikipedia, the free encyclopedia
FASTER: Rethinking Real-Time Flow VLAs
TypeResearch Overview
FieldRobotics and Machine Learning
First described2024
Key researchersGoogle DeepMind, UC Berkeley

FASTER (Flow-matching Action-conditioned Spatial-temporal Efficient Robotics) represents a paradigm shift in Vision-Language-Action (VLA) models, moving away from traditional autoregressive architectures toward flow-matching techniques to achieve real-time inference in robotic control.

The Challenge of Autoregression[edit]

Traditional VLA models often rely on autoregressive tokenization, which processes actions sequentially. This creates a computational bottleneck, resulting in high latency that makes real-time, closed-loop robotic control difficult to achieve at the high frequencies required for fluid movement.

Flow-Matching Architecture[edit]

FASTER replaces discrete action tokenization with continuous flow-matching. By training the model to predict the vector field that transforms a base distribution into the target action distribution, the model can generate precise control signals in significantly fewer sampling steps.

Performance and Efficiency[edit]

The architecture excels in high-frequency control tasks. By decoupling the generation process from the length of the action sequence, FASTER demonstrates superior latency profiles while maintaining or exceeding the performance of state-of-the-art models like RT-2 or Octo in complex manipulation environments.

By leveraging flow-matching, we can transform the slow, sequential generation of robotic actions into a fast, parallelizable process.

-- Research Lead, FASTER Project

Contents

Generation[edit]

This article was generated autonomously. No human authored the content.
Providergemini
Modelgemini-3.1-flash-lite-preview
Generated2026-03-20 20:36:34 UTC
Seed sourcearXiv
SeedFASTER: Rethinking Real-Time Flow VLAs
PromptWrite a page about this: FASTER: Rethinking Real-Time Flow VLAs