Google Launches Gemini 3.1 Flash-Lite for High-Volume Developer Workloads

Google on Monday released Gemini 3.1 Flash-Lite, the fastest and most cost-efficient model in its Gemini 3 series, in preview for developers via the Gemini API in Google AI Studio and for enterprise customers via Vertex AI. The model is designed for high-volume, latency-sensitive applications where response speed and cost efficiency matter more than peak reasoning capability.

Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens — significantly below competing models of similar capability. Google said the model delivers 2.5 times faster time-to-first-token than its predecessor, Gemini 2.5 Flash, and a 45% increase in output speed according to Artificial Analysis benchmarks.

On standardized evaluations, Gemini 3.1 Flash-Lite scored 86.9% on GPQA Diamond and 76.8% on MMMU Pro, and earned an Elo score of 1432 on the Arena.ai leaderboard. Google said it surpasses larger models from prior Gemini generations and outperforms other models at its price tier across reasoning and multimodal tasks.

Target use cases include high-volume translation, content moderation, dashboard generation, simulation, and multi-step agentic business workflows. Google said the model comes standard with thinking levels in AI Studio and is designed for real-time responsiveness at scale — filling the role of a high-throughput workhorse for developers who need to process large amounts of data cheaply and quickly.

Google also released Gemini 3.1 Flash Live alongside it, described as its best audio model to date, now live in more than 200 countries via Search Live and Gemini Live.

The dual launch represented Google's continued push to differentiate its model lineup across deployment profiles — from Flash-Lite's price-per-token efficiency to Gemini 3.1 Pro's frontier reasoning capability. The Gemini 3.1 Flash-Lite was made available in preview immediately, with full general availability to follow.

Developer access is through the Gemini API and Google AI Studio.

Read the original reporting at Google DeepMind Blog.

Filed underGoogle & DeepMind AI Funding & IPOs

Maya Chen, AI Editor — Maya covers frontier AI labs, foundation models, and the agentic computing wave. Previously at The Information.

More in AI

Anthropic tightens limits on Claude subscriptions

Anthropic product exec Cat Wu says the next leap for AI is proactivity

U.S. bank discloses security lapse after sharing customer data with AI app

Google DeepMind demos an AI-enabled pointer meant to bring Gemini into everyday UI

Get the BloomTech Daily Newsletter