Google Launches Gemini 3.1 Flash-Lite for High-Volume Developer Workloads
The fastest and cheapest model in Google's Gemini 3 series launches in preview at $0.25 per million input tokens, targeting cost-sensitive large-scale deployments.
Google on Monday released Gemini 3.1 Flash-Lite, the fastest and most cost-efficient model in its Gemini 3 series, in preview for developers via the Gemini API in Google AI Studio and for enterprise customers via Vertex AI. The model is designed for high-volume, latency-sensitive applications where response speed and cost efficiency matter more than peak reasoning capability.
Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens — significantly below competing models of similar capability. Google said the model delivers 2.5 times faster time-to-first-token than its predecessor, Gemini 2.5 Flash, and a 45% increase in output speed according to Artificial Analysis benchmarks.
On standardized evaluations, Gemini 3.1 Flash-Lite scored 86.9% on GPQA Diamond and 76.8% on MMMU Pro, and earned an Elo score of 1432 on the Arena.ai leaderboard. Google said it surpasses larger models from prior Gemini generations and outperforms other models at its price tier across reasoning and multimodal tasks.
Target use cases include high-volume translation, content moderation, dashboard generation, simulation, and multi-step agentic business workflows. Google said the model comes standard with thinking levels in AI Studio and is designed for real-time responsiveness at scale — filling the role of a high-throughput workhorse for developers who need to process large amounts of data cheaply and quickly.
Google also released Gemini 3.1 Flash Live alongside it, described as its best audio model to date, now live in more than 200 countries via Search Live and Gemini Live.
The dual launch represented Google's continued push to differentiate its model lineup across deployment profiles — from Flash-Lite's price-per-token efficiency to Gemini 3.1 Pro's frontier reasoning capability. The Gemini 3.1 Flash-Lite was made available in preview immediately, with full general availability to follow.
Developer access is through the Gemini API and Google AI Studio.
Read the original reporting at Google DeepMind Blog.