WatersWorks

By John K. Waters

Blog archive

Google's Gemini 3 Brings Agentic Coding to the Developer Stack

Google this week rolled out Gemini 3, the latest version of its AI model family, with features aimed squarely at developers. The update focuses on more accurate reasoning, deeper tool use, and a new framework for building software with agentic AI.

"It's the best model in the world for multimodal understanding and our most powerful agentic and vibe coding model yet," wrote DeepMind CEO Demis Hassabis and CTO Koray Kavukcuoglu in a blog post.

The standout feature of this release is Google Antigravity, a new development environment where AI agents can write, test, and manage code across a full stack—editor, terminal, and browser included. Gemini 3 operates here not as a suggestion engine, but as a task executor.

Antigravity joins Google's existing developer tools, such as AI Studio and Vertex AI, and extends to third-party IDEs like GitHub, JetBrains, and Replit.

In a demo, Google showed an AI agent generating an entire flight-tracking application: planning the structure, writing the code, validating the output, and generating documentation.

Gemini 3 improves on its predecessor, Gemini 2.5 Pro, in several standard developer benchmarks:

  • SWE-bench Verified: 76.2%
  • Terminal-Bench 2.0: 54.2%
  • WebDev Arena: 1487 Elo

These tests assess how well the model can resolve real GitHub issues, execute tasks via command-line interfaces, and build front-end components, respectively.

In a prelaunch briefing, Kavukcuoglu said that "LLMs have fundamentally changed how we build software," noting Gemini's ability to generalize from sparse prompts and hold context across more complex tasks.

Beyond conventional code generation, Gemini 3 can produce UI components, 3D voxel art, and interactive visualizations from text or hybrid prompts. The model supports "dynamic views," allowing developers to generate data-driven interfaces without specifying layout code.

Josh Woodward, who leads the Gemini app, said the system can now generate entire generative interfaces from a single query—something that may interest front-end engineers working with low-code pipelines or internal tools.

Gemini 3 is available across several Google tools:

  • AI Studio for fast prototyping
  • Vertex AI for production use
  • Gemini CLI for local workflows
  • Google Antigravity for agent-managed tasks

The model is also integrated into tools like Manus, Cursor, and Replit, making it accessible to independent devs and teams outside Google's ecosystem.

Gemini 3 also expands Google's foray into agentic automation. On the Vending-Bench 2 benchmark—a long-horizon simulation—the model maintained consistent tool usage over a simulated year, optimizing business outcomes without falling off track.

In consumer-facing products, these capabilities appear in Gemini Agent, which can help manage Gmail, plan travel, and perform multi-step workflows across Google services. For now, access is limited to subscribers of the company's premium AI tiers.

Also, Gemini 3 has undergone what Google says is its most extensive safety testing to date, with improvements in resistance to prompt injection and better behavior when executing tool-based tasks. Third-party reviews were conducted by Vaultis, Dreadnode, and others. A public model card is available with details on usage boundaries and testing methodology.

Google plans to release additional versions of Gemini 3, including Gemini 3 Deep Think, which targets more complex reasoning benchmarks like ARC-AGI-2 and Humanity's Last Exam. That version is still in testing and not yet available to the public.

"We're introducing Gemini 3, our most intelligent model," said Google CEO Sundar Pichai in a launch statement, "that combines all of Gemini's capabilities together so you can bring any idea to life."

Posted by John K. Waters on November 18, 2025