xiand.ai
Technology

OpenAI Launches GPT-5.4 Mini and Nano Models Optimized for Coding and Subagents

OpenAI has released GPT-5.4 mini and nano, new small models designed for efficiency and high-volume workloads. The update brings flagship capabilities to faster deployments, targeting coding assistants and subagents that require low latency. Industry analysts view this as a strategic shift toward compositional AI systems.

La Era

3 min read

OpenAI Launches GPT-5.4 Mini and Nano Models Optimized for Coding and Subagents
OpenAI Launches GPT-5.4 Mini and Nano Models Optimized for Coding and Subagents
Publicidad
Publicidad

OpenAI announced the release of GPT-5.4 mini and GPT-5.4 nano today. These models aim to bring the capabilities of the flagship GPT-5.4 architecture to faster, more efficient deployments. The company states the update targets high-volume workloads where speed and cost efficiency are critical factors. Industry observers note this move aligns with a broader push for optimized model sizes across the sector.

GPT-5.4 mini represents a significant upgrade over the previous GPT-5 mini version. It runs more than two times faster while improving capabilities in coding, reasoning, and tool use. The model approaches the performance of the larger GPT-5.4 model on specific evaluations like SWE-Bench Pro. This tradeoff offers developers a powerful tool for complex tasks without full model overhead.

GPT-5.4 nano serves as the smallest and most cost-effective option in the lineup. OpenAI recommends this version for tasks such as classification, data extraction, and ranking where latency is paramount. It is designed to handle simpler supporting tasks within broader workflows. Prices start at just $0.20 per one million input tokens for this specific tier.

These models address scenarios where latency directly shapes the product experience for end users. Coding assistants require responsiveness to feel natural during active development sessions. Subagents must complete supporting tasks quickly to maintain system flow without bottlenecks. Real-world latency varies based on tool call duration and input size.

Customer testing indicates strong performance in coding workflows that benefit from fast iteration. The models handle targeted edits, codebase navigation, and debugging loops with low latency. Benchmarks show GPT-5.4 mini consistently outperforms its predecessor at similar latency levels. Cost estimates are based on current API pricing at the time of writing.

The architecture supports systems that combine models of different sizes for optimal efficiency. In Codex, a larger model like GPT-5.4 can handle planning while delegating narrower subtasks to mini subagents. This pattern allows developers to compose systems where larger models decide what to do. Developers can compose systems where larger models decide what to do and smaller models execute.

GPT-5.4 mini also demonstrates strength on multimodal tasks related to computer use. It can quickly interpret screenshots of dense user interfaces to complete computer use tasks with speed. On OSWorld-Verified, the model substantially outperforms GPT-5 mini while approaching GPT-5.4 results. This capability enables real-time reasoning over images in production environments.

Availability spans the API, Codex, and ChatGPT platforms starting today. The API version supports text and image inputs with a four hundred thousand context window and specific pricing tiers. GPT-5.4 mini costs $0.75 per one million input tokens and $4.50 per one million output tokens. Free and Go ChatGPT users can access the model via the Thinking feature in the menu.

GPT-5.4 nano is only available in the API and costs significantly less for high volume. In Codex, the model uses only 30% of the GPT-5.4 quota for simpler coding tasks. For all other users, GPT-5.4 mini is available as a rate limit fallback for GPT-5.4 Thinking. The company also released a system card addendum regarding model safeguards for deployment.

Smaller models enable a shift toward compositional systems rather than relying on a single large model. Developers can scale execution across cheaper models for less reasoning-intensive work. The industry continues to prioritize efficiency as AI applications expand into real-time environments. Future updates may refine these latency and cost metrics further.

Publicidad
Publicidad

Comments

Comments are stored locally in your browser.

Publicidad
Publicidad