Skip to content

Plutonic Rainbows

Four AI Heavyweights Shaping the Future

Artificial intelligence has become more than just a buzzword — it’s becoming a daily partner in how we work, create, and even play. From writing code to generating ideas and powering conversations, today’s leading models each bring their own personality and strengths. Let’s take a quick look at four of the most exciting names in the space right now: GPT-Codex, Claude, Grok, and Qwen.

GPT-Codex is the coder’s dream assistant. Developed by OpenAI, it bridges natural language and programming, making it possible to describe your goals in plain English and have them turned into functional code. Whether you’re debugging, migrating projects, or building prototypes, Codex feels like an extra teammate who never gets tired of problem-solving.

Claude, from Anthropic, stands out for its thoughtful and safe design. Instead of just pushing raw power, it focuses on clarity, alignment, and long-form reasoning. This makes it an excellent choice for complex projects, structured workflows, and conversations where nuance matters. With Claude Code, developers in particular are finding new ways to work faster while staying organized.

Grok and Qwen represent the new wave of AI challengers. Grok, from xAI, has built its identity around speed, wit, and humor, making interactions more engaging without sacrificing intelligence. Qwen, from Alibaba Cloud, is all about versatility, offering a wide range of model sizes that excel at multilingual tasks, coding, and even image editing. Both are proof that the AI landscape is getting broader and more dynamic every day.

As these models continue to evolve, the takeaway is clear: there’s no single best AI, only the best fit for your goals. Codex shines in coding, Claude thrives in thoughtful reasoning, Grok brings personality to problem-solving, and Qwen pushes the boundaries of scale and adaptability. Together, they highlight an exciting future where we can choose from a diverse toolkit of digital partners — each designed to help us think, create, and build in new ways.

Qwen3-Max

I had a proper look at this today, and I came away really impressed. The whole suite feels fast and intuitive — snappy edits, clean results, and a layout that doesn’t slow you down. What really grabbed my attention, though, was the colorisation ability. It’s not just a gimmick — it handles subtle tones with surprising accuracy, breathing life into black-and-white images without that washed-out, artificial look you sometimes get elsewhere.

Put side by side with Gemini 2.5 Flash Image (Nano Banana), it’s easily on the same level, and in some respects — especially speed, ease of use, and the natural quality of its colorisation — it might even be ahead. It feels less like an alternative and more like a genuine leap forward in what image editing can offer.

GPT-5-Codex

Was released exclusively for a few days on OpenAI Plus and Pro accounts. It is now also available through the API.

A Flux.1 [Dev] image of Raquel Gibson, 2005.

OpenAI Codex

I have switched over to Codex — it’s much cheaper, and for now it seems far more reliable. I’m not running into the problems that have plagued Claude Code over the past month.

I have managed to get Github integration, with Codex loading the appropriate model and permissions. I will probably use Gemini CLI for planning and stick with Codex for a few weeks.

Claude Code Fixed

This is what developers are essentially being told right now. After nearly a month of frankly appalling performance, Anthropic claims to have identified and resolved the issues. Yet the wording of their statement is so vague and non-specific that it offers little reassurance. It doesn’t explain what went wrong, what was actually fixed, or how developers can expect things to improve going forward. Instead, it leaves us with a cloud of ambiguity — an opaque message that feels more like damage control than genuine clarity.

Gail Elliott

Flux.1 [Dev]

Guardrails

I implemented a balanced guardrail system for the Claude Prompt Builder's adaptive complexity engine to address the verbosity concerns while maintaining essential safety checks. The changes have included some important modifying of the adaptive_prompt_builder.py to scale guardrails appropriately: simple tasks (≤800 chars) now receive concise core integrity principles and minimal quality assurance focused on testing, medium tasks (800-2000 chars) get balanced guidance with a condensed runtime priority wrapper and standard QA including lint/typecheck requirements, and complex tasks (2000+ chars) retain comprehensive orchestration with full guardrails. Key improvements involved creating three tiers of core integrity (minimal/balanced/full), implementing scaled quality assurance sections (minimal/standard/comprehensive), adding a concise runtime wrapper for medium complexity, and adjusting verbosity targets to realistic levels.

The system now ensures that even simple fix typo requests include essential testing reminders without overwhelming users with unnecessary orchestration details, while complex multi-domain tasks still receive the comprehensive guidance they require. Testing confirmed that simple tasks reduced from 2000 to 700 characters while preserving critical safety checks, achieving the goal of appropriate scaling without compromising quality control standards.

Adaptive System

I implemented an adaptive complexity system for the Claude Prompt Builder that addresses a critical issue where specialist agents weren't being effectively called for appropriate tasks. The system automatically analyzes user input to classify tasks as simple, medium, or complex, then generates appropriately scaled prompts — from concise 400-character responses for basic requests to comprehensive 2,500+ character structures for complex system design tasks. The core innovation was fixing the restrictive agent delegation logic that was preventing domain experts like security-engineer, python-engineer, and qa-engineer from being recommended when needed.

The implementation required building several new components including the file adaptive_prompt_builder.py (700+ lines), comprehensive configuration management, new API endpoints, and extensive testing frameworks. I maintained full backward compatibility while adding intelligent features like contextual agent triggers, fallback mechanisms, and configurable complexity thresholds. The system now successfully recommends 2+ relevant agents for medium complexity tasks and 5+ specialists with full orchestration for complex projects. Testing showed 100% accuracy in complexity detection and proper agent coordination across all scenarios, restoring the application's effectiveness in guiding users toward appropriate specialist assistance.

Prompt Builder

I'll be honest — I didn't set out to build a prompt engineering tool. Like many developers, I was spending way too much time crafting the perfect prompt for Claude, only to get responses that missed the mark. I'd write something vague like fix this bug and wonder why the AI couldn't read my mind. After watching myself and countless other developers struggle with this same frustration, I realized we needed a bridge between human intent and AI understanding. That's how the Prompt Builder was born — not from grand ambition, but from a simple desire to stop wasting time on prompt trial-and-error. I wanted to transform casual requests into structured, effective prompts that actually got the results we needed.

The architecture I settled on feels almost embarrassingly simple now, but it took several iterations to get right. At its core, the system implements Anthropic's six official prompt engineering techniques, wrapped in a Flask application that processes natural language through multiple enhancement layers. I built an Enhancement Intelligence system that prevents over-engineering simple requests — because nobody needs a 500-word prompt to change a font size. The breakthrough came when I introduced XML-style tag structure in v3.8.0, which creates clear instruction boundaries that dramatically improve how Claude parses complex prompts. I also integrated optional GPT-4o-mini enhancement as a pre-processing layer, essentially using one AI to help communicate better with another AI. The whole thing is held together with dependency injection, regex caching for performance, and a subagent orchestration system that automatically delegates specialized tasks to appropriate AI agents.

Building this tool taught me something unexpected about human-AI interaction: the gap isn't technical, it's communicative. I discovered that most bad AI responses aren't failures of the model, but failures in how we frame our requests. The biggest revelation was realizing that prompt engineering isn't just about getting better outputs — it's about forcing ourselves to think more clearly about what we actually want. When I watch developers use the prompt builder now, they often say the transformed prompt helped them understand their own requirements better. I'm particularly proud that the system has evolved from a simple text transformer into something that embeds Core Integrity Principles — accuracy, professional honesty, and thorough testing — into every generated prompt. It's a small way to make AI interactions more reliable and trustworthy. Honestly, I never expected a side project about prompts to teach me so much about clear communication and systematic thinking.

Reading

This week I am reading Superintelligence: Paths, Dangers, Strategies by Nick Bostrom.