Plutonic Rainbows

Agentic Context Engineering

After reading a paper on Agentic Context Engineering, I realized my Claude Prompt Builder had been collecting valuable feedback data without actually learning from it. The paper explored how AI systems can refine themselves by analyzing their own context — and that struck a chord. My system already tracked performance across dozens of tasks, but it lacked a feedback loop. I decided to bridge that gap by introducing a new layer of self-awareness: the Context Evolution Engine — a module designed to analyze historical results and guide smarter prompt decisions.

The engine works quietly and safely. It’s feature-flagged, read-only, and non-disruptive, meaning it observes rather than alters live behavior. By grouping similar tasks through keyword and complexity analysis, it identifies which strategies have historically worked best. When a new task appears, it checks for pattern matches and offers transparent recommendations only if confidence is high. Early analysis of 41 feedback records revealed healthy consistency — no over-engineering and clear success clusters across styling, review, and debugging tasks. Everything remains stable and fully backward compatible, supported by 24 automated tests.

This project reminded me that meaningful improvement doesn’t require sweeping change — it comes from structured evolution. By adding a safe analytical layer, the Prompt Builder now has the foundation to grow intelligently, phase by phase. It’s a cautious but powerful step toward an AI that learns from real-world experience rather than static rules — the essence of agentic context engineering.

Guardrail

I built Guardrail Gateway as an AI safety platform to make interactions with Large Language Models more secure and transparent. It adds a layer of content filtering, policy enforcement, and audit logging between applications and providers such as OpenAI. The system runs on a FastAPI backend with a React frontend, acting as an intelligent proxy that checks every request and response against a set of customizable safety policies before it reaches the model.

The core of the platform is a policy engine that uses regex-based rules with adjustable severity levels and actions like blocking, warning, or redacting content. Right now, I’ve implemented two main policy sets: one for detecting and redacting personally identifiable information, and another for identifying prompt injections or attempts to extract system prompts. Every event is logged for traceability and compliance.

Developers (including myself) can test and tune policies through a web interface, which includes tools for validating configurations, managing policies, and reviewing audit logs. The system uses SQLite for development and PostgreSQL for production, with JWT authentication for secure access and UUID support across databases. Typical requests — from scanning to response logging — complete in about two seconds, with most scans finishing in under 50 ms.

I designed Guardrail Gateway to run quietly in the background, using Python’s asyncio loop on a high port (58001) to minimize interference with other services. It’s written for Python 3.13 and built to scale horizontally thanks to its stateless API design. The frontend, built in React with TypeScript and Vite, includes full documentation for both developers and AI agents.

Search is here

I've finally added a search functionality to my blog, after many months of deliberating over styling and performance impact. After considering various options, I implemented a lightweight client-side search that lets readers quickly find posts by typing keywords into the search box now positioned in the header. The search looks through post titles and content excerpts, highlighting matching terms and displaying up to 10 results in a dropdown. It's nothing revolutionary, but it works well — searches execute in under 10 milliseconds once the index is loaded, and the whole implementation adds just 5KB of JavaScript and CSS to the initial page load.

Search uses a lazy-loading mechanism. Rather than forcing every visitor to download the 143KB search index (containing data for all 629 posts), the index only loads when someone actually clicks or tabs into the search box. This means most visitors who come to read a specific post aren't penalized with extra download time they'll never use. When someone does focus the search input, the index loads in the background while they're typing their query — if they've already entered text by the time it loads, the search runs automatically. It's a simple optimization, but it keeps the blog fast for everyone while still providing instant search for those who need it. The entire search feature added less than half a second to my build time, which felt like a reasonable trade-off for the functionality gained.

Sonnet 4.5

Anthropic release the new model. Pricing remains the same as Claude Sonnet 4, at $3/$15 per million tokens.

Google Gemini

Apologies for the oversight in my previous post — I should have included Google Gemini in the discussion. Gemini is a key player in the current AI landscape, offering a versatile suite of models that combine strong reasoning, coding, and multimodal capabilities. Leaving it out may have given the impression that it isn’t relevant alongside GPT, Claude, Grok, and Qwen, but in reality, it deserves recognition as one of the most significant entrants shaping the competitive field.