On Codex CLI

18/05/2025

As part of my ongoing experimentation regarding AI assisted coding, I've been using some MCP servers with Claude Desktop, with much success - and more on this later.

However, OpenAI recently released their Codex application (now rebranded to Codex CLI with the release of a web version), their competitor to Claude Code. I've not tried Claude Code yet, but Codex caught my eye.

Initially Codex only supported Open AI models, but last week a release was cut that supported many third party models, including Azure OpenAI, so I gave it a spin.

Getting started

At the time I tried it, I had to install the beta version from NPM, but the version with Azure support is now in the mainline, so to try it you would install using:

npm install -g @openai/codex

I then configured my release to point to my Azure Open AI deployment by editing ~/.codex/config.json as follows:

{
    "model": "",
    "provider": "azure",
    "providers": {
        "azure": {
            "name": "AzureOpenAI",
            "baseURL": "https://.openai.azure.com/openai",
            "envKey": "AZURE_OPENAI_API_KEY"
        }
    }
}

Note that the model name needs to match your deployment name in Azure and you then need to export your key to your environment, e.g. export AZURE_OPENAI_API_KEY=<yourkey>. You can then run Codex via codex.

Codex In Use

The application itself is pretty easy to use - just type what you want to achieve in the directory you are in. It prompts you to set up git as you start and I found myself committing pretty regularly as I got going with the development of the system I was building.

In use the application is pleasingly Unixy - with lots of use of standard Unix tools to find and edit data (for example using find and sed).

I pretty soon set the application to auto-edit mode, which I found gave a good balance between the changes it was making and not being asked every time to intervene. I didn't set the full-auto mode which tries to go full agentic (installing tools, committing, pushing etc.)

I was using o4-mini as the main model for this experiement and I did find that the thinking process slowed it down compared to something like Claude (which swaps in and out of thinking mode itself). I'd like something that could switch between models automatically with a thinking model being used for planning, but a dense model being used for the code manipulation - partly for speed reasons, and partly for cost reasons.

It would also be great if the app could squash its context down by saving state to a local file which then gets read back in to save context length - hopefully this will come.

I spent maybe 5 hours on one particular session, and I achieved a lot in the time, so my experience was good. This came at a cost, however. During that 5 hours I used 24 million tokens and 3 million output tokens, for a cost of about £20. Using a dense model would have been significantly cheaper (probably about 1/10th of the price), but at the expense of quality. Being able to switch models mid session would have helped a lot. I generally got to 60% of the context used.

During the session I encountered a fair few crashes where the application would quit. Interrupting the flow often caused the whole session to stop and need restarting.

Conclusion

Would I use it again? Sadly, probably not immediately for personal use. While the quality was good, the cost was too high for me to pay on my own using o4-mini. I will see how much 4.1 costs when I next try this and check the difference in output. If it was included in ChatGPT Plus, then I might reconsider.

I suspect the same thing will apply to Claude Code - I have to consider whether the Max plan provides enough value to subscribe to get Claude Code, but it's still a stiff price for an individual. I'll stick to Claude Desktop and some MCP servers for the moment.

Further updates on LLM-assisted coding

30/04/2025

As part of the day job I've been doing a fair bit of LLM assisted coding. Some general thoughts on the current state of play (for posterity, including me in a month!)

In the wacky world of LLMs, old and stable is often good. Even for recently trained models (e.g. Claude Sonnet 3.7), they struggle with newer versions of libraries. In particular, where there's a lot of churn in a particular library, even when prompted heavily I often get a mixture of old and new code styles that errors out. Microsoft libraries are particularly prone to this. For example, the .NET Microsoft Graph SDK was significantly changed between v4 and v5 - and it's very hard to get v5 code out of the various LLMs. I also had a similar deal with the SharePoint PnP libraries.
The only model that got me out of that particular hole was o4-mini-high, which I've been impressed with. I still prefer Claude for most coding tasks but the combination of reasoning and being able to search the current documentation to work it out got me something working without major issues.
I've been playing with the latest OpenAI models a bit recently - and there's been a fair few released in a short space of time. It's not always obvious which model to pick for a particular job, so I often end up starting with the default 4o and then going up the scale as things get tricky. I do prefer the Claude approach of having one good model!
The implementation of search for ChatGPT seems a lot more integrated than that for Microsoft Copilot (Chat and for M365) - I'd like better citations, but the feel of it is better. Anthropic are lagging here.
I've been looking at the MCP ecosystem a bit with the release of the remote authentication RFC last week. I can see this unlocking a bunch of new interoperability for SaaS applications and LLMs. I'm hoping to spend some time getting a basic MCP server up and running with Claude Desktop using the Python SDK locally and then see if I can get the .NET SDK working with Azure Functions and EasyAuth, now the remote auth uses OAuth 2.1 and therefore will be interoperable with Entra ID.
I also want to see if I can get OpenAI Codex working with Azure OpenAI and see how that compares with Claude Code (good but expensive!). It's nice to see OpenAI do more with open source software, but I'm waiting for a pull request to be merged that will add Azure OpenAI to the supported providers.

I'm keen to get back to doing some more proofs of concept including looking more at MCP, Pydantic AI, and some of the other agent frameworks. It's an interesting time - and I wish I had a bit more of that myself!