On Codex CLI

18/05/2025

As part of my ongoing experimentation regarding AI assisted coding, I've been using some MCP servers with Claude Desktop, with much success - and more on this later.

However, OpenAI recently released their Codex application (now rebranded to Codex CLI with the release of a web version), their competitor to Claude Code. I've not tried Claude Code yet, but Codex caught my eye.

Initially Codex only supported Open AI models, but last week a release was cut that supported many third party models, including Azure OpenAI, so I gave it a spin.

Getting started

At the time I tried it, I had to install the beta version from NPM, but the version with Azure support is now in the mainline, so to try it you would install using:

npm install -g @openai/codex

I then configured my release to point to my Azure Open AI deployment by editing ~/.codex/config.json as follows:

{
    "model": "",
    "provider": "azure",
    "providers": {
        "azure": {
            "name": "AzureOpenAI",
            "baseURL": "https://.openai.azure.com/openai",
            "envKey": "AZURE_OPENAI_API_KEY"
        }
    }
}

Note that the model name needs to match your deployment name in Azure and you then need to export your key to your environment, e.g. export AZURE_OPENAI_API_KEY=<yourkey>. You can then run Codex via codex.

Codex In Use

The application itself is pretty easy to use - just type what you want to achieve in the directory you are in. It prompts you to set up git as you start and I found myself committing pretty regularly as I got going with the development of the system I was building.

In use the application is pleasingly Unixy - with lots of use of standard Unix tools to find and edit data (for example using find and sed).

I pretty soon set the application to auto-edit mode, which I found gave a good balance between the changes it was making and not being asked every time to intervene. I didn't set the full-auto mode which tries to go full agentic (installing tools, committing, pushing etc.)

I was using o4-mini as the main model for this experiement and I did find that the thinking process slowed it down compared to something like Claude (which swaps in and out of thinking mode itself). I'd like something that could switch between models automatically with a thinking model being used for planning, but a dense model being used for the code manipulation - partly for speed reasons, and partly for cost reasons.

It would also be great if the app could squash its context down by saving state to a local file which then gets read back in to save context length - hopefully this will come.

I spent maybe 5 hours on one particular session, and I achieved a lot in the time, so my experience was good. This came at a cost, however. During that 5 hours I used 24 million tokens and 3 million output tokens, for a cost of about £20. Using a dense model would have been significantly cheaper (probably about 1/10th of the price), but at the expense of quality. Being able to switch models mid session would have helped a lot. I generally got to 60% of the context used.

During the session I encountered a fair few crashes where the application would quit. Interrupting the flow often caused the whole session to stop and need restarting.

Conclusion

Would I use it again? Sadly, probably not immediately for personal use. While the quality was good, the cost was too high for me to pay on my own using o4-mini. I will see how much 4.1 costs when I next try this and check the difference in output. If it was included in ChatGPT Plus, then I might reconsider.

I suspect the same thing will apply to Claude Code - I have to consider whether the Max plan provides enough value to subscribe to get Claude Code, but it's still a stiff price for an individual. I'll stick to Claude Desktop and some MCP servers for the moment.