Updates on LLM-assisted coding
31/03/2025
I thought I'd give a quick update on my LLM coding journey - I've spent a lot more time on doing this over the last few weeks and also spent some time integrating LLM generated code with low-code tools.
Claude Sonnet 3.7
Most of my development in the last month has been with Claude assistance and the majority has been using the 3.7 model and getting to grips how it differs with the previous version.
In short, it's been much more verbose and tends to do more than I actually want! I'm not sure what tuning they did between the two versions but in my opinion it's actually less usable than the previous version. The issue I've noticed is that it tends to carry on generating things and extrapolating what I might want to do next, which isn't necessarily what I'd prefer to do. I've sometimes even found myself reverting back to 3.5 (October).
To get around this issue I've been varying my project instructions to ask it to:
- Come up with a plan and run it past me before starting to implement anything
- Implement solutions in small independent steps
- Ask for feedback after completing each meaningful compoent to check user requirements
- Implement only what is explicitly requested.
This prompting does seem to help, but it still has a tendency to run away with itself. I hope they can pull this back. The thinking mode does seem to be helpful and it's good that it doesn't require you to change to a specific reasoning model. That said, this feature doesn't seem to produce a major improvement in quality when it's invoked (possibly because 3.5 is already good on its own).
I haven't tried Claude Code because I'm subscribing to Pro and I don't want a separate charge at the moment.
Reasoning models
I've been spending a bit of time recently with the O1 and O3 models in ChatGPT too. My feedback is similar on these to the Claude thinking approach above too - I'm not sure the results are sufficiently good to need the extra time. I suspect this is something that may need more usage first!
Github Copilot
I've been playing around with Github Copilot since it was made free for light users in VS Code before Christmas. I do find it useful as better autocomplete at times, but I've definitely not found the chat functionality as useful as something like Claude. At some point in the near future I'll try the competitors like Cursor and Cline.
On 'Vibe Coding'
The phrase 'Vibe Coding' has been all over my Bluesky and LinkedIn feeds and I'm not sure it's been a great coinage to describe the AI development phenomenon. It seems to being used for things that don't fall under Andrej Karpathy's original definition. I'm happy with the thought of letting the LLM take over for things that are basically development spikes or proofs of concept, but it seems to avoid the hard parts of creating well factored, modular, tested code that can go seamlessly into production. "Vibe coding" seems to miss out discussing this aspect - but, to me, the AI makes this part easier to do as well. If planning, specifying and testing are easier to do, it's possible to do more of these activities, but still keep up with getting fast feedback. Perhaps someone with a better marketing brain than me needs to come up with a snappy alternative to "AI Assisted Software Engineering" or "AI assisted architecture".
Simon Willison's article Not all AI-assisted programming is vibe coding (but vibe coding rocks) covers similar ground but from a slightly different perspective. He recommends vibing away when there's low stakes involved (experiments, getting the feel of AI coding or for personal tools), but considering different modes for "serious programming". I agree with this, and I'd add architectural discussion to the things you should do first. There's some interesting posts on Reddit about using the AI to do the README first before letting it start programming.
I'm looking forward to carrying this experiment on over the next few weeks and delving more into the tools and also the wild world of MCP servers and others. It's an exciting time to be a nerd!