AI-generated code contains more bugs and errors than human output

Sahwa@reddthat.com · 2 days ago

AI-generated code contains more bugs and errors than human output

antihumanitarian@lemmy.world · 1 day ago

So this article is basically a puff piece for Code Rabbit, a company that sells AI code review tooling/services. They studied 470 merge/pull requests, 320 AI and 150 human control. They don’t specify what projects, which model, or when, at least without signing up to get their full “white paper”. For all that’s said this could be GPT 4 from 2024.

I’m a professional developer, and currently by volume I’m confident latest models, Claude 4.5 Opus, GPT 5.2, Gemini 3 Pro, are able to write better, cleaner code than me. They still need high level and architectural guidance, and sometimes overt intervention, but on average they can do it better, faster, and cheaper than me.

A lot of articles and forums posts like this feel like cope. I’m not happy about it, but pretending it’s not happening isn’t gonna keep me employed.

Source of the article: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report

hark@lemmy.world · 6 hours ago

I’m a professional developer, and currently by volume I’m confident latest models, Claude 4.5 Opus, GPT 5.2, Gemini 3 Pro, are able to write better, cleaner code than me.

I have also used the latest models and found that I’ve had to make extensive changes to clean up the mess it produces, even when it functions correctly it’s often inefficient, poorly laid out, and is inconsistent and sloppy in style. Am I just bad at prompting it or is your code just that terrible?

antihumanitarian@lemmy.world · 5 hours ago

The vast majority of my experience was Claude Code with Sonnet 4.5 now Opus 4.5. I usually have detailed design documents going in, have it follow TDD, and use very brownfield designs and/or off the shelf components. Some of em I call glue apps since they mostly connect very well covered patterns. Giving them access to search engines, webpage to markdown, in general the ability to do everything within their docker sandbox is also critical, especially with newer libraries.

So on further reflection, I’ve tuned the process to avoid what they’re bad at and lean into what they’re good at.

iglou@programming.dev · 23 hours ago

I am a professional software engineer, and my experience is the complete opposite. It does it faster and cheaper, yes, but also noticeably worse, and having to proofread the output, fix and refactor ends up taking more time than I would have taken writing it myself.

antihumanitarian@lemmy.world · 5 hours ago

A later commenter mentioned an AI version of TDD, and I lean heavy into that. I structure the process so it’s explicit what observable outcomes need to work before it returns, and it needs to actually test to validate they work. Cause otherwise yeah I’ve had them fail so hard they report total success when the program can’t even compile.

The setup I use that’s helped a lot of shortcomings is thorough design, development, and technical docs, Claude Code with Claude 4.5 Sonnet them Opus, with search and other web tools. Brownfield designs and off the shelf components help a lot, keeping in mind quality is dependent on tasks being in distribution.

GenosseFlosse@feddit.org · edit-2 22 hours ago

In web development it’s impossible to remember all functions, parameters, syntax and quirks for PHP, HTML, JavaScript, jQuery, vue.js, CSS and whatever else code exists in this legacy project. AI really helps when you can divide your tasks into smaller steps and functions and describe exactly what you need, and have a rough idea how the resulting code should work. If something looks funky I can ask to explain or use some other way to do the same thing.

iglou@programming.dev · 5 hours ago

And now instead of understanding the functions, parameters, syntax and quirks yourself, to be able to produce quality code, which is the job of a software engineer, you ask an LLM to spit out code that seem to be working, do that again, and again, and again, and call it a day.

And then I’ll be hired to fix it.

lapping6596@lemmy.world · 21 hours ago

That sounds almost like an AI version of TDD.