203 points by logicprog about 9 hours ago | 205 comments | View on ycombinator
dvt 1 minute ago |
aesthesia about 3 hours ago |
- The release with the highest number of attributed bugs is the release _right before_ the first release with Claude-coauthored commits, released in January; is there a chance that unattributed LLM-authored commits made it into this release?
- The release attribution methodology is not great, since it will tend to attribute bugs introduced in a minor version update to the longest-lived patch release of that minor version. I doubt that 3.4.1 actually introduced a lot of bugs, but since it was released a day after 3.4.0, bugs that were introduced in that release get attributed to 3.4.1.
- Relatedly, more recent releases have had less time to have bugs filed against them, so there may be a bit of a bias toward evaluating recent releases as less buggy.
thorum about 3 hours ago |
scsh about 9 hours ago |
If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer.
That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.
mikaeluman about 3 hours ago |
I think it will be up to some group in academia to make a real full blown study across several repositories.
There must be tons to learn on how LLMs have changed software development and perhaps the cleanest separation will simply be going by what repositories declare e.g. "No LLM involved" vs those that proudly do the opposite or are neutral.
Bugs is not the only variable of interest here. I am guessing someone is already doing this as we discuss it here...
lbrito about 1 hour ago |
AEVL 34 minutes ago |
tiahura 4 minutes ago |
faitswulff about 9 hours ago |
Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.
logicprog about 9 hours ago |
geraneum about 9 hours ago |
So the criticism was bad, and that somehow makes it ok to use a bad metric?
parliament32 30 minutes ago |
Your verbosity and sentence structure are not a problem. I hope that publishing this gives you a bit more confidence in your writing, because it's legitimately good.
tptacek about 2 hours ago |
Hey, 'logicprog, your writing is fine!
Use LLMs to critique your writing, check its structure, vet your choice of topic sentences, check flow from graf to graf and section to section, look for passive voice and overused words. LLMs are fantastic for that. But don't use a single word an LLM suggests in your actual writing. If it suggests something really fucking good, too bad, those words are disqualified. It's an easy red line to adhere to, easier than it sounds, and it'll keep your writing human.
(You ended up somewhere around here anyways, but that was after you posted something with LLM-written language because you weren't confident enough in your own writing. The things you do "worse" than an LLM are what make you you; be protective of them!)
mmonaghan about 1 hour ago |
rovr138 about 9 hours ago |
Is this a configuration that's not common and thus not tested?
If people think they can do better, I want to see their forks and them keeping up with it.
https://github.com/RsyncProject/rsync/graphs/contributors?fr...
PunchyHamster about 2 hours ago |
Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
logicprog about 3 hours ago |
Polarity about 9 hours ago |
WesolyKubeczek about 1 hour ago |
Instead we have a shitstorm over presumably legit issue, for which the only source is some mastodon post.
One command that used to work in 3.4.1 and stopped working in 3.4.3. Just one! We could have already bisected the living shit out of this and go home, but no.
steno132 about 1 hour ago |
So what? You've saved a significant amount of time for a decent number of humans, and if those humans are working on other projects, the overall net output for the world is net positive compared to without LLMs.
You have to broaden your perspective. It's not just about how rsync was affected.
undefined about 9 hours ago |
KronisLV about 2 hours ago |
> v3.4.3 has been out long enough that its rate (5.00) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break.
I mean, as someone who uses LLMs, it might be a good idea to consider how one might limit the amount of bugs that will appear in the future at least a little bit: parallel iterative code review loops would probably be the easiest and most applicable to LLMs, though I guess test coverage and other code analysis tools help too.
overgard about 3 hours ago |
gadrev about 9 hours ago |
$ apt-cache policy rsync | grep Installed
Installed: 3.4.1+ds1-7ubuntu0.2
$ sudo apt-mark hold rsync
rsync set on hold.themafia about 1 hour ago |
You can write for an audience or you can write for yourself. Which is fine either way but you shouldn't pass the blame for bad results on to your audience.
> and recieving almost no substantive input, discussion, or response on the actual content of the article
Well did you write it for that purpose?
> "Just wait, more bugs will surface" -- v3.4.3 has been out long enough
Wait for _more releases_. As your own data shows the bug rate is not consistent between releases. So this is probably not a worthwhile metric. Perhaps systems touched, new features included, or attempted fixes would be a better way to contextualize releases and the goals of the author.
yobid20 about 2 hours ago |
pushcx about 7 hours ago |
What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment.
The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits":
It spread to Hacker News and Lobsters, generating hundreds more comments.
This is false, it did not appear on Lobsters. Here is the function in the codebase that prohibits this kind of brigading: https://github.com/lobsters/lobsters/blob/main/app/models/st...Please correct your article.
nairboon about 9 hours ago |
dang about 3 hours ago |
[see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]
jrflowers about 2 hours ago |
Yes, it did. Here is some math showing that you shouldn’t care about that.
the_real_cher about 9 hours ago |
MagicMoonlight about 3 hours ago |
If I’m hiring and I see this kind of slop, I ain’t hiring you.
wookmaster about 9 hours ago |
mwkaufma about 2 hours ago |
I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever. It's tiring and a complete shame that author has to make such an insane deep dive into a random accusation that just caught on social media.