Hacker news

Top
New
Past
Ask
Show
Jobs

Dev-owned testing: Why it fails in practice and succeeds in theory (https://dl.acm.org)

152 points by rbanffy 1 day ago | 172 comments | View on ycombinator

OptionOfT 1 day ago |

The conversation is usually: devs can write their own tests. We don't need QA.

And the first part is true. We can. But that's not why we have (had) QA.

First: it's not the best use of our time. I believe dev and QA are separate skillset. Of course there is overlap.

Second, and most important: it's a separate person, an additional person who can question the ticket, and who can question my translation of the ticket into software.

And lastly: they don't suffer from the curse of knowledge on how I implemented the ticket.

I miss my QA colleagues. When I joined my current employer there were 8 or so. Initially I was afraid to give them my work, afraid of bad feedback.

Never have I met such graceful people who took the time in understanding something, and talking to me to figure out where there was a mismatch.

And then they were deemed not needed.

dasil003 1 day ago |

Oh man do I have opinions.

First of all, I've seen all type of teams be successful, ranging from zero QA at all, to massive QA teams with incredible power (eg. Format QA at Sony in Europe). I have absolutely seen teams with no QA deliver high quality full stop, the title is nonsense.

My firm belief is that QA can raise the ceiling of quality significantly if you know what you're doing, but there is also huge moral hazard of engineers dropping the ball on quality at implementation time and creating a situation where adding more QA resources doesn't actually improve quality, just communication churn and ticket activity. By the way the same phenomenon can happen with product people as well (and I've also seen teams without product managers do better than teams with them in certain circumstantes).

The most important anchor point for me is that engineering must fundamentally own quality. This is because we are closer to the implementation and can anticipate more failure modes than anyone else. That doesn't mean other roles don't contribute significantly to quality (product, design, QA, ops absolutely do), but it means we can't abdicate our responsibility to deliver high quality code and systems by leaning on some other function and getting lazy about how we ensure we are building right.

What level of testing is appropriate for engineers to do is quite project and product specific, but it is definitely greater than zero. This goes double in the age of AI.

terribleidea 1 day ago |

Most orgs I've worked for are so growth and product-focused that if you try adjusting your estimates to include proper testing, you get push back, and you have to ARGUE your case as to why a feature will take two weeks instead of one.

This is the thing I hate the most about work, having to ARGUE with PMs because they can't accept an estimate, there's often some back-and-forth. "What if you do X instead?" "Team Y (always uses hacks and adds technical debt with every single feature they touch) did something similar in two days." But we're just communicating and adding transparency so that's good and it certainly doesn't matter that it starts taking up 4+ hours of your time in Slack conversations and meetings of people 'level setting' 'getting on the same page' trying to help you 'figure out' how to 'reduce scope' etc. etc.

Also, I think testing via unit or integration tests should be standard regardless, and that isn't what I am thinking about here. I'm thinking about QA, the way QA does it. You hammer your feature with a bunch of weird bullshit like false and unexpected inputs, what happens if I refresh the page in strange ways, what happens if i make an update and force the cache to NOT clear, what happens if I drop my laptop in a dumpster while making the request from firefox and safari at the same time logged in as the same user, what happens if I turn off my internet in the middle of a file upload, and so on. When devs say that devs should be responsible for testing, they usually mean the former (unit and integration tests), and not this separate skillset of coming up with a bunch of weird edge cases for your code. And yes, unit tests SHOULD hit the edge cases, but QA is just better at it. You usually don't have engineers testing what happens when you try sending in Mandarin characters as input (unless they live in China, I guess). All of that effort should bring up your estimates because it is non-trivial. This is what getting rid of QA means, not happy path end-to-end testing plus some unit and integration tests.

sethammons 1 day ago |

While good points are made, I worry this gives the wrong impression. The paper doesn't say it is impossible, just hard. I have, very successfully, worked with dev owned testing.

Why it worked: the team set the timelines for delivery of software, the team built their acceptance and integration tests based on system inputs and outputs based on the edges of their systems, the team owned being on-call, and the team automated as much as possible (no repeatable manual testing aside from sanity checks on first release).

There was no QA person or team, but there was a quality focused dev on the team whose role was to ensure others kept the testing bar high. They ensured logs, metrics, and tests met the team bar. This role rotated.

There was a ci/cd team. They made sure the test system worked, but teams maintained their own ci configuration. We used buildkite, so each project had its own buildkite.yml.

The team was expected by eng leaders to set up basic testing before development. In one case, our team had to spend several sprints setting up generators to make the expected inputs and sinks to capture output. This was a flagship project and lots of future development was expected. It very much paid off.

Our test approach was very much "slow is smooth and smooth is fast." We would deploy multiple times a day. Tests were 10 or so minutes and very comprehensive. If a bug got out, tests were updated. The tests were very reliable because the team prioritized them. Eventually people stopped even manually verifying their code because if the tests were green, you _knew_ it worked.

Beyond our team, into the wider system, there was a light weight acceptance test setup and the team registered tests there, usually one per feature. This was the most brittle part because a failed test could be because another team or a system failure. But guess what? That is the same as production if not more noisy. So we had the same level of logging, metrics, and alerts (limited to business hours). Good logs would tell you immediately what was wrong. Automated alerts generally alerted the right team, and that team was responsible for a quick response.

If a team was dropping the ball on system stability, that reflected bad on the team and they were to prioritize stability. It worked.

Hands down the best dev org I have part of.

dgunay 1 day ago |

I have limited experience working in orgs with a QA apparatus. Just my anecdotes:

The one time I got to work with a QA person, he was worse than useless. He was not technical enough to even use cURL, much less do anything like automated e2e testing, so he'd have to manually test every single thing we wanted to deploy. I had to write up extremely detailed test plans to help him understand exactly what buttons he had to press in the app to test a feature. Sometimes he'd modify the code to try and make testing it easier, break the feature in doing so, and then report that it didn't work. In nearly all cases it would have been faster for me to just test the code myself.

The majority of the time I've worked in orgs where there is no QA team, the devs are expected to own the quality of their output. This works okay when you're in a group of conscientious and talented engineers, but you very quickly find out who really cares about quality and who either doesn't know any better or doesn't care. You will constantly battle management to have enough time to adequately test anything. Every bit of test automation you want to build has to be smuggled in with a new feature or a bugfix.

So really, they both suck, pick your poison. I prefer the latter, but I'm open to experiencing what good looks like in terms of dedicated QA.

solatic about 19 hours ago |

Devs need to write the 1% of automated tests needed just to prove that what they wrote works in the ideal case. QA is valuable for writing the 99% of automated tests that prove that the software works in the edge cases, with DevOps occasionally dropping in to make sure that the test suite runs quickly.

The way you solve Product and QA being at odds is very simple: QA loses, until they don't. When trying to find product-market-fit, it doesn't make sense to delay delivery to prove that an experiment works in exceptional circumstances. Eventually you do have product-market-fit, and you want to harden the features you already shipped, which is where QA comes in - better internal QA finds the bugs rather than your (future) customers. Eventually you start launching features to a massive audience on day 1, and you need QA to reduce reputational risk before you ship. The right time for QA to intercede and get a veto on delivery changes over the lifetime of the product, and part of whether or not QA is a net-add is whether your organization (leadership) is flexible enough to accept and implement that flexibility.

kayo_20211030 1 day ago |

A nice piece that outlines all the challenges, the opportunities, and the cultural and social adjustments that need to be made within organizations to maximize the chance of left-shifted testing being successful.

IMPO, as a developer, I see QA's role as being "auditors" with a mandate to set the guidelines, understand the process, and assess the outcomes. I'm wary of the foxes being completely responsible for guarding the hen-house unless the processes are structured and audited in a fundamentally different way. That takes fundamental organizational change.

KingOfCoders 1 day ago |

Developers want things to work.

QA wants things to break.

What worked for me, devs write ALL the tests, QA does selective code reviews of those tests making devs write better tests.

I also wrote the failure of Dev-Owned Testing: "Tests are bad for developers" https://www.amazingcto.com/tests-are-bad-for-developers/

thmpp 1 day ago |

If as a developer you want to be seen as someone advancing and taking ownership and responsibility, testing must be part of the process. Sending an untested product or a product that you as a software engineer do not monitor, essentially means you can never be sure you created an actual correct product. That is no engineering. If the org guidelines prevent it, some cultural piece prevents it.

Adding QA outside, which tests software regularly using different approaches, finding intersections etc. is a different topic. Both are necessary.

flambojones 1 day ago |

Having been at Microsoft when we had SDETs for everything (and I miss it greatly, though the way we could write a feature and then just toss it to test was ridiculous), I think things have swung too far away.

On one hand, engineers needed to take more ownership of writing things other than low-level unit tests.

On the other, the SDETs added immense value a ton of ways, like writing thorough test specs based off of the feature spec (rather than the design spec), testing without blind spots due to knowledge of the implementation, implementation of proper test libraries and frameworks to make tests better and easier to write, and an adversarial approach to trying to break things that makes things more robust.

I've also worked with manual QA for product facing flows, and while they added value with some of their approaches to ensuring quality - poking at our scenarios and tests, and looking more closely at subjective things - they often seemed to work as a crutch for the parts of code paths that engineers had made too difficult to test.

I've never seen anywhere attempt to replace the value that SDETs delivered with what engineers were tasked with. I'd argue it's not necessarily possible to fully replicate that when you're testing your own things. But with services now, it also seems like product/management are more willing to have slightly few assurances around quality and just counting on catching some in production in favor of velocity.

I've never seen places that got rid of QA

monster_truck 1 day ago |

Something I've always believed, and my experience with shipping multinational software on a schedule that has severe drop-dead dates confirmed: If you are contractually obligated to deliver a product that does x, y, and z correctly? QA is the only way to do that seriously. If you don't have QA, you don't care about testing full stop.

This only compounds when you have to comply with safety regulations in every country, completely setting aside the strong moral obligation you should feel to ensure you go far above & beyond mere compliance given the potential for harm. This compounds again when you are reliant upon deliverables from multiple tiers of hardware and software suppliers, each contract with its own drop-dead dates you must enforce. When one of them misfires, and that is a "when, not if", they are going to lie through their teeth and you will need hard proof.

These are not small fines, they are company-killing amounts of money. Nobody profits in this situation. I've been through it twice, both times it was a herculean effort to break even. Hell, even a single near-miss handled poorly is enough to lose out on millions in potential future work. The upsides are quite nice, though. I didn't know it was possible to get more than 100% of your salary as a bonus until then.

Don't take my word for it, though. Ask your insurance agent about the premiums for contractual liability insurance with and without a QA team. If you can provide metrics on their performance, -10-15% is not uncommon, this discount increases over time. Without one? +15-50% depending.

gwbas1c 1 day ago |

As a developer, I frequently tell higher ups that "I have a conflict of interest" when it comes to testing. Even though I fully attempt to make perfect software, often I have blind spots or assumptions that an independent tester finds.

That being said: Depending on what you're making and what platform(s) you target, developer-owned testing either is feasible or not. For example, if you're making a cross-platform product, it's not feasible for a developer to regression test on Windows 10, 11, MacOS, 10 distros of Linux. In contrast, if you're targeting a web API, it's feasible for a developer to write tests at the HTTP layer against a real database.

fizlebit about 6 hours ago |

I think that vibe coding now with anthropic tools and the latest model means that the cost of writing integration tests is significantly reduced. When the company ships a large product that has components from many teams, there is still a role for QA engineers who run nightly tests and chase teams to help diagnose the issue when there is an issue found. If you don't have such a central team publishing golden versions, then everybody is chasing the same bug. Ideally the integration tests are part of the change acceptance flow, but low frequency bugs (occur maybe 1 in 100 test runs) can still sneak through.

DiskoHexyl about 5 hours ago |

All the great QAs with whom I've worked would have made good developers (and they actually WERE good developers, only with a QA name and salary).

The problem is that a great QA earns less than a mediocre developer within the same company. And has a much lower status. And also fewer career opportunities elsewhere.

No wonder most of those guys switched at one point or another

MoreQARespect 1 day ago |

This paper has 7 references and 4 of them are to a single google blog post that treats test flakiness as an unavoidable fact of life rather than a class of bug which can and should be fixed.

Aside from the red flag of one blog post being >50% of all citations it is also the saddest blog post google ever put their name to.

There is very little of interest in this paper.

gwbas1c 1 day ago |

> At Google, for example, 16% of tests exhibited flakiness

This really surprised me. In my experience, usually a flaky test indicates some kind of race condition, and often a difficult-to-reproduce bug.

In the past year, we had a flaky unit test that caused about 1-2% of builds to fail. Upon fixing it, we learned it was what caused a deadlock in a production service every 5-6 months. As a result of fixing this one "flaky" test, we eliminated our biggest cause of manual intervention in our production environments.

inetknght 1 day ago |

Dev-owned testing is great when it succeeds. It's definitely not the only way. But you really do need the whole team to be on-board with the concept, or willing to train the team to do it. If that doesn't line up then you're not going to have a good time.

I've been on teams that own the whole pipeline, and have lead teams where I made it mandatory that the engineer writing a feature must also write tests to exercise that feature, and must also write sad-path tests too. That gets enforced during review of the pull request.

It works. But it takes a lot of effort to teach the whole team how to write testable code, how to write good tests, how to write sad-path tests, and how to even identify what sad paths might exist that they didn't think about.

I can tell you from experience that, when it does succeed and the whole team has high collaboration, then individual developer's work output is high, and new features can be introduced very rapidly with far fewer bugs making it to production than relying on a whole QA team to find all the problems.

It fails in practice because most (not all!) devs don't want to "waste time" doing that, and instead rely on QA cycles to tell them that something is wrong. Alas, QA cycles are a hell of a lot slower than the developer writing the tests. QA teams often don't have access to (or perhaps don't understand) the source code, and so they're left trying to find bugs through a user interfaces. That's valuable, but takes a completely different skillset and is a poor time to find a lot of the basic bugs that can show up.

On the other hand, the teams I've been on that failed (especially hard) often had huge (!) QA teams and budgets. Despite the size of team and budget, multiple projects would fall over from inertia and bickering between teams about who owns which bug, or which bug needs priority fixing.

weinzierl 1 day ago |

The article argues that Dev-Owned testing isn't wrong but all the arguments it presents support that it is.

I always understood shift-left as doing more tests earlier. That is pretty uncontroversial and where the article is still on the right track. It derails at the moment it equates shift-left with dev-owned testing - a common mistake.

You can have quality owned by QA specialists in every development cycle and it is something that consistently works.

tracerbulletx 1 day ago |

My experience with this was great. It went really well. We also did our own ops with in a small boundary of systems organized based on domain. I felt total ownership for it, could fix anything in it, deploy anything with any release strategy, monitor anything, and because of that had very little anxiety about being on call for it, best environment I ever worked in.

polotics about 15 hours ago |

Having gone through the whole cycle from 1 dev... to 2 dev-lead... to 3 large-ish team qa-lead for offshore devs... to 4 qa-lead for offshore qa... to 5 actual qa ("a" is for assurance as opposed to the quality control that passes as qa) in an industry that needs it... to 6 kind-of principal engineer... I would advise you that generalities about qa are useless, environments differ. Still always true: some mgmt will either want more for less, or even something for nothing, and in the long run we are all dead, and some folks' horizon is surprisingly short!

donatj 1 day ago |

> The problem is not that dev-owned testing is a flawed idea, but that it is usually poorly planned

In our case there was zero plan. One day they just let our entire QA team go. Literally no direction at all on how to deal with not having QA.

It's been close to a year and we're still trying to figure out how to keep things from going on fire.

For a while we were all testing each other's work. They're mad that this is slowing down our "velocity", and now they're pushing us to test our own work instead...

Testing your own work is the kind of thing an imbecile recommends. I tested it while I wrote it. I thought it was good. Done. I have all the blind spots I had when I wrote it "testing it" after the fact.

theptip 1 day ago |

> Why does dev-owned testing look so compelling in theory, yet fall short in almost every real-world implementation

Seems like a weird assertion. Plenty of startups do “dev owned testing” ie not hiring dedicated QA. Lots of big-tech does too. Indeed I’d venture it’s by far the most common approach on a market valuation-weighted basis.

time4tea 1 day ago |

The abstract says it really:

"It was clearly a top-down decision"

Many many things that are imposed like this will fail.

Its not willful non-compliance even, its just that its hard for people to do things differently, while still being the same people in the same teams, making the same products, with the same timelines...

Context is key here, lots of people see a thing that works well and think they can copy the activities of the successful team, without realising they need to align the mindset.. and the activities will follow. The activities might be different, and thats OK! In a different context, you'd expect that.

I'd argue that in most contexts you don't need a QA team at all, and if you do have one, then it will look a lot different to what you might think. For example, it would be put after a release, not before it.. QA teams are too slow to deal with 2000+ releases a year - not their fault, they are human.. need to reframe the value statement.

pftg 1 day ago |

I cannot believe the excuse for why shift-left QA is “not working” is that Amazon hires developers who can’t learn basic testing skills that QA engineers picked up in three months. If developers can’t write valid code for tests, that’s on the organization, not on the practice.

The author forgot to mention the costs of handoffs, which paid off all those tiny learning investments.

Shift-left has over 30 years of proof as one of the most effective ways to build reliable software.

P.S. This isn’t an ACM article; it’s a strongly opinionated post based on personal experience.

P.P.S. I'm not against QA, but make them as bug/quality hunters, instead of toil.

brap 1 day ago |

Do people actually send PRs with no tests? That is so bizarre to me

shuntress 1 day ago |

I'm surprised that with this many comments about the relationship between testing, development, and QA there is so little mention of environment and deploy process.

The usability of your test environment (and associated tooling) has a massive impact on quality assurance.

Every small difference between Production and Production-Plus-Feature creates friction and, even in systems of only moderate complexity, that friction adds up fast.

wesselbindt 1 day ago |

Conflict of interest that the author fails to mention: he's a QA manager at Amazon, and has a vested interest in QA being seen as a necessary role. It may well be, but this is definitely a conflict of interest.

Aside from that, this article is incredibly heavy on theory and very light on empirical fact. Its bibliography consists of a very narrow selection of blogs (4 of the articles he quotes are one and the same blog somehow), which talk about a very narrow subset of the industry. This article not referencing the serious and well-established research that has been done on the effectivity of dev-owned tests by for example the DORA folks, almost seems dishonest.

The clickbait title, when compared to the content of the article, is outright dishonest. The author theorizes about some warts dev-owned testing may have at some specific companies, but this is a very far cry from it failing in practice, especially when you compare them to the warts of offloading quality to a different team.

It's probably a bit harsh, but I feel like, as an industry, we should have a higher standard of empiricism when it comes to evaluating our ways of working.

tom_m about 22 hours ago |

Writing tests cases is part of development. Programmers simply need to do it. It helps them immensely.

Product acceptance and user acceptance testing is entirely different and often conflated. That's where people go wrong and are too simple minded (or cheap) to understand or invest in both.

__MatrixMan__ 1 day ago |

It pretty much comes down to whether QA is just doing what dev tells them--in which case they're not applying any scrutiny to dev's decisions, or if they're deciding for themselves what constitutes an appropriate validation for the dev work at hand.

darkwater 1 day ago |

First they came with the NoOps movement, and you were happy cause those damned ops people were always complaining and slowing you down. I can manage my infra!

Then, they came with the dev-owned testing and fired all the QAs, and you were happy because they were always breaking your app and slowing you down. I can write my tests!

Now, they are coming with LLM agents and you don't own the product...

threethirtytwo 1 day ago |

Simple. Because all theories are approximations. And our most high resolution theory of reality assumes reality is intrinsically random making it so that even the most accurate theory of reality can't predict anything.

__alexs 1 day ago |

This just seems to be basically a blog post that somehow got published in ACM?

physicsguy 1 day ago |

I think devs owning testing only works where they’re consumers of the product.

So a developer productivity tool - perfect.

A fully fledged engineering application targeting monitoring of assets? Not so much

tom_m about 22 hours ago |

This paper is really good and accurate. Many people would benefit from reading it.

tomtoday 1 day ago |

Note: the references in the article seem to be incorrect [4] through [7] are all the same article. I do not think that was intentional.

mrits 1 day ago |

Author-Owned proof reading is next

wpollock 1 day ago |

One purpose of QA testing is compliance assurances, including with applicable policies, industry regulations, and laws. While devs are (usually) good at functioal testing, QA (usually) does non-functional testing better. I have not known any devs that test for GDPR compliance for example. (I am certain many devs do test for that, just stating my personal experience.)

cratermoon 1 day ago |

The paper highlights the problem in two words of the first sentence of the abstract: "shrink QA".

Corporations do it to save money, and accept the loss of quality as the cost of doing business. Therein lies part of the reason for the sad state of software today.

zbentley 1 day ago |

Another vital quality of good QA teams is that they often serve as one of the last/main repositories of tribal knowledge about how an org's entire software system actually behaves/works together. As businesses grow and products get more complex and teams get more siloed, this is really important.

squirrellous about 20 hours ago |

I work with an amazing QA team right now. They are good programmers that don’t mind doing lots of grunt work, understand the goal of the software deeply, collaborates extremely well with other teams, and are just all around great people. I expect they would kill it as a non-QA dev team as well, so I’m always grateful to have such partners.

somewhereoutth 1 day ago |

If your product interface is with humans, you test it firstly with devs, then QA, then your customers.

Devs are a bit leaky for bugs/non-conformance, so if you skip the QA, then your customer is exposed. For some industries this is fine - for others, not so much.

johnea 1 day ago |

I feel the need to point out a phrase that was very popular among my dev peers:

The difference between theory and reality, is that in theory they're the same, but in reality they're not...

While any new feature, or bug fix, introduced by a dev should certainly be tested at that dev's desk to confirm to themselves that it's correct; it should also (of course) be tested by a product test group (call it QA if you must) to insure that all functional features of the product are still fully and correctly implemented.

I would aim a big fat finger at "agile", "scrum", "standup" culture for encouraging the violation of this, very obvious, testing requirement.

"What have you accomplished in the last 4 hours", type of management interface to development, fully and completely misses the primacy of confirming the functionality of updates before release.

This is really due to management, especially C-suite management of startups, living in a make believe world of deadlines and feature requirements pulled arbitrarily out of their ass, while refusing (or not having the capacity) to understand the technical issues involved.

boxed 1 day ago |

I think they've got that the other way around.

gwbas1c 1 day ago |

(Joke)

Can't AI just replace QA?

undefined 1 day ago |

undefined