199 points by luu 3 days ago | 128 comments | View on ycombinator
fancyfredbot 3 days ago |
jacquesm 3 days ago |
Of course they're not going to stop at just code. They need all the rest of it as well.
gulugawa 3 days ago |
" a.getElementsByTagName = function (...args) {//Clear page content}"
One can also hide components inside Shadow DOM to make it harder to scrape.
However, these methods will interfere with automated testing tools such as Playwright and Selenium. Also, search engine indexing is likely to be affected.
iamnothere 3 days ago |
I haven’t heard of the same attacks facing (for instance) niche hobby communities. Does anyone know if those sites are facing the same scale of attacks?
Is there any chance that this is a deniable attack intended to disrupt the tech industry, or even the FOSS community in particular, with training data gathered as a side benefit? I’m just struggling to understand how the economics can work here.
tedivm 3 days ago |
blakesterz 3 days ago |
"It is a DDOS attack involving tens of thousands of addresses"
It is amazing just how distributed some of these things are. Even on the small sites that I help host we see these types of attacks from very large numbers of diverse IPs. I'd love to know how these are being run.bloppe 3 days ago |
xacky 2 days ago |
zahlman 3 days ago |
blibble 3 days ago |
big tech incentivised to ddos... what a world they've built
Havoc 3 days ago |
There is no reason for AI scrappers to use tens of thousands of IPs to scrape one site over and over.
That just sounds like a classic DDOS.
sgc 3 days ago |
2OEH8eoCRo0 3 days ago |
samtrack2019 3 days ago |
chrisjj 3 days ago |
It is difficult to figure out the incentives here. Why would anyone want to pull data from LWN (or any other site) at a rate which would cause a DDOS like attack?
If I run a big data hungry AI lab consuming training data at 100Gb/s it's much much easier to scrape 10,000 sites at 10Mb/s than DDOS a smaller number of sites with more traffic. Of course the big labs want this data but why would they risk the reputational damage of overloading popular sites in order to pull it in an hour instead of a day or two?