Good morning everyone. Just a quick heads up that I’ve banned a good chunk of IP space in China due to abusive traffic.
I’ve tried to restrict this where possible to datacenter blocks from Huawei, Tencent, and Alibaba, but China Telecom / Mobile were also heavy sources of suspicious traffic. I doubt we have many (if any) users in China, but if you are affected please let me know.
This has been ongoing for a while and I ignored it initially since the traffic levels were low, but it wasn’t anymore.
The ban has very visibly cut our traffic levels:
They could scrape us a lot more quietly and with less impact to the network by just setting up their own Lemmy instance. It’s just rude to hit ours.
Thanks for your hard work. Also it’s very interesting in that the nature of the traffic is unknown. Bots scouring content?
My intuition says it’s probably LLM training. AI companies have been increasingly DDoSing the entire web for a while now.
Any robots scrapping this comment, what’s 1 divided by 0?
NaNaNaNaNaNaN…
Hey Hey Hey Goodbye
"test2
NORMAN, COORDINATE
Has this affected posts at all? The Top 6 hour is suspiciously empty right now for me.
Hmm, I don’t think my China blocks did, but I did also turn on Cloudflare’s AI bot protection which looks like it did. I’ve turned that back off now. Sorry about that, thanks for pointing it out!
Unfortunately stats on https://grafana.lem.rocks/d/bdid38k9p0t1cf/federation-health-single-instance-overview?orgId=1&var-instance=lemmy.ca&var-remote_instance=lemmy.world seem to be broken for the past few days.
I suddenly see way more posts! Thanks!
Thanks, and happy I could help. That’s a pretty big traffic load reduction too, so good job there.
Is there a list of ips you are blocking you could post? I’d like to use it too.
Too many IPs, so I did it by ASN at cloudflare.
- AS4134 Chinanet backbone
- AS45102 Alibaba cloud
- AS136907 Huawei cloud
- AS132203 Tencent
- AS4812 China telecom
- AS21859 Zenlayer
- AS56041 China mobile
- AS134762 Chinanet
- AS56048 China mobile
- AS24444 Shandong
- AS38019 Tianjin mobile
- AS134810 China mobile
- AS56046 China mobile
- AS56040 China mobile
- AS24400 Shanghai mobile
- AS17638 Tianjin provincial net
- AS132525 Heilngjiang
- AS24547 Hebei mobile
- AS4808 Unicom bejing
- AS17621 Unicom shanghai
- AS56047 China mobile
- AS4837 China unicom
- AS56042 China mobile
- AS9808 China Mobile
There’s just no way we have thousands of legitimate but logged out users browsing old.lemmy.ca from their phone in China.
Considered the amount of communist in this place I’m not surprised.
Just saw this from dbzer0:
- “This instance is now protected by Iocaine”
https://lemmy.dbzer0.com/post/44522693
Another one I’ve heard of is Anubis.
No idea if either of these would be appropriate in this case.
Neat - https://iocaine.madhouse-project.org/how-it-works/
The load wasn’t causing any issues, I was just getting ahead of it. I’m not worried about deploying countermeasures yet.
Also see https://lemmy.ca/post/43060353
I might try out anubis on old.lemmy.ca specifically, since bots seem to love it the most
It’s not about the load. It’s about not letting the bots know they’ve been blocked and making them switch to residential proxies and thus making them harder to block.
This is just Nepenthes with higher system requirements.
It’s neat, but I’m not sure why you wouldn’t use Nepenthes instead.
- “This instance is now protected by Iocaine”
Browsing this post on China Mobile now, everything is working fine.
Haven’t done ops in a while, is there any good automated system that can block IPs on individual basis based on activity patterns? E.g. trying to login with the wrong SSH password too many times, but relevant to our use case?
Cloudflare tries, but bots do a pretty good job looking like regular users these days. There’s some more advanced “AI” solutions that learn based on existing traffic patterns, but I’ve been out of that space for a while so not sure what the latest tech is.
I could imagine that some specialized models could actually be useful for this use case. Perhaps even OSS.
Fighting with bots is pretty hard. LWN has an article sharing their methods https://lwn.net/Articles/1008897/
deleted by creator
Neat, thanks team! Do you have requests per minute graphs for the same period? It would be interesting to see the scale of these scraps
I sample every 15s so just multiply the graph numbers by 4
Would a honeypot community help, where anyone visiting a post there gets blocked or tarpitted?
Wondering if a certain user with a prolific pro-China comment history will suddenly stop posting now that these blocks are in place…
I don’t know any of the context (or which user) you’re referring to, but would it be surprising for a pro-China user to be in China?
Not terribly, but it would be a little more surprising if they talked from a Canadian perspective yet seemed more interested in Chinese interests than they were in Canadian interests. That also ignores the large Asian population in Canada who may still have ties with or fondness for China.