Jeff Starr of Perishable Press offers what appears to be an excellent, free Blackhole for Bad Bots. Unfortunately it does not work with all cache setups, and I use some pretty crazy aggressive caching to boost site speed. I have not been able get Jeff’s plugin to work for me. I decided to make a very much simplified, less automated version, that will require an ongoing bit of my time but hopefully will thwart naughty bots.
The idea is to create a honeypot page, place a hidden link to it in the sidebar of every page and post, and disallow bots from visiting the honeypot page. Humans will not click the link because it is hidden. Good bots will not visit the page because it is disallowed. The only visitors should be bad bots that I can detect and ban.
Step one is to create the honeypot page. I title it something like “Bad Bot Trap” (not the real title – I don’t want to post the real title in fear of curios readers visiting it and thus getting banned). It looks something like this …
Step next is to disallow bots from visiting the new page. I log into my cPanel and add a line to my robots.txt file, something like …
Step next is to place a hidden link in the sidebar of my pages. I add a Custom HTML widget to my default sidebar. The widget has a single line, something like …
<a rel="nofollow" style="display:none;" href="https://wppov.com/bad-bot-trap/">Do NOT follow this link or you will be banned from the site!</a>
Step next is on CloudFlare. I setup a Firewall Rule called Bad Bot Honeypot …
When incoming requests match | URI Path | contains | /bad-bot-trap | Then | Allow
The action is ‘Allow’, so the rule on its own does nothing – except logging the event.
Step ongoing – this is where an ongoing bit of my time comes in and I don’t know yet if this is sustainable – I need to examine my CloudFlare firewall log on a regular basis and check each violation of the Bad Bot Honeypot rule. If it looks like an evil bot (I kinda expect they all will) I can then set a new firewall rule to block the IP or the user agent.
Update: 2019-03-17: Kinda interesting – and a bit unexpected – after 48 hours my bad bot honeypot has caught exactly … 0 bad bots. The trap is working – I can trigger it by accessing the page, but no bots are falling into it. I’ll keep an eye on it for about a week then decide if it is even worth the trouble of ongoing monitoring. In either case I’ll leave this post up in case anyone wants to try it out. Also, I like the gal in the top image.
Update: (2 hours later): Caught one! Grapeshot Crawler. A bit of googling indicates it is a benign bot that respects robots.txt. I probably didn’t allow enough time for bots to check the new robots.txt.
Update: 2019-03-24: Grapeshot Crawler is still falling into the trap – banned! Other banned bots to date … Baiduspider, [email protected], DnyzBot, Go-http-client, Nimbostratus, Scrapy, SemrushBot, SeznamBot, Sogou web spider, spbot, and WebDAV-MiniRedir.
Weird though … a few seemingly legitimate users are falling into the trap. No reason that should happen, and I haven’t figured it out. Either these are bad bots cleverly disguised as legitimate users; or there is something happening that I don’t yet understand. The later seems more likely.
WPPOV supports freedom from Net Neutrality and the GDPR. The Internet of the people, by the people, for the people, shall not perish from the Earth.