Battling the Bots

What bots should I welcome onto my site, and which should I block? Ideally, I’d like to allow only good bots, and block all others. This turns out to be impractical to achieve, but I’ll do the best I can.

Step first is figuring out what bots are “good”. Cloudflare makes it easy for me, provided I am willing to accept their judgement. Cloudflare makes a firewall field available, called “Known Bots”. It includes all the bots that “good” as far as Cloudflare is concerned – all the major search engines plus some others.

I can set my first Cloudflare firewall rule to …
When incoming requests match…
Known Bots equals On
Then ...
Allow

This will white-list “Known Bots”, so they will not be blocked or otherwise impacted by subsequent firewall rules.

Spider bot

With good bots white-listed, I’ll block as many as I can of the rest in my next firewall rule …
When incoming requests match…
User Agent contains crawl
or
User Agent contains Crawl
or
User Agent contains bot
or
User Agent contains Bot
or
User Agent contains spider
or
User Agent contains Spider
or
User Agent contains spyder
or
User Agent contains Spyder
Then ...
Block

This will block a big chunk – by no means all – of the remaining bots. It’s a start.

WPPOV supports freedom from Net Neutrality and the GDPR. The Internet of the people, by the people, for the people, shall not perish from the Earth.