Friday, December 27, 2013

WordPress Spammer Mitigation

If you run servers that host WordPress sites, no doubt you're eventually going to run into issues with resource consumption due to spammers.  There are a lot of great mitigation options, including:

  • CAPTCHA plugins
  • Anti-spam plugins (like Akismet)
  • Security plugins or configurations (black/whitelists, .htaccess rules, etc)
These are all great, and I encourage you to look into them, but for the scope of this post I'm going to go hardball and just straight up block whole subnets using some Apache log analysis and iptables rules.  You're probably not going to want to do anything this extreme, but bits of this may be useful to you.

One of the sites that I run servers for at $WORK has been getting pounded by comment spammers over the last few weeks.  This hasn't been much of a problem for the webservers, but the database server has been under some really high load to keep up with all the transactions.  Knowing the site pretty well, I can make a few assumptions.

The Assumptions: 
  • The largest portion of our traffic is from internal IPs
  • Most IPs won't be commenting more than a handful of times in 24 hours
Using these assumptions, I can make a very rough educated guess about each IP that shows up in the logs.

In order to post a comment, an IP will submit an HTTP POST to "wp-comments-post.php".  That script has nothing there if you browse to it, so there aren't any legit HTTP GET requests.  Any time "wp-comments-post.php" shows up in the logs, it's going to be a comment.  Using this, we can get a list of everyone who's posted a comment.

# Search for wp-comments-post.php; print the IP of the user

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)>

That's kind of useful, but let's also sort the IPs and count how many times they've posted.

# Same as before, but sorted by number of occurrences of an IP

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> | sort | uniq -c


So already we have a likely suspect, given that it's only noon and they've commented 22 times from one IP.  But it's borderline. there will have to get a lot worse for me to count it.  We can handle 22 comments.

However, look at how close some of those subnets are.  Sure, there are thousands of IPs in each, but given that most of our traffic will be coming from internal IPs, that's sort of suspicious.  And looking at the full list, there are thousands of IPs that are really close together and they're all commenting.  Suspicious.  Spammers are sneaky.  We can, though, update our assumption list.

The (New) Assumptions: 
  • The largest portion of our traffic is from internal IPs
  • Most IPs won't be commenting more than a handful of times in 24 hours
  • ...which means most external /24 subnets won't comment more than a few dozen times in 24 hours
So, let's set the bar high and say if an external /24 subnet writes more than 200 comments in 24 hours, they're on the list of suspected spammers.  We can find these guys.

# Search for wp-comments-post.php 
# and break the IPs into /24 subnets instead

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> |awk -F. '{print $1"."$2"."$3".0/24"}'

The second awk command there breaks the output into fields by using a period (.) as the separator (the "-F.") and then prints the first three octets and a 0/24 to represent it's subnet.  

From there, let's sort the subnets, count the number of occurrences, and sort again by the volume of occurrences:

# Break up comments into subnets 
# and sort by the number of comments per subnet

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> |awk -F. '{print $1"."$2"."$3".0/24"}' '| sort | uniq -c | sort -nb -k1  

The second sort does the sort on the first column (-k1) and interprets them numerically (the "n" in -nb) and ignores leading blanks (the "b" in -nb).  This is my new output:


Looks like we have some likely suspects.'s subnet didn't make the list of the top 8 - must have been borderline enough.  I'll go ahead and block those top three (seriously - 549 comments?) and consider the others borderline.  Our other spam mitigation techniques can handle them if they're spamming and not legit.

We're running puppet on all our servers, and I've included a module that will take IPs in an array and add a rule for each in the iptables for all the servers, so these guys get banned from this server, and all the others we're running as well:

REJECT     tcp  --           tcp /* block-spammers */ reject-with icmp-port-unreachable 

This is just an example of parsing logs and acting on the data we can obtain from them, and it's admittedly an extremely heavy-handed approach.  You should, of course, tailor this to your own environment.  If you're running a popular site with tons of legit comments, using this as-is would be a very bad move.

No comments:

Post a Comment