Friday, December 27, 2013

WordPress Spammer Mitigation

If you run servers that host WordPress sites, no doubt you're eventually going to run into issues with resource consumption due to spammers.  There are a lot of great mitigation options, including:

  • CAPTCHA plugins
  • Anti-spam plugins (like Akismet)
  • Security plugins or configurations (black/whitelists, .htaccess rules, etc)
These are all great, and I encourage you to look into them, but for the scope of this post I'm going to go hardball and just straight up block whole subnets using some Apache log analysis and iptables rules.  You're probably not going to want to do anything this extreme, but bits of this may be useful to you.

One of the sites that I run servers for at $WORK has been getting pounded by comment spammers over the last few weeks.  This hasn't been much of a problem for the webservers, but the database server has been under some really high load to keep up with all the transactions.  Knowing the site pretty well, I can make a few assumptions.

The Assumptions: 
  • The largest portion of our traffic is from internal IPs
  • Most IPs won't be commenting more than a handful of times in 24 hours
Using these assumptions, I can make a very rough educated guess about each IP that shows up in the logs.

In order to post a comment, an IP will submit an HTTP POST to "wp-comments-post.php".  That script has nothing there if you browse to it, so there aren't any legit HTTP GET requests.  Any time "wp-comments-post.php" shows up in the logs, it's going to be a comment.  Using this, we can get a list of everyone who's posted a comment.

# Search for wp-comments-post.php; print the IP of the user

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)>

That's kind of useful, but let's also sort the IPs and count how many times they've posted.

# Same as before, but sorted by number of occurrences of an IP

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> | sort | uniq -c


So already we have a likely suspect, given that it's only noon and they've commented 22 times from one IP.  But it's borderline. there will have to get a lot worse for me to count it.  We can handle 22 comments.

However, look at how close some of those subnets are.  Sure, there are thousands of IPs in each, but given that most of our traffic will be coming from internal IPs, that's sort of suspicious.  And looking at the full list, there are thousands of IPs that are really close together and they're all commenting.  Suspicious.  Spammers are sneaky.  We can, though, update our assumption list.

The (New) Assumptions: 
  • The largest portion of our traffic is from internal IPs
  • Most IPs won't be commenting more than a handful of times in 24 hours
  • ...which means most external /24 subnets won't comment more than a few dozen times in 24 hours
So, let's set the bar high and say if an external /24 subnet writes more than 200 comments in 24 hours, they're on the list of suspected spammers.  We can find these guys.

# Search for wp-comments-post.php 
# and break the IPs into /24 subnets instead

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> |awk -F. '{print $1"."$2"."$3".0/24"}'

The second awk command there breaks the output into fields by using a period (.) as the separator (the "-F.") and then prints the first three octets and a 0/24 to represent it's subnet.  

From there, let's sort the subnets, count the number of occurrences, and sort again by the volume of occurrences:

# Break up comments into subnets 
# and sort by the number of comments per subnet

$ awk '/wp-comments-post.php/ {print $1}' <your_apache_access_log(s)> |awk -F. '{print $1"."$2"."$3".0/24"}' '| sort | uniq -c | sort -nb -k1  

The second sort does the sort on the first column (-k1) and interprets them numerically (the "n" in -nb) and ignores leading blanks (the "b" in -nb).  This is my new output:


Looks like we have some likely suspects.'s subnet didn't make the list of the top 8 - must have been borderline enough.  I'll go ahead and block those top three (seriously - 549 comments?) and consider the others borderline.  Our other spam mitigation techniques can handle them if they're spamming and not legit.

We're running puppet on all our servers, and I've included a module that will take IPs in an array and add a rule for each in the iptables for all the servers, so these guys get banned from this server, and all the others we're running as well:

REJECT     tcp  --           tcp /* block-spammers */ reject-with icmp-port-unreachable 

This is just an example of parsing logs and acting on the data we can obtain from them, and it's admittedly an extremely heavy-handed approach.  You should, of course, tailor this to your own environment.  If you're running a popular site with tons of legit comments, using this as-is would be a very bad move.

SSL Key and Certificate Matching


  • Apache fails to start or restart
  • No errors on STDERR
  • No errors in logs
  • You've recently changed an SSL certificate or the SSL config for a Virtual Host


When this happens to me, 99% of the time my SSL key and certificate do not match for some reason (old key with new cert, copy error, vhost typo, etc).  Apache is really not helpful when this occurs. 

"Hey, $JUNIORADMIN!  I'm just not going to start.  Oh, you want error logs?  No, I think I'll skip that.  An error on STDERR?  Nope.  None of that either.  In fact, I think I'll just sit here doing nothing and silently mocking you."

Thanks Apache.

Check that your key and certificate match.  You can do this by comparing the modulus for each of them to see if it is a match.  To do this with openssl:

# Check the Key
$ openssl rsa -noout -modulus -in <your_ssl_key_file>


# Check the Cert
$ openssl x509 -noout -modulus -in <your_ssl_cert_file>


That's kind of tedious, and I've gotten into the habit of checking them every time I make a change to the key and certificate, so I found it easier to make a quick bash script to do this for me.

KEY_MOD=$(openssl rsa -noout -modulus -in $KEY)
CRT_MOD=$(openssl x509 -noout -modulus -in $CRT)

if [ "$KEY_MOD" != "$CRT_MOD" ] ; then
  echo "No Match"
  exit 1
  echo "Key and Certificate match"

Pass the script two arguments, first the key, then the certificate, and it'll compare the two strings for you.