<img src="https://ad.ipredictive.com/d/track/event?upid=110231&amp;url=[url]&amp;cache_buster=[timestamp]&amp;ps= 1" height="1" width="1" style="display:none">
Post: github, spam, tech | May 18, 2016

Using reCAPTCHA to handle spam misclassification

Today’s leading spam filter technologies offer a very high degree of accuracy. In this blog I’ll describe the current state of spam classification, and propose a pretty innovative method that can significantly improve both senders’ and recipients’ satisfaction (as well as reducing the burden on administrators and support staff) by enabling senders to report false positives if they pass a CAPTCHA test. Let’s start by familiarising ourselves with the history of anti-spam.

Release blocked email


The terminology that we normally use is

  • False positive, a blocked desired (legitimate) email (“ham”)
  • False negative, a missed spam that slipped through filters into a user’s mail box

Historically, spam filters had poor accuracy and low performance, and email was scanned after being accepted (probably as a consequence of the former). Finding themselves unable to reject email, they offered actions such as putting suspected spam in a junk folder, quarantine or by tagging the subject line.

This I believe, significantly damaged people’s trust in email as a reliable transport, simply because it makes legitimate (potentially important) email disappear.

The leading spam classification technologies today however, offers both high accuracy and performance. Many of them, including Cyren (that we use ) , uses fuzzy checksums (or “patterns”) to measure and classify email in a distributed, collaborative fashion. By constantly updating the hashing logic, anti-spam vendors are able to adopt as spammers evolve their tactics. By primarily looking at individual spam “outbreaks”, the false positive ratio is generally low in such systems. This is key, since people tend to be much less bothered by a few false negatives (missed spam) rather than having desired email blocked.

The high accuracy and performance also makes rejecting spam (rather than accepting it) a viable option. Rejecting spam is arguable superior to accepting and quarantining it, since the sender is informed about the email not being delivered to the recipient’s inbox. It reestablishes email as a reliable (transactionally safe) transport, while a copy of (the rejected) spam can still be retained in a quarantine of junk folder. Halon has advocated for this approach for a long time, and it’s a prerequisite for efficient feedback and reporting mechanisms like the one I’m going to describe now.

Using CAPTCHA to handle false positives

While I believe that our default approach of rejecting (giving a 500-error) spam with an informative error message (and storing a copy in a quarantine or junk folder) is superior to a traditional quarantine, there sure is room for improvement. For example, the sender needs to contact the recipient using some other mean (alternative email or phone, which they might not have), the quarantine might consume a significant amount of disk space, and the recipient might need to bother the support staff.

We’ve developed a self-service false-positive report and release project simply called sender-fp-release to address those shortcoming. As it says on its Github page, it allows senders to report false positives directly to the recipient after completing a reCAPTCHA.


In our experience, this system is a win for everybody;

  • The sender doesn’t need to manually contact the recipient, only verify a CAPTCHA
  • The recipient gets notified instantly, instead of having to browse through a junk folder
  • The helpdesk doesn’t need to do anything

Additionally, it saves disk space by only retaining spam for a short time (for example 1 day), unless the sender reports it. The retention time for reported email is extended (typically a week or two), giving the recipient plenty of time to release the email.

Release blocked email

If you believe that your spam handling could be improved, please take a look at the project, or maybe give it a spin.