Browse

Author: Erik Lax

The birth of Halon’s scripting language

April 28th marks the date for Halon’s 10th anniversary and I would like to share with you the story about Halon’s scripting language, HSL. In order to understand why we created our own scripting language you have to look back at what it was intended to do, and the landscape of embeddable languages in 2007.

HSL started out as an idea of having a dynamic configuration. We wanted people to easily be able to weight the results of different anti-spam engines (Cyren’s RPD and SpamAssassin). Hence, we came up with the idea of having a simple language with functions, ScanRPD returning the spam score from the Cyren engine, and ScanSA returning the result of SpamAssassin. The configuration could look like:

if (ScanSA() > 5 and ScanRPD() > 0) Reject();
if (ScanSA() > 3 and ScanRPD() >= 50) Reject();

In order to facilitate this, we needed a simple scripting language. At the time, the intent was not to allow any general purpose programming features. We didn’t even want loops, in order to prevent runaway programs.

Creating a domain-specific scripting language

If you’re not into programming languages, I should explain that creating a simple domain-specific scripting language is easy. There are tons of guides and it doesn’t take more than a few lines until you get simple arithmetic to work (5 + 6). The hardest and most important part of creating a language is the design, also called the syntax. You want to make it as easy as possible to read and write.

Domain-specific languages are no a new phenomena, as they have existed in a lot of different applications. I believe that custom application scripting DSLs are getting less common today, as a few selected embeddable scripting language engines are getting more traction. A few years ago you would probably pick Lua to be the embedded language of choice, while nowadays JavaScript (v8) is the language everyone knows.

Why not choose an established scripting language?

Over the years, people have asked me why we developed our own language and not used e.g. Sieve, Lua or JavaScript. Here’ why:

  • Sieve (rfc3028), could technically have been an alternative, but in 2007 we hadn’t heard about Sieve. It crossed our paths a few years later. Speaking against it; Sieve was created by Mirapoint, an email gateway competitor at the time. Looking back, it was probably good that we didn’t end up using Sieve. Having our own language made our own platform evolve way beyond Sieve, and what you would expect of a traditional email gateway.
  • Lua, it just didn’t happen and I suspect that if we would have considered Lua it would had been too large and unfamiliar as a language for our initial goal. Despite the fact that arrays starts at one 😃.
  • JavaScript wasn’t just that common as an embeddable language and v8 wasn’t released at the time. And to be honest, in 2007 no one expected JavaScript to be where it is today.
Easy to learn and easy to build upon

Today we try to make HSL as familiar and easy to learn as possible, which is really important when you have a custom language. Everything we add or change is by the principle of least surprise. The language has copied a lot of syntax and good ideas from different languages. It may look a lot like PHP, it may even be mistaken for PHP, while other major concepts are from JavaScript and Python. Development of new language features are in many cases intentionally slow, as they needs to be well thought through. From a language designer perspective I would say that there isn’t much syntax in HSL that I don’t like. However we continuously add modern features. In the recent year or two, a lot of time has been put in to the language and it has gained features such as closures, classes and modules. They allow our language to be easily extendable so that you can build reusable modules on top of it. Our entire examples collection on GitHub can be imported as modules and a lot of them are written as classes.

One of the most innovating features of HSL is the cache statement as it allows you to cache the result of any function call based on the input arguments. Sure, the same functionality can be built in other ways, but having such a powerful tool so easy at hand in HSL makes it stand out. It gets really neat when you do network lookup queries, such as API lookups using http() or ldap_search().

cache [] http("http://api.example.com/v1/?param=$1", [], ["foo"]);

I personally really like the concept of custom languages, I think it’s important to try to evolve and challenge the concept of established languages, and by doing so we progress and learn from each other. I think every new language brings something new to the table; it can be a specific feature or the entire concept of why it was created in the first place.

Haven’t tried scripting in HSL yet? Download Halon and give it a go!

Halon 4.5 – gettin’ certy with it

The main focus in Halon 4.5 release is TLS, hence the name “certy”. Check out the the new features and functions and try them out. Also, the knowledge base is growing with a lot of good how-to’s to help you around.

TLS information has been made accessible in the Halon Platform scripting language, both on the receiving and sending side. Support for X.509 client certificates has been added, allowing you to both verify the sender identity in the SMTP server, as well as identify yourself when sending email through an SMTP client.

Experiment: we configured a busy email system to ask for a client certificate for all inbound connections, and found that approximate 5% of all traffic provides a client identity. Most of the traffic is from Gmail and Office356. We did not collect the percentage of domains, which we leave as an exercise for you.

$peercert = GetTLS();
$haspeercert = isset($peercert["peer_cert"]);
stat("peer-cert", ["yes" => $haspeercert, "no" => !$haspeercert]);

How to enable this feature and start authenticating clients was documented as KB article.

Implementation and facilitation of TLS reporting (tlsrpt) has begun. It is a new standard for reporting TLS failures, mainly focused on MTA-STS and DANE.

The TLSSocket() class now have a getpeercert() function and the ability to specify a client certificate. Now you see why we called it” certy”?

Support for custom SASL authentication mechanism has been added. This allows you to build authentication schemes such as OTP, OAUTHBEARER or CRAM-MD5, but also EXTERNAL to facilitate the client certificate features. The procedure is documented in our knowledge base along with two sample implementations.

If you haven’t found our knowledge base before, the KB is a place to find how-to’s. The dev team is expanding it as fast as we can, adding topics that customers have asked about.

Finally, I want to highlight the big effort we’ve done to simplify, modernize and overall improve the web administration. This is an ongoing project, and something that we’re paying a lot of attention to. We want to thank, and congratulate, the Bootstrap team for providing such a awesome framework. We managed to get the Bootstrap 4.0 release in, with just a few days of work.

You can read the full changelog on our GitHub of all the other features big and small.

Halon 4.4 “lofty” packed with small improvements

The 4.4 release “lofty” is all about fixing bugs, boost existing features, and improve performance and memory management in the Halon script engine. And like macOS “High Sierra”, it’s fully baked.

The unusually long changelog contains many small improvements. We’ve given the pre/post-delivery script a slight overhaul. It’s now possible to tailor the bounce behaviour via the the SetDNS() function. Additionally, we’ve added $action and $context, as well as functions to set MAIL/RCPT parameters. Finally, the SetSouceIP() enables you to choose an IPv4 and IPv6 address pair, which is a great when you want to provide customers with a private IPv4 and IPv6 or if you want to use diverse address pools.

The improved “Listen on” directive on the Server > SMTP listener page enables more fine-grained control over listen ports and IPs; such as listening on different ports for different IPs.

Quirks and fun trivia
  • We recently revised our LDAP implementation, and realised that our own syntax and mechanism for failover between hosts is rather superfluous, since OpenLDAP supports that natively. Consequently, we adopted the standard LDAP URI’s in our configuration, and existing configurations will be automatically migrated.
  • While we support the PROXY protocol (v1) that passes client source IP information from load balancers, we thought it was mostly as HAProxy thing. Apparently, it’s used by many other load balancers such as Amazon ELB, Citrix Netscaler, and F5 BIG-IP. Most of them implements the version 1 (which is human readable), but there is a second version of the protocol that’s binary-packed, and have a quite smart feature: its magic string (protocol identification) is \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A which translates into literal "\r\n\r\nQUIT\r\n", a string chosen specifically to case an error and disconnect against servers not supporting this protocol. Clever!
  • If you have a IPv6 only datacenter, but still want to process IPv4 clients, you can do so with a SIIT-DC gateway which uses IPv4-mapped-IPv6 addresses. In Halon, you can use SIIT-DC while still performing IPv4 reputation (such as DNSBL), by extracting and setting the IPv4 address in the CONNECT script. If that doesn’t make the point that we’re very scriptable, then what does?

Image from Tore Anderson’s SIIT-DC presentation

If you ever had problems signing in to a Halon using Firefox, it can be because a recent change in how “secure cookies” are handled. When signing in over HTTPS, we set the secure cookie flag, which forbids the cookie to be send over a unencrypted HTTP connection to the same host. That is all great, but if you then try to sign in over HTTP (for whatever reason) Firefox will not be able to login because there is already a cookie for that domain with the secure flag and it cannot be replaced, nor accessed. We addressed this by using different cookie names for HTTP and HTTPS. Regardless of this fix, you should not use HTTP when administering your Halon hosts.

Meet “classy” and “cody”, Halon 4.2 and 4.3

We have done two new releases of Halon since last time we updated the blog with release matters. In Halon 4.1 “teamy”, released just before this summer, we introduced modules. A month later we followed up with 4.2 “classy” that added proper object orientation to the language (which works great in combination with modules). It spawned a few rewrites of our script examples (modules) to reflect this awesomeness. We initially added instance and class methods and variables (static), and in 4.3 “cody” we added the private keyword to functions and variables as well.

class HelloWorld
{
	private $name = "Dr Who?";
	constructor($name)
	{
		$this->name = $name;
	}
	function sayHello()
	{
		return "Hello ".$this->name";
	}
	static function ...()
	{
		...
	}
}

We’ve created a lot of modules and script examples. Some of those, such as the PostgreSQL and MongoDB modules, rely heavily on byte packed data structures. In order to better support those, we’ve added built-in functions such as pack() and unpack(). Upcoming modules and rewrites will also benefit from the new TLSSocket() class.

Here are some new additions to our module collection:

Other notable features from the changelog includes

  • FreeBSD 11.1 and new quarterly packages
  • sha2 hash functions
  • Added status and NDR codes to Reject, Defer and Deliver functions
  • SetTLS support CA name verification
  • DLP engine now support file hashes of SHA2-256 and SHA2-512
  • Added $sourceip variable to post-delivery script to easily determine which IP address that was used to send the mail
Geek out corner

One major change that only we can see and fully appreciate is the (both automated and manual) code migration to C++11 (and forward), using the truly awesome clang-tidy tool.

On another note; while we researched pack and unpack implementations by looking at other languages’ documentation (such as PHP, Perl and Python), we found a bug in PHP, which was fixed in 7.2, and backported to 7.1.9. The overall consensus of syntax and conventions amongst languages regarding how pack and unpack should work seems to reflect and mimic Perl.

Some scripting languages like JavaScript and HSL has the notion of class constructors but no destructors. The HSL memory model uses references counted automatic garbage collection to determine when objects should be removed.

“In a language with an automatic garbage collection mechanism, it would be difficult to deterministically ensure the invocation of a destructor, and hence these languages are generally considered unsuitable for RAII [Resource Acquisition Is Initialization]” – Wikipedia on destructors

MongoDB does unlike many other databases use little endian and not big endian (network byte order) in its wire protocol. This will let you send and receive data structures in native machine endian (for most people) since both x86 and amd64 use this convention. I highly recommend reading up on the fun historic trivia about endianness.

Want more in-depth info on the new releases? Get in touch with the support team.

How I fooled Microsofts Safe Link technology in 5 minutes

The Safe Link technology was recently launched by Microsoft through Office 365. The goal of this technology is to rewrite all URL’s in email to a URL classification service, so at the time of user-clicks it’s possible to reclassify a URL. This method is preferred as spammers more often replace the phishing URL’s site content after a message is being scanned, hence there is a need of reclassification later. Safe link is Microsoft’s “best-effort” to do so.

“For messages in HTML, Safe links identifies any link that uses the HREF attribute. For messages in plain text, Safe Link uses custom logic to identify any text resembling a URL.”

Microsoft.com

This method should work correctly in all MUA (email clients). From the web mail to your iPhone’s Mail app. However, replacing a URL in HTML as text is difficult. Just let me demonstrate how easy it is to fool Microsoft’s Safe Link:

<a x=">" href="http://badurl.com">click me</a>
      ^--- the regex? engine stops to detect the <a> tag here, and leaves the href unchanged.

Another obvious way to fool the Safe Link re-writer is to use a <form>-tag (it may not work in all email clients). You may be safe until spammers figure this out.

<form action="http://badurl.com"><input value="click me"></form>

If it’s this easy to fool, should it be done in another way or perhaps complemented with additional safeguards, preferably in the MUA (web mail, Outlook.app, etc)? I think so, and would have expected that Microsoft tried harder.

First suggestion; when rendering the email replace all links by asking the rendering engine what it has rendered

$("a").each(function () { /* all links are detected foolproof */ });

Second suggestion; Microsoft could surely use one of there own HTML parsers (like the one in the Edge engine) to detect where URL’s are located in the message in order to properly replace them, it’s probably better than a regex.

If customers are activating and paying for Safe Link they should be able to expect more value for their money and some more security.

In Halon you can do the same simple URL rewriting using this HSL code.

HSL instead of Safe Link

When HSL got objects

It’s time for yet another deep dive into our platform. This time we’re going to tell you about when and why HSL got classes with closures. It started when we revisited our MIME implementation. The MIME implementation was kind of basic, but served most people well. However, it could at the time only work on the top level MIME objects header and all MIME parts were simply addresses by IDs. Although the structure of MIME is highly nested with children and siblings and all MIME parts share the same features, there is a header and a body-part (which could be even more MIME parts). This data structure is ideal to be represented as objects in a tree structure; which got us started to sketch how we really wanted you to work with MIME objects.

In order to implement MIME objects properly we felt we had to implement objects deep into the language itself.

Anonymous functions were already added in an earlier release. Now we needed “classes” in order to bundle multiple functions around a local scope (implement closures by reference). The basic concept of closures is that a function object inherits a local scope, which follows the function object. We chose to add a new keyword (closure) which forces you to explicitly specified which variables should be included in the closure. This doesn’t impair the implementation but merely serves as a documentation of intention, which should keep the code less bug prune.

$variable = 0;
$function = function () closure ($variable) { $variable += 1; }
$function();
echo $variable; // 1

This concept allows you to build a simple MIME object.

function MIME() {
  $children = [];
  return [
   "getParts" => function () closure ($children) { return $children; },
   "appendPart" => function ($part) closure ($children) { $children[] = $part; },
  ];
}
$part1 = MIME();
$part2 = MIME();
$part1["appendPart"]($part2);
echo $part1["getParts"]();

The next thing was the syntax. Calling anonymous functions on arrays were already supported. However the syntax was somewhat confusing and we wanted it to be perceived more like an object than an array with functions. Hence, we added the property (array) operator just like C has the obj->fun syntax which is a shortcut for (*obj).fun.

$part1->appendPart($part2);
echo $part1->getParts();

This got us a long way, but we felt we were missing an important part, method chaining. In order for chaining to work, the object must return an instance to itself (let’s do that in our appendPart method). Our first though was, “that should be easy, can’t we just reference ourselves in the closure” like this:

function MIME() {
  $children = [];
  $self = [
   "getParts" => function () closure ($children) { return $children; },
   "appendPart" => function ($part) closure ($children, $self) { $children[] = $part; return $self; },
  ];
  return $self;
}
$part1 = MIME();
$part2 = MIME();
echo $part1->appendPart($part2)->getParts();

Unfortunately, that turned out causing issues with our reference counting. The object cannot reference itself, because if it does we start leaking memory for each object. To solve that a weak reference had to be added to the object, which required us to add some more functionally to the language. Two features, one to set the reference (define what self is) and one to get the reference (in order to return it). We choose the object keyword to declare an object which has the concept of a self. And the this keyword to get and return the reference to self.

function MIME() {
  $children = [];
  return object [
   "getParts" => function () closure ($children) { return $children; },
   "appendPart" => function ($part) closure ($children, $self) { $children[] = $part; return this; },
  ];
}
$part1 = MIME();
$part2 = MIME();
echo $part1->appendPart($part2)->getParts();

There you have it. Objects with closures. Since we implemented our built-in objects (such as MIME) using the same concept, you can actually extend them yourself.

function MIME() {
  $mime = builtin MIME();
  $mime["setSubject"] = function ($subject) { this->addHeader("Subject", $subject); return this; };
  return object $mime;
}
$part1 = MIME();
echo $part1->setSubject("Hello")->toString();

For more information see our documentation.