What is Remails?
About a year ago, the founder of Remails contacted us to help build Remails, a Mail Transfer Agent (MTA) hosted fully in Europe. An MTA is a service that helps deliver emails reliably by forwarding emails via IP addresses that have earned a good reputation, ensuring the right email headers are set, and retrying delivery automatically when things go wrong. Remails’ source code is available on GitHub, which allows anyone to self-host it, but they also provide a ready-to-use working instance of Remails at remails.net.
Remails is currently mainly meant for transactional emails, not broadcast emails. This means it is perfect for sending email verification codes, password reset links, and personal notifications and reminders, but it is not yet meant for sending advertising emails to hundreds or thousands of people at the same time. Eventually, we might lift this limitation in the process of further development.
In the following sections, let’s take a look at how the development has gone so far and the technical challenges we faced along the way.
From MVP to high availability
At the start of the project, our main focus was to quickly build a minimum viable product, keeping the feedback loop short and showing results early on. Therefore, we started off with a cheap, single Virtual Private Server (VPS) from a European cloud provider. The whole application was running as a single binary in a container “orchestrated” by Docker Compose next to a simple PostgreSQL database container.
Simple VPS deployment at development start
After the fundamental work of implementing SMTP communication, a basic web interface, and the database integration was done, we went on to improve the deployment setup. For Remails, we had a hard requirement to use a European cloud provider. Ideally, we want to administer as little infrastructure as possible ourselves while also ensuring high availability of the service.
High availability
We set up a managed Kubernetes cluster with a managed Postgres database from the cloud provider. We split our application into two logical parts: the web interface API used to manage credentials and the actual MTA part, which sends and receives messages via SMTP. Using Kubernetes, we can run multiple replicas of each component and distribute them over multiple machines (called nodes in Kubernetes speak). This means that services will still be available on other nodes when one of the nodes goes down, thus achieving high availability.
To make sure we are on the same page about the term high availability, let’s take a brief look at availability aspects of our threat model. Our main concern is data availability. This is partially taken care of by the cloud provider, which runs two database nodes and takes care of Point In Time Recovery (PITR) backups. Additionally, we run a daily job that stores a full backup at a technically and organizationally independent location. If that backup ever fails, we have an observability solution that will alarm us. The second most important availability property is that we are always ready to receive new emails from our clients. That is taken care of by the load balancers and multiple MTA pods (see below). The least critical part is to ensure that we are always ready to send out the emails we received from our clients, as we consider a small delay in sending out the message non-critical. Don’t get us wrong: we strive for disruption-free service in all parts, but nevertheless, it’s essential to prioritize the most vital parts.
Our initial Kubernetes-based setup
The above image depicts this first Kubernetes setup, slightly simplified. Nevertheless, it's already significantly more elaborate than the single binary VPS setup. Let's go over the different parts:
- First, note that the PostgreSQL database is external to the cluster, as it is managed by the cloud provider and not part of the Kubernetes setup. All of the pods in all of the nodes are connected to this same database in order to share data.
- As mentioned, the application is split into two main parts: the Web API and SMTP Mail Transfer Agent. Both are so-called deployments, which means Kubernetes will distribute them (randomly) over the available nodes. Connecting users will be forwarded to any healthy instance at random by the load balancers, which handle incoming connections (not shown in the image).
- Periodic tasks is a singular cron job that regularly checks for emails that could not be sent and should be retried. At this development stage, it would send out those emails from its own pod, just like the MTA pods do (spoiler: this will soon change!).
In short, this Kubernetes setup allows us to reliably relay incoming email while ensuring high availability. However, there is one big challenge we still have to tackle that we haven't discussed yet: IP addresses.
Juggling IP addresses
So far, the setup is pretty standard. However, there is one more requirement that we haven't mentioned yet. A big problem with email is that unsolicited messages (also referred to as spam) get sent out to lots of people at once by spammers. These messages, often containing either plain old advertising or full-on fraudulent scams, have prompted email service providers to implement spam filters. These spam filters aim to reduce the exposure to spam by either putting suspected spam messages into a separate folder or rejecting them outright.
As the chance of an email making it to an inbox and not just the spam folder depends highly on the IP address the email is sent from, we have to be controlling our outbound IP addresses. This is a twofold problem. Firstly, we want to use Remails’ own block of IP addresses with the cloud provider (which is usually called “bring your own IP”, or BYOIP), as it is a lot more difficult to build up a good reputation with public IPs from cloud providers. Secondly, we want to be in control of which IP address from our block is used for every mail we send.
The Finnish cloud provider UpCloud was able to fulfill the BYOIP requirement for us. For the second requirement, we will need to improve our Kubernetes architecture to take control of the network interfaces of the nodes. We want to be able to pick which IP address an email is sent from based on the email's sender, which allows us to offer high volume customers one or multiple IPs for their exclusive usage, so that their reputation is independent of other Remails customers.
Refactoring the Kubernetes architecture
In the setup as shown in the previous image, this is not possible, as the outbound IP is based on the IP of the node from which the email is sent. To gain control over the outbound IPs, we could simply run a single binary on a single machine with many IP addresses, but that would go completely against the requirement of high availability. So instead, we refactored our Kubernetes architecture to support both high availability and managed outbound IP addresses simultaneously:
The improved Kubernetes setup (simplified)
The above image provides a simplified overview of the current architecture. Let's take a look at it step by step:
- We have previously already seen the Web API, which is running as a Kubernetes deployment randomly distributed over the nodes. Besides its original task to host the web interface, it also provides the public REST API and the documentation of that API.
- We separated the SMTP Mail Transfer Agent into two parts: SMTP inbound and SMTP outbound. The inbound service is a deployment just as the Web API and is thus (randomly) distributed over the available nodes. A load balancer forwards inbound traffic to one of the healthy instances, just as with the Web API.
- The SMTP outbound is one of the most significant changes compared to the previous architecture. Instead of being a deployment like most other pods, it runs as a Kubernetes DaemonSet. This ensures there is always exactly one instance running on each node. Additionally, we granted host-network access to those pods, which allows them to interact directly with the network interface installed on each node.
- Another addition is the Cloud IP manager. This is responsible for making sure each node is assigned the required IP addresses by the cloud provider by interacting with the cloud provider’s API. Note that a node can have multiple IPs assigned from which the SMTP outbound pod can choose using its direct access to the network devices.
- The other major change is the introduction of a central Message Bus. We designed this as a very lightweight and simple broadcast message bus without any deliverability guarantees. We'll explain the reasoning for this in the next paragraph. The message bus is used for communication between the different components. For example, if an email should be sent out, the Web API, SMTP inbound, or Periodic Tasks send a simple notification to the message bus with the message ID of the email and which outbound IP it should use. The outbound DaemonSets are listening for these notifications, filtering for messages that should be sent from an IP they have access to. After a sending attempt, the outbound pod responds with a status update.
The design choice of using a best-effort message bus might come as a surprise. It’s only a single instance, without a retry or failover mechanism. Nevertheless, our setup is highly available, as every action we perform is stored in the database. Imagine a new email reaches an SMTP inbound pod. First, it will store this in our database and subsequently trigger a notification to send the message. If that message does not reach the (correct) outbound pod for any reason, it will be automatically retried by the periodic tasks until it works out eventually. Thus, a failing message bus can cause a small delay, but it will not hinder us from accepting new emails and sending them out slightly later.
Conclusion
With the architecture described in this blog post, we managed to achieve high availability while at the same time making sure our application can choose which outbound IP to use when sending emails. This allows us to reliably send the emails from users using Remails' own IP block.
Try it out now!
Remails is currently in public beta, so if you're currently using a US provider and are interested in a European alternative, feel free to give it a try. There is a free1 plan available, allowing you to send up to 3,000 emails per month. As soon as you need more, you can simply upgrade to a paid subscription.
If you’re interested in more technical details or even self-hosting, check out the code on GitHub!
Roadmap
Soon, we will add email notifications for invalid DNS records and quota warnings. We are also working on more moderation and privacy features, such as audit logs for organization admins and configurable shorter email retention periods. Furthermore, the ability to receive emails through Remails (useful for receiving bounced messages and DMARC reports) is also on our roadmap.