February 22, 2017

Validating Optimail: Testing an AI algorithm from concept to production

Optimail uses AI to optimize drip email marketing campaigns automatically and continuously while learning directly from customers’ behaviors. Under the hood, Optimail uses a suite of algorithms concentrating on message sequence optimization, message delivery optimization, and customer profiling. These algorithms have changed a lot since we first thought of automating email marketing optimization using AI… and, in fact, we weren’t always sure that they would work.

In this post, I wanted to share the story of how the idea of Optimail was born, how we built and tested our algorithms through several iterations, and what gave us the confidence to bring Optimail to market.

Seeing the opportunity with early skepticism

The idea that became Optimail first came to light about a year ago. At the time, I was the CTO of a successful SaaS startup (since acquired) and my now co-founder, Jacob, was completing his PhD dissertation.

Like most SaaS businesses, my startup had two key drip email campaigns intended to increase conversion and retention at various points in the customer lifecycle; namely, an onboarding campaign for new trial users encouraging them to subscribe, and a retention campaign for users whose subscription had lapsed.

The problem I found was that optimizing these email campaigns felt like a waste of time. I was constantly tweaking when an email was sent, testing to see whether a message was more or less effective for a given segment, changing the order in which discounts were offered to maximize the price people paid for the product, etc. Not only did this take time, but it was never really clear whether something worked… sure, we might have seen a small bump in conversions one week after a change, but was it because of that change? What if the change I try next week “cancels” that previous change out? It felt like I was missing the forest for the trees and wasting my time.

I was discussing my problem with Jacob when he saw a parallel between these email sequences and the kinds of dynamic sequences he was studying in his dissertation. (Much of Jacob’s work involved training machine learning algorithms to detect and predict events based on noisy, dynamic sequences of neural activity.) He suggested that we might be able to use an implementation of reinforcement learning to learn which sequence leads to the best outcomes, including both ‘immediate’ goals (like opening or click an email) and ‘distant’ goals (like subscribing to our service).

Our team at Strong Analytics discussed the idea and, at first, we were skeptical. We were skeptical in part that slight changes in the timing or content of messages could yield large enough effects in customer engagement. But after doing some reading on the successes of email optimization and seeing how big some of the effects in this space were, we felt more optimistic that there was at least the possibility to see some real results from an AI-optimized marketing campaign.

That left our second source of skepticism: could an algorithm learn these “optimal” email sequences quickly enough to be useful?

Simulating ‘toy’ environments in R and Python

Our first step toward testing this idea was to create a toy simulation environment in R. For those who don’t know, R is a statistical programming language and one of the tools we use for statistical and machine learning work at Strong.

This toy environment was straightforward: We simulated a simple 10-day onboarding campaign in which there were a bunch of customers in up to 5 distinct “segments”. Customers in each segment had distinct preferences about which emails should be received on each day in order to drive them towards signing up at the end of the trial. For example, sending emails 2 and 3 before email 5 might increase the chances that customers in segment A would signup. On the other hand, that same sequence might make customers in segment B less likely to convert. Other preferences were about specific emails and not necessarily their sequence; for example, segment C might get a boost from receiving email 5 while segment D might be unaffected by that email entirely. Finally, not all preferences were segment-specific, there were global preferences as well.

Preferred sequences and emails were randomly initiated and not exposed to the algorithm in any way; instead, the algorithm had to learn through simply sending emails and monitoring customer behavior which sequence was most effective for customers in general and in each segment. Because this was a simulation, we knew the maximum possible conversion rate that could be achieved and we ranked our algorithms by how quickly they could achieve this rate.

After a couple of weeks of tweaking the algorithms and the simulation environments (to stop us from simply tuning our approach to one class of problems), we started to see some consistent and encouraging success in detecting the effects in the simulations.

Now it was time to test the approach on real data.

The algorithm’s first real world test

We knew from our tests that our algorithm could discover the optimal email campaign at least in simulated environments, but could it work in the real world?

Thankfully, we had the opportunity to address this question by working with a friend and partner, Marvin Russell, who founded and operates a successful SaaS app called Checkli. Marvin is a marketing expert (prior to Checkli, he ran his own digital marketing agency) and was excited by our new technology. He agreed to let us test it with Checkli’s onboarding campaign for a couple of months in an experiment.

For the real-world experiment, we assigned half of his new users to receive the same onboarding campaign that was already running. The other half of users were designated to receive an Optimail version of the campaign — the very same set of emails, optimized on-the-fly by our algorithm using real customer data.

To implement the Optimail campaign, we put together a scrappy implementation of our algorithm that connected directly to Checkli’s backend. Every day, our algorithm server would spin up and retrieve a list of customers that needed to be emailed, select which email to send them, and then send those emails. In addition, because it needed to learn which email sequence was most effective, it also processed the customers’ actions from the previous day. Specifically, it looked at which customers logged in, created checklists, shared the app with their friends, or unsubscribed from the emails. It learned that sequences which led to logging in, creating checklists, and sharing were positive and should be tried again, while sequences that led to unsubscribes were negative and should be avoided.

At the end of the two month experiment, we saw some incredible results. Users who received the Optimail campaign engaged +25% more with the app than users in the control campaign. Incredibly, the Optimail campaign sent -20% fewer emails.

During our initial real-world test, users who received the Optimail campaign engaged more with the app despite receiving fewer emails.

This experiment demonstrated two things for us. First, we saw the power of Optimail in optimizing real email campaigns so that they were more effective for businesses. Second, for the first time, we saw how Optimail could work for customers too: it learned not to annoy them with lots of emails, but to send the ones they wanted to engage with and that helped them learn and use with the app.

App-embedded simulations with more complex data

The success with Checkli motivated us to begin building a production-quality app that could allow users to build their own Optimail campaigns and plug Optimail into their existing infrastructures (the app customers can sign up for and use today). This meant that, rather than plugging directly into our users’ backend systems, the algorithm would need to communicate with this app which, in turn, provides an API for users.

Yet we didn’t want to stop testing the algorithm while we built the app, so we once again turned to simulations. This time, we created a simulation component in the new app which, when activated, created a new fake user with 5 distinct campaigns. Like our earlier simulations in R, the simulated customers in each of these campaigns had unique preferences regarding which sequence of emails would most effectively drive them towards the goals. However, in these simulations, we could tweak some additional configurations that could (a) dial up the noise in the customers’ behavioral data and, (b) make the preferences more or less complicated. This meant we could make the simulations more and more difficult to solve for the algorithm while we built.

Once the simulation campaigns were created, the algorithm would spin up automatically and begin sending emails and learning from customer behavior. For every batch of emails it sent, new customers would be added and current customers would advance a day in their campaigns, allowing the cycle to repeat infinitely until we reset the simulations.

We built a dashboard using Metabase to monitor the algo-app API communication and monitor the algorithm’s learning. Over and over, we saw encouraging results. As expected by this point, the algorithm was able to learn any optimal sequence, but what impressed us most was the speed with which it could learn even under noisy, complex conditions. Even when it was initially led astray by misleading data, it was able to continue to explore different strategies until it latched onto the most effective one.

These app-embedded simulations have become a critical part of our deployment process because they provide such a comprehensive test of the app, algorithm, and their ability to communicate.

A need for higher-speed simulations… turning to Node.js

Despite the success of our simulations within the app, we still saw opportunities for making the algorithm better. The app-embedded simulations were great for a comprehensive test of app-ago communication, they were proving to be a bit slow for our liking when it came to iterating on the algorithm. Waiting for simulation resets and the back-and-forth chatter between the algorithm and app was slowing down algorithm development.

This led us to build a ‘mock’ version of our app in Node.js that could run on any local simulation. This Node.js app exposed the very same API as our real app, but it was designed for the sole purpose of simulations and, therefore, was super fast, easy to reset, and easy to configure for testing.

We also built a new way to scale up our simulations by creating simulation configuration files that dropped into the Node.js app. For example, if I had an idea about a kind of campaign that might be harder for Optimail to optimize, I would program a new configuration file and then share it with the team. Our language for configuration files was very expressive (rather than tweaking simple parameters, we could entirely replace sequence value logic with new, configuration-specific functions). We wanted to be sure Optimail could learn in each configuration context without sacrificing performance on others. Eventually, we had a library of configurations that represented as many types of campaigns as we could think of.

This rapid iteration gave us the time and motivation to began adding other, complementary algorithms to the main sequence optimizing algorithm, for example, algorithms that were specifically focused on finding the best time of day to deliver an email or modeling customer behaviour independently of the email campaigns.

Replicating our first real-world success in private beta

While continuing to work on the algorithm offline in our simulations, we also entered the final phase of pre-launch testing: a private beta in which we worked with a couple more companies that we were already doing business with via Strong. These were smaller, lean companies that were easily sold on the idea of continuous, automatic optimization of their email campaigns because it meant that they didn’t need to spend the time or money on doing it themselves.

Altogether, during the private beta, we added 5000 new customers per month to drip email campaigns powered by Optimail, and the results were once again impressive. Without any modification to the campaigns, we saw open rates increase by +5% relative to previous levels, click rates increase by +14%, and spam reports drop by -98%. And, to top it off, we saw another +20% bump in app engagement in the Checkli onboarding campaign (consistent with the results of our first real-world test).

Results from our beta test

During our expanded private beta, we once again found that Optimail could made email campaigns more effective by learning and reacting to customers' behaviors.

Continued improvements after launch

Moving forward, we are continuing to work on Optimail’s algorithms to make them more effective at optimize drip email campaigns.

Importantly, because we knew we would never stop iterating on the algorithms, we built the app such that we can upgrade or replace an algorithm at any time even for campaigns that are always running. Once replaced, the new algorithm rapidly re-trains on historical data until it is caught up, and then picks up from where the old algorithm left off. This means that our customers are always using our latest and best technology.

Want to talk to us about how Optimail can work for your email campaigns? Send me an email at brock@optimail.io, use the chat icon in the bottom right corner to speak to a member of the team, or just sign up free today and take a look for yourself!

Enter your email to receive updates from the Optimail team, thoughts on AI and machine learning, and email marketing optimization tips.
Thank you for subscribing!
Please double-check your email address.


Ready to get started? Optimail is free while you build and integrate.