Thursday, August 16, 2018

Tracking email performance with AWS

Background

For our two main websites (QSR & FNF), we send out a monthly e-letter to about 20k+ each and we have it setup such that it goes through Amazon SES and all email notifications (deliveries, bounces, unsubs, complaints, etc.) get posted back to the site. Our custom Drupal module receives each notification and updates several places so that we can track bounce rates, delivery rates, and opt out people who complain or unsubscribe. So our site bogs down (at least for authenticated traffic) when we send the 40k e-letters because these notifications bypass all of the layers of caching in order to make those database updates.

Inspiration

Decoupled Drupal is a major mind-shift for me. QSR was our first Drupal (6) site back in 2010 and over the last 8 years, we have written over 40 custom modules to do things big (lead generation, circulation, etc.) and small (user surveys, etc.).

The advantage is that it's one place for your users to go for all the tools they need. The disadvantage, though, is that your server resources are shared and is probably taking away from the higher priority of serving your users.

There's also something to be said about splitting a feature off into an environment where you're free to choose the best tech for the job, which might not necessarily be Drupal.

Setup

First, this article was a big help in getting things setup. I ended up using a different table schema, just having 4 fields, event_id (the SNS event messageid, which is also my primary key), the source (so I can gather items based on the site), a processed boolean flag, and the message itself, stringified JSON.

One thing to keep in mind is that SNS posts its event differently to HTTP(S) than it does for Lambda, so you cannot rely on your HTTP(S) examples as test cases. I have a (sanitized) captured example here.

Finally, the easy/cool bit is changing the SNS subscription from your HTTP(S) endpoint to your Lambda function. You don't even have to program a subscription confirmation for that - it just works.

Next Steps

So I went live with this without testing an actual load scenario. Big mistake! Once the SNS messages came flying in, Lambda reported a ton of errors and DynamoDB's default write capacity caused a lot of throttling. So while Lambda can scale from input dynamically, what it does with the output can wreak havoc. I would highly recommend you do some load testing based on your situation. You can set up a test run to send 40k emails to a random assortment of AWS SES' test addresses. I ended up having to scramble my Lambda and DynamoDB configurations to bump up the timeout max and enable auto-scaling for writes. I ended up losing a lot of tracking data because my Lambda function didn't fail properly and SNS thought everything was OK and didn't try again. :(

After I get that fixed and more bulletproof, my next step is to write a cron job to gather any unprocessed messages that belong to the site and process them. I'll write a follow-up post when I'm done with that.

And once I'm proud of my Lambda function, I'll post that, too. Update: Here it is.

Conclusion

So the tradeoff is that my reporting is not real-time and there are some AWS costs, but this frees up our web server to do what it should be doing best: serving content to our readers.

No comments:

Post a Comment