Thursday, September 6, 2018

Moving from Lambda+DynamoDB to SQS

This is a follow-up to my previous post, where I was trying to offload the email tracking capabilities from our web server. Even with the more bulletproof Lambda function, I was losing some events because evidently, SNS tries 4 times and then gives up. Lambda was automatically scaling, but my writes to a DynamoDB table were not. So back to the drawing board!

I next moved from Lambda + DynamoDB to SQS, which seems to handle anything thrown at it, so that's awesome! Then I wrote a cronjob script to pull SQS messages, process them, and then delete them from the queue.

So now I can send a batch of emails through SES (using SendBulkTemplatedEmail), 50 at a time, which are set to use a configuration set to send email events (delivery, bounce, complaint, click, open, etc) to a SNS topic that is set to add it to a SQS queue. Once that's done, all events go to the queue and then every hour, I poll the queue to gather all messages and process/delete the relevant ones. I was able to send 7k emails in a few minutes and the load on my server was unnoticeable!

Thursday, August 16, 2018

Tracking email performance with AWS

Background

For our two main websites (QSR & FNF), we send out a monthly e-letter to about 20k+ each and we have it setup such that it goes through Amazon SES and all email notifications (deliveries, bounces, unsubs, complaints, etc.) get posted back to the site. Our custom Drupal module receives each notification and updates several places so that we can track bounce rates, delivery rates, and opt out people who complain or unsubscribe. So our site bogs down (at least for authenticated traffic) when we send the 40k e-letters because these notifications bypass all of the layers of caching in order to make those database updates.

Inspiration

Decoupled Drupal is a major mind-shift for me. QSR was our first Drupal (6) site back in 2010 and over the last 8 years, we have written over 40 custom modules to do things big (lead generation, circulation, etc.) and small (user surveys, etc.).

The advantage is that it's one place for your users to go for all the tools they need. The disadvantage, though, is that your server resources are shared and is probably taking away from the higher priority of serving your users.

There's also something to be said about splitting a feature off into an environment where you're free to choose the best tech for the job, which might not necessarily be Drupal.

Setup

First, this article was a big help in getting things setup. I ended up using a different table schema, just having 4 fields, event_id (the SNS event messageid, which is also my primary key), the source (so I can gather items based on the site), a processed boolean flag, and the message itself, stringified JSON.

One thing to keep in mind is that SNS posts its event differently to HTTP(S) than it does for Lambda, so you cannot rely on your HTTP(S) examples as test cases. I have a (sanitized) captured example here.

Finally, the easy/cool bit is changing the SNS subscription from your HTTP(S) endpoint to your Lambda function. You don't even have to program a subscription confirmation for that - it just works.

Next Steps

So I went live with this without testing an actual load scenario. Big mistake! Once the SNS messages came flying in, Lambda reported a ton of errors and DynamoDB's default write capacity caused a lot of throttling. So while Lambda can scale from input dynamically, what it does with the output can wreak havoc. I would highly recommend you do some load testing based on your situation. You can set up a test run to send 40k emails to a random assortment of AWS SES' test addresses. I ended up having to scramble my Lambda and DynamoDB configurations to bump up the timeout max and enable auto-scaling for writes. I ended up losing a lot of tracking data because my Lambda function didn't fail properly and SNS thought everything was OK and didn't try again. :(

After I get that fixed and more bulletproof, my next step is to write a cron job to gather any unprocessed messages that belong to the site and process them. I'll write a follow-up post when I'm done with that.

And once I'm proud of my Lambda function, I'll post that, too. Update: Here it is.

Conclusion

So the tradeoff is that my reporting is not real-time and there are some AWS costs, but this frees up our web server to do what it should be doing best: serving content to our readers.

Thursday, August 2, 2018

DrupalCon changes = less value

Have you seen the latest changes for DrupalCon? They have replaced a day of sessions with an additional workshop/summit day (for an additional expense) and have increased the early-bird basic ticket price from $450 to $800. I went to DrupalCon Nashville and it was a good conference, but it felt more commercial and like a trade show with sponsors everywhere you look. I've also been to DrupalCamp Asheville (didn't get to go this year because of a scheduling conflict) and for me, I believe camps are a better bang for the buck and where I'll be focusing for continuing education.

Tuesday, July 24, 2018

Setting #default_value for date field in a D8 form

I had a hard time getting the #default_value to work for the date form field because the examples module and the documentation uses this approach:

    $form['send_date'] = [
      '#type' => 'date',
      '#title' => $this->t("Send Date"),
      '#default_value' => [

        'month' => 2,
        'day' => 15,
        'year' => 2020,
      ],
    ];


After playing around with it for a while, I was able to get it to work by passing in a YYYY-MM-DD value, using date.formatter (the new way to format_date()).

    $date_formatter = \Drupal::service('date.formatter');
    $form['send_date'] = [
      '#type' => 'date',
      '#title' => $this->t("Send Date"),
      '#default_value' => $date_formatter->format(REQUEST_TIME, 'html_date'),
    ];


Looks like this is a known issue that has also caused others lost time. :( Hopefully this will get rolled out soon.

Friday, June 8, 2018

Upgrading Drupal from 8.3 to 8.5

Background

I am the sysadmin and developer for Art & Object, a Drupal 8 website built with the Drupal Composer project. The version pin in composer for Drupal was 8, which in hindsight was too broad for our usage. Meaning Drupal point releases (8.3 to 8.4) require study to ensure you understand all the implications, which wasn't something I did. I just blindly did a composer update, thinking everything would be handled automatically.

This really bit me when 8.4 came out because my server was running Debian Jessie, which runs PHP 5.6 and my composer didn't have a platform PHP configuration, so a lot of the underlying Symfony code updated to PHP 7. So I ended up doing a backgrade until I figured it out.

Then there were the critical security Drupal updates (SA-CORE-2018-002 and SA-CORE-2018-004) earlier this year that would not be released for 8.3, so I had to upgrade (or at least, at the time, I felt I had to, though I see now they have a patch for older 8.x releases). By that time, 8.5 was released, so I updated the composer to 8.5 and ran update and after some basic testing, moved on.

Then a few months later, I noticed the status error messages about running the contrib media module alongside the core and I knew I missed something and there was a problem.

I then started down a wicked rabbit hole of getting a local copy running and following the upgrade instructions, running into problem and going back to getting a local copy running fresh again and trying again. Lots of trail and error (mostly errors) and head-banging-on-the-desk. I looked for help on the #media IRC channel, but the best advice came from posting on Stack Overflow, where @sonfd pointed out that the media module needs to be uninstalled first. I thought I had tried that and ran into an error message that mentioned you can't uninstall the media module with media items already created.

The Fix

So after lots and lots (and lots) of local refreshes and trials and errors, here's the list I finally followed when it came time to upgrade production:
  1. First, put the site in maintenance mode. Then take a database backup and make a tarball of your project directory. Don't skip over this.
  2. drush pmu media crop_media_entity: pmu = pm-uninstall. Remove the media module (and crop_media_entity, if you have that, too). This was the tip from @sonfd that opened the rest of this process for me.
  3. composer remove drupal/media: Remove the contrib media module from the filesystem. I should add that I prefixed all my composer commands with /usr/bin/php -d memory_limit=-1 /usr/local/bin/ because I often ran into memory limits when running composer.
  4. composer require drupal/inline_entity_form drupal/crop:1.x-dev drupal/media_entity_instagram:2.x-dev drupal/media_entity:2.x-dev drupal/media_entity_slideshow:2.x-dev drupal/media_entity_twitter:2.x-dev drupal/slick_media:2.x-dev drupal/media_entity_actions: These modules are temporary to help upgrade the database records.
  5. composer remove drupal/video_embed_field: For some reason, I couldn't require video_embed_field:2.x-dev, so I removed it and then...
  6. composer update: When I ran this, it updated video_embed_field to 2.x-dev.
  7. composer require drupal/media_entity_image drupal/media_entity_document drupal/image_widget_crop: More temporary modules to help the upgrade process.
  8. drush cr: Clear cache to make sure Drupal picks up new modules and paths.
  9. drush updb: Run the database updates.
  10. drush pmu entity media_entity: Uninstall these modules (these were the old contrib modules)
  11. composer remove drupal/media_entity drupal/media_entity_image drupal/media_entity_document drupal/crop drupal/image_widget_crop
    /usr/bin/php -d memory_limit=-1 /usr/local/bin/composer require drupal/crop:2.x-dev drupal/image_widget_crop drupal/empty_page:2
    : Clean out the temporary modules from the filesystem.
  12. drush cr: Clear caches
  13. drush updb: Run database updates
  14. drush cex: Export the configuration (so you can commit it later).
  15. The blazy module had an error with the core media and hasn't been updated (as of this writing), but there is a patch to fix that. So I learned how to add patches to a composer file - turned out pretty simple. Add this to composer.json in the extra section:
            "patches": {
                "drupal/blazy": {
                    "Gets Blazy to work with Drupal Core Media": "https://www.drupal.org/files/issues/2881849-8.patch"
                }
            }
  16. composer update: This was odd, but I had to do an update, which picked up the patch, but didn't really install it. I can't remember exactly now, but I believe this actually deleted the blazy folder.
  17. composer remove drupal/blazy: So removing this actually installed it. Who knew? Whatever ... it's still in my composer.json and now the filesystem has the module and the patch.
  18. drush cr: Clear caches!
  19. For some reason, this upgrade created a new field called field_media_image_1 and assigned that as the source for the image media type, which broke some of the images on the site. So I edited media.type.image.yml file to revert source_field back to my original field_image.
  20. drush cim: Import my hack to get my media image type to work.
  21. I had a custom field formatter that I had to edit to change the namespace from media_entity to media.
  22. drush cr: Final cache clear!
  23. Test and make sure all is well. If so, take the site out of maintenance mode and commit your repo changes.

Advice / Conclusion

A lot of this pain could be negated by studying the release notes better. I own that and this counts as one of my many scars of lessons learned. I hope others can learn from my lesson, too. Someone may end up writing a meta post about this post to point out the high cost of maintaining a Drupal site and I don't think they'd be wrong about that, but that's the price you pay for running servers that are publicly accessible.