Published: 2nd August 2019
This is a high-level guide regarding the PHP/MySQL automated process which interacts with the MailChimp API in order to extract information to go into RE and synchronise mailing list information between RE and MC.
The process uses PHP to move information between MC and a MySQL database hosted on your.manchester.ac.uk. In turn, there is a further process to transfer information between RE and the MySQL database using MS Access but that is out of scope.
This guide covers the following:
- File and database locations
- High Level script functionality
- Action data
- List synchronisation
- Auxiliary scripts and tools
The process runs from the your.manchester.ac.uk domain from a directory titled ‘mcapitest’.
There is also a git repository on bitbucket.
The database is on the your.manchester domain and named mcapitest.
There are 2 main scripts running automatically via crontab in order to pass data between MC and my MySQL database. Each script has it’s own associated table that is used to inform the script where it has got to and therefore which action is to be performed next. The scripts have extensive error checking and monitoring inbuilt due to inconsistent processing by MC.
NOTE: MC has fairly severe restrictions on direct data calls (1,000 rows max) and so the script uses batch requests to request higher volumes of data. These batch requests are processed on MC servers and once completed can be collected and processed by us.
The primary script is called control.php, and it runs every 15 seconds.
The first check the script makes is to look for an existing batch request. These batch requests can take a long time to complete and are not processed in a linear fashion. This means that portions of very request can be processed more requests are added without any being completed you can create a massive backlog of half processed requests. This check ensures a manageable number of requests are made and completed before making further requests.
Next it checks the control table to see whether it has any processing left to do for the day (processing usually takes 4-5 hours) and if it does, what step it is on. The daily steps are:
- Initial day processing:
- Create a new row in the control table
- Update the all_campaigns table with any emails sent during the previous day
- If there were emails we check the email types (A/B split emails take further processing in order to get the information for each variant and the winning combination) and request the following information:
- who it was sent to
- which addresses bounced
- who opened it (and when)
- which links they clicked on (and date of first click)
- Download a full copy of the alumni mailing list including mailing preferences
- Find out which email addresses have unsubscribed (from any email) since the previous day
- Collect click activity for older emails. Due to the size of our list this is done by looking at the activity for all emails sent on a particular day X days ago (1, 2, 5, 10, 30, 90 and 365 days). This doesn’t keep the information 100% accurate on RE but it is good enough for most of our activity
This script runs every 10 minutes and amends data in the mailing list which is placed in 3 tables:
As the names, suggest they either add people to the list, update information (name, preferences, email address etc) or it removes them from the list by admin unsubscribe – we do not delete email addresses from the list.
The script does some email address validation (makes sure it is syntactically valid) and tries to perform the requested action. If it fails to complete then it looks into why it failed and will try further processing if possible e.g. if the data is put in the sync_add table but the email address is already on the list, it will fail to add but then run the update process to make sure the pre-existing record is correct.
Any failures (invalid email addresses, bounced email addresses, unsubscribes etc) are recorded in a table so that the data can be corrected by us.
Auxiliary scripts and tools
There are 2 extra scripts that run automatically. The first runs every day at 9am. It checks whether the first (action data) script has finished and if it hasn’t it emails our team to show them where it has got to. The second script runs weekly on a Sunday at 1-pm and trims down some of the database tables to prevent database bloat.
There are several manual-run scripts that have been created to run various small sections of the wider process. Sometimes these take an input e.g. campaign or batch id and sometimes they require no input. These scripts are not designed to be used by anyone without a thorough understanding of the whole process. Below is a list of scripts currently in existence.
- Get 1 campaign (sent to and activity information)
- Get 1 campaign’s extra information (top level information stored in the all_campaigns table)
- Fill all_campaigns table completely
- Get unsubscribes
- Get all members (fill sync_members)
- View outstanding batches
- Delete an outstanding batch
- Tidy tables (cron)
- Manual process (action data)
- Manual list sync process
As mentioned above, automation is provided to my scripts by using crontab which is accessed in cpanel for the domain. Cron jobs have a maximum frequency of once per minute, so in order to achieve 15 second runs of the action data script it is called 4 times per minute with 15/30/45 second delays. The current list of cron jobs is listed below: