Migrate from AWS Postgresql to AWS Mysql
Well, I had to change a Postresql-type Aurora database to a Mysql-type Aurora database recently, and do it with no downtime. Boy! It cost me several days of trial and error. Here’s my experience
The initial conditions
There is an aws managed postgresql-like aurora database that is used by some service:
- We want to change it to an aws managed mysql-like aurora database.
- We don’t want to stop the service in the meanwhile
AWS DMS to the rescue
It is a job for the the Amazon “Database Migration Service”. The “database changing” thing should follow these steps:
- with ams we set up a “migration task” that copies all data in the postgresql database to the mysql database and after that, everytime some record is created/updated/deleted on the source postgresql database that operation is propagated to the target mysql databse
- once the two databases are “in sync” this way, we apply the code changes that make the service to use the new mysql database
- and vualá , no zero downtime
But, unfortunately, it’s not straightforward to have a dms between a postgresql aurora database and a mysql aurora database, there are some quirks. I’ll just show you what I did and emphasize the “quirks” along the way 😃
The three pieces of an AWS DMS migration
You’ll need to have three things set up for the migration:
- The replication instance. Think of this like the machine that is going to do the heavy lifting: just follow the examples and you’re set. Nothing special here
- The source and target endpoints. This “replication instance” needs to connect with the source of the data and with the target of the data. In our case, the source would be the postgresql aurora database and the target would be the mysql aurora database. This piece has some quirks, as we’ll see later.
- The migration task. Here you define how the migration is done, which tables to migrate, if it’s just a load data thing or you want to “propagate” further changes in the source to the target, etc … This piece also has quite a few quirks
The source endpoint: a postgresql aurora database.
Say you have some postgresql instance in your aurora postgresql cluster called “postgesql_instance”.
When creating source endpoint
- specify “source” of course
- check the checkbox “Select RDS DB instance” and select “postgresql_instance” at the “RDS instance” select
- On “Endpoint configuration” check the radio button “Provide access information manually” and fill the username, password, and “database name” fields. (You can leave the rest of the fields as they are)
That’s it! you’re done with the source endpoint
The target endpoint: a mysql aurora database.
Here it gets a little trickier …
Before you do anything. Make sure the tables present in the database that is in“postgresql_instance” are in the target database of “mysql_instance”. For example, if you have this table in your source database:
my_source_database=> \d my_table;
Table "public.user"
Column | Type | Collation | Nullable | Default
------------------------------+--------------------------------+-----------+----------+-----------------------------------
username | character varying(30) | | not null |
email | character varying(254) | | |
active | boolean | | not null | true
timezone | character varying(63) | | | 'UTC'::character varying
inserted_at | timestamp(0) without time zone | | not null |
updated_at | timestamp(0) without time zone | | not null |
Indexes:
"user_pkey" PRIMARY KEY, btree (username)
"user_email_index" UNIQUE, btree (email)
"user_username_index" UNIQUE, btree (username)
Then you should do this on you target database:
CREATE TABLE my_table
(
username char(30) primary key,
email varchar(254) not null unique key,
active boolean not null default true,
timezone varchar(63) default 'UTC',
inserted_at datetime default current_timestamp,
updated_at datetime default current_timestamp on update current_timestamp,
) ;
Say you have some postgresql instance in your aurora postgresql cluster called “mysql_instance”.
- specify “target” of course
- check the checkbox “Select RDS DB instance” and select “mysql_instance” at the “RDS instance” select
- on “Endpoint configuration” check the radio button “Provide access information manually” and fill the username and password (there is no “database” field) fields. (You can leave the rest of the fields as they are)
- (Warning: this is a quirk) On “Endpoint settings” set
DatabaseName
to your database name,TargetDbType
tospecific-database
andParallelLoadThreads
to1
. This is how we specify the database name to mysql - Check the “use endpoint connection attributes” checkbox and specify “initstmt=SET FOREIGN_KEY_CHECKS=0;autocommit=1”
NOTE: when testing connection to endpoints , make sure the security group your replication instance is, can access the security group of the source and target endpoints (if they’re different). THIS IS ANOTHER QUIRK
The Migration Task
On “migration type” , select “Migrate existing data, and replicate ongoing changes”
- Keep the checkbox “start task on creation” unchecked
- on “Target table preparation mod” specify “do nothing”
- on “Stop task after full load completes” specify “don’t stop”
- on “LOB column settings” leave it as is (If you have large binary columns, well, you’ll have to study to see how it’s done)
- on “table mappings” at “selection rules” well it’s pretty straightforward
- on “table mappings” at “transformation rules” there is a quirk regarding postgresql boolean fields and how they’re translated to mysql tinyint(1) fields. You’ll have to use the json editor (not the wizard) and do this for every boolean field in postgresql that you want to map to a tinyint(1) field in mysql:
{
"rule-type": "transformation",
"rule-id": "726893401",
"rule-name": "726893401",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "<the source table name your field is in>"
},
"rule-action": "add-column",
"value": "<name_of_source_boolean_field>",
"old-value": null,
"data-type": {
"type": "uint1"
},
"expression": "CASE WHEN $<name_of_source_boolean_field>=='true' THEN 1 ELSE 0 END"
},
{
"rule-type": "transformation",
"rule-id": "726863845",
"rule-name": "726863845",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "<the source table your field is in>",
"column-name": "<name_of_source_boolean_field>"
},
"rule-action": "remove-column",
"value": null,
"old-value": null
}
as you can see the “trick” is to tell AWS to remove a source boolean field and add a target tinyint field called the same.
NOTE: the json is much bigger, and include any “selection rule” you have made before using the wizard …
Bonus track: preparing source postgresql for “propagating” changes to the target database.
When defining the migration task, in order to AWS DMS being able to “propagate” any live change on the source postgresql database to the target mysql database, the source postgresql database must have some things done:
- the user you specified in the “source endpoint” configuration MUST have
rds_superuser
andrds_replication
roles:
-- as super upser
GRANT rds_superuser TO dfauth;
GRANT rds_replication TO dfauth;
- some events and tables mangling:
-- as the user you defined at the endpoint
CREATE TABLE public.awsdms_ddl_audit
(
c_key bigserial primary key,
c_time timestamp, -- Informational
c_user varchar(64), -- Informational: current_user
c_txn varchar(16), -- Informational: current transaction
c_tag varchar(24), -- Either 'CREATE TABLE' or 'ALTER TABLE' or 'DROP TABLE'
c_oid integer, -- For future use - TG_OBJECTID
c_name varchar(64), -- For future use - TG_OBJECTNAME
c_schema varchar(64), -- For future use - TG_SCHEMANAME. For now - holds current_schema
c_ddlqry text -- The DDL query associated with the current DDL event
);
CREATE OR REPLACE FUNCTION public.awsdms_intercept_ddl()
RETURNS event_trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
declare _qry text;
BEGIN
if (tg_tag='CREATE TABLE' or tg_tag='ALTER TABLE' or tg_tag='DROP TABLE') then
SELECT current_query() into _qry;
insert into public.awsdms_ddl_audit
values
(
default,current_timestamp,current_user,cast(TXID_CURRENT()as varchar(16)),tg_tag,0,'',current_schema,_qry
);
delete from public.awsdms_ddl_audit;
end if;
END;
$$;
-- as superuser
CREATE EVENT TRIGGER awsdms_intercept_ddl ON ddl_command_end
EXECUTE PROCEDURE public.awsdms_intercept_ddl();
grant all on public.awsdms_ddl_audit to public;
grant all on public.awsdms_ddl_audit_c_key_seq to public;
- and also , the parameter
rds.logical_replication
must be set to1
. That parameter in the “DB cluster parameter group” set associated to the “postgresql_instance”, and changing that involves rebooting the instance (good luck with that 😃)
Bonus track II: My cool company
Well, they say in my company the’re cool, and that’s true. My company is called doofinder and it’s quite something: we get to do a lot of programming in Elixir and a to do a lot on AWS, and no previous knowledge (at least, in AWS) is required. Just to study the hell out of it!!
http://www.doofinder.com