Mittwoch, 18. Februar 2015

Replacing a mySQL master with minimal downtime

Due to the transition of the CaseTrain project to the RZ I have to move the servers to a different subnet and because the currently used OS Windows Server 2008 R2 has entered extended support there was and is now some urge to set up completely new servers (with Windows Server 2012 R2).
The first step was to move the database slave to a new server. This was easy as the master db is/was 5.5 which can be easily replicated from a MySQL 5.6 installation, so I just used this guide. The next step was to replace the db master with a new one with minimal downtime. As CaseTrain is under heavy load during the lecture period (especially at the end of it when the tests are done) I waited until after Fasching when user access is traditionally very low. After much googling this is how I did it ...



Replacing a mySQL master with minimal downtime (on Windows)



I have a web server W, a database master M1 and a database slave S (W -- M1 ← S). I want to replace M1 with a new database master M2, so that I have W -- M2 ← S.


This will be performed in three steps:

Step 1: Duplicating replication slave (W -- M1 ← S and M1 ← M2)
Step 2: Daisy chaining (W -- M1 ← M2 ← S)
Step 3: Dropping old master (W -- M2 ← S)

The setup of M1 and S is as follows:

M1/my.ini
log-bin=M1-bin
binlog-do-db=myDB1
binlog-do-db=myDB2
server-id=M1

S/my.ini
datadir=D:/MySQL Datafiles/
# log-bin
report-host=S
relay-log=mysqld-relay-bin
replicate-do-db=myDB1
replicate-do-db=myDB2
server-id=S

For cloning the slave user in step 1 and replicating M1’s users on M2 I recomment HeidiSQL. Also I am aware of the problems of binlog-do-db and replicate-do-db, but as there are separate connection pools for the different databases it should be safe.

Step 1: Duplicate replication slave


This step can be performed with no downtime of M1/W.

1. Install MySQL on M2 (same version as on S) and stop it


M2 my.ini
datadir=D:/MySQL Datafiles/
log-bin = M2-bin
log-slave-updates
report-host=M2
relay-log=mysqld-relay-bin
replicate-do-db=myDB1
replicate-do-db=myDB2
binlog-do-db=myDB1
binlog-do-db=myDB2
server-id=M2

2. Set up replication for M2


Clone the replication user on M1 for S and edit the new user’s IP to the IP of M2 (don’t forget to flush privileges)

3. Prepare S for duplication


On S, do the following

> stop slave;
> flush tables;

and stop the service.

4. Copy S


Copy all contents of D:\MySQL Datafiles from S to M2. Typically there are now two sets of .err and -slow.log files - the ones from S should be deleted. Also delete auto.cnf on M2.

5. Start the service on S and M2 and check success


When issuing

> show slave status;

on M2 and S it should produce the same results and when issuing

> show master status;

on M2, it should show increase the position when called repeatedly.

So the replication on S is (still) working, the replication on M2 is working and M2 is also acting like a master, so S can be set as M2’s slave in the next step.

Step 2: Daisy chaining


Now we have to make sure that both slaves are on the same status, we can only achieve this if they are both synched with the master. The only way to do this is to lock M1 for a short time so no new data enters M1. As this may pause user actions, it may be more user friendly to put a maintenance note on W and briefly stop W now, but as it only takes seconds it could be done without stopping W.

M1 downtime starts here

On M1 issue:

> flush tables with read lock;

and keep this session open (otherwise the lock is released).
You have to wait until both slaves are in synch, so check with

> show slave status;

on S and M2 until both are showing Slave_IO_State = Waiting for master to send event. and the positions are not increasing anymore. Then issue

> stop slave;

on S and M2.
Now call

> show master status;

on M2 which is something like File = M2-bin.000003, Position = 1234567. This is the state on M2 that corresponds to the current state on S. These two dates will be used in a moment (so keep this display open) but now you already may resume M1 in the opened session with

> unlock tables;

so your application can run normally (if you stopped W then start it now).

M1 downtime ends here

The replication on M2 can now be started with

> start slave;

To make S replicating M2 first a slave user for S on M2 must be set up with

> grant replication slave on *.* to '[slave user]'@'[IP of S]' identified by '[slave password]';
> flush privileges;

Now S can be switched from it’s former master M1 to the new master M2:

> change master to master_host='[IP of M2]',master_user='[slave user]', master_password='[slave password]', master_log_file='M2-bin.000003', master_log_pos=1234567;

and the replication can be started

> start slave;

M1’s show master status should now correspond to M2’s show slave status and M2’s show master status with S’s show slave status. Now that S replicates M2 (and M2 still replicates M1) we can move to M2.

Step 3: Dropping old master


First, replicate M1’s users on M2. As there seems to be no easy way to replicate them without damaging the existing users on M2 you have to set them up one by one with all their permissions as they are on M1. Good time to check all those permissions ;-)

To keep the downtime as short as possible you may remove the lines

log-slave-updates
report-host=M2
relay-log=mysqld-relay-bin
replicate-do-db=myDB1
replicate-do-db=myDB2

from M2’s my.ini now without restarting M2.

Also you may prepare all your configuration for W so that all connections are made to M2 instead of M1 without deploying the configuration yet.

Before we switch again the the slave (M2) needs to be in sync with the master (M1).

W downtime starts here

In step 1 the W downtime was optional, typically you have to restart W because of the configuration changes anyway (e.g. to clean any connection pools), so you should stop W now.

On M1 issue:

> flush tables with read lock;

On M2 wait until in synch (with show slave status), then stop M1 (for the last time). On M2 enter

> stop slave;
mySQL < 5.5: > change master to master_host=''; reset slave;
mySQL 5.5+ : > reset slave all;

and restart the service on M2.

Then reconfigure W so that it uses M2 or deploy the prepared configuration and start W.

W downtime ends here

Et voila - M1 may now be deleted.



Next step: Moving the web server from Windows Server 2008 R2, httpd 2.2, Tomcat 7, Java 7 to Windows Server 2012 R2, httpd 2.4, Tomcat 8, Java 8.

Keine Kommentare:

Kommentar veröffentlichen