The Art of Trickery

A Simple Script to Restart MySQL Repliction

| Comments

As a MySQL DBA, I can say that one of the most common types of issues that I have encountered is problems with replication. Depending on how/why replication broke, the fix falls into one of two categories:

1 - A simple restart
2 - Manual intervention

A simple restart means that you can simply issue a “start slave” command and you’re back in busniess. However, manual interation, means that you need to fix the problem before you can restart the slave.

Luckly for me, most of the problems I encounter in my environment can be dealt with at a later time. We have seven slaves and if one of them encounters a problem with a duplicate primary key for example, it’s not a big deal for me to log the error somewhere and come back and resolve the issue later. It’s more important to ensure that replication is running and all seven slaves are properly serving request. Besides, if a requst for that primary key comes in again, there is a 1-in-7 chance that the same slave will receive the request.

If you have a similar environment, then the script below may be of help to you:

checkRepliction.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash

MYSQL=/path/to/mysql>
USERNAME=<your mysql login>
PASSWORD=<your mysql password>
EMAIL=<dba email address>
ERRORFILE=</path/to/err

$MYSQL -u$USERNAME -p<password> -Bse "show slave status\G"\
|grep '_Running'\
|grep 'No'>/dev/null 2>&1

if [ "$?" != "1" ]
  then
    HOST=`/usr/bin/hostname`
    ERROR=`$MYSQL -u$USERNAME -p$PASSWORD -Bse "show slave status\G"\
    |grep "Last_SQL_Error"`
    DATE=`date`
    echo "$DATE: $ERROR" >> $ERRORFILE
    /usr/bin/tail -1 $ERRORFILE|/usr/bin/mailx -s $HOST $EMAIL
    $MYSQL -u$USERNAME -p$PASSWORD -Bse "set global sql_slave_skip_counter=1;"
    $MYSQL -u$USERNAME -p$PASSWORD -Bse "slave start;"
fi

This script will look at the results from the “show slave status.” If it finds that either Slave_IO_Running or Slave_SQL_Running values are set to “No,” then it will:

1 - Log the error message to $ERRORFILE
2 - Email $EMAIL with a copy of the error message
3 - Set sql_slave_skip_counter=1 – makes the slave skip over the statement producing the error in the binlog file
4 - Attempt to restart the slave.

I simply cron the script to run once a minute and now I don’t have to wake up in the middle of the night to restart my slave anymore.