Circular asynchronous MySQL replication between multiple geographically-distributed Percona XtraDB Clusters with Juju

Intro


I have recently shown you how to replicate databases between two Percona XtraDB Clusters using asynchronous MySQL replication with Juju [1]. Today I am going to take you one step further. I will show you how to configure circular asynchronous MySQL replication between geographically-distributed Percona XtraDB Clusters. I will use Juju for this purpose again as it not only simplifies the deployment, but the entire life cycle management. So ... grab a cup of coffee and see the world around you changing!

Design


Let's assume that you have three geographically-distributed sites: dc1, dc2 and dc3, and you want to replicate example database across Percona XtraDB Clusters located in each site. We will use Juju for modelling purposes and MaaS [2] as a provider for Juju. Each site has MaaS installed, configured and nodes enlisted, and commissioned in MaaS. The whole environment is managed from a Juju client which is external to the sites. The above is presented on the following diagram:

Percona XtraDB Cluster circular asynchronous MySQL replication

P.S.: If you have more than three sites, don't worry. Circular replication scales out, so can replicate the database across multiple Percona XtraDB Clusters.

Initial deployment


Let's assume that you already have Juju client installed, all your three MaaS clouds added to the client and Juju controllers bootstrapped in each cloud. If you don't know how to do it, you can refer to MaaS documentation [3]. You can list Juju controllers by executing the following command:

$ juju list-controllers

Controller  Model    User   Access     Cloud/Region  Models  Machines  HA    Version
juju-dc1*   default  admin  superuser  maas-dc1      2       1         none  2.3.7
juju-dc2    default  admin  superuser  maas-dc2      2       1         none  2.3.7
juju-dc3    default  admin  superuser  maas-dc3      2       1         none  2.3.7

The asterisk character indicates the current controller in use. You can switch between them by executing the following command:

$ juju switch <controller_name>

In each cloud, on each controller we create a model and deploy Percona XtraDB Cluster within this model. I'm going to use bundles [4] today to make the deployment easier:

$ juju switch juju-dc1

$ juju add-model pxc-rep1

$ cat <<EOF > pxc1.yaml

series: xenial
services:
  pxc1:
    charm: "/tmp/charm-percona-cluster"
    num_units: 3
    options:
      cluster-id: 1
      databases-to-replicate: "example"
      root-password: "root"
      vip: 10.0.1.100
  hacluster-pxc1:
    charm: "/tmp/charm-hacluster"
    options:
      cluster_count: 3
relations:
  - [ pxc1, hacluster-pxc1 ]
EOF

$ juju deploy pxc1.yaml

$ juju switch juju-dc2

$ juju add-model pxc-rep2

$ cat <<EOF > pxc2.yaml

series: xenial
services:
  pxc2:
    charm: "/tmp/charm-percona-cluster"
    num_units: 3
    options:
      cluster-id: 2
      databases-to-replicate: "example"
      root-password: "root"
      vip: 10.0.2.100
  hacluster-pxc2:
    charm: "/tmp/charm-hacluster"
    options:
      cluster_count: 3
relations:
  - [ pxc2, hacluster-pxc2 ]
EOF

$ juju deploy pxc2.yaml

$ juju switch juju-dc3

$ juju add-model pxc-rep3

$ cat <<EOF > pxc3.yaml

series: xenial
services:
  pxc3:
    charm: "/tmp/charm-percona-cluster"
    num_units: 3
    options:
      cluster-id: 3
      databases-to-replicate: "example"
      root-password: "root"
      vip: 10.0.3.100
  hacluster-pxc3:
    charm: "/tmp/charm-hacluster"
    options:
      cluster_count: 3
relations:
  - [ pxc3, hacluster-pxc3 ]
EOF

$ juju deploy pxc3.yaml

Re-fill your cup of coffee and after some time check the Juju status:

$ juju switch juju-dc1

$ juju status
Model     Controller  Cloud/Region  Version  SLA
pxc-rep1  juju-dc1    maas-dc1      2.3.7    unsupported

App             Version       Status  Scale  Charm            Store  Rev  OS      Notes
hacluster-pxc1                active      3  hacluster        local    0  ubuntu  
pxc1            5.6.37-26.21  active      3  percona-cluster  local   45  ubuntu  

Unit                 Workload  Agent  Machine  Public address  Ports     Message
pxc1/0*              active    idle   0        10.0.1.1        3306/tcp  Unit is ready
  hacluster-pxc1/0*  active    idle            10.0.1.1                  Unit is ready and clustered
pxc1/1               active    idle   1        10.0.1.2        3306/tcp  Unit is ready
  hacluster-pxc1/1   active    idle            10.0.1.2                  Unit is ready and clustered
pxc1/2               active    idle   2        10.0.1.3        3306/tcp  Unit is ready
  hacluster-pxc1/2   active    idle            10.0.1.3                  Unit is ready and clustered

Machine  State    DNS       Inst id        Series  AZ  Message
0        started  10.0.1.1  juju-83da9e-0  xenial      Running
1        started  10.0.1.2  juju-83da9e-1  xenial      Running
2        started  10.0.1.3  juju-83da9e-2  xenial      Running

Relation provider      Requirer               Interface        Type         Message
hacluster-pxc1:ha      pxc1:ha                hacluster        subordinate  
hacluster-pxc1:hanode  hacluster-pxc1:hanode  hacluster        peer         
pxc1:cluster           pxc1:cluster           percona-cluster  peer

If all units turned to the active state, you're ready to go. Remember to check the status in all models.

Setting up circular asynchronous MySQL replication


In order to set up circular asynchronous MySQL replication between all three Percona XtraDB Clusters we have to relate them. However, as they don't belong to the same model / controller / cloud, we have to create offers [5] first (offers allow cross-model / cross-controller / cross-cloud relations):

$ juju switch juju-dc1

$ juju offer pxc1:slave

$ juju switch juju-dc2

$ juju offer pxc2:slave


$ juju switch juju-dc3

$ juju offer pxc3:slave

Then we have to consume the offers:

$ juju switch juju-dc1

$ juju consume juju-dc2:admin/pxc-rep2.pxc2 pxc2


$ juju switch juju-dc2

$ juju consume juju-dc3:admin/pxc-rep3.pxc3 pxc3


$ juju switch juju-dc3

$ juju consume juju-dc1:admin/pxc-rep1.pxc1 pxc1

Finally, we can add the cross-cloud relations:

$ juju switch juju-dc1

$ juju relate pxc1:master pxc2

$ juju switch juju-dc2

$ juju relate pxc2:master pxc3

$ juju switch juju-dc3

$ juju relate pxc3:master pxc1

Wait a couple of minutes and check whether all units turned into active state:

$ juju switch juju-dc1

$ juju status
Model     Controller  Cloud/Region  Version  SLA
pxc-rep1  juju-dc1    maas-dc1      2.3.7    unsupported

SAAS  Status  Store      URL
pxc2  active  maas-dc2  admin/pxc-rep2.pxc2

App             Version       Status  Scale  Charm            Store  Rev  OS      Notes
hacluster-pxc1                active      3  hacluster        local    0  ubuntu  
pxc1            5.6.37-26.21  active      3  percona-cluster  local   45  ubuntu  

Unit                 Workload  Agent  Machine  Public address  Ports     Message
pxc1/0*              active    idle   0        10.0.1.1        3306/tcp  Unit is ready
  hacluster-pxc1/0*  active    idle            10.0.1.1                  Unit is ready and clustered
pxc1/1               active    idle   1        10.0.1.2        3306/tcp  Unit is ready
  hacluster-pxc1/1   active    idle            10.0.1.2                  Unit is ready and clustered
pxc1/2               active    idle   2        10.0.1.3        3306/tcp  Unit is ready
  hacluster-pxc1/2   active    idle            10.0.1.3                  Unit is ready and clustered

Machine  State    DNS       Inst id        Series  AZ  Message
0        started  10.0.1.1  juju-83da9e-0  xenial      Running
1        started  10.0.1.2  juju-83da9e-1  xenial      Running
2        started  10.0.1.3  juju-83da9e-2  xenial      Running

Offer  Application  Charm            Rev  Connected  Endpoint  Interface                Role
pxc1   pxc1         percona-cluster  48   1/1        slave     mysql-async-replication  requirer

Relation provider      Requirer               Interface        Type         Message
hacluster-pxc1:ha      pxc1:ha                hacluster        subordinate  
hacluster-pxc1:hanode  hacluster-pxc1:hanode  hacluster        peer         
pxc1:cluster           pxc1:cluster           percona-cluster  peer
pxc1:master            pxc2:slave             mysql-async-replication  regular

At this point you should have circular asynchronous MySQL replication working between all three Percona XtraDB Clusters. The asterisk character in the output above indicates the leader unit which owns the VIP. Let's check whether it's actually working by connecting to the MySQL console on the leader unit:

$ juju ssh pxc1/0

$ mysql -u root -p

First check whether as a master it has granted access to pxc2 application units:

mysql> SELECT Host FROM mysql.user WHERE User='replication';
+----------+
| Host     |
+----------+
| 10.0.2.1 |
| 10.0.2.2 |
| 10.0.2.3 |
+----------+
3 rows in set (0.00 sec)

Then check its slave status as a slave of pxc3:

mysql> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.3.100
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 338
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 338
              Relay_Log_Space: 457
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: e803b085-739f-11e8-8f7e-00163e391eab
             Master_Info_File: /var/lib/percona-xtradb-cluster/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Finally, create a database:

mysql> CREATE DATABASE example;
Query OK, 1 row affected (0.01 sec)

and check whether it has been created on pxc2:

$ juju switch juju-dc2

$ juju ssh pxc2/0

$ mysql -u root -p

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| example            |
| mysql              |
| performance_schema |
| test               |
+--------------------+
5 rows in set (0.00 sec)

It is there! It should be created on pxc3 as well. Go there and check it.

At this point you can write to example database from leader units of Percona XtraDB Clusters in each site. This is how circular asynchronous MySQL replication works. Isn't that easy? Of course it is - thanks to Juju!

References






Pacemaker - resource with configured location is not transferred back to the preferred location after recovery from a failure

 Intro


As in the title - I configured a Pacemaker resource and used the location directive to configure its preferred location. The resource got allocated to the preferred node, but it was not transferred there back after a reboot. It took me a few hours to find the root cause and fix it.

Symptoms


Configure a preferred location of Pacemaker resource or resource group:

# crm configure location <location_name> <resource_name> <priority>: <node_name>

For example:

# crm configure location VIP_location VIP 50: node01

should ensure that the VIP resource is always placed on node01, unless other locations with higher priority for this resource are created.

This works fine until the failure of node01. If it happens the resource is transferred to another node in the cluster, which is expected. However, once node01 recovers from the failure, the resource should be transferred back to its preferred location. The problem is that it is not!

Solution


The solution is trivial, but not very well documented. Check the configured value of general cluster resource stickiness. It it's higher or equal to the priority of the location, the resource won't be transferred back to its preferred location after recovery from the failure. So in our example it is enough to execute:

# crm configure rsc_defaults resource-stickiness=49

That's it! A few hours I said ... Yet I managed to find a time to write this post, however :).

How to replicate databases between two Percona XtraDB Clusters - asynchronous MySQL replication with Juju

Intro


I have recently been looking for a solution for Percona XtraDB Cluster databases replication across multiple geographically-distributed sites. My first approach wast to use Galera synchronous replication, but I have found that the official recommendation is to use MySQL asynchronous replication [1]. I have followed one of many available tutorials for setting up replication between two MySQL nodes, but I have quickly faced some issues. Well, setting up replication between one Percona cluster and the other one is not the same as setting up replication between one MySQL node and the other one. Therefore, I have decided to document all of those traps, so that you could avoid them during your deployment. Moreover, I have created a patch for the percona-cluster charm [2], so that you could fully automate your multi-site Percona deployment!


Deployment - manual


Assuming you have two Percona clusters (pxc1 and pxc2) already deployed with VIP, the following is a list of steps to follow in order to setup master-slave replication from pxc1 to pxc2:

1) Update my.cnf file on each unit in each cluster to contain the following settings:

  • pxc1:

[mysqld]
bind-address = 0.0.0.0
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
server_id = 1
binlog_do_db = example1
binlog_do_db = example2

  • pxc2:

[mysqld]
bind-address = 0.0.0.0
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
server_id = 2
binlog_do_db = example1
binlog_do_db = example2

Note that the value of the server_id setting must be different for each cluster, but it must be the same within the cluster [3]. Also note that you have to create separate binlog_do_db entry for each database to replicate.

Once done restart mysql service:

# systemctl restart mysql

2) Create replication user account on pxc1 cluster side by executing the following commands from the unit which owns the VIP:

$ mysql -u root -p

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'10.130.194.21' IDENTIFIED BY 'password';

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'10.130.194.22' IDENTIFIED BY 'password';

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'10.130.194.23' IDENTIFIED BY 'password';

Note that you have to execute the GRANT command for each of the pxc2 cluster members.

Finally, check master status and note down the output as it will be used in point 3:

mysql> SHOW MASTER STATUS;
+------------------+----------+-------------------+------------------+
| File             | Position | Binlog_Do_DB      | Binlog_Ignore_DB |
+------------------+----------+-------------------+------------------+
| mysql-bin.000001 |      618 | example1,example2 |                  |
+------------------+----------+-------------------+------------------+
1 row in set (0.00 sec)

3) Setup replication on pxc2 cluster side by executing the following commands from the unit which owns the VIP:

$ mysql -u root -p

mysql> STOP SLAVE;

mysql> CHANGE MASTER TO master_host='10.130.194.254', master_port=3306, master_user='replication', master_password='password', master_log_file='mysql-bin.000001', master_log_pos=618;

mysql> START SLAVE;

Note that you have to use the VIP of the pxc1 cluster. Also note that you have to use the file and position from point 2.

4) At this point you have master-slave replication working from pxc1 to pxc2. You can check it by logging to the pxc1 unit which owns the VIP and executing the following commands:

mysql> CREATE DATASE example1;

The database is now visible on all units of both pxc1 and pxc2 clusters:

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| example1           |
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
5 rows in set (0.01 sec)

If you want to setup master-master replication you have to follow steps 2 and 3, but execute the commands on the other side of the replication.

Deployment - with Juju


If you haven't heart about Juju yet, now is a perfect time. Juju is an application modelling tool which works on the bases of so called charms which encapsulate the entire logic to install, configure and maintain applications. You can find more information about it at [4].

Let's start with bootstrapping Juju controller on a local LXD [5] provider for testing purposes (if you can't install LXD on your computer, you can setup a VM with Ubuntu LTS and test from there):

$ juju bootstrap localhost lxd-controller

This will take a while. In meantime you can read about other providers (clouds) supported by Juju [6]. Once the controller is bootstrapped you can start modelling. First, download the branch I created (it hasn't been merged with the upstream code yet):

$ cd /tmp

$ git clone git@github.com:tytus-kurek/charm-percona-cluster.git

$ git clone git@github.com:tytus-kurek/charm-hacluster.git

$ cd /tmp/charm-percona-cluster

$ git checkout 1776171

$ cd /tmp/charm-hacluster

$ git checkout 1776171

Then deploy two different clusters:

$ juju deploy --num-units 3 /tmp/charm-percona-cluster pxc1

$ juju deploy --num-units 3 /tmp/charm-percona-cluster pxc2

configure them (VIPs should belong to the same subnet as lxdbr0 interface):

$ juju config pxc1 root-password="changeme"

$ juju config pxc2 root-password="changeme"

$ juju config pxc1 vip="10.130.194.254"

$ juju config pxc1 vip="10.130.194.253"

and deploy hacluster subordinate application for Pacemaker / Corosync management:

$ juju deploy hacluster hacluster-pxc1

$ juju deploy hacluster hacluster-pxc2

$ juju relate pxc1 hacluster-pxc1

$ juju relate pxc2 hacluster-pxc2

That's almost it! At this point you should have 2 different Percona clusters deployed. You can check the status by executing the following command:

$ juju status

Assuming that all units have turned to the active status, you can enable asynchronous MySQL replication between pxc1 and pxc2. First, configure clusters with mandatory settings:

$ juju config pxc1 databases-to-replicate="example1,example2"

$ juju config pxc2 databases-to-replicate="example1,example2"

$ juju config pxc1 cluster-id=1

$ juju config pxc1 cluster-id=2

Then enable replication from pxc1 to pxc2 by executing the following command:

$ juju relate pxc1:master pxc2:slave

Voila! Wanna enable master-master replication? Here you go:

$ juju relate pxc2:master pxc1:slave

Don't you believe it's working? Just check it! It works like a charm. This is why I like Juju so much.

P.S.: Don't hesitate to visit my next post about setting up circular asynchronous MySQL replication between multiple geographically-distributed Percona XtraDB Clusters.


References


[1] https://www.percona.com/resources/mysql-white-papers/high-level-multi-datacenter-mysql-high-availability

[2] https://github.com/tytus-kurek/charm-percona-cluster/tree/1776171

[3] https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster/9433-master-master-between-two-clusters?p=19622#post19622

[4] https://jujucharms.com/

[5] https://www.ubuntu.com/containers/lxd

[6] https://docs.jujucharms.com/devel/en/clouds

How to provide an access from an OpenStack instance to LXD container running on Neutron Gateway

Intro


I have recently been tasked with an interesting problem. Let's assume we have an OpenStack cloud based on Open vSwitch which uses VXLAN tunneling. How would you provide an access from tenant instance to the LXD container running on Neutron Gateway node? The answer seems to be obvious - you assign floating IP to the instance and route the traffic through your networking infrastructure using OpenStack external network. But seriously? What is the point of the whole network virtualization concept then? Wouldn't it be just possible to somehow connect the tenant network with the LXD bridge on the Neutron Gateway node? The answer is "yes" and the following guide will walk you through necessary steps in order to configure it.

Design


The tenant network terminates on OpenStack router which is implemented as Linux namespace on Neutron Gateway node. You can list all OpenStack routers by executing the following command:

# ip netns | grep qrouter
qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec

On the other side LXD containers are attached to a lxdbr0 bridge by default. You can display it by running the following command:

# brctl show lxdbr0
lxdbr0 8000.000000000000 no eth0

Let's assume that the bridge resides on 10.0.0.0/24 subnet and the container has the IP address of 10.0.0.1. In order to create a connection between the tenant subnet and the container we create a veth pair and assign its ends to the LXD bridge and OpenStack router.

Configuration


Let's start with logging to the Neutron Gateway node as the root user.

1) We start with creation of the veth pair which will connect the LXD bridge with the OpenStack router:

# ip link add veth-qrouter type veth peer name veth-lxdbr

You can check whether the veth pair has actually been created by executing the following command:

# ip link | grep veth
2831: veth-lxdbr@veth-qrouter: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
2832: veth-qrouter@veth-lxdbr: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

2) Next we attach the veth-lxdbr interface to the LXD bridge and bring it up:

# brctl addif lxdbr0 veth-qrouter
# ip link set dev veth-qrouter up

You can check whether it has been attached by executing the following command:

# brctl show lxdbr0

lxdbr0 8000.000000000000 no eth0
veth-qrouter

3) Then we attach the veth-qrouter interface to the OpenStack rotuer - Linux namespace:

# ip link set veth-lxdbr netns qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec

Let's check whether it has successfully been attached:

# ip netns exec qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec ip link 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: qr-c4ff6972-03@if2829: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:37:89:65 brd ff:ff:ff:ff:ff:ff
3: qg-55cb180e-67@if2830: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:be:7d:2d brd ff:ff:ff:ff:ff:ff
2831: veth-lxdbr@if2832: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a2:71:8b:21:42:d8 brd ff:ff:ff:ff:ff:ff

It is there! But wait ... it has now disappeared from the default space:

# ip link | grep veth
2832: veth-qrouter@veth-lxdbr: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Of course it did - you have just assigned it to a different space :).

4) The interface withing the space is down. We have to bring it up and configure:

# ip netns exec qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec ip addr add 10.0.0.2/24 dev veth-lxdbr
# ip netns exec qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec ip link set dev veth-lxdbr up

Finally, we have to configure SNAT for the traffic coming from the tenant network:

# ip netns exec qrouter-3960f517-e2e9-4bca-900e-3681db5fe5ec iptables -t nat -I POSTROUTING -o veth-lxdbr -j MASQUERADE 

At this point tenant instances should be able to access the LXD container.