Engine Enable HA
Overview
Master Image
When the Master node is enabled with high availability, there are primary and standby instances. Clients can only connect to the primary master and execute commands on it, while the standby master maintains data consistency with the primary master through Write-Ahead Log (WAL) streaming replication. When the primary master fails, the standby master does not automatically switch to become the primary master. Administrators can switch the standby master to become the new primary master by running the gpactivatestandby tool. For detailed information, please refer to Overview of Master Mirroring.
Segment Mirror
The engine database stores data in multiple segment instances, each of which is a PostgreSQL instance. Data is distributed across segment nodes according to the distribution strategy defined in the table creation statement. If high availability is not enabled, the database must be manually recovered after a segment node fails before it can be started.
When high availability is enabled, each segment has a secondary node, referred to as a mirror node. Each segment instance consists of a pair of primary and mirror, with the mirror segment maintaining data consistency with the primary segment through streaming replication based on write-ahead logs (WAL).
For more detailed information, please refer to Overview of Segment Mirroring.
Enable HA
Below describes how to start HA. Assuming HENGSHI SENSE is installed in the /opt/hengshi directory on a host named host1, with the user being hengshi, the engine has 2 segment instances, and HA is not enabled. Another host, named host2, serves as the mirror.
Install and Initialize the Image Host
If HENGSHI SENSE is already installed on the host, you can proceed directly to the next step: Enable HA for Segment.
Create an execution user with root privileges, named hengshi in the example.
bashgrep hengshi /etc/passwd > /dev/null || sudo useradd -m hengshi
Configure passwordless login for host1 and host2.
Create the installation path
/opt/hengshi
.bashsudo mkdir -p /opt/hengshi && sudo chown hengshi:hengshi /opt/hengshi
Install HENGSHI SENSE.
bashsudo su - hengshi # Switch to the product running user cd ~/pkgs/hengshi-sense-[version] # Switch to the extraction target directory ./hs_install -p /opt/hengshi # Execute the installation
Initialize the OS with sudo privileges.
bashsudo su - hengshi # Switch to the product running user cd /opt/hengshi # Enter the installation target directory bin/hengshi-sense-bin init-os all # Initialize the OS
Note
The HENGSHI SENSE service does not need to be started here.
Enable HA for Segment
Mirror segment instances can be deployed in different ways across cluster hosts based on different configurations.
- Group method, the default deployment method. The main segment images corresponding to each host are collectively placed on another host. When one host fails, the number of active segments on the machine where the images of the services of the other host are located will double.
- Spread method ensures that at most one image on each machine is promoted to the primary segment. This approach prevents a sudden increase in pressure on other hosts in the event of a single host failure. Distributing mirrors in a spread manner requires the number of cluster hosts to be greater than the number of segments on each host.
Detailed instructions for the above two deployment methods can be found in Segment Mirroring Overview. This article focuses on deploying segment mirroring in group mode, explaining how to enable HA using the gpaddmirrors tool. The specific steps are as follows.
- Create the configuration file. The format of the configuration file is:
contentID|address|port|data_dir
Field Description:
- contentID: The content ID of the mirror segment, which is the same as the content ID of the primary node. For more details, please refer to the gp_segment_configuration reference information under content.
- address: The hostname or IP of the node.
- port: The listening port of the mirror segment, incremented based on the port base of the existing node.
- data_dir: The data directory of the mirror segment.
Below is an example of the configuration file, assuming the configuration file is named mirrors.txt, and the content of the configuration is:
0|host2|26432|/opt/hengshi/engine-cluster/mirror/SegDataDir0
1|host2|26433|/opt/hengshi/engine-cluster/mirror/SegDataDir1
- Run gpaddmirrors to enable segment mirroring.
source /opt/hengshi/engine-cluster/export-cluster.sh
gpaddmirrors -a -i mirrors.txt
- After the mirror is successfully added, you can see the following prompt message.
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Process results...
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-******************************************************************
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Mirror segments have been added; data synchronization is in progress.
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Data synchronization will continue in the background.
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Use gpstate -s to check the resynchronization progress.
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-******************************************************************
- Verify the synchronization status of the mirrors. Execute the command 'gpstate -m', and you will see the following prompt message, indicating that the mirrors have been synchronized.
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -m
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev'
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46'
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master...
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Current GPDB mirror list and status
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Type = Group
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- Mirror Datadir Port Status Data Status
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- host2 /opt/hengshi/engine-cluster/mirror/SegDataDir0 26432 Passive Synchronized
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- host2 /opt/hengshi/engine-cluster/mirror/SegDataDir1 26433 Passive Synchronized
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
Tip
- The status during verification may be
Failed
. Possible reasons include data still being synchronized or the mirror just starting up. Please try again later. - If the data in the segment is too large, it will put significant pressure on Greenplum. Therefore, it is recommended to enable high availability during periods of low business pressure.
Enable Master's HA
Enabling HA for the master is relatively simple, using the command gpinitstandby can complete the HA enablement.
Enable the mirror of the Master.
Execute the command
gpinitstandby
to enable the mirror of the Master. Refer to the example below.
gpinitstandby -s host2
You can see the following prompt message indicating a successful operation.
20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Validating environment and parameters for standby initialization...
20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking for data directory /opt/hengshi/engine-cluster/data/SegDataDir-1 on host2
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master initialization parameters
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master hostname = bdp2
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master port = 15432
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master hostname = host2
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master port = 15432
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1
20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum update system catalog = On
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y
20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Syncing Greenplum Database extensions to standby
20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-The packages on host2 are consistent.
20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Adding standby master to catalog...
20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Database catalog updated successfully.
20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Updating pg_hba.conf file...
20200313:16:09:58:008076 gpinitstandby:host1:hengshi-[INFO]:-pg_hba.conf files updated successfully.
20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Starting standby master
20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking if standby master is running on host: host2 in directory: /opt/hengshi/engine-cluster/data/SegDataDir-1
20200313:16:10:00:008076 gpinitstandby:host1:hengshi-[INFO]:-Cleaning up pg_hba.conf backup files...
20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Successfully created standby master on host2
- Use gpstate to check the status of the mirror master.
gpstate -f
The command execution will prompt the following message.
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -f
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev'
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46'
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master...
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Standby master details
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-----------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby address = host2
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby port = 15432
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby PID = 3050
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby status = Standby host passive
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--pg_stat_replication
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--WAL Sender State: streaming
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sync state: sync
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sent Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Flush Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Replay Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
Engine Enable HA Common FAQ
Master HA Common Issues
How to switch to standby master when the primary master fails?
When the primary master fails, it will not automatically switch to the standby master. Manual switching is required. On the standby master node host2, execute the following command.
source /opt/hengshi/engine-cluster/export-cluster.sh # If this file is not present, copy it from the original primary master
gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
How to restore the original primary master?
When the original primary master fails, you need to first switch the standby master to the new primary master (referred to as the backup master here), and then switch the original primary master as the new standby master manually.
a. Back up the data directory of the original primary master.
mv /opt/hengshi/engine-cluster/data/SegDataDir-1/opt/hengshi/engine-cluster/data/SegDataDir-1-backup
b. On the backup master node, execute the following command to create the original primary master as a mirrored master.
gpinitstandby -s host1
c. On the standby master node, stop the master.
gpstop -m
d. Execute the switch command on the original primary master node.
gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
e. On the standby master node, back up the data directory.
mv /opt/hengshi/engine-cluster/data/SegDataDir-1 /opt/hengshi/engine-cluster/data/SegDataDir-1-backup
f. On the original primary master node, create the backup master as a mirror master.
gpinitstandby -s host2
- How to Start If the Mirror Master Fails? Execute the following command to start the mirror master.
gpinitstandby -n
- How to Remove the Mirror Master? Execute the following command to remove the mirror master.
gpinitstandby -r
Segment HA Common Issues
- How to restore the original role after switching between master and slave segments? Please follow the instructions below.
- Check the current synchronization status.
gpstate -m
If the status is not Synchronized, wait for them to complete synchronization.
- When running the gprecoverseg tool with the -r option, return the segments to their preferred roles.
gprecoverseg -r
- Confirm their status.
gpstate -e