1. Engine启用HA
1.1. 概述
1.1.1. Master镜像
当Master节点启用高可用功能时,会有primary和standby两种实例。客户端只能连接到primary master并在上面执行命令,而standby master采用基于预写日志流复制(WAL)的方式保持与primary master数据一致。
当primary master出现故障时,standby master不会自动切换为primary master,管理员可以通过运行gpactivatestandby工具将standby master切换成为新的primary master。
详细的信息可以参考Master镜像概述。
1.1.2. Segment镜像
引擎数据库将数据存储在多个segment实例中,每一个实例都是一个PostgreSQL实例。数据依据建表语句中定义的分布策略在segment节点中分布,如果没有启用高可用,则当某个segment节点出现故障,必须手工恢复后才能启动数据库。
启用高可用后,每一个segment会有一个副节点,称之为镜像(mirror)节点。每个segment实例都由一对primary和mirror组成,镜像segment采用基于预写日志(WAL)流复制的方式保持与主segment的数据一致。
详细的信息可以参考Segment镜像概述。
1.2. 启用HA
下面讲述如何启动HA。假设HENGSHI SENSE安装在主机名为host1的/opt/hengshi目录下,用户为 hengshi,引擎有2个segment实例,未启用HA。另一台作为镜像的主机名为host2。
1.2.1. 安装并初始化镜像主机
如果主机上已经安装了HENGSHI SENSE,可以直接执行下一步启用Segment的HA。
使用root权限创建执行用户,示例中为hengshi。
# grep hengshi /etc/passwd > /dev/null || sudo useradd -m hengshi
配置host1、host2两台主机免密登录。
创建安装路径
/opt/hengshi
。$ sudo mkdir -p /opt/hengshi && sudo chown hengshi:hengshi /opt/hengshi
安装HENGSHI SENSE。
$ sudo su - hengshi #切换到产品运行用户 $ cd ~/pkgs/hengshi-sense-[version] #切换到解压目标目录 $ ./hs_install -p /opt/hengshi #执行安装
在sudo权限下,初始化OS。
$ sudo su - hengshi #切换到产品运行用户 $ cd /opt/hengshi #进入安装目标目录 $ bin/hengshi-sense-bin init-os all #初始化os
注意:
这里不需要启动HENGSHI SENSE服务。
1.2.2. 启用Segment的HA
镜像segment的实例可以根据配置的不同在集群主机中按不同的方式部署。
- group方式,默认部署方式。每台主机的主segment对应的镜像都整体放在另一台主机上。当其中一台主机故障时,另外一台接管该主机服务的镜像所在的机器的活动segment数量便会翻倍。
- spread方式,保证每一个机器上至多只有一个镜像提升为primary segment,这种方式可以防止单台主机故障后,另外的主机压力骤增。以spread方式镜像分布,要求集群主机数量多于每台主机上segment的数量。
上述两种部署方式的详细说明请参考Segment镜像概述。本文以group方式部署segment的镜像,讲述如何使用gpaddmirrors工具启用HA。 具体操作如下。
创建配置文件。配置文件的格式为:
contentID|address|port|data_dir
字段说明:
- contentID: 镜像segment的内容ID,与主节点具有相同的内容ID。 详细信息请阅读gp_segment_configuration的参考信息中的content。
- address:节点主机名或者ip。
- port :镜像segment的监听端口,在现有节点的端口基数上增加。
data_dir: 镜像segment的数据目录。
如下是配置文件示例,假设配置文件名为mirrors.txt,配置的内容为:
0|host2|26432|/opt/hengshi/engine-cluster/mirror/SegDataDir0 1|host2|26433|/opt/hengshi/engine-cluster/mirror/SegDataDir1
运行gpaddmirrors启用segment镜像。
source /opt/hengshi/engine-cluster/export-cluster.sh gpaddmirrors -a -i mirrors.txt
镜像添加成功后,可以看到如下提示信息。
20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Process results... 20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-****************************************************************** 20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Mirror segments have been added; data synchronization is in progress. 20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Data synchronization will continue in the background. 20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Use gpstate -s to check the resynchronization progress. 20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-******************************************************************
验证镜像的同步情况。执行命令'gpstate -m',可以看到如下提示信息,表示镜像已经同步。
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -m 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev' 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46' 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master... 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-------------------------------------------------------------- 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Current GPDB mirror list and status 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Type = Group 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-------------------------------------------------------------- 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- Mirror Datadir Port Status Data Status 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- host2 /opt/hengshi/engine-cluster/mirror/SegDataDir0 26432 Passive Synchronized 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:- host2 /opt/hengshi/engine-cluster/mirror/SegDataDir1 26433 Passive Synchronized 20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
说明
1.验证时status的状态可能是Failed
,可能原因是数据还在同步或者镜像刚启动,请稍后再试。
2.如果segment的数据过多,会导致Greenplum的压力很大。所以,建议在没有业务压力的时候启用高可用。
1.2.3. 启用Master的HA
启用master的HA相对简单,使用命令gpinitstandby即可完成HA的启用。
启用Master的镜像。
执行命令gpinitstandby,启用Maser的镜像。参考示例如下。
gpinitstandby -s host2
可以看到下面的提示信息时表示操作成功。
20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Validating environment and parameters for standby initialization... 20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking for data directory /opt/hengshi/engine-cluster/data/SegDataDir-1 on host2 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------ 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master initialization parameters 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------ 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master hostname = bdp2 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master port = 15432 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master hostname = host2 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master port = 15432 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1 20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum update system catalog = On Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Syncing Greenplum Database extensions to standby 20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-The packages on host2 are consistent. 20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Adding standby master to catalog... 20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Database catalog updated successfully. 20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Updating pg_hba.conf file... 20200313:16:09:58:008076 gpinitstandby:host1:hengshi-[INFO]:-pg_hba.conf files updated successfully. 20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Starting standby master 20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking if standby master is running on host: host2 in directory: /opt/hengshi/engine-cluster/data/SegDataDir-1 20200313:16:10:00:008076 gpinitstandby:host1:hengshi-[INFO]:-Cleaning up pg_hba.conf backup files... 20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Successfully created standby master on host2
使用gpstate检查镜像master的状态。
gpstate -f
命令执行后会提示如下信息。
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -f 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev' 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46' 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master... 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Standby master details 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:----------------------- 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby address = host2 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby port = 15432 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby PID = 3050 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:- Standby status = Standby host passive 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-------------------------------------------------------------- 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--pg_stat_replication 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-------------------------------------------------------------- 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--WAL Sender State: streaming 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sync state: sync 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sent Location: 0/C000000 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Flush Location: 0/C000000 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Replay Location: 0/C000000 20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
1.3. Engine启用HA常见FAQ
1.3.1. Master启用HA常见问题
primary master发生故障时如何切换到standby master?
当primary master发生故障后,不会自动切换到standby master,需要手工切换。 在standby master的节点 host2 上,执行如下命令。
source /opt/hengshi/engine-cluster/export-cluster.sh # 如果没有该文件,从原来的 primary master 拷贝过去即可 gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
如何恢复原来的primary master?
当原来的primary master发生故障时需要先将standby master切换为新的primary master(这里称为后备master),然后把原来的primary master作为新的standby master,并手动切换。
a. 备份原来的 primary master的数据目录。
mv /opt/hengshi/engine-cluster/data/SegDataDir-1/opt/hengshi/engine-cluster/data/SegDataDir-1-backup
b. 在后备master节点上,执行如下命令,将原来的primary master创建为镜像master。
gpinitstandby -s host1
c. 在后备master节点上,停止master。
gpstop -m
d. 在原来的primary master节点上,执行切换命令。
gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
e. 在后备master节点上,备份数据目录。
mv /opt/hengshi/engine-cluster/data/SegDataDir-1 /opt/hengshi/engine-cluster/data/SegDataDir-1-backup
f. 在原来的primary master节点上,把备份master作为镜像master进行创建。
gpinitstandby -s host2
如果镜像master故障,如何启动?
执行如下命令,可以启动镜像master。gpinitstandby -n
如何移除镜像master?
执行如下命令,可以移除镜像master。gpinitstandby -r
1.3.2. Segment启用HA常见问题
- 主从segment切换后,如何恢复原来的角色?
请按如下指导进行操作。- 检查当前的同步状况。
如果状态不是Synchronized ,则等待它们完成同步。gpstate -m
- 运行gprecoverseg工具时带有-r选项,让segment回到它们的首选角色。
gprecoverseg -r
- 确认它们的状态。
gpstate -e
- 检查当前的同步状况。
results matching ""
No results matching ""
衡石文档
- 产品功能一览
- 发布说明
- 新手上路
- 安装与启动
- 系统管理员手册
- 数据管理员手册
- 分析人员手册
- 数据查看员手册
- 数据服务
- 最佳实践
- 衡石分析平台 API 手册
- 附录