1. Engine启用HA

1.1. 概述

1.1.1. Master镜像

当Master节点启用高可用功能时,会有primarystandby两种实例。客户端只能连接到primary master并在上面执行命令,而standby master采用基于预写日志流复制(WAL)的方式保持与primary master数据一致。
当primary master出现故障时,standby master不会自动切换为primary master,管理员可以通过运行gpactivatestandby工具将standby master切换成为新的primary master。
详细的信息可以参考Master镜像概述

1.1.2. Segment镜像

引擎数据库将数据存储在多个segment实例中,每一个实例都是一个PostgreSQL实例。数据依据建表语句中定义的分布策略在segment节点中分布,如果没有启用高可用,则当某个segment节点出现故障,必须手工恢复后才能启动数据库。
启用高可用后,每一个segment会有一个副节点,称之为镜像(mirror)节点。每个segment实例都由一对primarymirror组成,镜像segment采用基于预写日志(WAL)流复制的方式保持与主segment的数据一致。
详细的信息可以参考Segment镜像概述

1.2. 启用HA

下面讲述如何启动HA。假设HENGSHI SENSE安装在主机名为host1的/opt/hengshi目录下,用户为 hengshi,引擎有2个segment实例,未启用HA。另一台作为镜像的主机名为host2。

1.2.1. 安装并初始化镜像主机

如果主机上已经安装了HENGSHI SENSE,可以直接执行下一步启用Segment的HA

  1. 使用root权限创建执行用户,示例中为hengshi。

     grep hengshi /etc/passwd > /dev/null || sudo useradd -m hengshi
    
  2. 配置host1、host2两台主机免密登录。

  3. 创建安装路径/opt/hengshi

     sudo mkdir -p /opt/hengshi && sudo chown hengshi:hengshi /opt/hengshi
    
  4. 安装HENGSHI SENSE。

     sudo su - hengshi             #切换到产品运行用户
     cd ~/pkgs/hengshi-sense-[version]           #切换到解压目标目录
     ./hs_install -p /opt/hengshi    #执行安装
    
  5. 在sudo权限下,初始化OS。

     sudo su - hengshi             #切换到产品运行用户
     cd /opt/hengshi                 #进入安装目标目录
     bin/hengshi-sense-bin init-os all  #初始化os
    

注意:
这里不需要启动HENGSHI SENSE服务。   

1.2.2. 启用Segment的HA

镜像segment的实例可以根据配置的不同在集群主机中按不同的方式部署。

  • group方式,默认部署方式。每台主机的主segment对应的镜像都整体放在另一台主机上。当其中一台主机故障时,另外一台接管该主机服务的镜像所在的机器的活动segment数量便会翻倍。
  • spread方式,保证每一个机器上至多只有一个镜像提升为primary segment,这种方式可以防止单台主机故障后,另外的主机压力骤增。以spread方式镜像分布,要求集群主机数量多于每台主机上segment的数量。

上述两种部署方式的详细说明请参考Segment镜像概述。本文以group方式部署segment的镜像,讲述如何使用gpaddmirrors工具启用HA。 具体操作如下。

  1. 创建配置文件。配置文件的格式为:

    contentID|address|port|data_dir
    

    字段说明:

    • contentID: 镜像segment的内容ID,与主节点具有相同的内容ID。 详细信息请阅读gp_segment_configuration的参考信息中的content。
    • address:节点主机名或者ip。
    • port :镜像segment的监听端口,在现有节点的端口基数上增加。
    • data_dir: 镜像segment的数据目录。

      如下是配置文件示例,假设配置文件名为mirrors.txt,配置的内容为:

      0|host2|26432|/opt/hengshi/engine-cluster/mirror/SegDataDir0
      1|host2|26433|/opt/hengshi/engine-cluster/mirror/SegDataDir1
      
  2. 运行gpaddmirrors启用segment镜像。

    source /opt/hengshi/engine-cluster/export-cluster.sh
    gpaddmirrors -a -i mirrors.txt
    
  3. 镜像添加成功后,可以看到如下提示信息。

    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Process results...
    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-******************************************************************
    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Mirror segments have been added; data synchronization is in progress.
    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Data synchronization will continue in the background.
    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-Use  gpstate -s  to check the resynchronization progress.
    20200313:15:52:54:007684 gpaddmirrors:host1:hengshi-[INFO]:-******************************************************************
    
  4. 验证镜像的同步情况。执行命令'gpstate -m',可以看到如下提示信息,表示镜像已经同步。

20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -m
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev'
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46'
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master...
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Current GPDB mirror list and status
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--Type = Group
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-   Mirror   Datadir                                        Port    Status    Data Status
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-   host2     /opt/hengshi/engine-cluster/mirror/SegDataDir0   26432   Passive   Synchronized
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:-   host2     /opt/hengshi/engine-cluster/mirror/SegDataDir1   26433   Passive   Synchronized
20200313:15:54:12:007841 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------

说明
1.验证时status的状态可能是Failed,可能原因是数据还在同步或者镜像刚启动,请稍后再试。
2.如果segment的数据过多,会导致Greenplum的压力很大。所以,建议在没有业务压力的时候启用高可用。

1.2.3. 启用Master的HA

启用master的HA相对简单,使用命令gpinitstandby即可完成HA的启用。

  1. 启用Master的镜像。

    执行命令gpinitstandby,启用Maser的镜像。参考示例如下。

    gpinitstandby -s host2
    

    可以看到下面的提示信息时表示操作成功。

    20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Validating environment and parameters for standby initialization...
    20200313:16:09:55:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking for data directory /opt/hengshi/engine-cluster/data/SegDataDir-1 on host2
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master initialization parameters
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:------------------------------------------------------
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master hostname               = bdp2
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master data directory         = /opt/hengshi/engine-cluster/data/SegDataDir-1
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum master port                   = 15432
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master hostname       = host2
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master port           = 15432
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum standby master data directory = /opt/hengshi/engine-cluster/data/SegDataDir-1
    20200313:16:09:56:008076 gpinitstandby:host1:hengshi-[INFO]:-Greenplum update system catalog         = On
    Do you want to continue with standby master initialization? Yy|Nn (default=N):
    > y
    20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Syncing Greenplum Database extensions to standby
    20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-The packages on host2 are consistent.
    20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Adding standby master to catalog...
    20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Database catalog updated successfully.
    20200313:16:09:57:008076 gpinitstandby:host1:hengshi-[INFO]:-Updating pg_hba.conf file...
    20200313:16:09:58:008076 gpinitstandby:host1:hengshi-[INFO]:-pg_hba.conf files updated successfully.
    20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Starting standby master
    20200313:16:09:59:008076 gpinitstandby:host1:hengshi-[INFO]:-Checking if standby master is running on host: host2  in directory: /opt/hengshi/engine-cluster/data/SegDataDir-1
    20200313:16:10:00:008076 gpinitstandby:host1:hengshi-[INFO]:-Cleaning up pg_hba.conf backup files...
    20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
    20200313:16:10:01:008076 gpinitstandby:host1:hengshi-[INFO]:-Successfully created standby master on host2
    
  2. 使用gpstate检查镜像master的状态。

gpstate -f
命令执行后会提示如下信息。
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Starting gpstate with args: -f
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.2.1 build dev'
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit compiled on Dec 23 2019 17:10:46'
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Obtaining Segment details from master...
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-Standby master details
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-----------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-   Standby address          = host2
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-   Standby data directory   = /opt/hengshi/engine-cluster/data/SegDataDir-1
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-   Standby port             = 15432
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-   Standby PID              = 3050
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:-   Standby status           = Standby host passive
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--pg_stat_replication
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--WAL Sender State: streaming
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sync state: sync
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Sent Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Flush Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--Replay Location: 0/C000000
20200313:16:13:07:008235 gpstate:host1:hengshi-[INFO]:--------------------------------------------------------------

1.3. Engine启用HA常见FAQ

1.3.1. Master启用HA常见问题

  1. primary master发生故障时如何切换到standby master?

    当primary master发生故障后,不会自动切换到standby master,需要手工切换。 在standby master的节点 host2 上,执行如下命令。

source /opt/hengshi/engine-cluster/export-cluster.sh               # 如果没有该文件,从原来的 primary master 拷贝过去即可
gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
  1. 如何恢复原来的primary master?

    当原来的primary master发生故障时需要先将standby master切换为新的primary master(这里称为后备master),然后把原来的primary master作为新的standby master,并手动切换。

    a. 备份原来的 primary master的数据目录。

    mv /opt/hengshi/engine-cluster/data/SegDataDir-1/opt/hengshi/engine-cluster/data/SegDataDir-1-backup
    

    b. 在后备master节点上,执行如下命令,将原来的primary master创建为镜像master。

    gpinitstandby -s host1
    

    c. 在后备master节点上,停止master。

    gpstop -m
    

    d. 在原来的primary master节点上,执行切换命令。

    gpactivatestandby -d /opt/hengshi/engine-cluster/data/SegDataDir-1
    

    e. 在后备master节点上,备份数据目录。

    mv /opt/hengshi/engine-cluster/data/SegDataDir-1 /opt/hengshi/engine-cluster/data/SegDataDir-1-backup
    

    f. 在原来的primary master节点上,把备份master作为镜像master进行创建。

    gpinitstandby -s host2
    
  2. 如果镜像master故障,如何启动?
    执行如下命令,可以启动镜像master。

    gpinitstandby -n
    
  3. 如何移除镜像master?
    执行如下命令,可以移除镜像master。

    gpinitstandby -r
    

1.3.2. Segment启用HA常见问题

  1. 主从segment切换后,如何恢复原来的角色?
    请按如下指导进行操作。
    • 检查当前的同步状况。
      gpstate -m
      
      如果状态不是Synchronized ,则等待它们完成同步。
    • 运行gprecoverseg工具时带有-r选项,让segment回到它们的首选角色。
      gprecoverseg -r
      
    • 确认它们的状态。
      gpstate -e
      

results matching ""

    No results matching ""

    Greenplum引擎扩容 Doris引擎扩容