Skip to content

Engine Upgrade (GP5 -> GP6)

This article explains how to upgrade GP5 from the historical version to GP6.

Pre-upgrade Environment Preparation

Before upgrading, ensure the environment meets the following requirements:

  • During the upgrade process, all machines in the engine, including Master and Segment, must reserve 2 times the current node's data capacity as free space. One time the space is needed for backing up the original data, and another time for GP6 storage. Use the following command to check the current node's data size:
    shell
    HENGSHI_HOME=/opt/hengshi
    du ${HENGSHI_HOME}/engine-cluster -sch
  • The number of Segments during the upgrade must be the same as the number of nodes before the upgrade.
  • Since the upgrade process relies on SSH, it is recommended to configure SSH passwordless login.

Engine Upgrade Process

The following operations are performed on the Master machine of the engine. During the gpbackup phase, all Segments will concurrently write data to the local directory specified by the backup-dir parameter on their respective machines. Similarly, during gprestore, all Segments will load the backup data files from their respective machines.

Backup GP5 Data

  1. Stop all services and start only the engine.

    shell
    HENGSHI_HOME=/opt/hengshi
    ${HENGSHI_HOME}/bin/hengshi-sense-bin stop all
    ${HENGSHI_HOME}/bin/hengshi-sense-bin start engine
  2. Create a directory for the installation package extraction, such as hengshi-[version].

  3. Download the migration tool.

    shell
    cd hengshi-[version]
    wget https://download.hengshi.com/3rd/pivotal_greenplum_backup_restore-1.15.0-1.tar.gz
  4. Update engine-related configurations.

    shell
    HENGSHI_HOME=/opt/hengshi
    cd ${HENGSHI_HOME}
    test -f conf/hengshi-sense-env.sh || cp conf/hengshi-sense-env.sh.sample conf/hengshi-sense-env.sh
    set_kv_config() {
        local config_file="$1"
        local param="$2"
        local val="$3"
        # edit param=val if exist or insert new param=val
        grep -E "^\s*${param}\s*=" "${config_file}" > /dev/null \
                    || sed -i "$ a ${param}=${val}" "${config_file}"
    }
    set_kv_config conf/hengshi-sense-env.sh HS_PG_DB postgres
    set_kv_config conf/hengshi-sense-env.sh HS_PG_USR postgres
    set_kv_config conf/hengshi-sense-env.sh HS_PG_PWD postgres
    set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_DB postgres
    set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_USR postgres
    set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_PWD postgres
  5. Export GP5 data and specify the directory to store the exported data, such as: ${HENGSHI_HOME}/gpbackup. The free space in this directory must be larger than the current data size.

    shell
    export HENGSHI_HOME=/opt/hengshi
    cd hengshi-[version]
    tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, extract to the current GP5 symlink directory
    bash #launch a new bash
    source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh
    psql postgres -A -t -c "select 'drop view '|| viewname || ' cascade;' from pg_catalog.pg_views where schemaname NOT IN ('pg_catalog', 'information_schema', 'gp_toolkit') order by schemaname, viewname" > drop_views.sql
    cat drop_views.sql
    psql postgres -f drop_views.sql
    psql postgres -c "drop function if exists public.safe_to_number(text)"
    # backup
    gpbackup --dbname postgres --backup-dir ${HENGSHI_HOME}/gpbackup --compression-level 9
    exit #exit new bash

Tip

In the example, the database name is postgres. Specify the actual database name during operation. If there are multiple databases, each database needs to be backed up separately, and a different --backup-dir must be specified. The value of --compression-level ranges from 1 to 9. The higher the value, the greater the compression ratio, but the longer the time required. During testing, it was found that when the level is 6, 100G takes about 1 hour, and the backup data size is approximately 30G. This result is for reference only. Other parameters for the gpbackup command can be found at gpbackup.

Start GP6

  1. Stop GP5 and start GP6.

    shell
    HENGSHI_HOME=/opt/hengshi
    cd hengshi-[version]
    cp -r lib/gpdb-6* ${HENGSHI_HOME}/lib
    cd ${HENGSHI_HOME}
    bin/hengshi-sense-bin stop engine
    mv engine-cluster engine-cluster.gp5.bak
    gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-* -dvr --color=never| head -n 1)
    gpdb_name=${gpdb_name##*/}
    cd ${HENGSHI_HOME}/lib
    rm -f gpdb
    ln -s ${gpdb_name} gpdb
    cd ${HENGSHI_HOME}
    bin/hengshi-sense-bin init engine
    bin/hengshi-sense-bin start engine
  2. After GP6 starts, import engine data.

    shell
    export HENGSHI_HOME=/opt/hengshi
    cd hengshi-[version]
    tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, extract to the current GP6 symlink directory
    bash #launch a new bash
    source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh
    psql postgres -c 'create role dwguest'
    # find all timestamp (14chars)
    find ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/ -maxdepth 2 | sort
    # restore with a timestamp
    gprestore --backup-dir ${HENGSHI_HOME}/gpbackup --timestamp xxxxxxxxxxxxxx
    exit #exit new bash

    Notes:

    • If issues occur during import and re-import is needed, you can execute the following steps to reinitialize and restart.
      bash
      cd ${HENGSHI_HOME}
      bin/hengshi-sense-bin stop engine
      rm -rf engine-cluster
      bin/hengshi-sense-bin init engine
      bin/hengshi-sense-bin start engine
    • Using this method to import engine data will not include global objects, such as Tablespaces, Databases, Database-wide configuration parameter settings (GUCs), Resource group definitions, Resource queue definitions, Roles, GRANT assignments of roles to databases. Refer to Parallel Backup with gpbackup and gprestore. This may result in missing roles or queues. Refer to the following methods to resolve:
      • Specify the --with-globals option, but it may prompt that roles or queues already exist. You need to check and delete them before import or use the --on-error-continue option to ignore errors. However, this option will ignore all errors, so use it cautiously.
      • Manually create roles and queues. Open the file ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/YYYYMMDD/YYYYMMDDHHMMSS/gpbackup_YYYYMMDDHHMMSS_metadata.sql to see which roles and queues were created, then manually execute the creation commands. Existing roles and queues can be ignored. If there are authorization operations for roles or queues, execute the corresponding authorization commands. Carefully check to avoid omissions.
    • If safe_to_number is missing, manually create it.
    sql
    CREATE OR REPLACE FUNCTION SAFE_TO_NUMBER(text)
    RETURNS numeric IMMUTABLE STRICT AS
    $
    BEGIN
      RETURN $1::numeric;
    EXCEPTION WHEN OTHERS THEN
      RETURN NULL;
    END
    $ LANGUAGE plpgsql;
    • If the database name is missing, you can specify the --create-db option to automatically create the database. If it already exists, do not specify this option, otherwise an error will occur.
    • You can specify --metadata-only to import only metadata, including table creation, but not data.
    • You can specify --data-only to import only data, excluding table creation.
    • Based on self-testing results, with a compression level of 6, the time taken is approximately 1.5 times the backup duration.
    • For details about the gprestore command, refer to link.
  3. Clean up data after a successful upgrade.

    shell
    HENGSHI_HOME=/opt/hengshi
    cd ${HENGSHI_HOME}
    rm -rf engine-cluster.gp5.bak
    rm -rf lib/gpdb-5*

Rollback After Upgrade Failure

If issues occur during the upgrade process, follow the steps below to perform a rollback.

  1. Stop all HENGSHI SENSE services.

    shell
    HENGSHI_HOME=/opt/hengshi
    ${HENGSHI_HOME}/bin/hengshi-sense-bin stop all
  2. Delete the GP6 data directory.

    shell
    HENGSHI_HOME=/opt/hengshi
    cd ${HENGSHI_HOME}
    test -d engine-cluster.gp5.bak && rm -rf engine-cluster
  3. Restore GP5-related engine data.

    shell
    HENGSHI_HOME=/opt/hengshi
    cd ${HENGSHI_HOME}
    mv engine-cluster.gp5.bak engine-cluster
    gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-5* -dvr --color=never| head -n 1)
    gpdb_name=${gpdb_name##*/}
    rm -f ${HENGSHI_HOME}/lib/gpdb
    cd ${HENGSHI_HOME}/lib
    ln -sf ${gpdb_name} ${HENGSHI_HOME}/lib/gpdb

User Manual for Hengshi Analysis Platform