Engine Upgrade (GP5 -> GP6)
This article explains how to upgrade GP5 from the historical version to GP6.
Pre-upgrade Environment Preparation
Before upgrading, ensure the environment meets the following requirements:
- During the upgrade process, all machines in the engine, including Master and Segment, must reserve 2 times the current node's data capacity as free space. One time the space is needed for backing up the original data, and another time for GP6 storage. Use the following command to check the current node's data size:shell
HENGSHI_HOME=/opt/hengshi du ${HENGSHI_HOME}/engine-cluster -sch
- The number of Segments during the upgrade must be the same as the number of nodes before the upgrade.
- Since the upgrade process relies on SSH, it is recommended to configure SSH passwordless login.
Engine Upgrade Process
The following operations are performed on the Master machine of the engine. During the gpbackup phase, all Segments will concurrently write data to the local directory specified by the backup-dir parameter on their respective machines. Similarly, during gprestore, all Segments will load the backup data files from their respective machines.
Backup GP5 Data
Stop all services and start only the engine.
shellHENGSHI_HOME=/opt/hengshi ${HENGSHI_HOME}/bin/hengshi-sense-bin stop all ${HENGSHI_HOME}/bin/hengshi-sense-bin start engine
Create a directory for the installation package extraction, such as hengshi-[version].
Download the migration tool.
shellcd hengshi-[version] wget https://download.hengshi.com/3rd/pivotal_greenplum_backup_restore-1.15.0-1.tar.gz
Update engine-related configurations.
shellHENGSHI_HOME=/opt/hengshi cd ${HENGSHI_HOME} test -f conf/hengshi-sense-env.sh || cp conf/hengshi-sense-env.sh.sample conf/hengshi-sense-env.sh set_kv_config() { local config_file="$1" local param="$2" local val="$3" # edit param=val if exist or insert new param=val grep -E "^\s*${param}\s*=" "${config_file}" > /dev/null \ || sed -i "$ a ${param}=${val}" "${config_file}" } set_kv_config conf/hengshi-sense-env.sh HS_PG_DB postgres set_kv_config conf/hengshi-sense-env.sh HS_PG_USR postgres set_kv_config conf/hengshi-sense-env.sh HS_PG_PWD postgres set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_DB postgres set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_USR postgres set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_PWD postgres
Export GP5 data and specify the directory to store the exported data, such as: ${HENGSHI_HOME}/gpbackup. The free space in this directory must be larger than the current data size.
shellexport HENGSHI_HOME=/opt/hengshi cd hengshi-[version] tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, extract to the current GP5 symlink directory bash #launch a new bash source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh psql postgres -A -t -c "select 'drop view '|| viewname || ' cascade;' from pg_catalog.pg_views where schemaname NOT IN ('pg_catalog', 'information_schema', 'gp_toolkit') order by schemaname, viewname" > drop_views.sql cat drop_views.sql psql postgres -f drop_views.sql psql postgres -c "drop function if exists public.safe_to_number(text)" # backup gpbackup --dbname postgres --backup-dir ${HENGSHI_HOME}/gpbackup --compression-level 9 exit #exit new bash
Tip
In the example, the database name is postgres. Specify the actual database name during operation. If there are multiple databases, each database needs to be backed up separately, and a different --backup-dir
must be specified. The value of --compression-level
ranges from 1 to 9. The higher the value, the greater the compression ratio, but the longer the time required. During testing, it was found that when the level is 6, 100G takes about 1 hour, and the backup data size is approximately 30G. This result is for reference only. Other parameters for the gpbackup command can be found at gpbackup.
Start GP6
Stop GP5 and start GP6.
shellHENGSHI_HOME=/opt/hengshi cd hengshi-[version] cp -r lib/gpdb-6* ${HENGSHI_HOME}/lib cd ${HENGSHI_HOME} bin/hengshi-sense-bin stop engine mv engine-cluster engine-cluster.gp5.bak gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-* -dvr --color=never| head -n 1) gpdb_name=${gpdb_name##*/} cd ${HENGSHI_HOME}/lib rm -f gpdb ln -s ${gpdb_name} gpdb cd ${HENGSHI_HOME} bin/hengshi-sense-bin init engine bin/hengshi-sense-bin start engine
After GP6 starts, import engine data.
shellexport HENGSHI_HOME=/opt/hengshi cd hengshi-[version] tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, extract to the current GP6 symlink directory bash #launch a new bash source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh psql postgres -c 'create role dwguest' # find all timestamp (14chars) find ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/ -maxdepth 2 | sort # restore with a timestamp gprestore --backup-dir ${HENGSHI_HOME}/gpbackup --timestamp xxxxxxxxxxxxxx exit #exit new bash
Notes:
- If issues occur during import and re-import is needed, you can execute the following steps to reinitialize and restart.bash
cd ${HENGSHI_HOME} bin/hengshi-sense-bin stop engine rm -rf engine-cluster bin/hengshi-sense-bin init engine bin/hengshi-sense-bin start engine
- Using this method to import engine data will not include global objects, such as Tablespaces, Databases, Database-wide configuration parameter settings (GUCs), Resource group definitions, Resource queue definitions, Roles, GRANT assignments of roles to databases. Refer to Parallel Backup with gpbackup and gprestore. This may result in missing roles or queues. Refer to the following methods to resolve:
- Specify the
--with-globals
option, but it may prompt that roles or queues already exist. You need to check and delete them before import or use the--on-error-continue
option to ignore errors. However, this option will ignore all errors, so use it cautiously. - Manually create roles and queues. Open the file ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/YYYYMMDD/YYYYMMDDHHMMSS/gpbackup_YYYYMMDDHHMMSS_metadata.sql to see which roles and queues were created, then manually execute the creation commands. Existing roles and queues can be ignored. If there are authorization operations for roles or queues, execute the corresponding authorization commands. Carefully check to avoid omissions.
- Specify the
- If
safe_to_number
is missing, manually create it.
sqlCREATE OR REPLACE FUNCTION SAFE_TO_NUMBER(text) RETURNS numeric IMMUTABLE STRICT AS $ BEGIN RETURN $1::numeric; EXCEPTION WHEN OTHERS THEN RETURN NULL; END $ LANGUAGE plpgsql;
- If the database name is missing, you can specify the
--create-db
option to automatically create the database. If it already exists, do not specify this option, otherwise an error will occur. - You can specify
--metadata-only
to import only metadata, including table creation, but not data. - You can specify
--data-only
to import only data, excluding table creation. - Based on self-testing results, with a compression level of 6, the time taken is approximately 1.5 times the backup duration.
- For details about the gprestore command, refer to link.
- If issues occur during import and re-import is needed, you can execute the following steps to reinitialize and restart.
Clean up data after a successful upgrade.
shellHENGSHI_HOME=/opt/hengshi cd ${HENGSHI_HOME} rm -rf engine-cluster.gp5.bak rm -rf lib/gpdb-5*
Rollback After Upgrade Failure
If issues occur during the upgrade process, follow the steps below to perform a rollback.
Stop all HENGSHI SENSE services.
shellHENGSHI_HOME=/opt/hengshi ${HENGSHI_HOME}/bin/hengshi-sense-bin stop all
Delete the GP6 data directory.
shellHENGSHI_HOME=/opt/hengshi cd ${HENGSHI_HOME} test -d engine-cluster.gp5.bak && rm -rf engine-cluster
Restore GP5-related engine data.
shellHENGSHI_HOME=/opt/hengshi cd ${HENGSHI_HOME} mv engine-cluster.gp5.bak engine-cluster gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-5* -dvr --color=never| head -n 1) gpdb_name=${gpdb_name##*/} rm -f ${HENGSHI_HOME}/lib/gpdb cd ${HENGSHI_HOME}/lib ln -sf ${gpdb_name} ${HENGSHI_HOME}/lib/gpdb