Skip to content

Engine Scaling

Engine scaling can be divided into two methods: vertical scaling and horizontal scaling:

  • Vertical scaling increases the number of segments on a single machine without adding more machines.
  • Horizontal scaling adds more machines while keeping the number of segments per machine unchanged.

The two scaling methods differ only in planning, but the specific operational steps are the same.

Preparatory Work Before System Expansion

Before expansion, please complete the following preparatory tasks:

  • Stop HENGSHI and other services that use the engine.
bash
${HENGSHI_HOME}/bin/hengshi-sense-bin stop hengshi
  • Clean up data inflation, divided into the following two scenarios:

    1. If the daily routine has not yet run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r via crontab.
      • During non-working hours, run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r -f once. The duration of this process varies depending on the actual situation, typically calculated in hours, and this operation cannot be interrupted midway.
      • Then configure it to execute ${HENGSHI_HOME}/bin/clean_engine.sh -t -r daily at midnight.
    2. If the daily routine is already configured to run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r via crontab.
      • During non-working hours, run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r -f once.
  • Modify the hostname and address fields in the gp_segment_configuration table to domain names and addresses that can distinguish different machines within the local network (i.e., change localhost or 127.0.0.1 to intranet domain names or addresses).
    For example, query the current environment:

sql
$ psql
psql (9.4.24)
Type "help" for help.

hengshi=# select  * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port |    hostname     |     address     |                           datadir
------+---------+------+----------------+------+--------+------+-----------------+-----------------+--------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5232 | localhost | localhost | /home/data/engine-cluster/data/SegDataDir-1
    2 |       0 | p    | p              | n    | u      | 5233 | localhost | localhost | /home/data/engine-cluster/data/SegDataDir0
(2 rows)
  • greenplum6
sql
psql (9.4.24)
Type "help" for help.

set allow_system_table_mods='true';
update gp_segment_configuration set address = '<real_ip>' where address = 'localhost';
update gp_segment_configuration set hostname = '<real_hostname>' where hostname = 'localhost';

Add Machine Expansion

Preparations for New Machines Before Expansion

Before horizontal expansion, the following preparations need to be carried out on the new machines.

Please refer to Engine New Node Environment Preparation to complete the preparation of new nodes.

GP Expansion

Clean Up Historical Expansion Schema

For clusters that have previously undergone expansion operations, there may be leftover schemas from the last expansion due to operational oversights. These need to be cleaned up first; otherwise, the expansion will fail.

bash
source /opt/hengshi/engine-cluster/export-cluster.sh
psql postgres -c '\dn'

If a schema named gpexpand exists, it needs to be cleaned up.

bash
source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand -c

After cleaning, check whether the cleanup was successful (repeat the above psql postgres -c '\dn' method). If it is not completely cleaned, you can choose to clean it manually.

sql
source /opt/hengshi/engine-cluster/export-cluster.sh
psql postgres
drop view gpexpand.expansion_progress;
drop table gpexpand.status;
drop table gpexpand.status_detail;
drop schema gpexpand;

Expansion Operations

Expansion is divided into non-mirrored expansion and mirrored expansion.

Non-Mirrored Expansion

Generate Expansion Configuration File
  1. First, prepare a new-host file and list the new hostnames to be added. Ensure there are no duplicates. If no new hosts are being added, use an empty file. Refer to the example below for adding a new machine with the hostname engine2 to the cluster.
bash
engine2
  1. Execute the following commands to generate the expansion configuration file.
bash
cd /opt/hengshi
source engine-cluster/export-cluster.sh
gpexpand -f new-host
  1. Follow the prompts to input the required information and generate the expansion configuration file.
bash
Would you like to initiate a new System Expansion Yy|Nn (default=N): # Answer 'y'
Are you sure you want to continue with this gpexpand session? Yy|Nn (default=N): # Answer 'y'
How many new primary segments per host do you want to add? (default=0): # If adding new machines, answer '0'; if adding segments to existing machines, specify the number of additional segments, e.g., answer '2' to add 2 segments per machine.
Enter new primary data directory 1: # Provide the root directory for the data, e.g., if the current directory is /opt/hengshi/engine-cluster/data/SegDataDir-1, answer '/opt/hengshi/engine-cluster/data'
Enter new primary data directory 2: # Provide the root directory for the data, e.g., if the current directory is /opt/hengshi/engine-cluster/data/SegDataDir-1, answer '/opt/hengshi/engine-cluster/data'

Wait for the execution to complete. In the example, gpexpand_inputfile_20210806_121646 is the generated expansion configuration file.

bash
Generating configuration file...

20210806:12:16:46:744854 gpexpand:m86:lvliang-[INFO]:-Generating input file...

Input configuration file was written to 'gpexpand_inputfile_20210806_121646'.

Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20210806_121646

20210806:12:16:46:744854 gpexpand:m86:lvliang-[INFO]:-Exiting...
  1. Explanation of the configuration file format.
bash
<hostname>|<address>|<port>|<datadir>|<dbid>|<content>|<preferred_role>

The preferred_role value can be either p or m, indicating whether the segment is a primary node or a mirror. p represents a primary node, and m represents a mirror.

Assume there are two existing machines, engine1 and engine2, each with one segment. If one additional segment is added to each machine, and the file is named gpexpand_inputfile, the configuration file content would be as follows:

bash
engine1|engine1|25433|/opt/hengshi/engine-cluster/data/SegDataDir1|4|2|p
engine2|engine2|25433|/opt/hengshi/engine-cluster/data/SegDataDir2|5|3|p
Execute Expansion

Run the following command to perform expansion. During the expansion process, you need to use the gpexpand_inputfile_xxxxxxxx_xxxxxx file generated in the previous step.

bash
source /opt/hengshi/engine-cluster/export-cluster.sh
# for greenplum-6
gpexpand -i gpexpand_inputfile_xxxxxxxx_xxxxxx

Tip

During the expansion of greenplum-6, a new gpexpand schema will be created under postgres.

Modify Old Node Configuration for Script Usage
  • Update conf/engine-master-hosts and conf/engine-segment-hosts to the latest non-localhost addresses.
Verify Cluster Functionality After Expansion

First, select an existing data table <DATA_TABLE> within the cluster and two fields <col1>, <col2> from the table for validation analysis. The data table does not need to be large, a few thousand rows will suffice.

sql
source /opt/hengshi/engine-cluster/export-cluster.sh;
psql hengshi;
DROP TABLE IF EXISTS test_redistribution;
CREATE TABLE test_redistribution WITH (APPENDONLY = true, ORIENTATION = COLUMN, COMPRESSTYPE = zlib, COMPRESSLEVEL = 5) as select * from <DATA_TABLE> DISTRIBUTED RANDOMLY;
select <col1>, count(<col2>) from test_redistribution group by <col1>; -- If this step returns statistical results normally, the cluster is functioning correctly
DROP TABLE IF EXISTS test_redistribution;

In the SQL above, if the query statement select <col1>, count(<col2>) from test_redistribution group by <col1>; returns results normally without blocking or errors, it indicates that the cluster is functioning correctly.

Redistributing Tables

Note

After expansion, new tables will be distributed across all segments, but tables created before the expansion will not be automatically redistributed.
Executing Redistributing Tables can take longer for larger datasets. It is recommended to perform this operation during idle periods. If the process takes too long, you can use -d/-e. For more details, refer to gpexpand.
See Redistributing Tables for more detailed information.

  • Automatic Redistribution
    Automatic redistribution will operate on all tables, which may take a significant amount of time. If most tables undergo regular full updates, you can manually redistribute key tables first and allow the business process to automatically refresh the remaining tables during full updates.
bash
source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand
  • Manual Redistribution for Large Tables
sql
CREATE TABLE schema.<NAME>_tmp WITH (APPENDONLY = true, ORIENTATION = COLUMN, COMPRESSTYPE = zlib, COMPRESSLEVEL = 5) AS SELECT * FROM schema.<NAME> DISTRIBUTED RANDOMLY;
DROP TABLE schema.<NAME>;
ALTER TABLE schema.<NAME>_tmp RENAME TO <NAME>;
ANALYZE <NAME>;
Remove the schema generated by expansion
bash
source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand -c

Mirrored Expansion

The difference between mirrored expansion and non-mirrored expansion lies in the need to add the configuration file for the mirror in the expansion input file. Other steps remain the same.

Example configuration file: Suppose there are currently two machines, engine1 and engine2, each with 1 segment, and each segment has 1 mirror. When adding a new segment, with each segment having 1 mirror, the configuration is as follows:

bash
engine1|engine1|25434|/opt/hengshi/engine-cluster/data/SegDataDir1|6|2|p
engine2|engine2|25434|/opt/hengshi/engine-cluster/mirror/SegDataDir1|7|2|m
engine2|engine2|25435|/opt/hengshi/engine-cluster/data/SegDataDir2|8|3|p
engine1|engine1|25435|/opt/hengshi/engine-cluster/mirror/SegDataDir2|9|3|m

Rollback

If an error occurs during the engine expansion process, a rollback is required. Refer to the code in the example below to perform the restoration operation.

bash
cd /opt/hengshi
source engine-cluster/export-cluster.sh
gpexpand -r

User Manual for Hengshi Analysis Platform