Engine Scaling

Engine scaling can be divided into two methods: vertical scaling and horizontal scaling. Vertical scaling refers to increasing the number of segments on a single machine. Horizontal scaling involves increasing the number of machines while keeping the number of segments per machine constant. Below is a detailed introduction on how to use these two methods for engine scaling on HENGSHI SENSE.

Preparations Before System Expansion

Complete the following preparations before scaling up.

Stop HENGSHI and other services that use the engine.

bash

${HENGSHI_HOME}/bin/hengshi-sense-bin stop hengshi

Clean Data Bloat There are two scenarios:
1. Daily execution of ${HENGSHI_HOME}/bin/clean_engine.sh -t -r via crontab has not been set up. * Run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r -f during non-working hours. The duration of this process varies depending on the actual situation, typically measured in hours. * Then configure daily execution at midnight with ${HENGSHI_HOME}/bin/clean_engine.sh -t -r.
2. Daily execution of ${HENGSHI_HOME}/bin/clean_engine.sh -t -r via crontab has been configured. * Run ${HENGSHI_HOME}/bin/clean_engine.sh -t -r -f during non-working hours.
Log in to the Engine and Modify localhost-like Addresses to Domain Names that Differentiate Between Machines By modifying the hostname and address fields in the gp_segment_configuration table to domain names and addresses that can distinguish different machines within the local network. For example, query the current environment:

sql

$ psql
psql (9.4.24)
Type "help" for help.

hengshi=# select  * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port |    hostname     |     address     |                           datadir
------+---------+------+----------------+------+--------+------+-----------------+-----------------+--------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5232 | localhost | localhost | /home/data/engine-cluster/data/SegDataDir-1
    2 |       0 | p    | p              | n    | u      | 5233 | localhost | localhost | /home/data/engine-cluster/data/SegDataDir0
(2 rows)

greenplum6

sql

psql (9.4.24)
Type "help" for help.

set allow_system_table_mods='true';
update gp_segment_configuration set address = '<real_ip>' where address = 'localhost';
update gp_segment_configuration set hostname = '<real_hostname>' where hostname = 'localhost';

greenplum5

sql

$ psql
psql (8.3)
Type "help" for help.

set allow_system_table_mods='DML';
update gp_segment_configuration set address = '<real_ip>' where address = 'localhost';
update gp_segment_configuration set hostname = '<real_hostname>' where hostname = 'localhost';

Increase Machine Scaling

Preparations for New Machine Before Scaling Up

Preparation work needs to be done on the new machine before horizontal scaling.

Please refer to Engine New Node Environment Preparation to complete the preparation for the new node.

GP Expansion

Clean Up Historical Expansion Schema

For clusters that have undergone scaling operations in the past, there may be leftover schemas from the previous scaling due to operational omissions. This situation requires a thorough cleanup first, otherwise, it may cause the scaling to fail.

bash

source /opt/hengshi/engine-cluster/export-cluster.sh
psql postgres -c '\dn'

If there is a schema named gpexpand, it needs to be cleaned up.

bash

source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand -c

Check if the cleanup is complete (repeat the method of running psql postgres -c '\dn'), and if not, manual cleanup can be selected.

sql

source /opt/hengshi/engine-cluster/export-cluster.sh
psql postgres
drop view gpexpand.expansion_progress;
drop table gpexpand.status;
drop table gpexpand.status_detail;
drop schema gpexpand;

Expansion Operation

Scalability is divided into non-mirrored scaling and mirrored scaling.

No Mirror Expansion

Generate Scaling Configuration File

Prepare a new-host file with the added hostname, ensuring no duplicates. If there are no new hosts, use an empty file. Refer to the example below to add a machine with the hostname engien2 to the cluster.

bash

engine2

Execute the following command to generate the expansion configuration file. The example includes expansion commands for both greenplum-6 and greenplum-5. Please select the corresponding command based on your environment when executing.

bash

cd /opt/hengshi
source engine-cluster/export-cluster.sh
# for greenplum-6
gpexpand -f new-host
# for greenplum-5
psql -c 'create database gpexpand_schame_db' # temporary database for greenplum to store schema, can be created temporarily
gpexpand -f new-host -D gpexpand_schame_db

Fill in the information according to the prompts to generate the scaling configuration file.

bash

Would you like to initiate a new System Expansion Yy|Nn (default=N): #Answer 'y'
Are you sure you want to continue with this gpexpand session? Yy|Nn (default=N): #Answer 'y'
How many new primary segments per host do you want to add? (default=0): #Answer '0' for adding new machines; Answer the increment number for adding segments on existing machines, e.g., '2' for adding 2 segments per host
Enter new primary data directory 1: #Fill in the root directory of the data directory, e.g., for the existing directory /opt/hengshi/engine-cluster/data/SegDataDir-1, answer '/opt/hengshi/engine-cluster/data'
Enter new primary data directory 2: #Fill in the root directory of the data directory, e.g., for the existing directory /opt/hengshi/engine-cluster/data/SegDataDir-1, answer '/opt/hengshi/engine-cluster/data'

Waiting for the execution result, the example gpexpand_inputfile_20210806_121646 is the generated expansion configuration file.

bash

Generating configuration file...

20210806:12:16:46:744854 gpexpand:m86:lvliang-[INFO]:-Generating input file...

Input configuration file was written to 'gpexpand_inputfile_20210806_121646'.

Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20210806_121646

20210806:12:16:46:744854 gpexpand:m86:lvliang-[INFO]:-Exiting...

Configuration File Format Description.

bash

<hostname>|<address>|<port>|<datadir>|<dbid>|<content>|<preferred_role>

The preferred_role value is either p or m, determining whether this segment is the primary node or a mirror. p stands for primary node, and m stands for mirror.

Assume there are two machines, namely engine1 and engine2, each with 1 segment. Now, an additional segment is added to each machine, and the file is named gpexpand_inputfile. The configuration file content is as follows.

bash

engine1|engine1|25433|/opt/hengshi/engine-cluster/data/SegDataDir1|4|2|p
engine2|engine2|25433|/opt/hengshi/engine-cluster/data/SegDataDir2|5|3|p

Execute Scaling

Execute the following command to perform the expansion. During the expansion process, you will need to use the gpexpand_inputfile_xxxxxxxx_xxxxxx file generated in the previous step.

bash

source /opt/hengshi/engine-cluster/export-cluster.sh
# for greenplum-6
gpexpand -i gpexpand_inputfile_xxxxxxxx_xxxxxx
# for greenplum-5
gpexpand -i gpexpand_inputfile_xxxxxxxx_xxxxxx -D gpexpand_schame_db # db created when generating the expansion configuration file above

Tip

During the expansion of greenplum-6, a new gpexpand schema will be created under postgres. During the expansion of greenplum-5, we manually create and specify the gpexpand_schame_db database to store schema information during the expansion.

Modify Old Node Configuration for Script to Work Properly

Both conf/engine-master-hosts and conf/engine-segment-hosts should be updated to the latest non-localhost addresses.

Verify if the cluster is functioning normally after scaling

First, select an existing data table <DATA_TABLE> within the cluster and two fields <col1>, <col2> from the table for verification analysis. The data table does not need to be large, with a few thousand rows is sufficient.

sql

source /opt/hengshi/engine-cluster/export-cluster.sh;
psql hengshi;
DROP TABLE IF EXISTS test_redistribution;
CREATE TABLE test_redistribution WITH (APPENDONLY = true, ORIENTATION = COLUMN, COMPRESSTYPE = zlib, COMPRESSLEVEL = 5) as select * from <DATA_TABLE> DISTRIBUTED RANDOMLY;
select <col1>, count(<col2>) from test_redistribution group by <col1>; -- This step returns the statistical results normally if the cluster is normal
DROP TABLE IF EXISTS test_redistribution;

In the above SQL, if the query statement select <col1>, count(<col2>) from test_redistribution group by <col1>; returns results normally without getting stuck or throwing errors, it indicates that the cluster is functioning properly.

Redistributing Tables.

Note

After scaling up, new tables will be distributed across all segments, but tables before scaling up will not be automatically redistributed. Executing Redistributing Tables, the larger the data volume, the longer the time, it is recommended to perform during idle periods, if the time is too long, you can use -d/-e, for details, refer to gpexpand. Refer to Redistributing Tables for more detailed information.

Automatic Redistribution Automatic redistribution will operate on all tables, which may take a very long time. Therefore, if most tables have regular full updates, you can manually redistribute the key tables first, and then wait for the business process to automatically refresh the remaining tables with full updates.

bash

source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand

Manual operation, large table redistribution

sql

create table schema.<NAME>_tmp WITH (APPENDONLY = true, ORIENTATION = COLUMN, COMPRESSTYPE = zlib, COMPRESSLEVEL = 5) as select * from schema.<NAME> DISTRIBUTED RANDOMLY;
drop table schema.<NAME>;
alter table schema.<NAME>_tmp rename to <NAME>;

Remove schema generated by scaling

bash

source /opt/hengshi/engine-cluster/export-cluster.sh
gpexpand -c

Mirroring and Scaling

There are two types of scaling: scaling with mirrors and scaling without mirrors. The difference lies in the need to add the mirror configuration file in the scaling input file for the configuration, while the other steps remain the same.

Configuration file example, assuming there are two machines, namely engine1 and engine2, each with 1 segment, and each segment has 1 mirror. If you add 1 segment, and each segment has 1 mirror, the configuration would be as follows.

bash

engine1|engine1|25434|/opt/hengshi/engine-cluster/data/SegDataDir1|6|2|p
engine2|engine2|25434|/opt/hengshi/engine-cluster/mirror/SegDataDir1|7|2|m
engine2|engine2|25435|/opt/hengshi/engine-cluster/data/SegDataDir2|8|3|p
engine1|engine1|25435|/opt/hengshi/engine-cluster/mirror/SegDataDir2|9|3|m

Rollback

If an error occurs during the engine scaling process, a rollback is required. Refer to the code in the example below for the restore operation.

bash

cd /opt/hengshi
source engine-cluster/export-cluster.sh
gpexpand -r

AI Assistant

Connect Data Source

Database

NoSQL/NewSQL

SQL on Hadoop

Cloud

Searching

Multi Dimensional Database

SaaS API

Create Dataset

Dataset Management

parameter

Dashboard

Control Settings

Chart Controls

Indicator Class

map

Table

Advanced Chart Calculations

Display Widgets

Functional Controls

Filter

HENGSHI SENSE Embedded Data Analysis

Data Permission Related Practices

Engine Scaling

Preparations Before System Expansion

Increase Machine Scaling

Preparations for New Machine Before Scaling Up

GP Expansion

Clean Up Historical Expansion Schema

Expansion Operation

No Mirror Expansion

Generate Scaling Configuration File

Execute Scaling

Modify Old Node Configuration for Script to Work Properly

Verify if the cluster is functioning normally after scaling

Redistributing Tables.

Remove schema generated by scaling

Mirroring and Scaling

Rollback

AI Assistant

Connect Data Source

Database

NoSQL/NewSQL

SQL on Hadoop

Cloud

Searching

Multi Dimensional Database

SaaS API

Control Settings

Chart Controls

Indicator Class

map

Table

Advanced Chart Calculations

Filter

Engine Scaling ​

Preparations Before System Expansion ​

Increase Machine Scaling ​

Preparations for New Machine Before Scaling Up ​

GP Expansion ​

Clean Up Historical Expansion Schema ​

Expansion Operation ​

No Mirror Expansion ​

Generate Scaling Configuration File ​

Execute Scaling ​

Modify Old Node Configuration for Script to Work Properly ​

Verify if the cluster is functioning normally after scaling ​

Redistributing Tables. ​

Remove schema generated by scaling ​

Mirroring and Scaling ​

Rollback ​

Engine Scaling

Preparations Before System Expansion

Increase Machine Scaling

Preparations for New Machine Before Scaling Up

GP Expansion

Clean Up Historical Expansion Schema

Expansion Operation

No Mirror Expansion

Generate Scaling Configuration File

Execute Scaling

Modify Old Node Configuration for Script to Work Properly

Verify if the cluster is functioning normally after scaling

Redistributing Tables.

Remove schema generated by scaling

Mirroring and Scaling

Rollback