Advanced Engine Configuration
Engine Description
HENGSHI can configure a built-in engine to accelerate datasets, suitable for scenarios where customers lack powerful query engines. Acceleration effects can be achieved by enabling the switch on the dataset management page. The HENGSHI installation package includes the Postgresql engine, GreenplumDB engine, and Doris engine.
Customers can also provide other types of engines, including AWS Athena, AWS Redshift, Alibaba Cloud Hologres, self-hosted Postgresql, self-hosted GreenplumDB, self-hosted MySQL, and self-hosted Dameng Database, among others. Additionally, if choosing to use the Postgresql or GreenplumDB engines installed by HENGSHI, it also supports configuring this service as a data warehouse, which can be used as an output node for data integration.
By default, HENGSHI uses GreenplumDB as the built-in engine and data warehouse, supporting dataset import into the engine and serving as the output destination for data integration. In this case, no advanced configuration is required.
Engine Table Name and Table Cleanup Mechanism
To avoid read-write conflicts, the table names for full imports are randomly generated, and the creation time is included in the table name. When performing another full import, a new table is generated, and the old table is no longer used. When duplicating a dataset, the two datasets will reuse a single table. When deleting a dataset, the table is not deleted. A garbage collection mechanism is introduced here, where cleanup tasks periodically scan the engine table list and delete tables that are not referenced by any dataset.
Engine Configuration
The engine configuration needs to be set in the conf/hengshi-sense-env.sh
file located in the installation root directory before startup. The export
keyword must be added before the variable. The relevant field descriptions are as follows:
Field | Type | Corresponding Java Field | Description |
---|---|---|---|
HS_ENGINE_TYPE | STRING | ENGINE_TYPE | Engine type, refer to Engine Type Description |
HS_ENGINE_IF_EXTERNAL | BOOL | None | Whether it is the built-in engine of HENGSHI, default is false . If it is the built-in engine, the data warehouse can be opened on the page. If it is not the built-in engine but the customer's own engine, the data warehouse cannot be opened on the page. |
SYSTEM_ENGINE_URL | STRING | SYSTEM_ENGINE_URL | Set the JDBC URL of the user's self-built engine as the built-in engine. |
INTERNAL_ENGINE_DATASET_PATH | STRING | INTERNAL_ENGINE_DATASET_PATH | Dataset import path, default is public . Multi-level paths are separated by commas. |
INTERNAL_ENGINE_TMP_PATH | STRING | INTERNAL_ENGINE_TMP_PATH | Engine temporary path, mainly used for dataset file uploads. Default is hengshi_internal_engine_tmp_schema . Multi-level paths are separated by commas. |
INTERNAL_ENGINE_OUTPUT_PATH | STRING | INTERNAL_ENGINE_OUTPUT_PATH | Engine public data path, used to provide shared data for HENGSHI, such as the perpetual calendar. Default is common . Multi-level paths are separated by commas. |
UPDATE_ENGINE_COMMON_DATA | BOOL | UPDATE_ENGINE_COMMON_DATA | Whether to update public data, default is false . If it exists, it will not update; if it does not exist, it will be imported. |
OPEN_DW_USER | STRING | OPEN_DW_USER | HENGSHI data warehouse username. |
OPEN_DW_DB | STRING | OPEN_DW_DB | Database where the HENGSHI data warehouse resides. |
OPEN_DW_TYPE | STRING | OPEN_DW_TYPE | HENGSHI data warehouse type, can be postgresql or greenplum . Default is greenplum . |
GREENPLUM_QUERY_USR | STRING | ENGINE_QUERY_USER | Query username provided by HENGSHI for GreenplumDB. |
GREENPLUM_QUERY_PWD | STRING | ENGINE_QUERY_PASSWORD | Query user password provided by HENGSHI for GreenplumDB. |
QUERY_QUEUE | STRING | ENGINE_QUERY_QUEUE | Resource queue to which the query user belongs, provided by HENGSHI for GreenplumDB. |
GREENPLUM_ETL_USR | STRING | ENGINE_ETL_USER | ETL username provided by HENGSHI for GreenplumDB. |
GREENPLUM_ETL_PWD | STRING | ENGINE_ETL_PASSWORD | ETL user password provided by HENGSHI for GreenplumDB. |
ETL_QUEUE | STRING | ENGINE_ETL_QUEUE | Resource queue to which the ETL user belongs, provided by HENGSHI for GreenplumDB. |
ENGINE_CONN_POOL_SIZE | INTEGER | ENGINE_CONN_POOL_SIZE | Engine connection pool size, default is 10 . |
INTERNAL_ENGINE_CONNECTION_TITLE | STRING | INTERNAL_ENGINE_CONNECTION_TITLE | Title displayed for engine connection, default is "Engine Connection". |
DATASET_CACHE_MAX_SIZE_MB | INTEGER | DATASET_CACHE_MAX_SIZE_MB | Size limit for datasets imported into the engine, in MB. Default is 50000 . |
DATASET_CACHE_IMPORT_MAX_TIME | INTEGER | DATASET_CACHE_IMPORT_MAX_TIME | Maximum time for dataset import process, in hours. Default is 3 . This affects the engine table recycling judgment. If there is a misjudgment, it can be increased. The consequence is that unused table cleanup will be delayed. |
ENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIO | BOOL | ENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIO | Whether to update the ratio of the default resource group (default_group ). Default is true . |
ENGINE_DEFAULT_RESOURCE_GROUP_RATIO | INTEGER | ENGINE_DEFAULT_RESOURCE_GROUP_RATIO | Ratio of the default resource group (default_group ). Default is 10 . |
ENGINE_RESOURCE_GROUP_PREFIX | STRING | ENGINE_RESOURCE_GROUP_PREFIX | Prefix for resource groups created by HENGSHI. Multiple HENGSHI services sharing one engine can use different prefixes. Default is hengshi . |
ENGINE_PLATFORM_RESOURCE_GROUP_RATIO | INTEGER | ENGINE_PLATFORM_RESOURCE_GROUP_RATIO | Total ratio of the platform resource group. Default is 30 . |
ENGINE_TENANT_RESOURCE_GROUP_RATIO | INTEGER | ENGINE_TENANT_RESOURCE_GROUP_RATIO | Total ratio of the tenant resource group. Default is 40 . |
TOTAL_TENANT_ENGINE_RESOURCE_UNIT | INTEGER | TOTAL_TENANT_ENGINE_RESOURCE_UNIT | Total unit count of tenant resources. Default is 100 . |
Configure Internal or External Engine
HS_ENGINE_IF_EXTERNAL | Meaning |
---|---|
true | Configured as customer-owned engine |
false | Configured as HENGSHI embedded engine |
Type Value | Meaning |
---|---|
NONE | Do not use any engine |
GREENPLUM | Use GreenplumDB (default) |
POSTGRESQL | Use PostgreSQL |
ATHENA | Use AWS Athena |
CLIENT_GREENPLUM | Configure this variable when using customer-owned GreenplumDB |
REDSHIFT | Use AWS Redshift |
MYSQL | Use MySQL |
DAMENG | Use Dameng Database |
HOLOGRES | Use Alibaba Cloud Hologres |
OTHER | Other databases supported by HENGSHI for write operations, determined by the protocol part of jdbc:protocol://. For example, jdbc:starRocks:// |
Example 1: Using HENGSHI's GreenplumDB as the engine and utilizing it as a data warehouse
The shell config file does not need to be configured. This is the default configuration script provided for the data warehouse version, and no additional settings are required.
Java config file
ENGINE_TYPE=greenplum
ENGINE_DB=jdbc:postgresql://192.168.211.4:15432/postgres?user=hengshi&password=xxx&charSet=UTF-8
ENGINE_QUERY_USER=hengshi_query
ENGINE_QUERY_PASSWORD=xxx
ENGINE_QUERY_QUEUE=hengshi_query_queue
ENGINE_ETL_USER=hengshi_etl
ENGINE_ETL_PASSWORD=xxx
ENGINE_ETL_QUEUE=hengshi_etl_queue
OPEN_DW_TYPE=greenplum
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw
Example 2: Using HENGSHI-provided PostgreSQL as the engine and utilizing it as a data warehouse
- Shell config file: The PostgreSQL data warehouse URL will be automatically configured, you only need to modify the
HS_ENGINE_TYPE
type.
HS_ENGINE_TYPE="postgresql"
export OPEN_DW_TYPE="postgresql"
export OPEN_DW_USER="dwguest"
export OPEN_DW_DB="hengshi_hs_dw"
- Java config file:
ENGINE_TYPE=postgresql
ENGINE_DB=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
OPEN_DW_TYPE=postgresql
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw
Example 3: Using External Postgresql as the Engine
- Shell config file
HS_ENGINE_TYPE="postgresql"
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"
export HS_ENGINE_IF_EXTERNAL=true # Declare using external engine to replace HENGSHI engine
- Java config file
ENGINE_TYPE=postgresql
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common
Example 4: Using External GreenplumDB as the Engine
- Shell config file
HS_ENGINE_TYPE="client_greenplum"
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"
export HS_ENGINE_IF_EXTERNAL=true # Declare using external engine to replace HENGSHI engine
- Java config file
ENGINE_TYPE=client_greenplum
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common
Example 5: Using Aws Athena as the Engine
- Shell config file
HS_ENGINE_TYPE="athena"
export SYSTEM_ENGINE_URL="jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/"
export INTERNAL_ENGINE_DATASET_PATH="AwsDataCatalog,enginedb"
export INTERNAL_ENGINE_TMP_PATH="AwsDataCatalog,enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="AwsDataCatalog,enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace HENGSHI engine
- Java config file
ENGINE_TYPE=athena
SYSTEM_ENGINE_URL=jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/
INTERNAL_ENGINE_DATASET_PATH=AwsDataCatalog,enginedb
INTERNAL_ENGINE_TMP_PATH=AwsDataCatalog,enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=AwsDataCatalog,enginecommon
Example 6: Using Aws Redshift as the Engine
- shell config file
HS_ENGINE_TYPE="redshift"
export SYSTEM_ENGINE_URL="jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace HENGSHI engine
- java config file
ENGINE_TYPE=redshift
SYSTEM_ENGINE_URL=jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon
Example 7: Using MySQL as the Engine
- shell config file
HS_ENGINE_TYPE="mysql"
export SYSTEM_ENGINE_URL="jdbc:mysql://192.168.211.4:3306/testdb?user=root&password=Test123@"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace HENGSHI engine
- java config file
ENGINE_TYPE=mysql
SYSTEM_ENGINE_URL=jdbc:mysql://192.168.211.4:3306/testdb?user=root&password=Test123@
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon
Example 8: Using Doris as the Engine
- Shell config file
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:doris://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
- Java config file
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:doris://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon
Example 9: Using ClickHouse as the Engine
- Shell config file
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:clickhouse://192.168.2.250:8123/hengshi?user=default&password=hengshipwd&cluster=hengshi_cluster"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
- Java config file
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:clickhouse://192.168.2.250:8123/hengshi?user=default&password=hengshipwd
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon
Example 10: Using starRocks as the Engine
- Shell config file
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:starRocks://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace HENGSHI engine
- Java config file
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:starRocks://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon
Engine Field Name Encoding
Since HENGSHI SENSE supports different databases as built-in engines, the character range supported by field names varies across databases. Some databases may not support Chinese characters or special characters such as +-*/#
. Configuring these databases as built-in engines may encounter failures during engine import due to unsupported characters in field names. The most common example is file uploads, where the table headers in files have high flexibility and may contain special characters.
To address this issue, HENGSHI SENSE provides a field name encoding feature, which is disabled by default. The encoding algorithm is base62, which only includes characters a-zA-Z0-9
. When enabled, field names containing characters outside this range will be encoded. To enable this feature, follow the steps below:
- Modify the configuration table in the metadb database where
config_key
is set toengine
:
update public.configuration set config_value = config_value || '{"fieldBase62Enabled": true}'::jsonb where config_key = 'engine';
- Restart the HENGSHI SENSE service.
Once this feature is enabled, subsequent engine import tasks will use this functionality, but previously imported data will not be affected.