Skip to content

Advanced Engine Configuration

Engine Description

HENGSHI can configure a built-in engine to accelerate datasets, suitable for scenarios where customers lack powerful query engines. Acceleration effects can be achieved by enabling the switch on the dataset management page. The HENGSHI installation package includes the Postgresql engine, GreenplumDB engine, and Doris engine.
Customers can also provide other types of engines, including AWS Athena, AWS Redshift, Alibaba Cloud Hologres, self-hosted Postgresql, self-hosted GreenplumDB, self-hosted MySQL, and self-hosted Dameng Database, among others. Additionally, if you choose to use the Postgresql or GreenplumDB installed by HENGSHI as the engine, it also supports configuring this service as a data warehouse, which can be used as an output node for data integration.
By default, HENGSHI uses GreenplumDB as the built-in engine and data warehouse, supporting dataset import into the engine and serving as the output destination for data integration. In this case, no advanced configuration is required.

Engine Table Names and Table Cleanup Mechanism

To avoid read-write conflicts, the table names for full imports are randomly generated, and the creation time is included in the table name. When performing another full import, a new table is generated, and the old table is no longer used. When duplicating a dataset, the two datasets will reuse the same table. When deleting a dataset, the table is not deleted. A garbage collection mechanism is introduced here: the cleanup task periodically scans the engine table list and deletes tables that are not referenced by any dataset.

Engine Configuration

The engine configuration needs to be set in the conf/hengshi-sense-env.sh file located in the installation root directory before starting. The export keyword must be added before the variables. The relevant field descriptions are as follows:

FieldTypeCorresponding Java FieldDescription
HS_ENGINE_TYPESTRINGENGINE_TYPEEngine type, refer to Engine Type Description
HS_ENGINE_IF_EXTERNALBOOLNoneWhether it is the built-in HENGSHI engine, default is false. If it is the built-in engine, the data warehouse can be opened on the page. If it is not the built-in engine but the customer's own engine, the data warehouse cannot be opened on the page.
SYSTEM_ENGINE_URLSTRINGSYSTEM_ENGINE_URLSets the JDBC URL of the user-built engine as the built-in engine.
INTERNAL_ENGINE_DATASET_PATHSTRINGINTERNAL_ENGINE_DATASET_PATHDataset import path, default is public. Multi-level paths are separated by commas.
INTERNAL_ENGINE_TMP_PATHSTRINGINTERNAL_ENGINE_TMP_PATHEngine temporary path, mainly used for dataset file uploads. Default is hengshi_internal_engine_tmp_schema. Multi-level paths are separated by commas.
INTERNAL_ENGINE_OUTPUT_PATHSTRINGINTERNAL_ENGINE_OUTPUT_PATHEngine public data path, used to provide shared data for HENGSHI, such as the perpetual calendar. Default is common. Multi-level paths are separated by commas.
UPDATE_ENGINE_COMMON_DATABOOLUPDATE_ENGINE_COMMON_DATAWhether to update public data, default is false. If it exists, it will not update; if it does not exist, it will be imported.
OPEN_DW_USERSTRINGOPEN_DW_USERHENGSHI data warehouse username.
OPEN_DW_DBSTRINGOPEN_DW_DBDatabase where the HENGSHI data warehouse is located.
OPEN_DW_TYPESTRINGOPEN_DW_TYPEHENGSHI data warehouse type, can be postgresql or greenplum. Default is greenplum.
GREENPLUM_QUERY_USRSTRINGENGINE_QUERY_USERQuery username provided by HENGSHI for GreenplumDB.
GREENPLUM_QUERY_PWDSTRINGENGINE_QUERY_PASSWORDQuery user password provided by HENGSHI for GreenplumDB.
QUERY_QUEUESTRINGENGINE_QUERY_QUEUEResource queue to which the query user belongs in GreenplumDB provided by HENGSHI.
GREENPLUM_ETL_USRSTRINGENGINE_ETL_USERETL username provided by HENGSHI for GreenplumDB.
GREENPLUM_ETL_PWDSTRINGENGINE_ETL_PASSWORDETL user password provided by HENGSHI for GreenplumDB.
ETL_QUEUESTRINGENGINE_ETL_QUEUEResource queue to which the ETL user belongs in GreenplumDB provided by HENGSHI.
ENGINE_CONN_POOL_SIZEINTEGERENGINE_CONN_POOL_SIZEEngine connection pool size, default is 10.
INTERNAL_ENGINE_CONNECTION_TITLESTRINGINTERNAL_ENGINE_CONNECTION_TITLETitle displayed for engine connection, default is "Engine Connection".
DATASET_CACHE_MAX_SIZE_MBINTEGERDATASET_CACHE_MAX_SIZE_MBSize limit for datasets imported into the engine, in MB. Default is 50000.
DATASET_CACHE_IMPORT_MAX_TIMEINTEGERDATASET_CACHE_IMPORT_MAX_TIMEMaximum time for dataset import process, in hours. Default is 3. This affects the engine table recycling judgment. If there is a misjudgment, it can be increased, but the consequence is that unused table cleanup will be delayed.
ENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIOBOOLENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIOWhether to update the ratio of the default resource group (default_group). Default is true.
ENGINE_DEFAULT_RESOURCE_GROUP_RATIOINTEGERENGINE_DEFAULT_RESOURCE_GROUP_RATIORatio of the default resource group (default_group). Default is 10.
ENGINE_RESOURCE_GROUP_PREFIXSTRINGENGINE_RESOURCE_GROUP_PREFIXPrefix for resource groups created by HENGSHI. Multiple HENGSHI services sharing one engine can use different prefixes. Default is hengshi.
ENGINE_PLATFORM_RESOURCE_GROUP_RATIOINTEGERENGINE_PLATFORM_RESOURCE_GROUP_RATIOTotal ratio of platform resource groups. Default is 30.
ENGINE_TENANT_RESOURCE_GROUP_RATIOINTEGERENGINE_TENANT_RESOURCE_GROUP_RATIOTotal ratio of tenant resource groups. Default is 40.
TOTAL_TENANT_ENGINE_RESOURCE_UNITINTEGERTOTAL_TENANT_ENGINE_RESOURCE_UNITTotal unit count of tenant resources. Default is 100.

Configure Internal or External Engine

HS_ENGINE_IF_EXTERNALMeaning
trueConfigured as customer-owned engine
falseConfigured as HENGSHI embedded engine
Type ValueMeaning
NONEDo not use an engine
GREENPLUMUse GreenplumDB (default)
POSTGRESQLUse PostgreSQL
ATHENAUse AWS Athena
CLIENT_GREENPLUMWhen using customer-owned GreenplumDB, configure this variable
REDSHIFTUse AWS Redshift
MYSQLUse MySQL
DAMENGUse Dameng Database
HOLOGRESUse Alibaba Cloud Hologres
OTHEROther databases supported by HENGSHI for write operations, determined by the protocol part of the connection string, e.g., jdbc:protocol:// such as jdbc:starRocks://

Example 1: Using the GreenplumDB provided by HENGSHI as the engine and using it as a data warehouse

  • No need to configure the shell config file, as it is the default configuration script provided for the data warehouse version, and no additional settings are required.

  • Java config file

shell
ENGINE_TYPE=greenplum
ENGINE_DB=jdbc:postgresql://192.168.211.4:15432/postgres?user=hengshi&password=xxx&charSet=UTF-8
ENGINE_QUERY_USER=hengshi_query
ENGINE_QUERY_PASSWORD=xxx
ENGINE_QUERY_QUEUE=hengshi_query_queue
ENGINE_ETL_USER=hengshi_etl
ENGINE_ETL_PASSWORD=xxx
ENGINE_ETL_QUEUE=hengshi_etl_queue
OPEN_DW_TYPE=greenplum
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw

Example 2: Using HENGSHI-provided PostgreSQL as the engine and using it as a data warehouse

  • Shell config file, the PostgreSQL data warehouse URL will also be automatically configured, you only need to modify the HS_ENGINE_TYPE type.
shell
HS_ENGINE_TYPE="postgresql"
export OPEN_DW_TYPE="postgresql"
export OPEN_DW_USER="dwguest"
export OPEN_DW_DB="hengshi_hs_dw"
  • Java config file
shell
ENGINE_TYPE=postgresql
ENGINE_DB=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
OPEN_DW_TYPE=postgresql
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw

Example 3: Using an External Postgresql as the Engine

  • shell config file
shell
HS_ENGINE_TYPE="postgresql"
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
  • java config file
shell
ENGINE_TYPE=postgresql
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 4: Using an external GreenplumDB as the engine

  • shell config file
shell
HS_ENGINE_TYPE="client_greenplum"
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
  • java config file
ENGINE_TYPE=client_greenplum
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 5: Using Aws Athena as the Engine

  • shell config file
shell
HS_ENGINE_TYPE="athena"
export SYSTEM_ENGINE_URL="jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/"
export INTERNAL_ENGINE_DATASET_PATH="AwsDataCatalog,public"
export INTERNAL_ENGINE_TMP_PATH="AwsDataCatalog,hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="AwsDataCatalog,common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
  • java config file
shell
ENGINE_TYPE=athena
SYSTEM_ENGINE_URL=jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/
INTERNAL_ENGINE_DATASET_PATH=AwsDataCatalog,public
INTERNAL_ENGINE_TMP_PATH=AwsDataCatalog,hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=AwsDataCatalog,common

Example 6: Using AWS Redshift as the Engine

  • Shell config file
shell
HS_ENGINE_TYPE="redshift"
export SYSTEM_ENGINE_URL="jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare the use of an external engine to replace the HENGSHI engine
  • Java config file
shell
ENGINE_TYPE=redshift
SYSTEM_ENGINE_URL=jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 7: Using MySQL as the Engine

  • shell config file
shell
HS_ENGINE_TYPE="mysql"
export SYSTEM_ENGINE_URL="jdbc:mysql://192.168.211.4:3306/public?user=root&password=Test123@"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace HENGSHI engine
  • java config file
shell
ENGINE_TYPE=mysql
SYSTEM_ENGINE_URL=jdbc:mysql://192.168.211.4:3306/public?user=root&password=Test123@
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 8: Using Doris as the Engine

  • Shell config file
shell
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:doris://10.10.10.251:9030/public?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare the use of an external engine to replace the HENGSHI engine
  • Java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:doris://10.10.10.251:9030/public?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 9: Using ClickHouse as the Engine

  • Shell config file
shell
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:clickhouse://192.168.2.250:8123/public?user=default&password=hengshipwd&cluster=hengshi_cluster"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
  • Java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:clickhouse://192.168.2.250:8123/public?user=default&password=hengshipwd
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 10: Using starRocks as the Engine

  • shell config file
shell
export HS_ENGINE_TYPE="other"
export SYSTEM_ENGINE_URL="jdbc:starRocks://10.10.10.251:9030/public?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"

export HS_ENGINE_IF_EXTERNAL=true # Declare using an external engine to replace the HENGSHI engine
  • java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:starRocks://10.10.10.251:9030/public?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Engine Field Name Encoding

Since HENGSHI SENSE supports different databases as built-in engines, the range of characters supported for field names varies across databases. Some databases may not support Chinese characters or special characters such as +-*/#. When configuring these databases as built-in engines, you may encounter import failures due to unsupported characters in field names. The most common example is file uploads, where the table headers in files are highly flexible and may contain special characters.

To address this issue, HENGSHI SENSE provides a field name encoding feature, which is disabled by default. The encoding algorithm is base62, which only includes characters a-zA-Z0-9. Once enabled, field names containing characters outside this range will be encoded. To enable this feature, follow these steps:

  1. Modify the configuration table in the metadb database where config_key is set to engine:
update public.configuration set config_value = config_value || '{"fieldBase62Enabled": true}'::jsonb where config_key = 'engine';
  1. Restart the HENGSHI SENSE service.

After enabling this feature, datasets with non-English field names will need to be re-imported into the engine.

User Manual for Hengshi Analysis Platform