Skip to content

Engine Description

HENGSHI SENSE can configure an embedded engine for dataset acceleration, suitable for scenarios where customers do not have a query engine with strong computing capabilities. Acceleration effects can be achieved by simply turning on the switch on the dataset management page. The HENGSHI SENSE installation package includes the Postgresql engine, GreenplumDB engine, and Doris engine.

Customers can also provide other types of engines, including Aws Athena, Aws Redshift, Alibaba Cloud Hologres, self-built Postgresql, self-built GreenplumDB, self-built MySQL, and self-built Dameng database. Additionally, if the Postgresql or GreenplumDB installed by HENGSHI SENSE is chosen as the engine, it also supports configuring this service as a data warehouse for use as an output node for data integration.

By default, HENGSHI SENSE uses GreenplumDB as the embedded engine and data warehouse, supporting dataset import into the engine and also serving as the output destination for data integration. In this case, no advanced configuration is required.

Engine Table Names and Table Cleanup Mechanism

To avoid read-write conflicts, the table name for full imports is randomly generated, including the creation time in the table name. If a full import is performed again, a new table will be generated, and the old table will no longer be used. When copying a dataset, the two datasets will reuse the same table. When deleting a dataset, the table is not deleted. This introduces a garbage collection mechanism where cleanup tasks periodically scan the engine's table list, and tables not referenced by any dataset are deleted.

Engine configuration needs to be set before startup in the conf/hengshi-sense-env.sh file under the installation root directory. The export keyword must be added before the variables. The relevant field descriptions are as follows:

FieldTypeCorresponding Java FieldDescription
HS_ENGINE_TYPESTRINGENGINE_TYPEEngine type, refer to Engine Type Description
IS_ENGINE_EMBEDDEDBOOLNoneWhether it is the engine embedded by HENGSHI SENSE, default is true; If it is the engine embedded by HENGSHI SENSE, the data warehouse can be opened on the page, if it is the customer's own engine, the data warehouse cannot be opened on the page
SYSTEM_ENGINE_URLSTRINGSYSTEM_ENGINE_URLSet the JDBC URL of the user-built engine as the built-in engine
INTERNAL_ENGINE_DATASET_PATHSTRINGINTERNAL_ENGINE_DATASET_PATHDataset import path, default is public, multi-level paths are separated by commas
INTERNAL_ENGINE_TMP_PATHSTRINGINTERNAL_ENGINE_TMP_PATHEngine temporary path, mainly used for data integration file upload, default is hengshi_internal_engine_tmp_schema, multi-level paths are separated by commas
INTERNAL_ENGINE_OUTPUT_PATHSTRINGINTERNAL_ENGINE_OUTPUT_PATHEngine public data path, used to provide shared data for HENGSHI SENSE, such as perpetual calendar, default is common, multi-level paths are separated by commas
UPDATE_ENGINE_COMMON_DATABOOLUPDATE_ENGINE_COMMON_DATAWhether to update common data, default is false, if it exists, it will not be updated, only imported if it does not exist
OPEN_DW_USERSTRINGOPEN_DW_USERHENGSHI SENSE data warehouse username
OPEN_DW_DBSTRINGOPEN_DW_DBHENGSHI SENSE data warehouse DB
OPEN_DW_TYPESTRINGOPEN_DW_TYPEHENGSHI SENSE data warehouse type, can be postgresql or greenplum, default is greenplum
GREENPLUM_QUERY_USRSTRINGENGINE_QUERY_USERQuery username provided by HENGSHI SENSE for GreenplumDB
GREENPLUM_QUERY_PWDSTRINGENGINE_QUERY_PASSWORDQuery user password provided by HENGSHI SENSE for GreenplumDB
QUERY_QUEUESTRINGENGINE_QUERY_QUEUEResource queue to which the query user of HENGSHI SENSE for GreenplumDB belongs
GREENPLUM_ETL_USRSTRINGENGINE_ETL_USERETL username provided by HENGSHI SENSE for GreenplumDB
GREENPLUM_ETL_PWDSTRINGENGINE_ETL_PASSWORDETL user password provided by HENGSHI SENSE for GreenplumDB
ETL_QUEUESTRINGENGINE_ETL_QUEUEResource queue to which the ETL user of HENGSHI SENSE for GreenplumDB belongs
ENGINE_CONN_POOL_SIZEINTEGERENGINE_CONN_POOL_SIZEEngine connection pool size, default is 10
INTERNAL_ENGINE_CONNECTION_TITLESTRINGINTERNAL_ENGINE_CONNECTION_TITLETitle displayed for engine connection, default is "Engine Connection"
DATASET_CACHE_MAX_SIZE_MBINTEGERDATASET_CACHE_MAX_SIZE_MBSize limit of the dataset imported into the engine, unit is MB, default is 50000
DATASET_CACHE_IMPORT_MAX_TIMEINTEGERDATASET_CACHE_IMPORT_MAX_TIMEMaximum time for the import process into the engine, unit is hours, default is 3, this affects the judgment of engine table recycling, if there is a misjudgment, it can be increased, the consequence is that the cleanup of unused tables will be delayed
ENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIOBOOLENGINE_UPDATE_DEFAULT_RESOURCE_GROUP_RATIOWhether to update the ratio of the default resource group (default_group), default is true
ENGINE_DEFAULT_RESOURCE_GROUP_RATIOINTEGERENGINE_DEFAULT_RESOURCE_GROUP_RATIORatio of the default resource group (default_group), default is 10
ENGINE_RESOURCE_GROUP_PREFIXSTRINGENGINE_RESOURCE_GROUP_PREFIXPrefix of the resource group created by HENGSHI SENSE, multiple HENGSHI SENSE services can share one engine with different prefixes, default is hengshi
ENGINE_PLATFORM_RESOURCE_GROUP_RATIOINTEGERENGINE_PLATFORM_RESOURCE_GROUP_RATIOTotal ratio of the platform resource group, default is 30
ENGINE_TENANT_RESOURCE_GROUP_RATIOINTEGERENGINE_TENANT_RESOURCE_GROUP_RATIOTotal ratio of the tenant resource group, default is 40
TOTAL_TENANT_ENGINE_RESOURCE_UNITINTEGERTOTAL_TENANT_ENGINE_RESOURCE_UNITDefault total unit number of tenant resources, default is 100

Engine Type Description

IS_ENGINE_EMBEDDEDMeaning
trueConfigured as HENGSHI SENSE embedded engine
falseConfigured as customer's own engine
Type ValueMeaning
NONEDo not use the engine
GREENPLUMUse GreenplumDB (default)
POSTGRESQLUse Postgresql
ATHENAUse Aws Athena
CLIENT_GREENPLUMConfigure this variable when it is a customer-owned GreenplumDB
REDSHIFTUse Aws Redshift
MYSQLUse MySQL
DAMENGUse Dameng Database
HOLOGRESUse Alibaba Cloud's Hologres
OTHEROther databases that HENGSHI SENSE already supports write operations, determined by the protocol part of the jdbc: protocol://. For example, jdbc:starRocks://

Example 1: Use HENGSHI SENSE-provided GreenplumDB as the engine and use it as the data warehouse

  • The shell config file does not need to be configured. This is the default configuration provided by the script for the data warehouse version, and no settings are required.

  • Java config file

shell
ENGINE_TYPE=greenplum
ENGINE_DB=jdbc:postgresql://192.168.211.4:15432/postgres?user=hengshi&password=xxx&charSet=UTF-8
ENGINE_QUERY_USER=hengshi_query
ENGINE_QUERY_PASSWORD=xxx
ENGINE_QUERY_QUEUE=hengshi_query_queue
ENGINE_ETL_USER=hengshi_etl
ENGINE_ETL_PASSWORD=xxx
ENGINE_ETL_QUEUE=hengshi_etl_queue
OPEN_DW_TYPE=greenplum
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw

Example 2: Using HENGSHI SENSE-provided Postgresql as the engine and using it as a data warehouse

  • The shell config file, PostgreSQL data warehouse URL will also be automatically configured, just modify the HS_ENGINE_TYPE type.
shell
HS_ENGINE_TYPE="postgresql"
export OPEN_DW_TYPE="postgresql"
export OPEN_DW_USER="dwguest"
export OPEN_DW_DB="hengshi_hs_dw"
  • Java config file
shell
ENGINE_TYPE=postgresql
ENGINE_DB=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
OPEN_DW_TYPE=postgresql
OPEN_DW_USER=dwguest
OPEN_DW_DB=hengshi_hs_dw

Example 3: Using an External Postgresql as the Engine

  • Shell configuration file
shell
HS_ENGINE_TYPE="postgresql"
IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"
  • Java config file
shell
ENGINE_TYPE=postgresql
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 4: Using an external GreenplumDB as the engine

  • Shell config file
shell
HS_ENGINE_TYPE="client_greenplum"
IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8"
export INTERNAL_ENGINE_DATASET_PATH="public"
export INTERNAL_ENGINE_TMP_PATH="hengshi_internal_engine_tmp_schema"
export INTERNAL_ENGINE_OUTPUT_PATH="common"
  • Java config file
ENGINE_TYPE=client_greenplum
SYSTEM_ENGINE_URL=jdbc:postgresql://192.168.211.4:45433/engine?user=hengshi&password=xxx&charSet=UTF-8
INTERNAL_ENGINE_DATASET_PATH=public
INTERNAL_ENGINE_TMP_PATH=hengshi_internal_engine_tmp_schema
INTERNAL_ENGINE_OUTPUT_PATH=common

Example 5: Using Aws Athena as the Engine

  • Shell configuration file
shell
HS_ENGINE_TYPE="athena"
IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/"
export INTERNAL_ENGINE_DATASET_PATH="AwsDataCatalog,enginedb"
export INTERNAL_ENGINE_TMP_PATH="AwsDataCatalog,enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="AwsDataCatalog,enginecommon"
  • Java config file
shell
ENGINE_TYPE=athena
SYSTEM_ENGINE_URL=jdbc:awsathena://AwsRegion=cn-north-1;User=user;Password=pass;Catalog=AwsDataCatalog;Schema=default;S3OutputLocation=s3://wss-athena-result/result/;S3DataStorageLocation=s3://wss-athena/0-storage/
INTERNAL_ENGINE_DATASET_PATH=AwsDataCatalog,enginedb
INTERNAL_ENGINE_TMP_PATH=AwsDataCatalog,enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=AwsDataCatalog,enginecommon

Example 6: Using Aws Redshift as the Engine

  • Shell configuration file
shell
HS_ENGINE_TYPE="redshift"
IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
  • Java config file
shell
ENGINE_TYPE=redshift
SYSTEM_ENGINE_URL=jdbc:redshift://test.ccveezprunlx.cn-north-1.redshift.amazonaws.com.cn:5439/engine?user=user&password=pass
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon

Example 7: Using MySQL as the Engine

  • Shell configuration file
shell
HS_ENGINE_TYPE="mysql"
IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:mysql://192.168.211.4:3306/testdb?user=root&password=Test123@"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
  • Java config file
shell
ENGINE_TYPE=mysql
SYSTEM_ENGINE_URL=jdbc:mysql://192.168.211.4:3306/testdb?user=root&password=Test123@
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon

Example 8: Using Doris as the Engine

  • Shell configuration file
shell
export HS_ENGINE_TYPE="other"
export IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:doris://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
  • Java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:doris://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon

Example 9: Using ClickHouse as the Engine

  • Shell configuration file
shell
export HS_ENGINE_TYPE="other"
export IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:clickhouse://192.168.2.250:8123/hengshi?user=default&password=hengshipwd&cluster=hengshi_cluster"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
  • Java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:clickhouse://192.168.2.250:8123/hengshi?user=default&password=hengshipwd
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon

Example 10: Using starRocks as the Engine

  • Shell configuration file
shell
export HS_ENGINE_TYPE="other"
export IS_ENGINE_EMBEDDED=false
export SYSTEM_ENGINE_URL="jdbc:starRocks://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030"
export INTERNAL_ENGINE_DATASET_PATH="enginedb"
export INTERNAL_ENGINE_TMP_PATH="enginetmp"
export INTERNAL_ENGINE_OUTPUT_PATH="enginecommon"
  • Java config file
shell
ENGINE_TYPE=other
SYSTEM_ENGINE_URL=jdbc:starRocks://10.10.10.251:9030/hengshidb?user=hengshi&password=hengshi&feHttpPort=8030
INTERNAL_ENGINE_DATASET_PATH=enginedb
INTERNAL_ENGINE_TMP_PATH=enginetmp
INTERNAL_ENGINE_OUTPUT_PATH=enginecommon

Engine Field Name Encoding

Since HENGSHI SENSE supports different databases as built-in engines, the character ranges supported for field names vary among different databases. Some databases may not support Chinese characters or special characters such as +-*/#. Configuring these databases as built-in engines may encounter issues where the import fails due to field names containing unsupported characters. A common example is file upload, where the file headers have high flexibility and are likely to contain special characters.

To address this issue, HENGSHI SENSE provides a feature for encoding field names, which is turned off by default. The encoding algorithm used is base62, which only includes characters a-zA-Z0-9. After enabling this feature, field names containing characters outside this range will be encoded. To enable this function, follow these steps:

  1. Modify the configuration with config_key set to engine in the configuration table of metadb
update public.configuration set config_value = config_value || '{"fieldBase62Enabled": true}'::jsonb where config_key = 'engine';
  1. Restart HENGSHI SENSE Service

After enabling this feature, subsequent import engine tasks will use this feature, and previous imports will not be affected.

HENGSHI SENSE Platform User Manual