Skip to content

Cluster Installation

This article describes the installation process of HENGSHI SENSE in a cluster environment.

Before installation, please confirm the network environment. If it is an isolated environment and cannot connect to the internet, please first follow the guidance in Install Dependencies Offline to install the dependency packages, and then continue with the instructions in this article. If the network environment can connect to the internet, please proceed directly with the instructions in this article for installation.

Preparations

Before the integrated installation, please complete the following preparations.

Environment Preparation

Please follow the steps below to prepare the environment.

  1. First, refer to the Installation Environment document to prepare the installation environment.
  2. Ensure that the installation devices meet the following conditions.
    • The sudo command is installed on each device.
    • A user with passwordless SSH login is established on each device.
    • The user on each device is configured with passwordless sudo privileges.
    • Each device has a unique hostname.
    • Firewall port restrictions are open between devices, allowing internal network communication.
    • Ensure that the machine executing the cluster installation has Ansible installed.

If you have already completed the environment preparation in steps 1 and 2, please ignore the following prompts and proceed directly to Configure User and Installation Directory to continue the installation. If you are unsure how to configure the conditions in step 2, you can refer to the following prompts for setup.

  1. Install the sudo command. This command needs to be executed under the root user.
    shell
    yum install -y sudo
  2. Create an execution user on each device. In the example, the user hengshi is used. This operation needs to be executed under the root user.
    shell
    useradd -m hengshi
    passwd hengshi # Set hengshi login password
  3. Set passwordless sudo privileges for the execution user. This operation needs to be executed under the root user.
    shell
    visudo
    Enter the following, save and exit
    shell
    hengshi ALL=(ALL)       NOPASSWD: ALL
  4. Ensure each device has a unique hostname. If there are duplicate hostnames, such as localhost, you need to set them. Directly edit the hostname file to modify.
    shell
    sudo vim /etc/hostname
  5. Ensure each machine can communicate using the hostname. Edit the /etc/hosts file, if there is 127.0.0.1 local IP information, delete it and restart the server.
    shell
    a.b.c.d1 ${Node-A-hostname}
    a.b.c.d2 ${Node-B-hostname}
    a.b.c.d3 ${Node-C-hostname}
  6. Configure the execution user to ensure passwordless SSH login on each machine. Assume the cluster consists of three machines: Node-A, Node-B, Node-C, and the user hengshi runs on each machine. (In actual situations, use the hostname configured for the machine to replace Node-A, Node-B, Node-C)
    • Enter the hengshi password as prompted
    • When prompted with the following message, enter yes. "Are you sure you want to continue connecting (yes/no)?" Enter yes
    • Execute the ssh-copy-id operation on the local machine IP, for example, on Node-A, execute 'ssh-copy-id hengshi@Node-A'
    shell
    test -e ~/.ssh/id_rsa || { yes "" | ssh-keygen -t rsa -q -P ''; }
    ssh-copy-id hengshi@localhost
    ssh-copy-id hengshi@127.0.0.1
    ssh-copy-id hengshi@${Node-A-hostname}
    ssh-copy-id hengshi@${Node-A-ip>
    ssh-copy-id hengshi@${Node-B-hostname}
    ssh-copy-id hengshi@${Node-B-ip>
    ssh-copy-id hengshi@${Node-C-hostname}
    ssh-copy-id hengshi@${Node-C-ip>
  7. Install Ansible on the machine executing the installation deployment.
    shell
    sudo yum install -y epel-release
    sudo yum install -y ansible

Configure User and Installation Directory

The following operations should be performed with sudo or root privileges.

The example demonstrates how to configure users and installation directories on a cluster. The username is hengshi, and the installation directory is /opt/hengshi. Assuming there are three nodes A, B, and C, the user executes the following operations on different nodes.

shell
for x in ${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname}; do
    ssh $x "grep hengshi /etc/passwd > /dev/null || sudo useradd -m hengshi"
    # Create hengshi user, set installation directory and permissions
    ssh $x "sudo mkdir -p /opt/hengshi && sudo chown hengshi:hengshi /opt/hengshi"
done

SSH Login Confirmation

Assuming three machines A, B, and C, execute the following code for login confirmation.

shell
nodes=(${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname})
for host in ${nodes[@]}; do
  ssh $host "for x in ${nodes[@]}; do ssh-keygen -R \$x; ssh-keyscan -H \$x >> ~/.ssh/known_hosts; done"
done

SSHD Listening on Non-22 Port on the Server

The machines involved in the installation include the local machine and those configured in the HS_ENGINE_SEGMENTS variable. If there are cases where the SSH port is not 22, you need to configure the actual port for each host in the deployment user's ~/.ssh/config.

The local machine needs to configure the ports for the domain names returned by the localhost and hostname commands.

For example: The local machine is configured with the hostname as localhost, and HS_ENGINE_SEGMENTS includes machines A, B, and C, all listening on port 122.

The .ssh/config file needs to include the following configuration and be synchronized to the .ssh/config on each machine.

Host localhost
  Port 122
Host ${Node-A-hostname}
  Port 122
Host ${Node-B-hostname}
  Port 122
Host ${Node-C-hostname}
  Port 122

Set Cluster Information

Set the cluster information on the machine where the deployment command needs to be executed.

  • Create a cluster configuration directory. It is recommended that this directory be at the same level as the installation package extraction directory for easy reuse of configurations during upgrades. You can refer to the example below, where the installation package extraction directory is hengshi-sense-[version].

    shell
    mkdir hengshi-sense-[version]/../cluster-conf
    cd hengshi-sense-[version]
    cp ansible/hosts.sample ../cluster-conf/hosts
    cp ansible/vars.yml.sample ../cluster-conf/vars.yml
  • Configure hosts. Follow the instructions in the example to operate.

    [metadb] #Internal metadata database
    ${Node-A-hostname}
    
    #[metaslave] #metadb database slave (optional) can be used as a standby when the master is down
    #${Node-B-hostname}
    
    [engine] #Specify one as master
    ${Node-A-hostname} master=true
    ${Node-B-hostname}
    ${Node-C-hostname}
    
    #Note that the number of doris-fe needs to be configured as an odd number
    [doris-fe] # It is recommended to use IP information for configuration, hostname configuration may cause startup failure
    ${Node-A-hostname} master=true
    ${Node-B-hostname}
    ${Node-C-hostname}
    
    [doris-be] # It is recommended to use IP information for configuration, hostname configuration may cause startup failure
    ${Node-A-hostname}
    ${Node-B-hostname}
    ${Node-C-hostname}
    
    [minio]
    ${Node-A-hostname}
    
    [gateway]
    ${Node-A-hostname}
    
    [redis]
    ${Node-A-hostname}
    
    [flink]
    ${Node-A-hostname}
    
    [hengshi]
    ${Node-A-hostname}
    ${Node-B-hostname}
    ${Node-C-hostname}
  • Configure vars.yaml. Configure vars.yaml according to the instructions in the example below.

    yaml
    temp_work_dir_root: "/tmp"   #Temporary directory, generally does not need to be changed
    install_path: "/opt/hengshi"  #Installation target directory
    gateway_port: 8080
    hengshi_sense_port: 8081
    metadb_port: 54320
    zookeeper_client_port: 2181
    engine_master_port: 15432
    engine_segment_base_port: 25432

Installation

Follow the instructions below to complete the installation process.

  1. Set the environment variable ANSIBLE_PLAYBOOK.
    shell
    export ANSIBLE_PLAYBOOK="ansible-playbook -v"
  2. Switch to the user executing the installation, in this example, the username is hengshi.
    shell
    sudo su - hengshi
  3. Navigate to the target directory where the installation package is extracted.
    shell
    cd ~/pkgs/hengshi-sense-[version]
  4. Execute the cluster installation command.
    shell
    ./hs_install -m cluster -c ../cluster-conf    # Execute cluster installation
    The installation process will display prompt messages. When the status of each node is [unreachable=0,failed=0], it indicates a successful installation.
    shell
    PLAY RECAP ****************************************************************
    Node-A : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
    Node-B : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
    Node-C : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

Configure the System

Before starting the service, please read the Configuration File to set the relevant configurations. If the built-in engine type requires Doris, please read the Doris Engine Configuration.

Start the Service

Please follow the steps below to start the service.

  1. Initialize the OS.

    During initialization, please ensure that the executing user has sudo privileges. After initialization is complete, you can revoke sudo privileges. Switch to the executing user, navigate to the installation directory, and execute the OS initialization command. You can refer to the following example, where the executing user is hengshi and the installation directory is /opt/hengshi.

    shell
    sudo su - hengshi
    cd /opt/hengshi
    bin/hengshi-sense-bin init-os all  # Initialize OS

    Tip

    In an offline environment, you can execute bin/hengshi-sense-bin init-os all-offline to skip dependency installation.

    Check the prompt message. When the status of each node shows [unreachable=0,failed=0], it indicates that the OS initialization was successful.

    sh
    TASK [deploy : init-os kernel] ********************************************************************************************************************************************************************************************************************************
    changed: [Node-A]
    changed: [Node-B]
    changed: [Node-C]
    
    TASK [deploy : init-os deps] **********************************************************************************************************************************************************************************************************************************
    changed: [Node-A]
    changed: [Node-B]
    changed: [Node-C]
    
    PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************
    Node-A              : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
    Node-B              : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
    Node-C              : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
  2. Initialize HENGSHI SENSE.

    Switch to the executing user and navigate to the installation directory, then execute the HENGSHI SENSE initialization command. You can refer to the following example, where the executing user is hengshi and the installation directory is /opt/hengshi.

    sh
    sudo su - hengshi
    cd /opt/hengshi
    bin/hengshi-sense-bin init all   # Initialize HENGSHI SENSE

    Check the prompt message. When the status of each node shows [unreachable=0 failed=0], it indicates that the HENGSHI SENSE initialization was successful.

    sh
    TASK [operations : metadb init] *******************************************************************************************************************************************************************************************************************************
    skipping: [Node-A]
    skipping: [Node-B]
    skipping: [Node-C]
    
    TASK [operations : engine init] *******************************************************************************************************************************************************************************************************************************
    skipping: [Node-A]
    skipping: [Node-B]
    skipping: [Node-C]
    
    PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************
    Node-A              : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
    Node-B              : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
    Node-C              : ok=2    changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
  3. Start the service before entering the license.

    Before entering the license, the system does not support multi-node operation. You need to start the service of one instance (e.g., Node-A) first, and then start the services of all instances after updating the license.

    shell
    cd /opt/hengshi
    bin/hengshi-sense-bin start metadb
    bin/hengshi-sense-bin start engine
    bin/hengshi-sense-bin start zookeeper
    bin/hengshi-sense-bin start minio
    bin/hengshi-sense-bin start redis
    bin/hengshi-sense-bin start flink
    ansible-playbook ansible/site.yml -i ansible/hosts --tags start-hengshi -e "target=hengshi"  --limit "Node-A";
  4. Refer to Software License to enter the license.

  5. After the license is successfully authorized, start HENGSHI SENSE normally.

    Switch to the executing user and navigate to the installation directory, then execute the command to start HENGSHI SENSE. Please refer to the example below, where the executing user is hengshi and the installation directory is /opt/hengshi.

    sh
    sudo su - hengshi
    cd /opt/hengshi                 # Enter the installation target directory
    bin/hengshi-sense-bin restart hengshi    # Restart the hengshi service

    Check the prompt message. When the status of each node shows [unreachable=0 failed=0], it indicates that HENGSHI SENSE has started successfully.

    sh
    PLAY RECAP ***********************************************************************
    Node-A              : ok=4    changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
    Node-B              : ok=3    changed=2    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
    Node-C              : ok=3    changed=2    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
  6. You can access the service address through a browser to use the HENGSHI SENSE service. If you cannot access it, please check whether the service port HS_HENGSHI_PORT in the configuration file conf/hengshi-sense-env.sh is open to the outside.

Operations After Starting the Service

When the HENGSHI SENSE service is running, it is necessary to regularly back up data to prevent data loss and promptly clean up useless logs to free up storage space.

  1. Regular Data Backup.

    It is recommended to back up the database metadb daily, either to a local device or a remote device. Regular backups are suggested during non-peak business hours, such as midnight, to avoid affecting users' use of the service. The following example is a command executed to back up data to a remote device at midnight every day. For detailed parameter explanations, please refer to Data Backup.

    sh
    0 0 * * * /opt/hengshi/bin/dbbackup.sh -m metadb -l /BACKUP/PATH -h $REMOTE_IP -r /BACKUP/PATH
  2. Regular Log Cleanup.

    During operation, HENGSHI SENSE generates operational logs that need to be cleaned regularly to free up storage space. The following example is a command to regularly clean the rolling logs of the internal database daily.

    sh
    0 0 * * * /opt/hengshi/bin/clean_engine.sh -t -r -c -g -p
    */5 * * * * /opt/hengshi/bin/clean_engine.sh -l
  3. Port Opening Instructions in Public Network Situations.

    In a public network environment, do not expose the entire service port of HENGSHI unless necessary, to avoid attacks due to component reasons. In special cases, the web service port can be accessed in the form of IP+port.

Stop Service

Stop the cluster service with the following command.

shell
bin/hengshi-sense-bin stop all

The service is successfully stopped when the status of each node in the prompt message is [unreachable=0 failed=0].

sh
PLAY RECAP ****************************************************************
Node-A : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
Node-B : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
Node-C : ok=18   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

Check Service Running Status

You can check the service running status by executing the following command.

shell
bin/hengshi-sense-bin status all

In the displayed information, you can view the running information of HENGSHI modules such as metadb, engine, zookeeper, gateway, minio, redis, and flink. "IS ACTIVE" indicates that the corresponding module is running, "NOT ACTIVE" indicates that the corresponding module has stopped, and "skipping" indicates that the node has not installed the corresponding module.

sh
TASK [operations : metadb status msg] ******************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[metadb]: NOT ACTIVE"
    ]
}
skipping: [Node-B]
skipping: [Node-C]

TASK [operations : engine status msg] ******************************************************************************************
ok: [Node-B] => {
    "msg": [
        "[engine]: NOT ACTIVE"
    ]
}

TASK [operations : zookeeper status msg] ***************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[zookeeper]: NOT ACTIVE"
    ]
}
ok: [Node-B] => {
    "msg": [
        "[zookeeper]: NOT ACTIVE"
    ]
}
ok: [Node-C] => {
    "msg": [
        "[zookeeper]: NOT ACTIVE"
    ]
}

TASK [operations : gateway status msg] *****************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[gateway]: NOT ACTIVE"
    ]
}
skipping: [Node-B]
skipping: [Node-C]


TASK [operations : hengshi sense status msg] **********************************************************************************
ok: [Node-A] => {
    "msg": [
        "[syslog]: NOT ACTIVE",
        "[hengshi]: NOT ACTIVE",
        "[watchdog]: NOT ACTIVE"
    ]
}
ok: [Node-B] => {
    "msg": [
        "[syslog]: NOT ACTIVE",
        "[hengshi]: NOT ACTIVE",
        "[watchdog]: NOT ACTIVE"
    ]
}
skipping: [Node-C]

TASK [operations : redis status msg] *****************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[redis]: NOT ACTIVE"
    ]
}
skipping: [Node-B]
skipping: [Node-C]

TASK [operations : minio status msg] *****************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[minio]: NOT ACTIVE"
    ]
}
skipping: [Node-B]
skipping: [Node-C]

TASK [operations : flink status msg] *****************************************************************************************
ok: [Node-A] => {
    "msg": [
        "[flink]: NOT ACTIVE"
    ]
}
skipping: [Node-B]
skipping: [Node-C]

HENGSHI SENSE Platform User Manual