Cluster Installation
This article describes the installation process of HENGSHI SENSE in a cluster environment.
Before installation, please confirm the network environment. If it is an isolated environment and cannot connect to the internet, please first follow the guidance in Install Dependencies Offline to install the dependency packages, and then continue with the instructions in this article. If the network environment can connect to the internet, please proceed directly with the instructions in this article for installation.
Preparations
Before the integrated installation, please complete the following preparations.
Environment Preparation
Please follow the steps below to prepare the environment.
- First, refer to the Installation Environment document to prepare the installation environment.
- Ensure that the installation devices meet the following conditions.
- The
sudo
command is installed on each device. - A user with passwordless SSH login is established on each device.
- The user on each device is configured with passwordless
sudo
privileges. - Each device has a unique hostname.
- Firewall port restrictions are open between devices, allowing internal network communication.
- Ensure that the machine executing the cluster installation has Ansible installed.
- The
If you have already completed the environment preparation in steps 1 and 2, please ignore the following prompts and proceed directly to Configure User and Installation Directory to continue the installation. If you are unsure how to configure the conditions in step 2, you can refer to the following prompts for setup.
- Install the
sudo
command. This command needs to be executed under the root user.shellyum install -y sudo
- Create an execution user on each device. In the example, the user
hengshi
is used. This operation needs to be executed under the root user.shelluseradd -m hengshi passwd hengshi # Set hengshi login password
- Set passwordless
sudo
privileges for the execution user. This operation needs to be executed under the root user.shellEnter the following, save and exitvisudo
shellhengshi ALL=(ALL) NOPASSWD: ALL
- Ensure each device has a unique hostname. If there are duplicate hostnames, such as localhost, you need to set them. Directly edit the hostname file to modify.shell
sudo vim /etc/hostname
- Ensure each machine can communicate using the hostname. Edit the /etc/hosts file, if there is 127.0.0.1 local IP information, delete it and restart the server.shell
a.b.c.d1 ${Node-A-hostname} a.b.c.d2 ${Node-B-hostname} a.b.c.d3 ${Node-C-hostname}
- Configure the execution user to ensure passwordless SSH login on each machine. Assume the cluster consists of three machines: Node-A, Node-B, Node-C, and the user
hengshi
runs on each machine. (In actual situations, use the hostname configured for the machine to replace Node-A, Node-B, Node-C)- Enter the
hengshi
password as prompted - When prompted with the following message, enter yes. "Are you sure you want to continue connecting (yes/no)?" Enter yes
- Execute the ssh-copy-id operation on the local machine IP, for example, on Node-A, execute 'ssh-copy-id hengshi@Node-A'
shelltest -e ~/.ssh/id_rsa || { yes "" | ssh-keygen -t rsa -q -P ''; } ssh-copy-id hengshi@localhost ssh-copy-id hengshi@127.0.0.1 ssh-copy-id hengshi@${Node-A-hostname} ssh-copy-id hengshi@${Node-A-ip> ssh-copy-id hengshi@${Node-B-hostname} ssh-copy-id hengshi@${Node-B-ip> ssh-copy-id hengshi@${Node-C-hostname} ssh-copy-id hengshi@${Node-C-ip>
- Enter the
- Install Ansible on the machine executing the installation deployment.shell
sudo yum install -y epel-release sudo yum install -y ansible
Configure User and Installation Directory
The following operations should be performed with sudo or root privileges.
The example demonstrates how to configure users and installation directories on a cluster. The username is hengshi, and the installation directory is /opt/hengshi. Assuming there are three nodes A, B, and C, the user executes the following operations on different nodes.
for x in ${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname}; do
ssh $x "grep hengshi /etc/passwd > /dev/null || sudo useradd -m hengshi"
# Create hengshi user, set installation directory and permissions
ssh $x "sudo mkdir -p /opt/hengshi && sudo chown hengshi:hengshi /opt/hengshi"
done
SSH Login Confirmation
Assuming three machines A, B, and C, execute the following code for login confirmation.
nodes=(${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname})
for host in ${nodes[@]}; do
ssh $host "for x in ${nodes[@]}; do ssh-keygen -R \$x; ssh-keyscan -H \$x >> ~/.ssh/known_hosts; done"
done
SSHD Listening on Non-22 Port on the Server
The machines involved in the installation include the local machine and those configured in the HS_ENGINE_SEGMENTS variable. If there are cases where the SSH port is not 22, you need to configure the actual port for each host in the deployment user's ~/.ssh/config.
The local machine needs to configure the ports for the domain names returned by the localhost and hostname commands.
For example: The local machine is configured with the hostname as localhost, and HS_ENGINE_SEGMENTS includes machines A, B, and C, all listening on port 122.
The .ssh/config file needs to include the following configuration and be synchronized to the .ssh/config on each machine.
Host localhost
Port 122
Host ${Node-A-hostname}
Port 122
Host ${Node-B-hostname}
Port 122
Host ${Node-C-hostname}
Port 122
Set Cluster Information
Set the cluster information on the machine where the deployment command needs to be executed.
Create a cluster configuration directory. It is recommended that this directory be at the same level as the installation package extraction directory for easy reuse of configurations during upgrades. You can refer to the example below, where the installation package extraction directory is hengshi-sense-[version].
shellmkdir hengshi-sense-[version]/../cluster-conf cd hengshi-sense-[version] cp ansible/hosts.sample ../cluster-conf/hosts cp ansible/vars.yml.sample ../cluster-conf/vars.yml
Configure hosts. Follow the instructions in the example to operate.
[metadb] #Internal metadata database ${Node-A-hostname} #[metaslave] #metadb database slave (optional) can be used as a standby when the master is down #${Node-B-hostname} [engine] #Specify one as master ${Node-A-hostname} master=true ${Node-B-hostname} ${Node-C-hostname} #Note that the number of doris-fe needs to be configured as an odd number [doris-fe] # It is recommended to use IP information for configuration, hostname configuration may cause startup failure ${Node-A-hostname} master=true ${Node-B-hostname} ${Node-C-hostname} [doris-be] # It is recommended to use IP information for configuration, hostname configuration may cause startup failure ${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname} [minio] ${Node-A-hostname} [gateway] ${Node-A-hostname} [redis] ${Node-A-hostname} [flink] ${Node-A-hostname} [hengshi] ${Node-A-hostname} ${Node-B-hostname} ${Node-C-hostname}
Configure vars.yaml. Configure vars.yaml according to the instructions in the example below.
yamltemp_work_dir_root: "/tmp" #Temporary directory, generally does not need to be changed install_path: "/opt/hengshi" #Installation target directory gateway_port: 8080 hengshi_sense_port: 8081 metadb_port: 54320 zookeeper_client_port: 2181 engine_master_port: 15432 engine_segment_base_port: 25432
Installation
Follow the instructions below to complete the installation process.
- Set the environment variable ANSIBLE_PLAYBOOK.shell
export ANSIBLE_PLAYBOOK="ansible-playbook -v"
- Switch to the user executing the installation, in this example, the username is hengshi.shell
sudo su - hengshi
- Navigate to the target directory where the installation package is extracted.shell
cd ~/pkgs/hengshi-sense-[version]
- Execute the cluster installation command.shellThe installation process will display prompt messages. When the status of each node is [unreachable=0,failed=0], it indicates a successful installation.
./hs_install -m cluster -c ../cluster-conf # Execute cluster installation
shellPLAY RECAP **************************************************************** Node-A : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 Node-B : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 Node-C : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
Configure the System
Before starting the service, please read the Configuration File to set the relevant configurations. If the built-in engine type requires Doris, please read the Doris Engine Configuration.
Start the Service
Please follow the steps below to start the service.
Initialize the OS.
During initialization, please ensure that the executing user has sudo privileges. After initialization is complete, you can revoke sudo privileges. Switch to the executing user, navigate to the installation directory, and execute the OS initialization command. You can refer to the following example, where the executing user is hengshi and the installation directory is /opt/hengshi.
shellsudo su - hengshi cd /opt/hengshi bin/hengshi-sense-bin init-os all # Initialize OS
Tip
In an offline environment, you can execute bin/hengshi-sense-bin init-os all-offline to skip dependency installation.
Check the prompt message. When the status of each node shows [unreachable=0,failed=0], it indicates that the OS initialization was successful.
shTASK [deploy : init-os kernel] ******************************************************************************************************************************************************************************************************************************** changed: [Node-A] changed: [Node-B] changed: [Node-C] TASK [deploy : init-os deps] ********************************************************************************************************************************************************************************************************************************** changed: [Node-A] changed: [Node-B] changed: [Node-C] PLAY RECAP **************************************************************************************************************************************************************************************************************************************************** Node-A : ok=5 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Node-B : ok=5 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Node-C : ok=5 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Initialize HENGSHI SENSE.
Switch to the executing user and navigate to the installation directory, then execute the HENGSHI SENSE initialization command. You can refer to the following example, where the executing user is hengshi and the installation directory is /opt/hengshi.
shsudo su - hengshi cd /opt/hengshi bin/hengshi-sense-bin init all # Initialize HENGSHI SENSE
Check the prompt message. When the status of each node shows [unreachable=0 failed=0], it indicates that the HENGSHI SENSE initialization was successful.
shTASK [operations : metadb init] ******************************************************************************************************************************************************************************************************************************* skipping: [Node-A] skipping: [Node-B] skipping: [Node-C] TASK [operations : engine init] ******************************************************************************************************************************************************************************************************************************* skipping: [Node-A] skipping: [Node-B] skipping: [Node-C] PLAY RECAP **************************************************************************************************************************************************************************************************************************************************** Node-A : ok=1 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 Node-B : ok=1 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 Node-C : ok=2 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Start the service before entering the license.
Before entering the license, the system does not support multi-node operation. You need to start the service of one instance (e.g., Node-A) first, and then start the services of all instances after updating the license.
shellcd /opt/hengshi bin/hengshi-sense-bin start metadb bin/hengshi-sense-bin start engine bin/hengshi-sense-bin start zookeeper bin/hengshi-sense-bin start minio bin/hengshi-sense-bin start redis bin/hengshi-sense-bin start flink ansible-playbook ansible/site.yml -i ansible/hosts --tags start-hengshi -e "target=hengshi" --limit "Node-A";
Refer to Software License to enter the license.
After the license is successfully authorized, start HENGSHI SENSE normally.
Switch to the executing user and navigate to the installation directory, then execute the command to start HENGSHI SENSE. Please refer to the example below, where the executing user is hengshi and the installation directory is /opt/hengshi.
shsudo su - hengshi cd /opt/hengshi # Enter the installation target directory bin/hengshi-sense-bin restart hengshi # Restart the hengshi service
Check the prompt message. When the status of each node shows [unreachable=0 failed=0], it indicates that HENGSHI SENSE has started successfully.
shPLAY RECAP *********************************************************************** Node-A : ok=4 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 Node-B : ok=3 changed=2 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0 Node-C : ok=3 changed=2 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
You can access the service address through a browser to use the HENGSHI SENSE service. If you cannot access it, please check whether the service port HS_HENGSHI_PORT in the configuration file conf/hengshi-sense-env.sh is open to the outside.
Operations After Starting the Service
When the HENGSHI SENSE service is running, it is necessary to regularly back up data to prevent data loss and promptly clean up useless logs to free up storage space.
Regular Data Backup.
It is recommended to back up the database metadb daily, either to a local device or a remote device. Regular backups are suggested during non-peak business hours, such as midnight, to avoid affecting users' use of the service. The following example is a command executed to back up data to a remote device at midnight every day. For detailed parameter explanations, please refer to Data Backup.
sh0 0 * * * /opt/hengshi/bin/dbbackup.sh -m metadb -l /BACKUP/PATH -h $REMOTE_IP -r /BACKUP/PATH
Regular Log Cleanup.
During operation, HENGSHI SENSE generates operational logs that need to be cleaned regularly to free up storage space. The following example is a command to regularly clean the rolling logs of the internal database daily.
sh0 0 * * * /opt/hengshi/bin/clean_engine.sh -t -r -c -g -p */5 * * * * /opt/hengshi/bin/clean_engine.sh -l
Port Opening Instructions in Public Network Situations.
In a public network environment, do not expose the entire service port of HENGSHI unless necessary, to avoid attacks due to component reasons. In special cases, the web service port can be accessed in the form of IP+port.
Stop Service
Stop the cluster service with the following command.
bin/hengshi-sense-bin stop all
The service is successfully stopped when the status of each node in the prompt message is [unreachable=0 failed=0].
PLAY RECAP ****************************************************************
Node-A : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
Node-B : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
Node-C : ok=18 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
Check Service Running Status
You can check the service running status by executing the following command.
bin/hengshi-sense-bin status all
In the displayed information, you can view the running information of HENGSHI modules such as metadb, engine, zookeeper, gateway, minio, redis, and flink. "IS ACTIVE" indicates that the corresponding module is running, "NOT ACTIVE" indicates that the corresponding module has stopped, and "skipping" indicates that the node has not installed the corresponding module.
TASK [operations : metadb status msg] ******************************************************************************************
ok: [Node-A] => {
"msg": [
"[metadb]: NOT ACTIVE"
]
}
skipping: [Node-B]
skipping: [Node-C]
TASK [operations : engine status msg] ******************************************************************************************
ok: [Node-B] => {
"msg": [
"[engine]: NOT ACTIVE"
]
}
TASK [operations : zookeeper status msg] ***************************************************************************************
ok: [Node-A] => {
"msg": [
"[zookeeper]: NOT ACTIVE"
]
}
ok: [Node-B] => {
"msg": [
"[zookeeper]: NOT ACTIVE"
]
}
ok: [Node-C] => {
"msg": [
"[zookeeper]: NOT ACTIVE"
]
}
TASK [operations : gateway status msg] *****************************************************************************************
ok: [Node-A] => {
"msg": [
"[gateway]: NOT ACTIVE"
]
}
skipping: [Node-B]
skipping: [Node-C]
TASK [operations : hengshi sense status msg] **********************************************************************************
ok: [Node-A] => {
"msg": [
"[syslog]: NOT ACTIVE",
"[hengshi]: NOT ACTIVE",
"[watchdog]: NOT ACTIVE"
]
}
ok: [Node-B] => {
"msg": [
"[syslog]: NOT ACTIVE",
"[hengshi]: NOT ACTIVE",
"[watchdog]: NOT ACTIVE"
]
}
skipping: [Node-C]
TASK [operations : redis status msg] *****************************************************************************************
ok: [Node-A] => {
"msg": [
"[redis]: NOT ACTIVE"
]
}
skipping: [Node-B]
skipping: [Node-C]
TASK [operations : minio status msg] *****************************************************************************************
ok: [Node-A] => {
"msg": [
"[minio]: NOT ACTIVE"
]
}
skipping: [Node-B]
skipping: [Node-C]
TASK [operations : flink status msg] *****************************************************************************************
ok: [Node-A] => {
"msg": [
"[flink]: NOT ACTIVE"
]
}
skipping: [Node-B]
skipping: [Node-C]