数据代理

Overview

In HENGSHI SENSE, leveraging the capabilities of large models, the Data Agent can help users make full use of their data. Based on a conversational interaction experience, the Data Agent can assist users in completing tasks ranging from instant analysis of business data to metric creation and dashboard generation. We will continue to integrate Agent capabilities into the product, aiming to enhance the efficiency of data analysts and data managers, while simplifying workflows and complex tasks.

Installation and Configuration

Prerequisites

Please ensure the following steps are completed to make the Data Agent operational:

Installation and Startup: Complete the installation of the HENGSHI service by following the Installation and Startup Guide.
AI Deployment: Complete the installation and deployment of related services by following the AI Deployment Documentation.

Configure Large Model

On the System Settings - Feature Configuration - Data Agent page, configure the relevant information.

Data Agent Feature Configuration

User Guide

Before using the Data Agent, some data preparation is required to ensure that the Data Agent understands the unique business context, prioritizes accurate information, and provides consistent, reliable, and goal-aligned responses.

Preparing data for the Data Agent lays the foundation for a high-quality, practical, and context-aware Data Agent experience. If the data is disorganized or ambiguous, the Data Agent may struggle to accurately comprehend it, resulting in responses that are either superficial, factually incorrect, or even misleading.

By dedicating effort to proper data preparation, the Data Agent can fully grasp the business context, accurately extract key information, and deliver responses that are not only stable and reliable but also highly aligned with your objectives, maximizing the effectiveness of the Data Agent.

Note

AI behavior is inherently unpredictable. Even with the same input, AI does not always generate identical responses.

Writing Prompts for AI

Industry Terminology and Private Domain Knowledge

To enable the large model to perform at its best, we provide a feature to configure prompts in the Data Agent Console under system settings. You can use natural language in the UserSystem Prompts to provide the large model with information such as, but not limited to, your company's industry background, business logic, analytical direction guidance, and specific instructions. The Data Agent will utilize these directives to understand your organization's internal language habits, professional terminology, and analytical priorities, thereby improving the quality and relevance of its responses.

Prompts can help the Data Agent respond based on your industry, strategic goals, terminology, or operational logic, ensuring users receive more accurate and relevant data analysis. For example:

"Big Sale" refers to the period from October 11 to November 11 each year.
When users mention product-related questions, please retrieve both the product name and product ID.

UserSystem Prompts

Dataset Analysis Rules

In the Knowledge Management of the dataset, you can use natural language to describe in detail the purpose of the dataset, implicit rules (such as filter conditions), synonyms, and the corresponding fields and metrics for specific business terms, guiding the Data Agent on how to perform certain types of analysis. For example:

"Small orders" refer to orders where the total quantity under the same order number is less than or equal to 2.
"Fiscal year" refers to the period starting from December 1 of the previous year to November 30 of the current year. For example, the 2025 fiscal year refers to 2024/12/1 to 2025/11/30, and the 2024 fiscal year refers to 2023/12/1 to 2024/11/30.
When asked about AAA, also list metrics such as BBB, CCC, and DDD.

Dataset Prompt

Note

Understanding the best practices of prompt engineering is crucial. AI may be sensitive to the prompts it receives, and the construction of prompts can affect AI's understanding and output. The characteristics of prompts are as follows:

Clear and specific
Use analogies and descriptive language
Avoid ambiguity
Use markdown to write in a structured, topic-based manner
Break down complex instructions into simple steps whenever possible

Preparing Data for AI

Data Vectorization

To efficiently and accurately locate the most relevant information within massive data assets, it is recommended to perform "vectorization" on the data. Vectorization converts the text information of field/atomic metric names, descriptions, and field values into computable semantic vectors and writes them into a vector database. This enables the Data Agent to perform retrieval and recall based on semantics rather than relying solely on keyword matching.

Benefits of Vectorization:

Higher Relevance: Understands synonyms, industry terms, and context, reducing missed and false detections.
Faster Response: Narrows the search scope, reducing the context-filling cost of large models.
Greater Scalability: Supports semantic associations and knowledge linking across datasets, adapting to multilingual scenarios.
Continuous Optimization: Enhances Q&A quality over time by combining human review results with "intelligent learning" tasks.

Steps to Operate:

Navigate to the target dataset page and click "Vectorization" in the action bar.
Check the progress in System Settings - Task Management - Execution Plan, and enable scheduled tasks as needed to improve recall stability and coverage.

Data Vectorization

Note

The maximum number of distinct field values for vectorization is 100,000.

Data Management

Effective data management is the foundation for Data Agent to correctly understand business semantics and metric definitions. By standardizing naming conventions, completing field/metric descriptions, setting appropriate data types, and hiding or cleaning irrelevant objects, you can significantly improve query relevance and response speed, reduce large model context costs, and minimize misunderstandings. It is recommended to perform self-checks using the following checklist before publishing datasets and during routine maintenance. Combining this with "Data Vectorization" and "Intelligent Learning" will yield even better results.

Dataset Naming: Ensure dataset names are concise and clearly reflect their purpose.
Field Management: Ensure field names are concise and descriptive, avoiding special characters. Provide detailed explanations of field purposes in the Field Description, such as "Default field for timeline." Additionally, field types should align with their intended use, e.g., fields requiring summation should use numeric types, and date fields should use date types, etc.
Metric Management: Ensure atomic metric names are concise and descriptive, avoiding special characters. Provide detailed explanations of metric purposes in the Atomic Metric Description.
Field Hiding: For fields not involved in queries, it is recommended to hide them to reduce the number of tokens sent to the large model, improve response speed, and lower costs.
Distinguishing Fields and Metrics: Ensure field names and metric names are not similar to avoid confusion. Fields not required for answering questions should be hidden, and unnecessary metrics should be deleted.
Intelligent Learning: It is recommended to trigger the "Intelligent Learning" task to convert general examples into dataset-specific examples. After execution, manually review the learning results and perform add, delete, or modify operations to enhance the assistant's capabilities.

Enhancing Understanding of Complex Calculations

Predefine reusable business metrics on the data side and expose them in the form of Metrics to achieve higher accuracy, stability, and interpretability in query scenarios.

Practical Recommendations:

Provide unified definitions for industry-specific metrics (e.g., financial risk control, advertising campaigns, e-commerce conversion) and maintain synonym mappings in the Knowledge Management of the dataset.
Establish mappings between "business terms → metrics" for easily confused concepts (e.g., "conversion rate," "ROI," "repurchase rate") to avoid free-form field combinations in models.
Prioritize using "metrics" to carry definitions rather than temporary calculation expressions in single conversations; for critical metrics, establish versioning and change logs to prevent definition drift.

Example (ROI):

Advertising/E-commerce: ROI = GMV ÷ Advertising Cost. Clearly specify in the metric description whether it includes coupons, deducts refunds and shipping costs, includes platform service fees, uses "payment time/order time" as the statistical basis, and the time window (e.g., day/week/month).
Manufacturing/Projects: ROI = (Revenue − Cost) ÷ Cost, with the window being the full project cycle or financial period.

Use Cases

The agent mode of Data Agent has the following features:

No Restriction on Conversation Source

Data Agent autonomously determines user intent based on the input content, breaks down user requirements, and performs hybrid searches within the user's authorized data scope across the dataset marketplace, app marketplace, and app creation. It then analyzes and queries data from the target data sources to provide answers.

Complex Problem Decomposition

Data Agent supports not only regular data query tasks but also allows users to input multiple questions at once, especially when there are logical relationships between them. Depending on the complexity of the requirements, Data Agent may perform one or multiple data queries.

Context Awareness

Data Agent can read the information of the logged-in account, seamlessly understanding indicative pronouns in user input (e.g., "my department" or other user attributes). Additionally, it can read the information of the page the user is currently viewing. When users are on specific pages such as data packages, datasets, or dashboards, the Agent will directly interact based on the information of the current page when handling data query requests.

With these capabilities, Data Agent can transform into multiple roles, such as a visual creation assistant, metric creation assistant, or analyst assistant.

Intelligent Query

After upgrading the Data Agent, Intelligent Query is no longer limited to a specific data range, and there is no need to manually select a range to start querying. This means the agent's tasks will involve finding content, performing ad-hoc analysis, or providing insights.

Dashboard Conversation Example

Visualization Creation

Data Agent allows users to create dashboards from scratch on the dashboard list page based on their needs or directly edit existing dashboards. Whether it's chart creation, adding filters, analyzing data with rich text reports, adjusting dashboard layouts, modifying colors, or performing batch operations on controls.

Dashboard Creation Example

Intelligent Interpretation

To facilitate business users in performing business data analysis, periodic reviews, and data interpretation using Data Agent, we have added the "Intelligent Interpretation" configuration and shortcut button. On the dashboard page, a "Intelligent Interpretation" button will appear in Data Agent. By clicking it, Data Agent will follow the pre-configured interpretation logic to perform real-time data queries, anomaly detection, decomposition, and drill-down, ultimately providing an interpretation report.

Dashboard Data Interpretation

In the dashboard editing mode, you can click the dropdown menu in the upper-right corner to open the "Intelligent Interpretation Configuration." Here, users can configure fixed interpretation logic based on their business needs or click a button to let AI analyze the dashboard structure and data to generate an interpretation logic template. Each chart in the dashboard also supports individual interpretation logic configuration. The "Intelligent Interpretation" button in the upper-right corner of the chart control can be clicked to invoke Data Agent and send interpretation commands.

Dashboard Data Interpretation Settings

The Intelligent Interpretation feature leverages artificial intelligence technology to perform automated analysis on user-specified data ranges. Its core capabilities and boundaries are as follows:

Data Query and Extraction: Quickly locate and extract relevant information from data sources based on user instructions or built-in analysis logic.
Data Summarization and Induction: Integrate, summarize, and condense query results across multiple dimensions to reveal key facts, patterns, and current states in the data.
Generate Descriptive Reports: Output analysis results in the form of structured reports or concise text summaries, helping users understand "what happened in the past" and "what the current situation is."

Note that Intelligent Interpretation does not perform predictive inference. This feature strictly analyzes existing and historical data, and its output is a description and summary of established facts. It cannot predict future data trends, business outcomes, or any probabilistic events that have not yet occurred.

Note

Complex reports and complex tables are not supported by Intelligent Interpretation.

Expression Writing

Data Agent, based on its understanding of HQL, can assist users in writing complex expressions and creating metrics.

Metric Creation Example

Debugging and Tuning

Agent and Workflow support different prompt instructions for performance tuning. Refer to the specific documentation Agent Tuning, Workflow Tuning.

Integrating ChatBI

HENGSHI SENSE offers multiple integration methods, allowing you to choose the one that best suits your needs:

IFRAME Integration

Use iframe to integrate ChatBI into your existing system, enabling seamless connection with the HENGSHI SENSE BI PaaS platform. The iframe is simple and easy to use, allowing direct utilization of HENGSHI ChatBI's conversation components, styles, and features without requiring additional development in your system.

SDK Integration

By integrating ChatBI into your existing system through the SDK, you can implement more complex business logic and achieve finer control, such as customizing the UI. The SDK offers a wealth of configuration options to meet personalized needs. Depending on your development team's tech stack, choose the appropriate SDK integration method. We provide two JavaScript SDKs: Native JS SDK and React JS SDK.

How to choose which SDK to use?

The difference between Native JS and React JS lies in that Native JS is pure JavaScript and does not depend on any framework, while React JS is JavaScript based on the React framework, requiring React to be installed first.

The Native JS SDK provides UI, functionality, and integration similar to iframe. It directly uses the HENGSHI ChatBI's conversation components, styles, and features. However, through JavaScript control, SDK initialization parameters, and other methods, it allows for custom API requests, request interception, and more.

The React JS SDK, on the other hand, only provides the Completion UI component and the useProvider hook, making it suitable for use in your own React projects.

API Integration

Integrate ChatBI capabilities into your Feishu, DingTalk, WeCom, or Dify workflow through the Backend API to implement customized business logic. For the Dify workflow tool, refer to the attachment HENGSHI AI Workflow Tool v1.0.1.zip.

Enterprise Instant Messaging Tool Data Q&A Bot

You can create an intelligent data Q&A bot through the Enterprise Instant Messaging Tool Data Q&A Bot, linking relevant data in HENGSHI ChatBI to enable intelligent data Q&A within instant messaging tools. Currently supported enterprise instant messaging tools include WeCom, Lark, and DingTalk.

FAQ

How to Troubleshoot Query Failures and Errors?

Failures and errors involve diagnosing multiple aspects. When encountering issues, collect the following information and contact the support engineer:

Click the three-dot menu below the dialog card, select "Execution Log," and then click "Copy Full Log."

Copy Execution Log

Press the F12 key on your keyboard or right-click and select "Inspect" to open the browser console. Navigate to "Network" - "Fetch/XHR."

Open Browser Console

Reproduce the error by querying again, then right-click the failed network request and select "Copy" - "Copy Response."

Copy Network Request Response

Go to "System Settings" - "Intelligent Operations" - "System Debugging," set "Unified Settings" to "DEBUG," enable "Real-Time Debugging," reproduce the error by querying again, and then click "Export Logs."

Real-Time Debugging Logs

How to Fill in the Vector Database Address?

Simply follow the AI Assistant Deployment Documentation to complete the installation and deployment of the related services. No manual input is required.

Does it support other vector models?

Currently, it is not supported. If needed, please contact the support engineer.

Capability	Data Agent Sidebar	ChatBI
Intelligent querying with specified data sources	✅	✅
Intelligent querying without data source limitations	✅	❌
One-click dashboard creation from conversation charts	❌	✅
Visualized assistance for creation	✅	❌
Metric assistance for creation	✅	❌
Intelligent interpretation	✅	❌

What are the differences between Agent Mode, Workflow Mode, and API Mode?

Capability	Agent Mode	Agent API Mode	Workflow and Workflow API Mode
Intelligent Q&A with specified data sources	✅	✅	✅
Intelligent Q&A without data source limitations	✅	✅	❌
Visualized assistance for creation	✅	❌	❌
Metric assistance for creation	✅	❌	❌
Intelligent interpretation	✅	❌	❌

User Manual

SDK

ChatBot

Integration

Data Reporting

Create Dataset

Dataset Management

Function List

Dashboard Creation

Chart Controls

Advanced Chart Calculations

Functionality

Display Controls

App Settings

Data Agent

Model Providers

数据代理 ​

Overview ​

Installation and Configuration ​

Prerequisites ​

Configure Large Model ​

User Guide ​

Writing Prompts for AI ​

Industry Terminology and Private Domain Knowledge ​

Dataset Analysis Rules ​

Preparing Data for AI ​

Data Vectorization ​

Data Management ​

Enhancing Understanding of Complex Calculations ​

Use Cases ​

Intelligent Query ​

Visualization Creation ​

Intelligent Interpretation ​

Expression Writing ​

Debugging and Tuning ​

Integrating ChatBI ​

IFRAME Integration ​

SDK Integration ​

How to choose which SDK to use? ​

API Integration ​

Enterprise Instant Messaging Tool Data Q&A Bot ​

FAQ ​

How to Troubleshoot Query Failures and Errors? ​

How to Fill in the Vector Database Address? ​

Does it support other vector models? ​

What are the differences between the Data Agent Sidebar and ChatBI? ​

What are the differences between Agent Mode, Workflow Mode, and API Mode? ​

数据代理

Overview

Installation and Configuration

Prerequisites

Configure Large Model

User Guide

Writing Prompts for AI

Industry Terminology and Private Domain Knowledge

Dataset Analysis Rules

Preparing Data for AI

Data Vectorization

Data Management

Enhancing Understanding of Complex Calculations

Use Cases

Intelligent Query

Visualization Creation

Intelligent Interpretation

Expression Writing

Debugging and Tuning

Integrating ChatBI

IFRAME Integration

SDK Integration

How to choose which SDK to use?

API Integration

Enterprise Instant Messaging Tool Data Q&A Bot

FAQ

How to Troubleshoot Query Failures and Errors?

How to Fill in the Vector Database Address?

Does it support other vector models?

What are the differences between the Data Agent Sidebar and ChatBI?

What are the differences between Agent Mode, Workflow Mode, and API Mode?