Dataset Management
A dataset is the foundation of data modeling and serves as an important basis for data analysts to conduct data analysis. A dataset contains information such as fields, metrics, and data, providing a detailed description of granular data. Analytical metrics can also be defined within a dataset.
Dataset Display
Click the Dataset
menu in the app to enter the dataset display page. The dataset list has two display styles: one is the default card style, and the other is the list style.
Card Style
List Style
Click the list button to switch to the list style.
In the list style, Access Frequency
and Last Access Time
are also provided to help users understand the access status of resources. Users can use this data to clean up resources that have not been accessed for a long time.
Sorting
The dataset list is sorted in descending order by modification time by default.
In the list view, you can click on the column header to sort, such as sorting by Access Popularity
in the example below:
Search
Enter characters in the search box to search based on the dataset name, filtering out datasets whose names contain the entered characters.
Dataset Operations
The entry point for creating a new dataset is located in the top-right menu of the dataset list. Operations for individual datasets are available in the three-dot menu on the dataset card.
Create Dataset
Create a Blank Dataset
Click the Create Dataset
button in the upper right corner to create a dataset. For more details, see Create Dataset.
Import Dataset
Click the New Dataset
button in the upper right corner, and select Import Dataset from the dropdown menu. You can import datasets from other apps or data packages.
Tip
- Only local files, data connections, and SQL query datasets support this feature, so the imported dataset list may not include all datasets in the app.
- After importing a dataset, the dataset metrics and related data models will not be carried over to the new app.
- When the app's data permission is set to User, datasets with accelerated engines enabled cannot be imported.
Delete
When deleting a dataset, if the dataset has been referenced by associated models or charts, it cannot be deleted. Only datasets that are not referenced can be deleted. In this case, the dataset is merely moved to the app's recycle bin and is not permanently deleted.
Datasets stored in the app's recycle bin can be restored or permanently deleted. Resources in the recycle bin are retained for up to 90 days. Only users with app management and app editing permissions can access the recycle bin entry within the app.
Rename
You can rename the dataset to make it more aligned with business logic.
Create a Copy
Create a copy of the dataset to generate a new dataset.
Copy To
Copy To allows you to duplicate datasets into other apps or data packages, enabling cross-app dataset reuse.
Tips
- Only local files, data connections, and SQL query datasets support this feature; other datasets are not supported.
- When a dataset is copied to another app, the dataset metrics and related data models will not be duplicated.
- For datasets with the engine enabled, copying to another app requires the app's data permissions to be in either the app author or dataset author mode.
Replace Dataset
Replace the dataset, see Replace Dataset for details.
Hide
After processing the dataset, some intermediate datasets may no longer be needed for charting. In this case, these datasets can be hidden, and they will not be visible during chart creation.
Hidden datasets can still participate in association models or dataset processing; they are just not visible on the dashboard and chart pages.
Once a dataset is hidden, the "Hide" option in the three-dot menu changes to "Show." You can click "Show" to set the dataset to a visible state.
Set as Default Dataset
The system supports setting a dataset as the default dataset. During chart creation, the default dataset in the data package will be displayed, reducing the need to switch and search for datasets while creating charts.
Data Management
On the dataset list page, clicking on a single dataset will take you to the data management page for that dataset. This page features a two-dimensional table structure, displaying the dataset's headers and a portion of the data. Above the headers is the dataset operation menu.
Field Selection
Corresponds to the ① icon in the data management interface. Click the Field Selection
icon to open the field selection list, choose the fields that users are concerned about, and then click Apply
. The dataset page will display the data of the selected fields.
Filter Data
Corresponds to the ② icon in the data management interface. Click the Filter Data
icon to open the data filtering popup. Add filtering conditions in the popup, then click Confirm
. The dataset page will display the filtered data.
Information Display Area
Corresponds to area ③ in the data management interface. It displays the number of rows, columns, and the estimated occupied space size of the dataset. The estimated occupied space size is only shown for direct connection datasets and datasets imported into the engine. This estimated size is calculated based on the number of rows and row data values in the dataset and serves as a reference value, not representing the actual storage size in the database.
Knowledge Management
Corresponds to the ④ icon in the data management interface. Knowledge management aims to improve the accuracy of using datasets in AI analysis. It includes the following features:
- Edit
Dataset Description
: The Smart Query Assistant will use this description for relevance searches when responding, and large models will also reference it in their replies. - Edit
Weight
: The Smart Query Assistant uses this weight to calculate the relevance of this dataset when responding. The weight value is an integer ranging from 1 to 100. - Trigger the
Intelligent Learning
task to extract knowledge from the dataset content, enhancing the accuracy of the Smart Query Assistant when answering questions. Additionally, you canView Tasks
,View Learning Results
, and directly activate theSmart Query
function for the current dataset.
Data Management
For file-type datasets, the Data Management
menu provides the functionality to Append File Data and Export Data
. The Export Data
feature allows you to export the dataset's data into an Excel spreadsheet.
For non-file-type datasets, the Data Management
menu provides the functionality to Edit Dataset
and Export Data
. When editing a dataset, reference checks are performed, and fields that are referenced elsewhere cannot be removed and must be retained.
Data Information
The data information section displays metadata about the dataset, including:
- Dataset Name
- Dataset Type: Indicates the source of the dataset, such as data connection, local file, SQL query, etc. Icons represent the storage type of the dataset.
- Data Connection: The data connection used by the dataset.
- Rows/Columns/Size: Displays the space occupied by the dataset in the system.
- Original Table: The name of the original table for the data connection dataset.
- Engine Table: The table name in the engine for datasets stored as engine connections.
- Enable Acceleration Engine: Turn on/off the acceleration engine.
- Public Dictionary: Enables the public dictionary, allowing modeling between different data sources. The public dictionary table requires that the columns in the dataset contain only numeric, text, or date-type fields, and the total number of rows must not exceed 500.
- Inherit Upstream Permissions: When enabled, for transformed datasets (multi-table joins, data aggregation, data merging, row-to-column, column-to-row), the permission settings of upstream datasets can be inherited. If the current dataset or its downstream datasets have already been imported into the engine, the inherit upstream permissions feature becomes invalid.
- Update Time: The time when the dataset content was last updated.
- Data Update:
- Immediate Update: For datasets imported into the engine, an update task will be initiated to re-examine the data from the data source and generate the information needed for data exploration. For direct connection datasets, immediate updates will refresh metadata information.
- Update Schedule: Set an update schedule, see Update Schedule.
- Update Status: Indicates the status of the most recent data update operation, such as completed, failed, pending, or in progress.
Field Management
Field management primarily involves operations on fields, such as field grouping, creating new fields, modifying field types, etc. For detailed instructions, refer to Field Management.
Metric Management
Metric management involves operations on dataset metrics, including creating new metrics, metric grouping, etc. For detailed instructions, refer to Metric Management.