Dataset
A dataset is a collection of data and serves as the core component of data analysis. Subsequent data exploration and data management are all based on datasets.
Depending on the source and purpose of the dataset, the following methods for creating datasets are provided:
- File Upload: Supports direct upload of files in CSV, XLS, XLSX, XLSM formats to create datasets. All users with the
Data Scientist
role can directly use this feature to create datasets. When creating a local file dataset, files can be uploaded to various data sources. For specific supported data sources and required operations, please refer to the relevant documentation on file upload datasets. - Data Connection: Allows connection to various relational databases within the enterprise, such as Oracle, SQL Server, MySQL, etc.; NoSQL databases, such as Elastic Search, Solr, MongoDB, etc.; big data platforms, such as Hive, Impala, etc. Then, using a visual graphical interface, you can select the appropriate data subset to create a dataset.
- SQL Query: Based on data connections, datasets can be created using custom SQL statements. This method is suitable for users with a deep understanding of SQL language.
- API Query: When creating a dataset, the API query feature can convert an HTTP JSON API into a dataset.
- Multi-Table Fusion: Combines multiple pre-created datasets to generate a new dataset.
- Data Aggregation: Creates a new dataset by aggregating an existing dataset. This method is suitable for analyzing datasets with a large number of columns.
- Data Union: Consolidates data from multiple datasets into a single dataset.
- Pivot: Performs row-to-column operations on an existing dataset to generate a new dataset.
- Unpivot: Performs column-to-row operations on an existing dataset to generate a new dataset.