Dataset
A dataset is a collection of data, serving as a core component for data analysis. Subsequent data exploration and data management are all based on datasets.
Depending on the source and purpose of the dataset, the following methods for creating datasets are provided:
- File Upload: Supports directly uploading files in CSV, XLS, XLSX, and XLSM formats to create datasets. Users with the
Data Analysis
role can directly use this feature to create datasets. When creating a local file dataset, the file can be uploaded to a data source. Supported data sources for file uploads include engine connections (built-in data connections), Greenplum, PostgreSQL, and Amazon Redshift. However, the application must already have a homologous dataset established to upload local files to the corresponding non-built-in data sources. - Data Connection: Allows connections to various relational databases in the enterprise, such as Oracle, SQL Server, MySQL, etc.; NoSQL databases, such as Elastic Search, Solr, MongoDB, etc.; and big data platforms, such as Hive, Impala, etc. Then, using a visual graphical interface, appropriate subsets of data can be selected to create datasets.
- SQL Query: Based on data connections, datasets can be created using custom SQL statements. This method is suitable for users who have a deep understanding of SQL language.
- API Query: When creating a dataset, the API query feature can convert an HTTP JSON API into a dataset.
- Multi-Table Union: Combines multiple already created datasets to generate a new dataset.
- Data Aggregation: Based on already created datasets, new datasets can be created through aggregation, suitable for data analysis on datasets with many columns.
- Data Union: Combines data from multiple datasets into a single dataset.