Skip to content

Data Lineage

Data Lineage refers to the lifecycle of data, showcasing the transformation process from the data source to the data target. Data lineage provides a clear view of the dataset processing path, especially when the path is lengthy, allowing users to analyze the upstream datasets that the current dataset depends on and the downstream datasets it is associated with. When updating a dataset, data lineage ensures that its upstream datasets have been updated, thereby guaranteeing the timeliness of the data. Data lineage also displays the objects influenced by the dataset, including charts, datasets, data integrations, and data models.

Data lineage involves several data concepts, including data source, data target, and transformation relationships:

In the Data Lineage tab, the dataset list on the left displays the datasets. When clicking on a dataset, the upper section of the right panel will show its data lineage diagram, while the lower section of the panel will display the objects influenced by the dataset.

Data Lineage Diagram

The data lineage diagram displays the selected dataset, the datasets that the selected dataset depends on, and the datasets generated by the selected dataset. The selected dataset will be highlighted.

When hovering over each dataset card, a jump icon will appear in the upper right corner of the card, allowing users to click and navigate to the homepage of the current dataset.

Additionally, each dataset card will show the update time of the dataset, making it convenient for users to check whether the update scheduling plan between datasets is appropriate and whether the datasets have been updated in a timely manner. If a dataset update fails, a failure icon will be displayed before the update time.

Affected Objects

Affected objects will list the charts, datasets, dataset integrations, and data models impacted by the current dataset. Clicking on the relevant module will display the affected content of the dataset, and you can directly click to navigate to the corresponding module. As shown in the illustration, clicking on the affected object "Datasets Fusion" in the dataset allows direct navigation to the corresponding dataset.

Chart

The chart relationships displayed in data lineage include the following scenarios:

  • Charts directly created using the dataset.

  • Charts created using fields from downstream datasets that are derived from this dataset. For example: Dataset A and Dataset B are linked to generate Dataset C, and then a chart D is created using fields from Dataset C (fields derived from Dataset B). In this case, Dataset B will affect Chart D, while Dataset A will not affect Chart D.

  • Charts created using datasets that reuse this dataset within a data model. For instance, in a data model, Dataset A is dragged in twice, and the second instance is named A(2). Charts created using A(2) will also appear in the impact charts of Dataset A.

Dataset

The dataset list displayed in data lineage only includes the downstream datasets of the dataset, i.e., datasets directly or indirectly involved in its generation.

Data Integration

The data integration module displayed in the data lineage lists the integration projects where this dataset serves as an input node.

Data Model

The data model displayed in data lineage only lists the data model as an associated table (child table) and does not list the data model as a model table. This is because it will definitely affect itself, so there is no need to list it here.

Permission Control

Data lineage displays the full link of the dataset and the affected objects, which are not intended to be visible to app viewers/tenant users.

The purpose of data lineage design is to enable dataset managers to better design datasets based on data lineage and see the affected objects when modifying datasets. For viewers, these management functions are unnecessary to see, so data lineage is not visible to app viewers and tenant users.

  • Data lineage is accessible to the following users: owners and collaborators of apps in the personal space of app creation, managers and editors of apps in the team space of app creation, and managers and editors of apps in the dataset marketplace.

  • Data lineage is not visible to app viewers and tenant users because it is not desirable for viewers and tenant users to see the full link of the dataset and all affected objects.

User Manual for Hengshi Analysis Platform