Skip to content

Data Lineage

Data Lineage is the lifecycle of data, showcasing the transformation process from data source to data target. When the dataset processing path is long, data lineage provides a visual representation of the dataset processing path, allowing for the analysis of upstream datasets that the dataset depends on and downstream datasets that it is associated with. When a dataset is updated, data lineage ensures that the upstream datasets of the dataset have been updated, guaranteeing data timeliness. Data lineage also displays the objects affected by the dataset, including charts, datasets, data integrations, and data models.

The data concepts involved in data lineage include data source, data target, and transformation relationships.

In the data lineage tab, the left dataset list displays datasets. When you click on one of the datasets, the upper right panel will display its data lineage diagram, and the lower panel will display the objects affected by the dataset.

Data Lineage Diagram

The data lineage diagram will display the selected dataset, the datasets it depends on, and the datasets it participates in generating. The selected dataset will be highlighted.

When hovering over each dataset card, a jump icon will appear in the upper right corner of the card, allowing you to click to jump to the homepage of the current dataset.

At the same time, the update time of each dataset will be displayed on the card, facilitating users to check whether the update scheduling plan between datasets is appropriate and whether the datasets have been updated in time. When a dataset update fails, a failure icon will be displayed before the update time.

Affected Objects

The affected objects will list the charts, datasets, data integrations, and data models affected by the current dataset. Clicking on the relevant module will display the content affected by the dataset, and you can directly click to jump to the corresponding module. As shown, clicking on the data lineage project in data integration can directly jump to the data integration project.

Charts

The chart relationships displayed in data lineage include the following cases:

  • Charts directly created using the dataset.
  • Charts created using fields from downstream datasets that use fields from the dataset, for example: Dataset A and Dataset B are associated to generate Dataset C, and then a chart D is created using fields from Dataset C (fields derived from Dataset B), then Dataset B will affect Chart D, and Dataset A will not affect Chart D.
  • In the data model, charts created using datasets that reuse the dataset, for example: in the data model, Dataset A is dragged in twice, the second drag-in dataset is A(2), then the charts created using A(2) will also appear in the affected charts of A.

Datasets

The dataset list displayed in data lineage only lists the downstream datasets of the dataset, i.e., datasets directly or indirectly involved in generating.

Data Integrations

The data integration module displayed in data lineage lists the integration projects where the dataset serves as an input node.

Data Models

The data models displayed in data lineage only list those where it serves as an associated table (child table), not those where it serves as a model table, because it will certainly affect itself, so there is no need to list it here.

Permission Control

Data lineage displays the full chain of datasets and the affected objects, which are not intended to be seen by application viewers/tenant users.

The purpose of data lineage is to allow dataset managers to better design datasets based on data lineage and see the affected objects when modifying datasets. For viewers, these management functions are unnecessary to see, so data lineage is not visible to application viewers and tenant users.

  • Data lineage is open to the following users: Owners and collaborators of applications in the personal space of application creation, managers and editors of applications in the team space of application creation, managers and editors of applications in the dataset marketplace.
  • Data lineage is not visible to application viewers and tenant users, as it is not intended for them to see the full chain of datasets and all affected objects.

HENGSHI SENSE Platform User Manual