Data Lineage
Data Lineage is the lifecycle of data, showcasing the transformation process from data source to data target. When the dataset processing path is long, data lineage provides a visual representation of the dataset processing path, allowing for the analysis of upstream datasets that the dataset depends on and downstream datasets that it is associated with. When a dataset is updated, data lineage ensures that the upstream datasets of the dataset have been updated, guaranteeing data timeliness. Data lineage also displays the objects affected by the dataset, including charts, datasets, data integrations, and data models.
The data concepts involved in data lineage include data source, data target, and transformation relationships.
- Data Source: Can be any type of dataset, but the root must be a local file dataset, SQL dataset, or direct connection dataset.
- Data Target: Fusion Dataset, Union Dataset, Aggregate Dataset, Pivot Dataset, Unpivot Dataset.
- Transformation Process: Fusion, Union, Aggregate, Pivot, Unpivot.
In the data lineage tab, the left dataset list displays datasets. When you click on one of the datasets, the upper right panel will display its data lineage diagram, and the lower panel will display the objects affected by the dataset.
Data Lineage Diagram
The data lineage diagram will display the selected dataset, the datasets it depends on, and the datasets it participates in generating. The selected dataset will be highlighted.
When hovering over each dataset card, a jump icon will appear in the upper right corner of the card, allowing you to click to jump to the homepage of the current dataset.
At the same time, the update time of each dataset will be displayed on the card, facilitating users to check whether the update scheduling plan between datasets is appropriate and whether the datasets have been updated in time. When a dataset update fails, a failure icon will be displayed before the update time.
Affected Objects
The affected objects will list the charts, datasets, data integrations, and data models affected by the current dataset. Clicking on the relevant module will display the content affected by the dataset, and you can directly click to jump to the corresponding module. As shown, clicking on the data lineage project in data integration can directly jump to the data integration project.
Charts
The chart relationships displayed in data lineage include the following cases:
- Charts directly created using the dataset.
- Charts created using fields from downstream datasets that use fields from the dataset, for example: Dataset A and Dataset B are associated to generate Dataset C, and then a chart D is created using fields from Dataset C (fields derived from Dataset B), then Dataset B will affect Chart D, and Dataset A will not affect Chart D.
- In the data model, charts created using datasets that reuse the dataset, for example: in the data model, Dataset A is dragged in twice, the second drag-in dataset is A(2), then the charts created using A(2) will also appear in the affected charts of A.
Datasets
The dataset list displayed in data lineage only lists the downstream datasets of the dataset, i.e., datasets directly or indirectly involved in generating.
Data Integrations
The data integration module displayed in data lineage lists the integration projects where the dataset serves as an input node.
Data Models
The data models displayed in data lineage only list those where it serves as an associated table (child table), not those where it serves as a model table, because it will certainly affect itself, so there is no need to list it here.
Permission Control
Data lineage displays the full chain of datasets and the affected objects, which are not intended to be seen by application viewers/tenant users.
The purpose of data lineage is to allow dataset managers to better design datasets based on data lineage and see the affected objects when modifying datasets. For viewers, these management functions are unnecessary to see, so data lineage is not visible to application viewers and tenant users.
- Data lineage is open to the following users: Owners and collaborators of applications in the personal space of application creation, managers and editors of applications in the team space of application creation, managers and editors of applications in the dataset marketplace.
- Data lineage is not visible to application viewers and tenant users, as it is not intended for them to see the full chain of datasets and all affected objects.