Skip to content

Relationship Description

This article explains the description of relationships based on the "coffee shop business scenario." The following two tables are the datasets involved.

Dataset 1: Coffee Bean Inventory Table (coffee_beans)

bean_idbean_namestock_kg
1Colombia50
2Ethiopia30
3Brazil20

Dataset 2: Coffee Sales Table (sales)

sale_idbean_idsale_kg
101110
10225
10348

Association Types

Association types refer to the types of joins used when associating datasets, including left join (LEFT JOIN), inner join (INNER JOIN), right join (RIGHT JOIN), and full join (FULL JOIN).

Left Join (LEFT JOIN)

Retains all records from the left dataset, displaying NULL for unmatched records from the right side. The result of the left join between the two datasets above is: all coffee bean inventory + matching sales records (if inventory is unsold, it shows NULL). Left join result:

bean_idbean_namestock_kgsale_idbean_idsale_kg
1Colombia50101110
2Ethiopia3010225
3Brazil20NULLNULLNULL

Inner Join (INNER JOIN)

Retains records that exist in both datasets. The result of the inner join between the two datasets above is: only coffee beans that are both in stock and have been sold are retained. Inner join result:

bean_idbean_namestock_kgsale_idbean_idsale_kg
1Colombia50101110
2Ethiopia3010225

RIGHT JOIN

Retains all records from the right dataset, displaying NULL for unmatched records from the left. The result of the right join between the two datasets above is: all sales records + matching inventory (if a sale was not stocked, it shows NULL).

Right join result:

bean_idbean_namestock_kgsale_idbean_idsale_kg
1Colombia50101110
2Ethiopia3010225
NULLNULLNULL10348

Full Join (FULL JOIN)

Retains all results from both datasets, with unmatched parts filled with NULL. The full join result of the two datasets above is: all inventory and sales records (unmatched parts are filled with NULL). Full join result:

bean_idbean_namestock_kgsale_idbean_idsale_kg
1Colombia50101110
2Ethiopia3010225
3Brazil20NULLNULLNULL
NULLNULLNULL10348

Join Conditions

Join conditions are expressions used in join operations to specify the relationships between datasets. Typically, join conditions are based on columns within the datasets, and by comparing the values in these columns, it is determined which rows should be joined together. For example, in the aforementioned "Coffee Shop Business Scenario," there are two datasets: the "Coffee Bean Inventory Table" contains a bean_id column, and the "Coffee Sales Table" also contains a bean_id column. By using this common bean_id column, a relationship can be established between the two datasets. Correctly understanding and using join conditions is key to efficient model queries.

Single-Column Association

The association condition between two datasets can be described as the equality of a single column. In the example of the "Coffee Shop Business Scenario," the association condition can be written as: {{Coffee Bean Inventory}}.{bean_id} = {{Coffee Sales}}.{bean_id}. In the HENGSHI SENSE data model interface, users can configure this association condition by selecting columns on the Simple Condition page, or by writing expressions on the Expression Condition page.

Multi-Column Join

The join condition between two datasets may require equality on multiple columns. For example, consider the following two datasets: Coffee Orders Table and Coffee Inventory Table. When we want to query the inventory status of the coffee involved in each order, we need to join based on both the coffee type (coffee_type) and roast level (roast_level) columns. The join condition should be written as: {{coffee_orders}}.{coffee_type} = {{coffee_inventory}}.{coffee_type} AND {{coffee_orders}}.{roast_level} = {{coffee_inventory}}.{roast_level}. In the HENGSHI SENSE data model interface, users can configure this join condition by selecting columns on the Simple Condition page, or by writing an expression on the Expression Condition page.

Dataset 3: Coffee Orders Table (coffee_orders)

coffee_typeroast_levelquantity
ColombiaMedium Roast5
EthiopiaLight Roast3
BrazilDark Roast7

Dataset 4: Coffee Inventory Table (coffee_inventory)

coffee_typeroast_levelavailable_quantity
ColombiaMedium Roast32.5
EthiopiaLight Roast18.2
BrazilDark Roast12.8
GuatemalaMedium Roast9.5

According to the above join condition, select the left join type: the Coffee Orders Table is left joined with the Coffee Inventory Table, and the resulting data is as follows:

coffee_typeroast_levelquantityavailable_quantity
ColombiaMedium Roast532.5
EthiopiaLight Roast318.2
BrazilDark Roast712.8

Non-Equi Join

In addition to using "equals" for equi joins, you can also use other comparison operators or nested functions to perform non-equi joins. For example, to find coffee products that fall within the price range acceptable to customers, a non-equi join is required. The join condition is: {{coffee_products}}.{price} >= {{customer_preferences}}.{min_price} AND {{coffee_products}}.{price} <= {{customer_preferences}}.{max_price}. In the HENGSHI SENSE data model interface, users can only configure this join condition by writing an expression on the Expression Condition page.

Dataset 5: Coffee Products Table (coffee_products)

product_idproduct_nameprice
CP001Colombia Special58
CP002Ethiopia Gesha85
CP003Brazil Yellow Bourbon42

Dataset 6: Customer Preferences Table (customer_preferences)

customer_idmin_pricemax_price
CU10013080
CU100250120
CU10032060

According to the above join condition, select the inner join type. When the Coffee Products Table is inner joined with the Customer Preferences Table, the resulting data is as follows:

product_idproduct_namepricecustomer_idmin_pricemax_price
CP001Colombia Special58CU10013080
CP001Colombia Special58CU100250120
CP001Colombia Special58CU10032060
CP002Ethiopia Gesha85CU100250120
CP003Brazil Yellow Bourbon42CU10013080
CP003Brazil Yellow Bourbon42CU10032060

Association Cardinality

The association cardinality of a relationship reflects the data characteristics of the related columns on the "from" and "to" sides. The "one" side indicates that each row in the column contains a unique value, while the "many" side means that multiple rows in the column can contain duplicate values. The association condition is the key basis for determining cardinality. There are four types of association cardinality: one-to-one, one-to-many, many-to-one, and many-to-many. If the relationship from Dataset A to Dataset B is one-to-many, it means that introducing Dataset B will cause data expansion in Dataset A.

Determining the Cardinality of Relationships

Cardinality Determination under Simple Join Conditions

When the join condition is a simple condition, we can define the relationship cardinality by checking whether the values in the relevant columns are unique. The following example uses the Department Member table and the User Information datasets for illustration. The join condition between these two datasets is {{Department Member}}.{user_id} = {{User Information}}.{id}. In the Department Member dataset, there are multiple rows containing the same "user_id", which indicates that the Department Member dataset should be defined as the "many" side of the relationship. In the User Information dataset, each row's "id" is unique, so the User Information table should be defined as the "one" side. The cardinality between Department Member and User Information is determined to be "many-to-one".

Dataset 7: Department Member (depart_member)

l1l2user_id
SalesSales Dept 11
SalesSales Dept 22
SalesSales Dept 33
SalesMarketing1
SalesKey Account2
MarketingMarketing1

Dataset 8: User Information (user)

idname
1jack
2tom
3amy

Cardinality Determination Under Complex Join Conditions

When the join condition is complex, we can assist in determining cardinality by adding a virtual column. Take the Sales Orders table and the Month Info datasets as an example. The join condition between these two datasets is trunc_month({{Sales Orders}}.{day}) = {{Month Info}}.{month}, where the trunc_month function is used to get the first day of the month for a given variable. We add a virtual column trunc_month({{Sales Orders}}.{day}) to the Sales Orders table. If multiple rows have the same value in this column, it means the "Sales Orders" table should be defined as the "many" side of the cardinality. In contrast, each row in the "Month Info" table has a unique "month" value, so the "Month Info" table should be defined as the "one" side. Therefore, the cardinality between Sales Orders and Month Info is determined as many-to-one.

Dataset 9: Sales Orders (orders)

dayorder_numtrunc_month({{Sales Orders}}.{day})
2023-01-012002023-01-01
2023-01-021502023-01-01
2023-01-043002023-01-01
2023-02-011002023-02-01
2023-02-05802023-02-01
2023-02-06902023-02-01
2023-01-012002023-01-01

Dataset 10: Month Info (date)

month
2023-01-01
2023-02-01
2023-03-01

Impact of Join Cardinality on Model Calculations

The "many" side in a join cardinality will cause data expansion on the other side, so setting different cardinality types in the model determines the execution logic of the join calculation. The core rules are as follows:

  1. The model table must always participate in calculations under any circumstances. All datasets along the path from the model table to the dataset used by the measure will also participate in the calculation. Whether the nodes after the dataset used by the measure participate in the calculation is determined by the join cardinality.
  2. To ensure the correctness of each measure, if introducing Measure B while calculating Measure A causes data expansion in the dataset where Measure A resides, then Measure A and Measure B should be calculated separately.

In the following explanation, assume the left table is the model table and the right table is the subordinate table. Different join cardinalities will result in different execution logics:

  1. One-to-One: This cardinality type represents an equal relationship and will not cause data expansion during the data join process. All calculations based on the left and right tables are founded on the join result of the left and right tables.
  2. One-to-Many: This cardinality type means that joining with the right table will cause data expansion in the left table. The calculation basis for measures related only to the left table is the left table itself, while the calculation basis for other measures is the join result of the left and right tables.
  3. Many-to-One: This cardinality type means that joining with the left table will cause data expansion in the right table. However, since the model table always participates in the calculation, all calculations based on the left and right tables are founded on the join result of the left and right tables.
  4. Many-to-Many: Both the left and right tables will experience data expansion due to the join, but the model table must always participate in the calculation. The calculation basis for measures related only to the left table is the left table itself, while the calculation basis for other measures is the join result of the left and right tables.

Activation Mechanism

The model contains a model table and multiple sub-tables. In scenario analysis based on this model, the model table will always be included in the calculation. However, whether the sub-tables are included in the calculation depends on the activation mechanism of the sub-tables. The activation mechanism determines how datasets collaborate with each other during the calculation process. There are two types of activation mechanisms: associate on demand and always associate.

On-Demand Association

On-demand association is the default activation mechanism for relationships. Its core feature is that the right table only participates in the calculation process when its fields are used in the analysis. This mechanism embodies the concept of "on-demand invocation," allowing the right table data to be flexibly introduced based on actual calculation needs, thereby avoiding unnecessary resource consumption.

Always Associate

The "Always Associate" mechanism can only be set when the cardinality of the relationship is "one-to-one" or "many-to-one." The rule is that as long as the left dataset is used in the calculation process, the right dataset will automatically participate in the calculation. This mechanism ensures that the data on the right always works in conjunction with the data on the left. This feature is typically used in scenarios where datasets are used to restrict data permissions.

Filter Conditions

Filter conditions after model association refer to the filtering rules set after completing the association operations between data models. When data models are associated, different structured datasets are combined based on their relationships, which may result in a large amount of data. Some of this data may not be necessary for the current specific analysis scenario. At this point, filter conditions are needed to screen the associated data.

Enabling the Ring Model

There are two options for enabling the ring model feature, which determine whether the current child table can have multiple upstream datasets. Independent Association means the current child table can only have one upstream dataset, while Shared Association means the current child table can have multiple upstream datasets. Enabling Shared Association is typically used in scenarios with multi-dimensional tables and multiple fact tables.

In the scenario shown below, the user dataset serves as user information and is associated with the app, dataset, and dashboard datasets. This allows for simultaneous analysis of the app, dataset, and dashboard datasets from the user dimension.

Limitations of the ring model:

  1. The association cardinality between other tables and the child table with "Shared Association" can be many-to-one (many:1) or one-to-one (1:1).
  2. The relationship path starts from the model table and ends at the child table with "Shared Association"; no other child tables can follow after the child table with "Shared Association".
  3. The child table with "Shared Association" does not support renaming.
  4. In a single model, the same dataset can only exist as a "Shared Association" child table once.

User Manual for Hengshi Analysis Platform