Big Data - Data Warehouse - Upsert Table In Data Warehouse(Part 1)

Hi guys ! It’s me again :laughing: . As scheduled, today we will go together in a series on data warehosue. In the previous section, we learned about the OLAP. In this section, I want to share about my process as well as my challenges and solutions in the upsert table in Data Warehouse
Note :Those who have not read the previous section can review follow this link:
Big Data - Data Warehouse - OLAP

Yooo! :love_you_gesture: :love_you_gesture: :love_you_gesture:

As we have learned in the previous sections, the database in OLTP is highly standardized (very high), while the database in DW is designed to reduce (a lot) normalization, at the same time. OLAP cubes are pre-computed blocks with data in the DW.
Sample database for us to test: AdventureWorks sample databases

I recommend that you use two databases for the most complete hands-on experience

One is for the original database and the other for the database in DW.

After downloading and restoring these databases, the 1st thing we need to do is explore the structure of the tables

We can see that tables like these in OLTP are divided into 6 main groups

  • [HumanResources]
  • [Person]
  • [Production]
  • [Sales]
  • [Purchasing]
  • Other

More than 70 tables in the database :slightly_smiling_face: :slightly_smiling_face: :slightly_smiling_face:

Keep calm and continue exploring the database tables in the data warehouse !

In DW database we have Dim tables and Fact tables and maybe a few other tables

Let’s see the details of the fields of the fact table that we need to upsert
[dbo].[FactInternetSales]

Note: Keep it mind that the more keys a fact table has, the longer it may take to upsert the table.

As we can see, the key fields in the Fact table are divided into groups

  • [Sales]
  • [Customer]
  • [Product]
  • [Date]

Here we have a rough look at the source and target database :smiley:
We will go to the next very important part, probably the most important part after code only :slight_smile:

The 2nd thing : Find out the mapping between fields in [dbo].[FactInternetSales] with tables in OLTP.

This job is time consuming and requires you to understand the table structure of both OLTP and DW :love_you_gesture:

For ease, you should do a mapping like this

This advice I learned from @anon19898721 :innocent: :innocent: :innocent:

Obviously we need to prioritize the fill of the key fields first, the following fields we can completely update based on formula or based on key fields.

Note: When upserting, we need to pay attention to the time, so in DW we have a table [dbo]. [DimDate]

At this point, I am very optimistic to get started on coding :))).

To be able to fill in DW we need to create temporary tables. These temporary tables have the same structure as the one in DW that we want upsert, specifically here [dbo].[FactInternetSales].

These temporary tables make a lot of sense when upserting data into DW,

  • Insert data into the temporary table quickly
  • Use temporary tables to clean, transform data

The table in OLTP is the table that will match our table

and the table image

We need to join these 2 tables together

Prioritize to fill the Date key field first.

We have 6 Date fields in [dbo].[FactInternetSales]

  • [OrderDateKey]
  • [DueDateKey]
  • [ShipDateKey]
  • [OrderDate]
  • [DueDate]
  • [ShipDate]

Let’s take a look at the values of these fields

image
We can see that the [OrderDateKey], [DueDateKey] and [ShipDateKey] are created from fields [OrderDateKey], [DueDateKey] ,[ShipDateKey] .

Now let’s look to the OLTP side to see what we have
image

OK, everything looks pretty good :slight_smile:

Remember to upsert the data, we only choose the appropriate time period at OLTP. This time period must be greater than the time available in the DW that was most recently updated.

This step is not too difficult :v: :v: :v:

Lets’s continue!
We will fill in the [ProductKey] field.
[ProductKey] field in [dbo].[FactInternetSales] (DW) image
and
[ProductID] in [Sales].[SalesOrderDetail] (OLTP) image

It looks pretty easy, let’s insert :))
Oops . We have two problems

Issue: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression

This problem is encountered when inserting into the staging table. Hmm…:thinking: :thinking: :thinking:
I made that fundamental mistake : HAVE NOT checked the [Status] in [dbo].[DimProduct] table. Like this case

image

Solve : We select [ProductKey] where [Status] is ‘Current’.

and

Issue: Cannot insert the value NULL into column [ProductKey]

This problem I have when upsert [dbo].[FactInternetSales] from the staging table, I didn’t know why :no_mouth: :no_mouth: :no_mouth:

A long a long time after :)), I realize that the [dbo].[DimProduct] table does not yet have those products, and therefore we need to update the [dbo].[DimProduct] table

Solve : Update the [dbo].[DimProduct] table.

So we have 4 key fields :v:

Since there are many problems that I have encountered, in the next section I will share the difficulties as well as how to solve them with the remaining keys.

So that’s all for this section ! :smiley:
See you soon in the next section Big Data - Data Warehouse - Upsert Table In Data Warehouse(Part 2) ! :love_you_gesture: :love_you_gesture: :love_you_gesture: :love_you_gesture:

4 Likes

@anon60116140 @Mike : intern mới xịn quá anh chị nhỉ :slight_smile: :rofl: :rofl: :rofl:

1 Like