Big Data - Data Warehouse - Upsert Table In Data Warehouse(Part 2)

Hi guys ! It’s me again :laughing: . As scheduled, today we will go together in a series on data warehosue. In the previous section, I shared about my process as well as my challenges and solutions in the upsert table in Data Warehouse. In this second part I will continue to share about the difficulties that I have not mentioned in the previous section.

Note :Those who have not read the previous section can review follow this link:
Big Data - Data Warehouse - Upsert Table In Data Warehouse(Part 1)

Yooo! :love_you_gesture: :love_you_gesture: :love_you_gesture:

In the previous section I explored the table structure of both OLTP and DW databases and filled out 4 fields.The fields that have been filled in are:

  • [ProductKey]
  • [OrderDateKey]
  • [DueDateKey]
  • [ShipDateKey]

Now i will fill the remaining fields that is

  • [CustomerKey]
  • [SalesOrderNumber]
  • [SalesOrderLineNumber]
  • [PromotionKey]
  • [CurrencyKey]
  • [SalesTerritoryKey]

These fields are also quite difficult to fill, especially the [CustomerKey] field.

Let’s start with [SalesOrderNumber] field.

image

We can see that this field is made up of 2 substring, one is ‘SO’, the other is converted from number. These numbers are automatically incremented starting at 100000.
Let’s see what we have on the OLTP side!

image

If only at a glance, you will think: "Oh, how easy it is, just insert based on the Date fields we have filled is already.

I also thought the same way =)).

Oops ! When inserting date data into DW, I encountered a very basic error

ERROR: duplicate key value violates unique constraint

Hmm…Why? I started checking again. And here is it

I start to select the Date fields included with the [SalesOrderNumber] field.
In [dbo].[FactInternetSales]

And in [Sales].[SalesOrderHeader] (OLTP)

And here I realized that when I was in DW, in [OrderDateKey] field with value ‘20140501’, the value in [SalesOrderNumber] filed was ‘SO100000’, while in OLTP with value ‘20110531’ the value of the[SalesOrderNumber] field was ‘SO4359’.

From here I can come up with a solution

Solve: I just incremented [SalesOrderNumber] field in [dbo].[FactInternetSales] every time a record was inserted

Note: Remember, when inserting, check carefully the field that we choose as a watermark. For simplicity use the familiar clause:

             SELECT  [watermark_column], [want_to_fill_column]
             FROM current_table 
             ORDER BY [want_to_fill_column] DESC

OK, Now let’s get to the field that takes me the most time and also the most headache, [CustomerKey] field. :innocent:

In [dbo].[FactInternetSales] (DW) image

and in [Sales].[SalesOrderHeader] image

This time with more experience, I continued to explore the tables [dbo].[DimCustomer] in DW and table [Sales].[Customer] in OLTP.

  • Table [dbo].[DimCustomer]

image

And

  • Table [Sales].[Customer]

image

The [CustomerKey] in DW and [CustomerID] OLTP do not match.

The first thing I think about is: why are there two fields [PersonID] and [StoreID] in table [Sales].[Customer]? What are the two fields in that table for ?

I go through all the rows with three fields [CustomerID], [PersonID] and [StoreID] and notice an interesting thing

image

  • There are some personID rows that have a value of null, the storeID has a value other than null
  • The majority of the PersionID rows have a value other than null and then storeID has a null value

So far, I have not guessed anything, I see that 2 fields are [PersonID] and [StoreID] are the foreign keys of this table, so let’s see which of these two fields are the primary keys of which table.

[PersonID] in the [Sales] table. [Customer] is the [BusinessEntityID] in the table [Person].[BusinessEntityAddress]

   and [Person]. [Person]



So we will map [CustomerID] field in table [Sales].[SalesOrderHeader] with table [dbo].[FactInternetSales] in DW based on [EmailAddress], [FirstName] , [MiddleName] and [LastName]

A long time later, I realized, maybe the system has updated so some new has a [StoreID] field to mark that these two [PersonID] value are different but the same person and are stored by the same a [StoreID] !
Like this example

image

Ok, it’s easier now. We just need to build a temporary table to map all [CustomerID] field values with corresponding field [PersonID] values with no null values.

Now I can insert the [CustomerKey] field.

The other key fields I have inserted do not take much effort, so I will not cover them here.

From the upsert table process into DW, I draw the following things

  • The values of fields in OLTP and DW may not match. Be careful with that!
  • Always append the watermark to other key fields when using the SELECT statement
  • Checks [State] fields such as state to uniquely identify a selected value
  • It may be necessary to insert Dim tables during upserting into DW.

OK, So that’s all for this section ! :smiley:

See you soon ! :love_you_gesture: :love_you_gesture: :love_you_gesture: :love_you_gesture:

2 Likes