Hi guys ! It’s me again . As scheduled, today we will go together in a series on data warehosue. In the previous section, I shared about my process as well as my challenges and solutions in the upsert table in Data Warehouse. In this second part I will continue to share about the difficulties that I have not mentioned in the previous section.
Note :Those who have not read the previous section can review follow this link:
Big Data - Data Warehouse - Upsert Table In Data Warehouse(Part 1)
In the previous section I explored the table structure of both OLTP and DW databases and filled out 4 fields.The fields that have been filled in are:
Now i will fill the remaining fields that is
These fields are also quite difficult to fill, especially the [CustomerKey] field.
Let’s start with [SalesOrderNumber] field.
We can see that this field is made up of 2 substring, one is ‘SO’, the other is converted from number. These numbers are automatically incremented starting at 100000.
Let’s see what we have on the OLTP side!
If only at a glance, you will think: "Oh, how easy it is, just insert based on the Date fields we have filled is already.
I also thought the same way =)).
Oops ! When inserting date data into DW, I encountered a very basic error
ERROR: duplicate key value violates unique constraint
Hmm…Why? I started checking again. And here is it
I start to select the Date fields included with the [SalesOrderNumber] field.
And in [Sales].[SalesOrderHeader] (OLTP)
And here I realized that when I was in DW, in [OrderDateKey] field with value ‘20140501’, the value in [SalesOrderNumber] filed was ‘SO100000’, while in OLTP with value ‘20110531’ the value of the[SalesOrderNumber] field was ‘SO4359’.
From here I can come up with a solution
Solve: I just incremented [SalesOrderNumber] field in [dbo].[FactInternetSales] every time a record was inserted
Note: Remember, when inserting, check carefully the field that we choose as a watermark. For simplicity use the familiar clause:
SELECT [watermark_column], [want_to_fill_column] FROM current_table ORDER BY [want_to_fill_column] DESC
OK, Now let’s get to the field that takes me the most time and also the most headache, [CustomerKey] field.
In [dbo].[FactInternetSales] (DW)
and in [Sales].[SalesOrderHeader]
This time with more experience, I continued to explore the tables [dbo].[DimCustomer] in DW and table [Sales].[Customer] in OLTP.
- Table [dbo].[DimCustomer]
- Table [Sales].[Customer]
The [CustomerKey] in DW and [CustomerID] OLTP do not match.
The first thing I think about is: why are there two fields [PersonID] and [StoreID] in table [Sales].[Customer]? What are the two fields in that table for ?
I go through all the rows with three fields [CustomerID], [PersonID] and [StoreID] and notice an interesting thing
- There are some personID rows that have a value of null, the storeID has a value other than null
- The majority of the PersionID rows have a value other than null and then storeID has a null value
So far, I have not guessed anything, I see that 2 fields are [PersonID] and [StoreID] are the foreign keys of this table, so let’s see which of these two fields are the primary keys of which table.
[PersonID] in the [Sales] table. [Customer] is the [BusinessEntityID] in the table [Person].[BusinessEntityAddress]
and [Person]. [Person]
So we will map [CustomerID] field in table [Sales].[SalesOrderHeader] with table [dbo].[FactInternetSales] in DW based on [EmailAddress], [FirstName] , [MiddleName] and [LastName]
A long time later, I realized, maybe the system has updated so some new has a [StoreID] field to mark that these two [PersonID] value are different but the same person and are stored by the same a [StoreID] !
Like this example
Ok, it’s easier now. We just need to build a temporary table to map all [CustomerID] field values with corresponding field [PersonID] values with no null values.
Now I can insert the [CustomerKey] field.
The other key fields I have inserted do not take much effort, so I will not cover them here.
From the upsert table process into DW, I draw the following things
- The values of fields in OLTP and DW may not match. Be careful with that!
- Always append the watermark to other key fields when using the SELECT statement
- Checks [State] fields such as state to uniquely identify a selected value
- It may be necessary to insert Dim tables during upserting into DW.
OK, So that’s all for this section !
See you soon !