Big Data - Data Warehouse - Architecture

Hi guys ! It’s me again :smiley:. As scheduled, today we will go together in a series on data warehosue. In the previous section, we took a look at the basics of data warehouse. In this section, we will learn about the basic architecture of the data warehouse

Note:Those who have not read the previous section can review follow this link:
Big Data - Data Warehouse- Things to know

Yooo :v:

Basically, we can divide the data warehosue architecture into 3 types

  • Data Warehouse Architecture: Basic
  • Data Warehouse Architecture: With Staging Area
  • Data Warehouse Architecture: With Staging Area and Data Marts

Now, we will learn about Data Warehouse Architecture with Staging Area and Data Marts also known as Three-tier Data Warehouse Architecture.

A picture is worth more than a thousand words


Looking at the picture above, you can see that the three-tier architecture of the data warehouse includes:

  1. Bottom Tier (Data Warehouse Server)
  2. Middle Tier (OLAP Server)
  3. Top Tier (Front end Tools).

We will go through each part in detail. Let’s focus!

  1. Bottom-tier

    On the bottom-tier we have Data Warehouse(DW) servers. These servers are usually an RDBMS. It may contain several specialized data mart and a metadata repo.
    A gateway is simply an application program used to extract data from data source. Gateway allows users to generate SQL code to be executed at the server. Some popular Gateway like JDBC, ODBC …

Note: When the amount of data is very large, JDBC tends to be inefficient

  1. Middle-tier
    Middle-tier OLAP servers need fast querying capabilities and are typically deployed in the ROLAP or MOLAP style.
  • First, ROLAP is the need to map operations on multidimensional data into standard relational operations.

    Come here, we may have question Why is it necessary? That’s because OLAP works with multidimensional cubes, we obviously have to map!

  • Second, the MOLAP model can directly implements the multidimensional data and operations.

Note: As if we don’t always choose MOLAP, choosing ROLAP or MOLAP depends, because each method has its own advantages and disadvantages.

  1. Top-tier

    The top layer is responsible for displaying the cunug results provided by OLAP, which has some tools like Data mining tools, or Reporting tools …

  2. Appendix
    Above, we see the concept “metadata repo”, so what is it?

    As the name suggests, it stores metadata. The metadata repo will contain information to define the Data Warehouse Object. It can be said that Metadata repo is the glue to connect components in a Data Warehouse system.

Metadata repo some important information such as:

  • Describe the structure of DW such as data warehouse schema, data mart location, …
  • Operational metadata: this data gives us the state history of the stored data such as active, archived, or purged, and monitor information about the data warehouse …
  • Information to map from the operational databases, the source RDBMSs as well as their content, there is also information about the rules for cleaning and transforming.
  • Information about owner, definition and terms of business

So that’s all for this section ! :smiley:

In the last section , we see the concept of OLAP. So what is it?. To answer this question, we will continue to learn about OLAP in the next section :ok_hand:

See you soon ! :love_you_gesture: :love_you_gesture: :love_you_gesture:


Ko thấy mentor @anon19898721 vào review bài sharing nhỉ :smiley:

1 Like