Analysis and Research for a data warehouse system
Data warehousing is a difficult system and has to have the capability deliver quality data. An operational database is one which is used by organizations to run its day to day database activities. They are designed to handle rapid transaction processes with systematically updates. Velocity is important to operational databases. They are most commonly operated by office staff, and are on the order of megabytes of data to gigabytes. Database consistency checks and constraints are rigidly enforced. They contain the latest technology necessary to operate organizational functions.
A data warehouse is different in several ways. They are used by management for making decisions, following trends, and pulling reports. They are typically used offline, have minimal users and are enormous: gigabytes to terabytes. They contain decades of data, which are read only, and added to but never updated. The data in the data warehouse is time sensitive - each row in the warehouse is time stamped so that trending of data versus time can be done. The kinds of queries that are run against data warehouses are difficult. These are decisions support databases that are used to make strategic decisions about the organization.
Businesses have data warehouses in place to attain knowledge about latest fads in organization data that affect the business strategically. This type of analysis and reporting is called OLAP: on line analytical processing. Management uses OLAP tools on data warehouse to run reports and make determinations. This would be impossible to do with an operational data store, since operational data store contains data that is only true at the current time. For example, an operational database which points out inventory would only point out inventory that is in stock currently, not what was in stock last week. Data warehouses would contain snippets of the operational data for many instances of time, and so would show inventory both as of now and as of last month.
There are two terms that needs to be acknowledged: data warehouse and data mart. A data warehouse is used to house all the data that an organization would need to make determinations about any part of itself. A data mart houses only a fraction of the data that the organization uses. Data marts can be either implemented as a subset or a building block for a data warehouse, or an independent entity.
There are two methods to consider when we think of data warehouse: how do we get data from the operational databases into the data warehouse, and how do we design the data warehouse itself.
Getting data from the operational database to the data warehouse occurs via a process called ETL: extraction, translation and loading. Data are extracted periodically from the operational database into a temporary holding area, they are transformed – time stamped and converted from the data definition of the operational...