Data Warehousing
The proliferation of cheaper and cheaper computer storage and processing power has resulted in organizations amassing huge volumes of business data, at a much greater level of detail than was possible even five years ago. Such data can yield invaluable information to both management decision makers in particular and knowledge workers in general. Unfortunately the data may not be easily accessible to these data consumers for reasons which include combinations of the following:
- The schema under which the data are stored are too complicated for access by a non-specialist user; IT staff must write special programs or generate reports in response to queries (a business that most IT staff are trying to get out of). Special programs rarely work because generally a user does not know what s/he wants to query next until s/he sees the result of the previous query. In short, users want to be able to slice and dice data at will, and present the results in an informative way.
- The data are stored in multiple disparate computer systems. Even if end user query tools are available for some of the systems, it is impractical for users to learn how to use several tools. Combining results (e.g. headcounts from HR and fixed overheads from the GL) is nearly impossible.
- Running queries against production transactional systems causes degradation in performance of these systems.
- Running queries against transactional data stores (which are not optimized for this) can be excruciatingly slow.
A data warehouse is a separate, central data store. Data from transactional systems are loaded into this store. Before loading, the data are sometimes summarized and are often cleansed. This cleansing includes correcting some errors and inconsistencies between source systems and mapping disparate coding systems (e.g. disparate cost center codes between plants). The data schema of the data warehouse is designed and optimized for business analysis. It is more intuitive to end users so that end users can query the data with easy-to-use query tools, while not impacting the performance of transactional systems.
Simplifying the data schema in a data warehouse, and making it more visual is commonly done by using a dimensional data model where dimensions might include products (audio CDs, DVDs, video cassettes, audio cassettes etc.), store locations, time (Jan-99, Feb-99 etc) and so on. A breed of software tools, called OLAP tools, are rapidly gaining in popularity for managing dimensional data. OLAP tools use a multidimensional metaphor for data management and navigation.
A data mart is usually a central store of data with a specific purpose in mind. A good example is a financial data mart which holds an organization's budgeting, planning forecasting and financial reporting information. This type of data mart would most likely support multi-user read and write to accommodate, say, the distributed data collection of departmental budgets, prior to consolidation. A financial data mart can be visualized as sitting on top of a more homogeneous (and probably more detailed) data warehouse.
The question of whether to build your data warehouse first or (one or more data marts first) is the subject of some controversy among data warehousing industry gurus. While building a comprehensive data warehouse first has appeal, the implementation of the project can run into years. In contrast, creating a data mart may take only a few weeks and can be earning its keep with a comparably lower investment of time and money. Furthermore, creating a data mart can be a learning process for a larger data warehousing project.
Now, more than ever before, can almost all businesses afford to implement a data warehouse, and avail themselves of the business intelligence it provides. As a Microsoft Solution Provider, we are well placed to provide very cost-effective business intelligence solutions to a much broader audience.