The Recent Advances In Data Warehouse Information Technology Essay
A data warehouse is a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes and business intelligence (Inmon and Hackathorn, 1994). The meaning of each of the key terms in this definition follows:
• Subject-oriented:
A data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects may include customers, patients, students, products, and time.
• Integrated:
The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. This means that the data
store house holds the one version of “the truth.”
Time-variant
Data in the data warehouse contain a time dimension so that they may be use to study trends and changes.
• Non-updatable
Data in the data warehouse are loaded and refreshed from operational systems, but cannot be update by end users.
A data warehouse is not just a consolidation of all the operational databases in an organization. Because of its focus on business intelligence, external data, and time-variant data (not just current status), a data warehouse is a unique kind of database. Data warehousing is the process whereby organizations create and maintain data warehouses and extract meaning and inform decision making from their informational assets through these data warehouses. Since its beginnings about 18 years ago, data warehousing has evolved so rapidly that data warehousing is now one of the hottest topics in information systems. A 1996 study of 62 data warehousing projects showed an average return on investment of 321 percent, with an average payback period of 2.73 years. Studies have also shown that approximately 40 percent of data warehousing projects fail, primarily due to insufficient attention to organizational issues of data ownership and definition within an organization
A Brief History
Data warehousing emerged because of advances in the field of information systems over several decades. Some key advances were the following:
• Improvements in database technology, particularly the development of the relational data model and relational database management systems (RDBMSs)
• Advances in computer hardware, particularly the emergence of affordable mass storage and parallel computer architectures
• The emergence of end-user computing, facilitated by powerful, intuitive computer interfaces and tools
• Advances in middleware products that enable enterprise database connectivity across heterogeneous platforms (Hackathorn, 1993)
The key discovery that triggered the development of data warehousing was the recognition (and subsequent definition) of the fundamental differences between operational (or transaction processing) systems (sometimes called systems of record because their role is to keep the official, legal record of the organization) and informational (or decision-support) systems. In 1988, Devlin and Murphy (1988) published the first article describing the architecture of a data warehouse, based on this distinction. In 1992, Inmon published the first book describing data warehousing and has subsequently become one of the most prolific authors in this field.
The Need for Data Warehousing
Two major factors drive the need for data warehousing in most organizations today:
1. A business requires an integrated, company-wide view of high-quality information.
2. The information systems department must separate informational from operational systems to improve performance dramatically in managing company data.
Need for a Company-wide View
Data in operational systems are typically fragmented and inconsistent. They distribute on a variety of incompatible hardware and software platforms. For example, one file containing customer data may be located on a UNIX-based server running an Oracle DBMS, whereas another is located on an IBM mainframe running the DB2 DBMS. Yet, for decision-making purposes, it is often necessary to provide a single, corporate view of that information.
While consolidating all data into a single file format, some of the issues that you must resolve are as follows:
• Inconsistent key structures
• Synonyms
• Free-form fields versus structured fields
• Inconsistent data values
• Missing data
Why do organizations need to bring data together from various systems of record?
Ultimately, of course, the reason is to be more profitable, to be more competitive, or to grow by adding value for customers. This can be accomplish by increasing the speed and flexibility of decision making, improving business processes, or gaining a clearer understanding of customer behavior.
Operational system: A system that is use to run a business in real time, based on current data, also called system of record. related to the health of students, or whether poor academic performers cost more to support, for example, due to increased health care as well as other costs. In general, certain trends in organizations encourage the need for data warehousing; these trends include the following:
• No single system of record Almost no organization has one database.
Because of the heterogeneous needs for data in different operational settings, because of corporate mergers and acquisitions, and due to the sheer size of many organizations, multiple operational databases exist.
• Multiple systems are not synchronize
It is difficult, if not impossible, to make separate databases consistent. Even if the metadata are in control and made the same by one data administrator, the data values for the same attributes will not agree. This is because of different update cycles and separate places where the same data are capture for each system. Thus, to get one view of the organization, the data from the separate systems must be periodically consolidated and synchronize into one additional database. We will see that there can be actually two such consolidated databases-one called an operational data store and the other called an enterprise data warehouse, both of which we include under the topic of data warehousing.
• Organizations want to analyze the activities in a balanced way
Many organizations have implemented some form of a balanced scorecard-metrics that show organization results in financial, human, customer satisfaction, product quality, and other terms simultaneously. To ensure that this multidimensional view of the organization shows consistent results, a data warehouse is necessary. When questions arise in the balanced scorecard, analytical software working with the data warehouse can be used to “drill down,” “slice and dice,” visualize, and in other ways mine business intelligence.
• Customer relationship management
Organizations in all sectors are realizing that there is value in having a total picture of their interactions with customers across all touch points. Different touch points (e.g., for a bank, these touch points include ATM, online banking, teller, electronic funds transfers, investment portfolio management, and loans) are supported by separate operational systems. Thus,
without a data warehouse, a teller may not know to try to cross-sell a customer one of the bank’s mutual funds if a large, atypical automatic deposit transaction appears on the teller’s screen. A total picture of the activity with a given customer requires a consolidation of data from various operational systems.
• Supplier relationship management
Managing the supply chain has also become a critical element in reducing costs and raising product quality for many organizations. Organizations want to create strategic supplier partnerships based on a total picture of their activities with suppliers, from billing, to meeting delivery dates, to quality control, to pricing, to support. Data about these different activities can be lock inside separate operational systems (e.g., accounts payable, shipping and receiving, production scheduling, and maintenance). ERP systems have improved this situation by bringing many of these data into one database. However, ERP systems tend to be design to optimize operational, not informational or analytical, processing, which we discuss next.
Need to Separate Operational and Informational Systems
An operational system is a system that is use to run a business in real time, based on current data. Examples of operational systems are sales order processing, reservation systems, and patient registration. Operational systems must process large volumes of relatively simple read/write transactions, while providing fast response. Operational systems refer as systems of record.
Informational systems are design to support decision-making based on historical point-in-time and prediction data. They are also design for complex queries or data-mining applications. Examples of informational systems are sales trend analysis, customer segmentation, and human resources planning.
Operational systems: used by clerks, administrators, salespersons, and others who must process business transactions? Informational systems are used by managers, executives, business analysts, and (increasingly) by customers who are searching for status information or who are decision makers.
The need to separate operational and informational systems is based on three
Primary factors:
1. A data warehouse centralizes data that are scattering throughout disparate operational systems and makes them readily available for decision support applications.
2. A properly designed data warehouse adds value to data by improving their quality and consistency.
3. A separate data warehouse eliminates much of the contention for resources that results when informational applications are confounded with operational processing.
For a manager the decision-making part is the most important task. It is his quick decision making which affects the organization most. Decision-making founds its base through information warehousing. Current technological progressions in information warehousing have been adding to the appearance of business cleverness helpful for management decision-making. One recent advantage in database management is data warehousing, whereby copies of all the databases in a firm are preserved in one place and reachable to staffs at any place. Simply stated, a data warehouse is a collection of data that supports management decision-making. Typically, a data warehouse is house on an enterprise mainframe server. It is a middle warehouse for the entire or important parts of the information that a firm numerous business systems collect.
Data warehouse finds its use in reporting. The key foundation of the information is fresh, altered, catalogued and made offered for utilization by administrators and other trade experts for data withdrawal and online logical processing, marketplace investigation and decision carriage. The thought and aggressive Analytics of the latest movements, knowledge, escalation in business intellect and information warehousing provides a large array of knowledge to the managers helpful in decision-making. The various applications of data warehousing makes it an integral part of management.
The notion of information warehousing dates rear to late 1980s while IBM analysts Barry Devlin and Paul Murphy created the “business-info warehouse.” Initially, data-warehousing concept had projected to offer an architectural replica for the stream of information from functional scheme to decision carry surroundings.
Record warehouses have base on multidimensional models. Data warehouses are important for decision-making, it also support corporate project such as performance management, B2N and B2C management, customer relationship management.
Significance, Description and Development of Data warehousing
The data warehouse is where information is to collected, sorted and stored centrally. Data warehousing is a mechanism for storing and distributing information. It describes the process of defining, populating and using a data warehouse. This process emphasizes the capture of the data from diverse sources for useful analysis and access. Data mining and Micro marketing are its ways in which information has utilized.
Data warehousing
Dissemination of Information
Executives and other company employees
Channel Partners
Micromarketing
Data Mining
The diagram shows the interplay of the data ware housing with data mining and micro marketing
2.1 Components of data warehousing
The data warehouse : where data are physically stored
Software :to copy original data bases and transfer them to warehousing
Interactive software: to process enquiries
Directory: for the categories of information kept in warehouse.
Kambayashi, Yahiko & Lee, Dik Lun(1998) Advances in Database Technologies: ER’98 workshops on data warehousing & data mining & mobile data access, and collaborative work support.
Data storage system
Customers
Warehouse data derives from the data contained in operational systems. Operational data sources connect to the wrappers/monitors, and functions to select, clean, and transform data. Monitoring the changes in source data, it propagates them to integrator. The integrator’s job is to combine data selected from data sources. After the process of integration, data propagate into warehouse storage.
There are two approaches to create the warehouse data and those are:
Bottom-up Approach
Top-down Approach
The bottom-up approach considered as the most feasible and useful in enhancing the performance of the system because the data has obtained from the primary sources based on the data warehouse approaches. These are in knowledge in advance and then the data is selected, transformed and integrated by data acquisition tools, where as in top-down approach the data has obtained from the primary sources whenever a question is at pose. The bottom-up approach has used for answering queries immediately and analyzing data efficiently as it is always present in the warehouse. An additional approach is hybrid approach; it is the combination of aspects of the bottom-up method and top-down methods. Data that is stored in warehouse and the data obtained from primary source are used.
The metadata contains the informational data about the formation, management and treatment of the data warehouse serving as a bridge between the users of the warehouse and the data contained in it. OLAP server is the server used to access the warehouse data to present in a multi- dimensional way to the frontend tools, done by interpreting the clients’ queries and converting them into complex SQL queries, which is required in accessing the warehouse data. It may also access the data from the primary sources if the client’s queries need operational data. Lastly, OLAP server passes the multi dimensional views of data to the front-end tools, and these tools format the data accordingly to the client’s requirements.
Kambayashi, Yahiko & Lee, Dik Lun(1998) Advances in Database Technologies: ER’98 workshops on data warehousing & data mining & mobile data access, & collaborative work support.
2.2 View Maintenance:
A data warehouse stores integrated information from multiple data sources in materialized views (MV) over the source data. The data sources may be heterogeneous, distributed and autonomous. When the data in any source (base data) changes, the MVs at the DW has updated and this procedure of informing a materialized analysis in reply to the modification in the underlying source data called View Maintenance. This view maintenance gives rise to inconsistencies, as there is a finite unpredictable amount of time required:
Propagating changes from the DS to the DW and
Computing view updates in response to these changes.
The inconsistencies at the data warehouse happen because of the changes that take place at the data sources, which are random and dynamic.
2.3 Recent development in data warehouses
Recent development in more frequent update
Update for the users can be download in bulk and drop modes to see the other data.
Business requirements, such as trading partner when they access to web site, they provided by current data.
For international firms, there is no good time to load warehouse. So they provide frequent data flow
Recent development in click stream data
Result from clicks at web site
A dialog manager handles user communications.
The click stream data has filtered, parsed, and sent to a data warehouse where it has analyzed.
Software is available to analyze the click stream data
2.5 The main users of the data warehouses are-
Analysts:
Analysts do analyze data in order to get the relevant information from the existing records so that they could generate their viewpoints on that basis.
Managers:
They are involved into strategic decisions for any organization. They do need past data and current data to forecast the scenario. Hence, data ware houses do provide them all the relevant data that they require.
Executives:
Executives do require lot of data for report generation and record maintenance.
Operational personnel:
Any system does involve a number of processes to carry out various operations. For smooth running of any process, we do require supporting data, which we get from databases.
Customers and suppliers:
Customers and suppliers do require various related to the product ingredients or usage and many other purposes. Data warehouses do help them with the entire requirement.
Taniar, David (2009) Progressive Methods in data warehousing & business intelligence, concepts & competitive analytics, advances in data warehousing & data mining.
Conclusion:
In this assignment, we have discussed all the techniques of integrating data to help in effective decision-making. The data-warehousing phenomenon, which has grown efficiently in the last few years, presents new challenges every day. In this assignment, we have discussed the recent advances in data warehousing. Further, we have exploited the multidimensionality of data in the warehouse to introduce the concept of constraints and data cube. All the topics discussed so far is with a view of invoking keen interest in this field of data warehousing.
Order Now