The concepts of data warehouse and data mining in organization

Introduction

In today real world, most of information and data has been managed or organized by using information technology and also information system. Information systems are now widely use in every industry to stored data and information for future use. Data warehouse and data mining are the common process that can be found in information technology field. Data warehouse are used to store a huge volume of data and data mining can be defined as a process of pull out patterns fromdata.

Data warehouse

Adata warehouseworks as an electronic storage area of an organization’s to stored data. Data warehouses are planned to assist in reporting and analysis for an organization. Retrieving and analyzing data, extracting, transforming and loading and managing data are also the fundamental components of a data warehousing. The data warehouse has specific characteristics that include the following:

1. Subject-Oriented

Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject.

2. Integrated

Data stored in a worldwide accepted method with constant measurements, naming conventions, physical characteristic and encoding structures.

3. Non-Volatile

Stable information that doesn’t change each time an operational process is executed. Information is consistent in any case of when the warehouse is accessed.

4. Time-Variant

Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse.

5. Process-Oriented

It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

6. Accessible

Provide easy access for information to end-users.

There are three Data Warehouse Models:

• Enterprise warehouse

– collects all of the information about subjects across the entire organization

• Data Mart

– a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart

• Virtual warehouse

– A set of views over operational databases .Only some of the possible summary views may be materialized

Data Warehouse Concepts

In data warehouse, there are several concepts that can be listed as valued to data ware housing and the value concepts as per below:

1. Dimensional Data Model- Dimensional data model is usually used in data warehousing systems. This section describes this modeling technique, and the two common schema types,star schemaandsnowflake schema. It is the most regularly used in data warehousing systems. 3rd normal form is different from it, regularly used for transactional (OLTP) type systems. There are few term that can be define regularly to understand dimensional data modeling:

Dimension: A category of information.

Read also  Online banking has shot up

For example, the time dimension.

Attribute: A unique level within a dimension.

For example, Month is an attribute in the Time Dimension.

Hierarchy: The specification of levels that represents relationship between different attributes within a dimension.

For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.

– Slowly Changing Dimension: This is a common issue facing data warehousing practioners. This section explains the problem, and describes the three ways of handling this problem with examples.

– Conceptual Data Model: A conceptual data model identifies the relationships between the different entities. character of conceptual data model including:

Includes the important entities and the relationships among them.

No specified attribute.

There is no specified primary key.

The figure below is an example of a conceptual data model.

Conceptual Data Model

From the figure above, we can see that the only information shown via the conceptual data model is the entities that describe the data and the relationships between those entities. No other information is shown through the conceptual data model.

Logical Data Model: Logical data models explain the data in as much detail as feasible, without look upon to how they will be corporeal apply in the database. Features of a logical data model include:

* Consist of all units, entities and relationships between them.

* All attributes for each unit are precise and specific.

* The primary key for each entity is particular precise.

* Foreign keys (keys recognize the relationship between different entities) are specified.

* Normalization transpires at this level.

The steps for scheming the logical data model are as follows:

1. Identify input keys for all entities.

2. Locate the relationships between different entities.

3. Discover all attributes for each entity.

4. Determine many-to-many relationships.

5. Normalization.

The figure below is an example of a logical data model.

Logical Data Model

The different between two conceptual data of the model from the diagram and the logical data as to be listed below:

* Primary keys are present, whereas in a theoretical data model, no primary key is present in a logical data model.

* All attributes are specified in an entity. No characteristic are specified in a conceptual data model also in a logical data model,

* In a conceptual data model, the relationships are basically set, not explicit, so we simply know that two entities are related, but we do not specify what attributes are used for this relationship. The relationships between entities are specified using primary keys and foreign keys in a logical data model.

– Physical Data Model

– Conceptual, Logical, and Physical Data Model: Altered or different levels of abstraction for a data model. This part compares and contrasts the three other types of data models.

Read also  Social exchange theory

– Data Integrity: What is data integrity and how it is obligatory and enforced in data warehousing.

– OLAP- stands for On-Line Analytical Processing. The first detonation to provide a definition to OLAP was by Dr. Codd, who proposed 12 rules for OLAP. Then, it was discovered that this particular white paper was support by one of the OLAP tool vendors, thus causing it to drop objectivity. The OLAP Report has proposed the FASMI test, Fast Analysis of Shared Multidimensional Information.

– Bill Inmon vs. Ralph Kimball: These two data warehousing heavyweights have a different outlook of the role between data warehouse and data mart. In the data warehousing field, we frequently attend to about discussions on where a person / organization’s viewpoint falls into Bill Inmon’s camp or into Ralph Kimball’s camp. We describe below the difference between the two.

Bill Inmon’s paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

Ralph Kimball’s paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

– http://www.1keydata.com/datawarehousing/concepts.html

There is no accurate or incorrect between these two idea and views, as they symbolize diverse data warehousing philosophies. In reality, the data warehouse in most schemes is closer to Ralph Kimball’s idea. This is because most data warehouses on the go out as a departmental attempt, and hence they invented as a data mart. Only when more data marts are built later do they develop into a data warehouse.

There are many theories can be used in executing the data warehouse and depends on the criterion of data that appropriate the significance of the system needed. These concepts are copyright from the website http://www.1keydata.com/datawarehousing/inmon-kimball.html.

The Benefits of data warehouse to the organization

* The potential to handle server tasks and responsibilities connected to querying which is not used by most operation systems.

* Can be ended within the good time frame

* The set up do not need a technical skill workers

* Data warehouses are exotic unique that they can act as a repository, a repository for transaction processing systems that have been cleaned.

* Can produce reports, data extracts, can also be done from outside sources.

* Chronological information for competent and competitive analysis

* Niche data quality and completeness

* Enhancement disaster recovery plans with another data back up source

Data Mining

Introduction

Data mining is the progression of analyzing data from dissimilar standpoint and summarizing it into practical information – information that can be used to increase profits, cuts costs, or both. Data mining can also called data or knowledge innovation or knowledge discovery. Software of data mining is one of a number of systematic and methodological tools for evaluating or analyzing data. It assigns the users to analyze and evaluate the data from many different scope or angles, dimensions, proportions, categorize it, and review and summarize the relationships identified. In technical view, data mining is the procedure of finding relationship or patterns among all of fields in large relational databases. The Knowledge Discovery in Databases procedure includes of a few steps the most important from raw and undefined data compilation to some form of innovative knowledge. The progression as of the following steps²:

Read also  Information systems for strategic advantage

* Data cleaning: also known as data cleansing, it is a stage in which noise data and irrelevant data are removed from the group collection.

* Data integration: at this point, multiple data sources, often heterogeneous, may be combined in a general source.

* Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection.

* Data transformation: also known as data consolidation, it is a phase in which the certain data is transformed into forms suitable for the mining process.

* Data mining: it is the vital step in which smart techniques are applied to extract patterns potentially valuable.

* Pattern evaluation: in this step, firmly interesting patterns representing knowledge are identified based on given method.

* Knowledge representation: is the final chapter in which the exposed knowledge is visually represented to the user. This crucial step uses visualization techniques to help users understand and infer the data mining results.

Function

Data mining is mainly data and knowledge for each relation of tools. It enables to decide relationships among home factors and external factors for each study. The purpose as large-scale information technology has been emergent detach transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user inquiry. Data mining consists of five major elements³:

* Remove, transform, and load transaction data onto the data warehouse system.

* Store and administer the data in a multidimensional database system.

* Provide data access to business forecaster and information technology professionals.

* Analyze the data by relevance software.

* Present the data in a useful format, such as a graph or chart.

² http://www.exinfm.com/pdffiles/intro_dm.pdf

³ http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

Data Mining Concepts

Data mining process contains of 5 processes, there are³:

* State the problem

* Collect the data

* Perform pre-processing

* Approximate the model (mine the data)

* Interpret the model & draw the finale

³http://media.wiley.com/product_data/excerpt/24/04712285/0471228524-1.pdf

Order Now

Order Now

Type of Paper
Subject
Deadline
Number of Pages
(275 words)