Data Mining is a process that uses rules of association with the collection of vast amounts of data. It incorporates statistics, data collection and recognition of patterns and trends. The most common type of data mining today is the use of programs within programs that record computer habits, websites visited and much more information about both the user and computer. The analysis of the data provides profiles that are used by all kinds of organizations. Frequently the programs to do the data mining are included in purchased software and downloaded software. However it is also frequently installed on a computer through the use of a virus or worm or other such malicious tool.
There is no inherent danger to data mining. It is the use of the information that is most troubling. When such techniques can include information like credit card numbers and bank accounts there is an obvious troubling potential for the use of the data. What is meant by no inherent danger is the software data mining collection does not damage a computer, it just records what the computer is doing.
One of the major reasons for collecting such vast amounts of information is for resale. Different on-line merchants would like to know where and when each online shopper is spending their money. Then they can use that information to direct their marketing and timing to a more dedicated market.
The data that is mined, or collected, can come from a vast array of sources. As stated above, the most common form of data mining is the infection of a computer. It can also be extracted from POS (Point Of Sale) transactions, all kinds of data bases, online resources, articles, etc. Although this is an easy way to gather information there is also off-line sources and databases that can be included through the old-style manual data entry.
With all the information they receive the company must filter through the information to find what their consumers need. Companies wish to have only one type of information it can take time to decipher all the information and see, for example, who buys Huggies brand diapers. All this information is to be fused together and cleaned. Many of these data sources have something wrong with them, such as inconsistencies or is a poor quality. This must be cleaned and checked for the consistency of formats.
Meta-Data is data about data; it describes how and when and by whom a particular set of data was collected and how it’s formatted. Meta-Data is essential for understanding information stored in data warehouse
Data warehouse is a special database of cleaned up data and Meta-data. Both the data and the Meta-Data are sent to the data warehouse.