书籍详情

数据挖掘（概念与技术英文版第2版）

作者：（加）韩家炜

出版社：机械工业出版社

出版时间：2006-04-01

ISBN：9787111188285

定价：¥79.00

购买这本书可以去

内容简介

　　本书第2版更新和改进了原本已十分丰富和全面的第1版内容，并增添了新的重要课题，例如挖掘流数据、挖掘社群网络和挖掘空间、多媒体和其他复杂数据。本书将是一本适用于数据挖掘和知识发现课程的优秀教材。：GregoryPiatetsky-Shapiro，KDnuggets的总裁本书第2版最完整、最全面地讲述了数据挖掘领域的重要知识和技术创新。相比内容已经相当全面的第1版，第2版展示了该领域的最新研究成果，例如挖掘流、时序数据和序列数据以及挖掘空间、多媒体、文本和Web数据。本书是数据挖掘和知识发现领域内所有教师、研究人员、开发人员和用户都必读的一本书。：Hans-PeterKriegel，德国慕尼黑大学我们产生和收集数据的能力正在快速增长。除了大多数商业、科学和政府事务的日益计算机化会产生数据之外，数码相机、发布工具和条码的广泛应用也会产生数据。在数据收集方面，扫描的文本和图像平台、卫星遥感系统和国际互联网已经使我们的生活被巨大的数据量所包围。这种爆炸性的数据增长促使我们比以往更加迫切地需要新技术和自动化工具来帮助我们将这些数据转换为有用的信息和知识。本书第1版曾被KDnuggets的读者评选为最受欢迎的数据挖掘专著，是一本可读性极佳的教材。它从数据库角度全面系统地介绍了数据挖掘的基本概念、基本方法和基本技术以及数据挖掘的技术研究进展，重点关注其可行性、有用性、有效性和可伸缩性问题。但是，自第1版出版之后，数据挖掘领域的研究又取得了很大的进展，开发出了新的数据挖掘方法、系统和应用。第2版在这一方面进行了加强，增加了多个章节讲述最新的数据挖掘方法，以便能够挖掘出复杂类型的数据，包括流数据、序列数据、图结构数据、社群网络数据和多重关系数据。本书适合作为高等院校计算机及相关专业高年级本科生的选修课教材，特别适合作为研究生的专业课教材，同时也可供从事数据挖掘研究和应用开发工作的相关人员作为必备的参考书。本书主要特点●全面实用地论述了从实际业务数据中抽取出的读者需要知道的概念和技术。●更新并结合了来自读者的反馈、数据挖掘领域的技术变化以及统计和机器学习方面的更多资料。●包含了许多算法和实现示例，全部以易于理解的伪代码编写，适用于实际的大规模数据挖掘项目。

作者简介

　　Jiawei Han伊利诺伊大学厄巴纳一尚佩恩分校计算机科学系教授。由于在数据挖掘和数据库系统领域卓有成效的研究工作，他曾多次获得各种荣誉和奖励，其中包括2004年ACM SIGKDD颁发的创新奖。同时，他还是《ACM Trarlsactiorls on Krlowledge Discovery fronl Data》杂志的主编，以及《IEEE Trarlsactiorls 0n Krlowledge and Data Engirleering》和《Data Mirling and Krlowledge Discovery》杂志的编委会成员。Micheline Kamber拥有加拿大康考迪亚大学计算机科学硕士学位，现在加拿大西蒙·弗雷泽大学从事博士后研究工作。

Foreword
Preface
Chapter   Introduction
1.1   What Motivated Data Mining? Why Is It Important?
1.2  So, What Is Data Mining?
1.3   Data Mining-On What Kind of Data?
1.3.1  Relational Databases
1.3.2 Data Warehouses
1.3.3  Transactional Databases
1.3.4  Advanced Data and Information Systems and Advanced Applications
1.4   Data Mining Functionalities-What Kinds of Patterns Can Be Mined?
1.4.1  Concept/Class Description: Characterization and Discrimination
1.4.2  Mining Frequent Patterns, Associations, and Correlations
1.4.3 Classification and Prediction
1.4.4 Cluster Analysis
1.4.5 Outlier Analysis
1.4.6  Evolution Analysis
1.5   Are All of the Patterns Interesting?
1.6   Classification of Data Mining Systems
1.7   Data Mining Task Primitives
1.8   Integration of a Data Mining System with a Database or Data Warehouse System
1.9   Major Issues in Data Mining
1.1O  Summary
Exercises
Bibliographic Notes
Chapter   Data Preprocessing
2.1   Why Preprocess the Data?
2.2   Descriptive Data Summarization
2.2.1  Measuring the Central Tendency
2.2.2 Measuring the Dispersion of Data
2.2.3  Graphic Displays of Basic Descriptive Data Summaries
2.3   Data Cleaning
2.3.1  Missing Values
2.3.2 Noisy Data
2.3.3 Data Cleaning as a Process
2.4   Data Integration and Transformation
2.4.1  Data Integration
2.4.2  Data Transformation
2.5   Data Reduction
2.5.1  Data Cube Aggregation
2.5.2 Attribute Subset Selection
2.5.3  Dimensionality Reduction
2.5.4 Numerosity Reduction
2.6   Data Oiscretization and Concept Hierarchy Generation
2.6.1 Discretization and Concept Hierarchy Generation for Numerical Data
2.6.2  Concept Hierarchy Generation for Categorical Data
2.7   Summary
Exercises
Bibliographic Notes
Chapter 3 Data Warehouse and OLAP Technology: An Overview
3.1   What Is a Data Warehouse?
3.1.1  Differences between Operational Database System and Data Warehouses
3.1.2  But, Why Have a Separate Data Warehouse?
3.2   A Multidimensional Data Model
3.2.1 From Tables and Spreadsheets to Data Cubes
3.2.2  Stars, Snowflakes, and Fact Constellations:Schemas for Multidimensional Databases
3.2.3  Examples for DefTnzng Star, Snowflake,and Fact Constellation Schemas
3.2.4  Measures: Their Categorization and Computation
3.2.5 Concept Hierarchies
3.2.60LAP Operations in the Multidimensional Data Model
3.2.7 A Stamet Query Model for Querying Multidimensional Databases
3.3   Data Warehouse Architecture
3.3.1  Steps for the Design and Construction of Data Warehouses
3.3.2 A Three-Tier Data Warehouse Architecture
3.3.3  Data Warehouse Back-End Tools and Utilities
3.3.4 Metadata Repository
3.3.5  Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP
3.4   Data Warehouse Implementation
3.4.1  Efficient Computation of Data Cubes
3.4.2 Indexing OLAP Data
3.4.3 Efficient Processing of OLAP Queries
3.5   From Data Warehousing to Data Mining
3.5.1  Data Warehouse Usage
3.5.2  From On-Line Analytical Processing to On-Line Analytical Mining
3.6   Summary
Exercises
Bibliographic Notes
Chapter 4 Data Cube Computation and Data Generalization
4. 1   Efficlent Methods for Data Cube Computation
4.1.1  A Road Map for the Materialization of Different Kinds of Cubes
4.1.2  Multiway Array Aggregation for Full Cube Computation
4.1.3  BUC: Computing Iceberg Cubes from the Apex Cuboid Downward
4.1.4  Star-cubing: Computing Iceberg Cubes Using a Dynamic Star-tree Structure
4.1.5  Precomputing Shell Fragments for Fast High-Dimensional OLAP
4.1.6 Computing Cubes with Complex Iceberg Conditions
4.2   Further Development of Data Cube and OLAP
4.3   Attribute-Oriented Induction-An Alternative Method for Data Generalization and Concept
Description
4.3.1  Attribute-Oriented Induction for Data Characterization
4.3.2  Efficient implementation of Attribute Oriented Induction
4.3.3  Presentation of the Derived Generalization
4.3.4  Mining Class Comparisons: Discriminating between Different Classes
4.3.5  Class Description: Presentation of Both Characterization and Comparison
4.4   Summary
Exercises
Bibliographic Notes
Chapter 5 Mining Frequent Patterns, Associations, and Correlations
5. 1   Basic Concepts and a Road Map
5.1.1 Market Basket Analysis: A Motivating Example
5.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules
5.1.3  Frequent Pattern Mining; A Road Map
5.2   Efficient and Scalable Frequent Itemset Mining Methods
5.2.1  The Apriori Algorithm: Finding Frequent ltemsets Using Candidate Generation
5.2.2  Generating Association Rules from Frequent Itemsets
5.2.3  Improving the Efficiency of Apriori
5.2.4  Mining Frequent Itemsets without Candidate Generation
5.2.5  Mining Frequent Itemsets Using Vertical Data Format
5.2.6  Mining Closed Frequent Itemsets
5.3   Mining Various Kinds of Association Rules
5.3.1  Mining Multilevel Association Rules
5.3.2 Mining Multidimensional Association Rules from Relational Databases and Data
Warehouses
5.4   From Association Mining to Correlation Analysis
5.4.1  Strong Rules Are Not Necessarily Interesting, An Example
5.4.2 From Association Analysis to Correlation Analysis
5.5   Constraint-Based Association Mining
5.5.1  Metarule-Guided Mining of Association Rules
5.5.2 Constraint Pushing: Mining Guided by Rule Constraints
5.6  Summary
Exercises
Bibliographic Notes
Chapter 6 Classification and Prediction
6. 1  What Is Classification? What Is Prediction?
6.2   Issues Regarding Classification and Prediction
6.2.1  Preparing the Data for Classification and Prediction
6.2.2 Comparing Classification and Prediction Methods
6.3   Classification by Decision Tree Induction
6.3.1  Decision Tree Induction
6.3.2 Attribute Selection Measures
6.3.3 Tree Pruning
6.3.4 Scalability and Decision Tree Induction
6.4  Bayesian Classification
6.4.1  Bayes' Theorem
6.4.2  Naive Bayesian Classification
6.4.3  Bayesian Belief Networks
6.4.4 Training Bayesian Belief Networks
6.5   Rule-Based Classification
6.5.1  Using IF-THEN Rules for Classification
6.5.2  Rule Extraction from a Decision Tree
6.5.3  Rule Induction Using a Sequential Covering Algorithm
6.6   Classification by Backpropagation
6.6.1  A Multilayer Feed-Forward Neural Network
6.6.2 Defining a Network Topology
6.6.3  Backpropagation
6.6.4 Inside the Black Box: Backpropagation and Interpretability
6.7   Support Vector Machines
6.7.1  The Case When the Data Are Linearly Separable
6.7.2 The Case When the Data Are Linearly Inseparable
6.8  Associative Classification: Classification by Association Rule Analysis
6.9   Lazy Learners (or Learning from Your Neighbors)
6.9.1  k-Nearest-Neighbor Classifiers
6.9.2 Case-Based Reasoning
6.10  Other Classification Methods
6.10.1 Genetic Algorithms
6.10.2 Rough Set Approach
6.10.3 Fuzz'/Set Approaches
6.11  Prediction
6.11.1 Linear Regression
6.11.2 Nonlinear Regression
6.11.3 Other Regression-Based Methods
7.6.2  OPTICS: Ordering Points to Identify the Clustering Structure
7.6.3  DENCLUE: Clustering Based on Density Distribution Functions
7.7  Grid-Based Methods
7.7.1  STING: STatistical INformation Grid
7.7.2 WaveCluster: Clustering Using Wavelet Transformation
7.8  Model-Based Clustering Methods
7.8.1  Expectation-Maximization
7.82 Conceptual Clustering
7.8.3 Neural Network Approach
7.9   Clustering High-Dimensional Data
7.9.1 CLIQUE: A Dimension-Growth Subspace Clustering Method
7.9.2 PROCLUS: A Dimension-Reduction Subspace Clustering Method
7.9.3  Frequent Pattern-Based Clustering Methods
7.10  Constraint-Based Cluster Analysis
7.10.1 Clustering with Obstacle Objects
7.10.2 User-Constrained Cluster Analysis
7.10.3 Semi-Supervised Cluster Analysis
7.11  Outlier Analysis
7.11.1 Statistical Distribution-Based Outlier Detection
7.11.2 Distance-Based Outlier Detection
7.11.3 Density-Based Local Outlier Detection
7.11.4 Deviation-Based Outlier Detection
7.12  Summary
Exercises
Bibliographic Notes
Chapter 8 Mining Stream, Time-Series, and Sequence Data
8.1   Mining Data Streams
8.1.1  Methodologies for Stream Data Processing and Stream Data Systems
8.1.2 Stream OLAP and Stream Data Cubes
8.1.3  Frequent-Pattern Mining in Data Streams
8.1.4 Classification of Dynamic Data Streams
8.1.5  Clustering Evolving Data Streams
8.2   Mining Time-Series Data
8.2.1  Trend Analysis
8.2.2 Similarity Search in Time-Series Analysis
8.3   Mining Sequence Patterns in Transactional Databases
8.3.1  Sequential Pattern Mining: Concepts and Primitives
8.3.2  Scalable Methods for Mining Sequential Patterns
8.3.3  Constraint-Based Mining of Sequential Patterns
8.3.4  Periodicity Analysis for Time-Related Sequence Data
8.4   Mining Sequence Patterns in Biological Data
8.4.1  Alignment of Biological Sequences
8.4.2 Hidden Markov Model for Biological Sequence Analysis
8.5   Summary
Exercises
Bibliographic Notes
Chapter 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining
9.1   Graph Mining
9.1.1  Methods for Mining Frequent Subgraphs
9.1.2 Mining Variant and Constrained Substructure Patterns
9.1.3  Applications: Graph Indexing, Similarity Search, Classification,and Clustering
9.2   Social Network Analysis
9.2.1  What Is a Social Network?
9.2.2  Characteristics of Social Networks
9.2.3  Link Mining: Tasks and Challenges
9.2.4  Mining on Social Networks
9.3   Multirelational Data Mining
9.3.1  What Is Multirelational Data Mining?
9.3.2  ILP Approach to Multirelational Classification
9.3.3 Tuple ID Propagation
9.3.4 Multirelational Classification Using Tuple ID Propagation
9.3.5  Muitirelational Clustering with User Guidance
9.4   Summary
Exercises
Bibliographic Notes
Chapter  10 Mining Object, Spatial, Multimedia, Text, and Web Data
10.1 Multidimensional Analysis and Descriptive Mining of Comple Data Objects
10.1.1 Generalization of Structured Data
10.1.2 Aggregation and Approximation in Spatial and Multimedia Data Generalization
10.1.3 Generalization of Object Identifiers and Class/Subclass Hierarchies
10.1.4 Generalization of Class Composition Hierarchies
1O.1.5 Construction and Mining of Object Cubes
10.1.6 Generalization-Based Mining of Plan Databases by Divide-and-Conquer
102  Spatial Data Mining
10.2.1 Spatial Data Cube Construction and Spatial OLAP
10.2.2 Mining Spatial Association and Co-location Patterns
10.2.3 Spatial Clustering Methods
10.2.4 Spatial Classification and Spatial Trend Analysis
10.2.5 Mining Raster Databases
10.3  Multimedia Data Mining
10.3.1 Similarity Search in Multimedia Data
10.3.2 Multidimensional Analysis of Multimedia Data
10.3.3 Classification and Prediction Analysis of Multimedia Data
10.3.4 Mining Associations in Multimedia Data
10.3.5 Audio and Video Data Mining
10.4  Text Mining
10.4.1 Text Data Analysis and Information Retrieval
10.4.2 Dimensionality Reduction for Text
10.4.3 Text Mining Approaches
10.5  Mining the World Wide Web
10.5. I Mining the Web Page Layout Structure
10.5.2 Mining the Web's Link Structures to Identify Authoritative Web Pages
10.5.3 Mining Multimedia Data on the Web
10.5.4 Automatic Classification of Web Documents
10.5.5 Web Usage Mining
10.6  Summary
Exercises
Bibliographic Notes
Chapter 11 Applications and Trends in Data Mining
11.1  Data Mining Applications
11.1.1 Data Mining for Financial Data Analysis
11.1.2 Data Mining for the Retail Industry
11.1.3 Data Mining for the Telecommunication Industry
11.1.4 Data Mining for Biological Data Analysis
11.1.5 Data Mining in Other Scientific Applications
11.1.6 Data Minin for Intrusion Detection
11.2  Data Mining System Products and Research Prototypes
11.2.1 How to Choose a Data Mining System
11.2.2 Examples of Commercial Data Mining Systems
1.3  Additional Themes on Data Mining
11.3.1 Theoretical Foundations of Data Mining
11.3.2 Stat/stical Data Mining
11.3.3 Visual and Audio Data Mining
11.3.4 Data Mining and Collaborative Filtering
1.4  Social Impacts of Data Mining
11.4.1 Ubiquitous and Invisible Data Mining
11.4.2 Data Mining, Privacyand Data Security
1.5  Trends in Data Mining
11.6  Summary
Exercises
Bibliographic Notes
Appendix  An Introduction to Microsoft's OLE DB for Data Mining
A.I Model Creation
A.2 Model Training
A.3 Model Prediction and Browsing
Bibliography
Index

猜您喜欢

工业机器人操作与编程

从问题到程序

案例式程序设计基础