书籍详情
数据挖掘:实用机器学习技术及Java实现 英文版
作者:(新西兰)Ian H.Witten,(新西兰)Eibe Frank著
出版社:机械工业出版社
出版时间:2003-09-01
ISBN:9787111127697
定价:¥40.00
购买这本书可以去
内容简介
本书是综合运用数据挖掘、数据分析、信息理论以及机器学习技术的里程碑。——微软研究院,图灵奖得主JimGray这是一本将数据挖掘算法和数据挖掘实践完美结合起来的优秀教材。作者以其丰富的经验,对数据挖掘的概念和数据挖掘所用的技术(特别是机器学习)进行了深入浅出的介绍,并对应用机器学习工具进行数据挖掘给出了良好的建议。数据挖掘中的各个关键要素也融合在众多实例中加以介绍。本书还介绍了Weka这种基于Java的软件系统。该软件系统可以用来分析数据集,找到适用的模式,进行正确的分析,也可以用来开发自己的机器学习方案。本书的主要特点:解释数据挖掘算法的原理。通过实例帮助读者根据实际情况选择合适的算法,并比较和评估不同方法得出的结果。介绍提高性能的技术,包括数据处理以及组合不同方法得到的输出。提供了本书所用的Weka软件和附加学习材料,可以从http://www.mkp.com/datamining上下载这些资料。JanH.Witten新西兰怀卡托(Waikato)大学计算机科学系教授。他是ACM和新西兰皇家学会的成员,并参加了英国、美国、加拿大和新西兰的专业计算、信息检索。工程等协会。他著有多部著作,是多家技术杂志的作者,发表过大量论文。EibeFrank毕业于德国卡尔斯鲁厄大学计算机科学系,目前是新西兰怀卡托大学机器学习组的研究员。他经常应邀在机器学习会议上演示其研究成果,并在机器学习杂志上发表多篇论文。
作者简介
Lan H.Witten,新西兰怀卡托大学计算机科学系教授。他是ACM和新西兰皇家学会的成员,并参加了英国、美国、加拿大和新西兰的专业计算、信息检索、工程等协会。他著有多部著作,是多家技术杂志的作者,发表过大量论文。
目录
Foreword vii
Preface xvii
1 What's it all about?
1.1 Data mining and machine learning
Describing structural patterns
Machine learning
Data mining
1.2 Simple examples: The weather problem and others
The weather problem
Contact lenses: An idealized problem
Irises: A classic numeric dataset
CPU performance: Introducing numeric prediction
Labor negotiations: A more realistic example
Soybean classification: A classic machine learning success
1.3 Fielded applications
Decisions involving judgment
Screening images
Load forecasting
Diagnosis
Marketing and sales
1.4 Machine learning and statistics
1.5 Generalization as search
Enumerating the concept space
Bias
1.6 Data mining and ethics
1.7 Further reading
2 Input Concepts, instances, attributes
2.1 What's aconcept?
2.2 What's in an example?
2.3 What's in an attribute?
2.4 Preparing the input
Gathering the data together
Arff format
Attribute types
Missing values
Inaccurate values
Getting to know your data
2.5 Further reading
3 Output: Knowledge representation
3.1 Decision tables
3.2 Decision trees
3.3 Classification rules
3.4 Association rules
3.5 Ruleswith exceptions
3.6 Rules involving relations
3.7 Trees for numeric prediction
3.8 Instance-based representation
3.9 Clusters
3.10 Further reading
4 Algorithms: The basic methods
4.1 Inferring rudimentary rules
Missing values and numeric attributes
Discussion
4.2 Statistical modeling
Missing values and numeric attributes
Discussion
4.3 Divide and conquer: Constructing decision trees
Calculating information
Highly branching attributes
Discussion
4.4 Covering algorithms: Constructing rules
Rules versus trees
A simple covering algorithm
Rules versus decision lists
4.5 Mining association rules
Item sets
Association rules
Generating rules efficiently
Discussion
4.6 Linear models
Numeric prediction
Classification
Discussion
4.7 Instance-based learning
The distance function
Discussion
4.8 Further reading
5 Credibility: Evaluating what's been learned
5.1 Training and testing
5.2 Predicting performance
5.3 Cross-validation
5.4 Other estimates
Leave-one-out
The bootstrap
5.5 Comparing data mining schemes
5.6 Predicting probabilities
Quadratic loss function
Informational loss function
Discussion
5.7 Counting the cost
Lift charts
ROC curves
Cost-sensitive learning
Discussion
5.8 Evaluating numeric prediction
5.9 The minimum description length principle
5.10 Applying MDL to clustering
5.11 Further reading
6 Implementations: Real machine learning schemes
6.1 Decision trees
Numeric attributes
Missing values
Pruning
Estimating error rates
Complexity of decision tree induction
From trees to rules
C4.5: Choices and options
Discussion
6.2 Classification rules
Criteria for choosing tests
Missing values, numeric attributes
Good rules and bad rules
Generating good rules
Generating good decision lists
Probability measure for rule evaluation
Evaluating rules using a test set
Obtaining rules from partial decision trees
Rules with exceptions
Discussion
6.3 Extending linear classification: Support vector machines
The maximum margin hyperplane
Nonlinear class boundaries
Discussion
6.4 Instance-based learning
Reducing the number of exemplars
Pruning noisy exemplars
Weighting attributes
Generalizing exemplars
Distance functions for generalized exemplars
Generalized distance functions
Discussion
6.5 Numeric prediction
Model trees
Building the tree
Pruning the tree
Nominal attributes
Missing values
Pseudo-code for model tree induction
Locally weighted linear regression
Discussion
6.6 Clustering
Iterative distance-based clustering
Incremental clustering
Category utility
Probability-based clustering
The EM algorithm
Extending the mixture model
Bayesian clustering
Discussion
7 Moving on: Engineering the input and output
7.1 Attribute selection
Scheme-independent selection
Searching the attribute space
Scheme-specific selection
7.2 Discretizing numeric attributes
Unsupervised discretization
Entropy-based discretization
Other discretization methods
Entropy-based versus error-based discretization
Converting discrete to numeric attributes
7.3 Automatic data cleansing
Improving decision trees
Robust regression
Detecting anomalies
7.4 Combining multiple models
Bagging
Boosting
Stacking 258
Error-correcting output codes
7.5 Further reading
8 Nuts and bolts: Machine learning algorithms in Java
8.1 Getting started
8.2 Javadoc and the class library
Classes, instances, and packages
The weka. core package
The weka. classifiers package
Other packages
Indexes
8.3 Processing datasets using the machine learning programs
Using M5'
Generic options
Scheme-specific options
Classifiers
Meta-learning schemes
Filters
Association rules
Clustering
8.4 Embedded machine learning
A simple message classifier
8.5 Writing new learning schemes
An example classifier
Conventions for implementing classifiers
Writing filters
An example filter
Conventions for writing filters
9 Looking forward
9.1 Learning from massive datasets
9.2 Visualizing machine learning
Visualizing the input
Visualizing the output
9.3 Incorporating domain knowledge
9.4 Text mining
Finding key phrases for documents
Finding information in running text
Soft parsing
9.5 Mining the World Wide Web
9.6 Further reading
References
Index
About the authors
Preface xvii
1 What's it all about?
1.1 Data mining and machine learning
Describing structural patterns
Machine learning
Data mining
1.2 Simple examples: The weather problem and others
The weather problem
Contact lenses: An idealized problem
Irises: A classic numeric dataset
CPU performance: Introducing numeric prediction
Labor negotiations: A more realistic example
Soybean classification: A classic machine learning success
1.3 Fielded applications
Decisions involving judgment
Screening images
Load forecasting
Diagnosis
Marketing and sales
1.4 Machine learning and statistics
1.5 Generalization as search
Enumerating the concept space
Bias
1.6 Data mining and ethics
1.7 Further reading
2 Input Concepts, instances, attributes
2.1 What's aconcept?
2.2 What's in an example?
2.3 What's in an attribute?
2.4 Preparing the input
Gathering the data together
Arff format
Attribute types
Missing values
Inaccurate values
Getting to know your data
2.5 Further reading
3 Output: Knowledge representation
3.1 Decision tables
3.2 Decision trees
3.3 Classification rules
3.4 Association rules
3.5 Ruleswith exceptions
3.6 Rules involving relations
3.7 Trees for numeric prediction
3.8 Instance-based representation
3.9 Clusters
3.10 Further reading
4 Algorithms: The basic methods
4.1 Inferring rudimentary rules
Missing values and numeric attributes
Discussion
4.2 Statistical modeling
Missing values and numeric attributes
Discussion
4.3 Divide and conquer: Constructing decision trees
Calculating information
Highly branching attributes
Discussion
4.4 Covering algorithms: Constructing rules
Rules versus trees
A simple covering algorithm
Rules versus decision lists
4.5 Mining association rules
Item sets
Association rules
Generating rules efficiently
Discussion
4.6 Linear models
Numeric prediction
Classification
Discussion
4.7 Instance-based learning
The distance function
Discussion
4.8 Further reading
5 Credibility: Evaluating what's been learned
5.1 Training and testing
5.2 Predicting performance
5.3 Cross-validation
5.4 Other estimates
Leave-one-out
The bootstrap
5.5 Comparing data mining schemes
5.6 Predicting probabilities
Quadratic loss function
Informational loss function
Discussion
5.7 Counting the cost
Lift charts
ROC curves
Cost-sensitive learning
Discussion
5.8 Evaluating numeric prediction
5.9 The minimum description length principle
5.10 Applying MDL to clustering
5.11 Further reading
6 Implementations: Real machine learning schemes
6.1 Decision trees
Numeric attributes
Missing values
Pruning
Estimating error rates
Complexity of decision tree induction
From trees to rules
C4.5: Choices and options
Discussion
6.2 Classification rules
Criteria for choosing tests
Missing values, numeric attributes
Good rules and bad rules
Generating good rules
Generating good decision lists
Probability measure for rule evaluation
Evaluating rules using a test set
Obtaining rules from partial decision trees
Rules with exceptions
Discussion
6.3 Extending linear classification: Support vector machines
The maximum margin hyperplane
Nonlinear class boundaries
Discussion
6.4 Instance-based learning
Reducing the number of exemplars
Pruning noisy exemplars
Weighting attributes
Generalizing exemplars
Distance functions for generalized exemplars
Generalized distance functions
Discussion
6.5 Numeric prediction
Model trees
Building the tree
Pruning the tree
Nominal attributes
Missing values
Pseudo-code for model tree induction
Locally weighted linear regression
Discussion
6.6 Clustering
Iterative distance-based clustering
Incremental clustering
Category utility
Probability-based clustering
The EM algorithm
Extending the mixture model
Bayesian clustering
Discussion
7 Moving on: Engineering the input and output
7.1 Attribute selection
Scheme-independent selection
Searching the attribute space
Scheme-specific selection
7.2 Discretizing numeric attributes
Unsupervised discretization
Entropy-based discretization
Other discretization methods
Entropy-based versus error-based discretization
Converting discrete to numeric attributes
7.3 Automatic data cleansing
Improving decision trees
Robust regression
Detecting anomalies
7.4 Combining multiple models
Bagging
Boosting
Stacking 258
Error-correcting output codes
7.5 Further reading
8 Nuts and bolts: Machine learning algorithms in Java
8.1 Getting started
8.2 Javadoc and the class library
Classes, instances, and packages
The weka. core package
The weka. classifiers package
Other packages
Indexes
8.3 Processing datasets using the machine learning programs
Using M5'
Generic options
Scheme-specific options
Classifiers
Meta-learning schemes
Filters
Association rules
Clustering
8.4 Embedded machine learning
A simple message classifier
8.5 Writing new learning schemes
An example classifier
Conventions for implementing classifiers
Writing filters
An example filter
Conventions for writing filters
9 Looking forward
9.1 Learning from massive datasets
9.2 Visualizing machine learning
Visualizing the input
Visualizing the output
9.3 Incorporating domain knowledge
9.4 Text mining
Finding key phrases for documents
Finding information in running text
Soft parsing
9.5 Mining the World Wide Web
9.6 Further reading
References
Index
About the authors
猜您喜欢