书籍详情
Python大数据分析与应用实战
作者:余本国,刘宁,李春报 著
出版社:电子工业出版社
出版时间:2021-12-01
ISBN:9787121421976
定价:¥109.00
购买这本书可以去
内容简介
本书主要介绍大数据分析、人工智能的实战应用。全书共 9 章,通过 8 个大型的数据分析案例,系 统地介绍常用的数据分析方法。 这 8 个大型案例涉及数据可视化方法,回归、聚类、决策树、朴素贝叶斯等机器学习算法,以及深度 学习算法等内容。各章程序在 Python 3.8.5 环境下编写完成,在案例编写过程中,涉及 Pandas、NumPy、 Matplotlib 等 Python 中常用的依赖库,最大限度地帮助读者掌握相关知识内容。每个案例之间相互独立, 读者可以根据自己的兴趣选择相关章节进行学习。 本书内容丰富,通俗易懂,以实操为目的帮助用户快速掌握相关技能。书中案例程序全码解析,注释 完备,在编程环境下经过简单的修改便可以使用。本书不仅适合大数据分析、人工智能相关领域的入门读 者使用,也适合有一定基础的读者进行实战时参考,同时适合本科生、研究生及对 Python 感兴趣的读者 阅读。
作者简介
余本国,博士,硕士研究生导师,现工作于海南医学院生物医学信息与工程学院。主讲高等数学、微积分、Python语言、大数据分析基础等课程。2012年到加拿大York University做访问学者。出版《Python数据分析基础》《基于Python的大数据分析基础及实战》《Python在机器学习中的应用》《PyTorch深度学习入门与实战》《Python编程与数据分析应用》等书。 刘宁,深圳大学信号与信息处理专业硕士研究生毕业,目前从事智慧城市、数字政府建设等相关工作。曾发表SCI论文Content-based image retrieval using high-dimensional information geometry,出版《高维信息几何与几何不变量》等著作。 李春报 海南医学院现代教育技术中心高级实验师,从事教育领域信息化研究工作,兼任海南信息化协会监事长,海南省网络安全协会专家等职。
目录
第 1 章 Python 语法基础 ··························· 1
1.1 安装 Anaconda ····································· 1
1.1.1 代码提示 ······························· 4
1.1.2 变量浏览 ······························· 5
1.1.3 安装第三方库 ························· 5
1.2 语法基础 ············································ 6
1.2.1 字符串、列表、元组、字典和
集合 ····································· 6
1.2.2 条件判断、循环和函数 ··········· 13
1.2.3 异常 ··································· 17
1.2.4 特殊函数 ····························· 20
1.3 Python 基础库应用入门 ························ 22
1.3.1 NumPy 库应用入门 ················ 23
1.3.2 Pandas 库应用入门 ················· 29
1.3.3 Matplotlib 库应用入门 ············· 40
1.4 本章小结 ·········································· 45
第 2 章 天气数据的获取与建模分析 ·········· 52
2.1 准备工作 ·········································· 52
2.2 利用抓取方法获取天气数据 ·················· 54
2.2.1 网页解析 ····························· 54
2.2.2 抓取一个静态页面中的天气
数据 ··································· 57
2.2.3 抓取历史天气数据 ················· 60
2.3 天气数据可视化 ································· 63
2.3.1 查看数据基本信息 ················· 63
2.3.2 变换数据格式 ······················· 64
2.3.3 气温走势的折线图 ················· 66
2.3.4 历年气温对比图 ···················· 67
2.3.5 天气情况的柱状图 ················· 69
2.3.6 使用 Tableau 制作天气情况的
气泡云图 ····························· 70
2.3.7 风向占比的饼图 ···················· 73
2.3.8 使用 windrose 库绘制风玫瑰图 ·· 74
2.4 机器学习在天气预报中的应用 ··············· 76
2.4.1 线性回归的基本概念 ·············· 76
2.4.2 使用一元线性回归预测气温 ····· 77
2.4.3 使用多元线性回归预测气温 ····· 85
2.5 本章小结 ·········································· 91
第 3 章 养成游戏中人物的数据搭建 ·········· 92
3.1 准备工作 ·········································· 92
3.2 利用 Pyecharts 库进行数据基本情况分析 ··· 93
3.2.1 感染人数分布图 ···················· 94
3.2.2 病情分布图 ·························· 96
3.2.3 病症情况堆叠图 ···················· 97
3.2.4 绘制出院、死亡情况折线图 ····· 98
3.2.5 病情热力图 ························· 100
3.2.6 病情分布象形图 ··················· 101
3.2.7 人口流动示意图 ··················· 103
| Python 大数据分析与应用实战 |
VI
3.3 感染病例分析 ··································· 105
3.3.1 基本信息统计 ······················ 106
3.3.2 使用直方图展示感染周期 ······· 108
3.3.3 使用词云图展示死亡病例情况 ··· 111
3.4 疫情趋势预测 ··································· 114
3.4.1 利用逻辑方程预测感染人数 ···· 115
3.4.2 利用 SIR 模型进行疫情预测 ···· 120
3.4.3 Logistic 模型和 SIR 模型的
对比 ·································· 128
3.5 本章小结 ········································· 131
第 4 章 航空数据分析 ···························· 132
4.1 准备工作 ········································· 132
4.2 基本情况统计分析 ····························· 135
4.2.1 查看数据的基本信息 ············· 135
4.2.2 航空公司、机型分布 ············· 137
4.2.3 展示各个城市航班数量的 3D
地图 ·································· 139
4.2.4 从首都机场出发的桑基图 ······· 142
4.2.5 通过关系图展示航线 ············· 145
4.3 利用 Floyd 算法计算最短飞行时间 ········· 148
4.3.1 Floyd 算法简介 ···················· 148
4.3.2 Floyd 算法的流程 ················· 150
4.3.3 算法程序实现 ······················ 150
4.3.4 结果分析 ···························· 154
4.4 本章小结 ········································· 158
第 5 章 市民服务热线文本数据分析 ········· 160
5.1 准备工作 ········································· 160
5.2 基本情况分析 ··································· 162
5.2.1 数据分布基本信息 ················ 162
5.2.2 每日平均工单量分析 ············· 165
5.2.3 来电时间分析 ······················ 166
5.2.4 工单类型分析 ······················ 167
5.3 利用词云图展示工单内容 ···················· 171
5.3.1 工单分词 ···························· 171
5.3.2 去除停用词 ························· 172
5.3.3 词频统计 ···························· 173
5.3.4 市民反映问题词云图 ············· 175
5.3.5 保存数据 ···························· 176
5.4 基于朴素贝叶斯的工单自动分类转办 ····· 177
5.4.1 需求概述 ···························· 177
5.4.2 朴素贝叶斯模型的基本概念 ···· 177
5.4.3 朴素贝叶斯文本分类算法的
流程 ·································· 181
5.4.4 程序实现 ···························· 182
5.5 基于 K-Means 算法和 PCA 方法降维的
热点问题挖掘 ··································· 189
5.5.1 应用场景 ···························· 189
5.5.2 K-Means 算法和 PCA 方法的
基本原理 ···························· 189
5.5.3 热点问题挖掘算法的流程 ······· 193
5.5.4 程序实现 ···························· 194
5.6 本章小结 ········································· 205
第 6 章 决策树信贷风险控制 ·················· 206
6.1 准备工作 ········································· 206
6.2 数据集基本情况分析 ·························· 209
6.2.1 查看数据大小和缺失情况 ······· 209
6.2.2 绘制直方图查看数据的分布
情况 ·································· 211
6.2.3 绘制直方图的 3 种方法 ·········· 212
| 目录 |
VII
6.2.4 通过箱型图查看异常值的情况 ···· 213
6.2.5 异常值和缺失值的处理 ·········· 217
6.2.6 使用小提琴图展示预处理后的
数据 ·································· 218
6.3 利用决策树进行信贷数据建模 ·············· 219
6.3.1 决策树原理简介 ··················· 219
6.3.2 决策树信贷建模流程 ············· 225
6.3.3 利用 scikit-learn 库实现决策树
风险控制算法 ······················ 226
6.3.4 模型优化 ···························· 231
6.4 本章小结 ········································· 233
第 7 章 利用深度学习进行垃圾图片分类 ···· 234
7.1 准备工作 ········································· 234
7.2 深度学习的基本原理 ·························· 237
7.2.1 CNN 的基本原理 ·················· 237
7.2.2 Keras 库简介 ······················· 240
7.3 利用 Keras 库实现基于 CNN 的垃圾
图片分类 ········································ 241
7.3.1 算法流程 ···························· 241
7.3.2 数据预处理 ························· 241
7.3.3 CNN 模型实现 ····················· 247
7.4 优化 CNN 模型 ································· 252
7.4.1 选择优化器 ························· 252
7.4.2 选择损失函数 ······················ 254
7.4.3 调整模型 ···························· 256
7.4.4 图片增强 ···························· 259
7.4.5 改变学习率 ························· 263
7.5 模型应用 ········································· 265
7.6 本章小结 ········································· 268
第 8 章 协同过滤和矩阵分解推荐算法
分析 ········································· 269
8.1 准备工作 ········································· 269
8.2 基于协同过滤算法的短视频完播情况
分析 ··············································· 271
8.2.1 基于用户的协同过滤算法的
原理 ·································· 271
8.2.2 算法流程 ···························· 274
8.2.3 程序实现 ···························· 275
8.3 基于矩阵分解算法的短视频完播情况
预测 ·············································· 283
8.3.1 算法原理 ···························· 283
8.3.2 利用 Surprise 库实现 SVD
算法 ·································· 286
8.4 几种方法在测试集中的表现 ················· 289
8.5 本章小结 ········································· 291
第 9 章 《红楼梦》文本数据分析 ············ 292
9.1 准备工作 ········································· 292
9.1.1 编程环境 ···························· 292
9.1.2 数据情况简介 ······················ 293
9.2 分词 ··············································· 294
9.2.1 读取数据 ···························· 295
9.2.2 数据预处理 ························· 298
9.2.3 分词及去除停用词 ················ 306
9.2.4 制作词云图 ························· 307
9.3 文本聚类分析 ··································· 316
9.3.1 构建分词 TF-IDF 矩阵 ··········· 317
9.3.2 K-Means 聚类 ······················ 318
9.3.3 MDS 降维 ··························· 320
9.3.4 PCA 降维 ··························· 321
| Python 大数据分析与应用实战 |
VIII
9.3.5 HC 聚类 ····························· 323
9.3.6 t -SNE 高维数据可视化 ·········· 325
9.4 LDA 主题模型 ·································· 326
9.5 人物社交网络分析 ····························· 332
9.6 本章小结 ········································· 338
附录 A 抓取数据请求头查询 ··················· 339
附录 B GraphViz 库的安装方法 ·············· 341
附录 C 在 Windows 10 中安装 TensorFlow
的方法 ······································ 343
参考文献 ··············································· 346
致谢 ····················································· 34
猜您喜欢