书籍详情
并行处理基础(英文版)
作者:(美)Harry F.Jordan,(美)Gita Alaghband著
出版社:清华大学出版社
出版时间:2003-10-01
ISBN:9787302073826
定价:¥56.00
购买这本书可以去
内容简介
本书作者在多年从事并行处理教学和研究的基础上,从并行体系结构、算法和语言三者结合的角度全面地介绍了计算机半行处理所涉及到的主要内容,主要包括并行处理的发展、向量处理、集中与分布多处理、互联网络、同步与通信、性能分析、并行时序、并行I/O、并行库等。从提出问题到理论分析和程序算法实现,循序渐进,科学严谨,具有很好的理论性和实践性。本书的写作语言朴实,易于读者阅读和理解,每章后的习题和参考文献又为读者提供了进一步思考相关问题的线索。本书适用于计算机科学勤务计算机工程专业的研究生、高年级本科生作为有关“计算机并行处理”课程的教材,同时也可作为计算机并行处理领域研究人员的科研参考书。
作者简介
暂缺《并行处理基础(英文版)》作者简介
目录
Preface
Chapter 1: Parallel Machines and Computations
1.1 The Evolution of Parallel Architectures
1.1.1 Parallelism in Sequential Computers
1.1.2 Vector or SIMD Computers
1.1.3 Multiprocessors or MIMD Computers
1.2 Interconnection Networks
1.3 Application of Architectural Parallelism
1.4 Getting Started in SIMD and MIMD Programming
1.5 Parallelism in Algorithms
1.6 Conclusion
1.7 Bibliographic Notes
Chapter 2: Potential for Parallel Computations
2.1 Parameters Characterizing Algorithm Parallelism
2.2 Prefix Problem
2.3 Parallel Prefix Algorithms
2.3.1 Upper/Lower Parallel Prefix Algorithm
2.3.2 Odd/Even Parallel Prefix Algorithm
2.3.3 Ladner and Fischer''s Parallel Prefix
2.4 Characterizing Algorithm Behavior for Large Problem Size
2.5 Programming Parallel Prefix
2.6 Speedup and Efficiency of Parallel Algorithms
2.7 The Performance Perspective
2.7.1 Factors That Influence Performance
2.7.2 A Simple Performance Model--Amdahl''s Law
2.7.3 Average Execution Rate
2.8 Conclusion
2.9 Bibliographic Notes
Chapter 3: Vector Algorithms and Architectures
3.1 Vector and Matrix Algorithms
3.2 A Vector Architecture---Single Instruction Multiple Data
3.3 An SIMD Instruction Set
3.3.1 Registers and Memories of an SIMD Computer
3.3.2 Vector, Control Unit, and Cooperative Instructions
3.3.3 Data-Dependent Conditional Operations
3.3.4 Vector Length and Strip Mining
3.3.5 Routing Data Among the PEs
3.4 The Prime Memory System
3.5 Use of the PE Index to Solve Storage Layout Problems
3.6 SIMD Language Constructs--Fortran 90
3.6.1 Arrays and Array Sections
3.6.2 Array Assignment and Array Expressions
3.6.3 Fortran 90 Array Intrinsic Functions
3.6.4 Examples of SIMD Operations in Fortran 90
3.7 Pipelined SIMD Vector Computers
3.7.1 Pipelined SIMD Processor Structure
Processor/Memory Interaction
Number and Types of Pipelines
Implementation of Arithmetic
3.7.2 The Memory Interface of a Pipelined SIMD Computer
3.7.3 Performance of Pipelined SIMD Computers
3.8 Vector Architecture Summary
3.9 Bibliographic Notes
Chapter 4: MIMD Computers or Multiprocessors
4.1 Shared Memory and Message-Passing Architectures
4.1.1 Mixed-Type Multiprocessor Architectures
4.1.2 Characteristics of Shared Memory and Message Passing
4.1.3 Switching Topologies for Message Passing Architectures
4.1.4 Direct and Indirect Networks
4.1.5 Classification of Real Systems
4.2 Overview of Shared Memory Multiprocessor Programming
4.2.1 Data Sharing and Process Management
4.2.2 Synchronization
4.2.3 Atomicity and Synchronization
4.2.4 Work Distribution
4.2.5 Many Processes Executing One Program
4.3 Shared Memory Programming Alternatives and Scope
4.3.1 Process Management--Starting, Stopping, and Hierarchy
4.3.2 Data Access by Parallel Processes
4.3.3 Work Distribution
4.3.4 Multiprocessor Synchronization
Atomicity
Hardware and Software Synchronization Mechanisms
Fairness and Mutual Exclusion
4.4 A Shared Memory Multiprocessor Programming Language
4.4.1 The OpenMP Language Extension
Execution Model
Process Control
Parallel Scope of Variables
Work Distribution
Synchronization and Memory Consistency
4.4.2 The OpenMP Fortran Applications Program Interface API
Constructs of the OpenMP Fortran API
4.4.3 OpenMP Fortran Examples and Discussion
4.5 Pipelined MIMD--Multithreading
4.6 Summary and Conclusions
4.7 Bibliographic Notes
Chapter 5: Distributed Memory Multiprocessors
5.1 Distributing Data and Operations Among Processor/Memory Pairs
5.2 Programming with Message Passing
5.2.1 The Communicating Sequential Processes CSP Language
5.2.2 A Distributed Memory Programming Example: Matrix Multiply
5.3 Characterization of Communication
5.3.1 Point-to-Point Communications
5.3.2 Variable Classes in a Distributed Memory Program
5.3.3 High-Level Communication Operations
5.3.4 Distributed Gauss Elimination with High-Level Communications
5.3.5 Process Topology Versus Processor Topology
5.4 The Message Passing Interface, MPI
5.4.1 Basic Concepts in MPI
Communicator Structure
The Envelope
The Data
Point-to-Point Communication Concepts
Collective Communications Concepts
5.4.2 An Example MPI Program--Matrix Multiplication
5.5 Hardware Managed Communication--Distributed Cache
5.5.1 Cache Coherence
5.5.2 Shared Memory Consistency
5.6 Conclusion---Shared Versus Distributed Memory Multiprocessors
5.7 Bibliographic Notes
Chapter 6: Interconneetion Networks
6.1 Network Characteristics
6.2 Permutations
6.3 Static Networks
6.3.1 Mesh
6.3.2 Ring
6.3.3 Tree
6.3.4 Cube Networks
6.3.5 Performance
6.4 Dynamic Networks
6.4.1 Bus
6.4.2 Crossbar
6.4.3 Multistage Interconnection Networks MINs
Benes Network
Butterfly Network
Omega Network
6.4.4 Combining Networks--Mutual Exclusion Free Synchronization
6.4.5 Performance
6.5 Conclusion
6.6 Bibliographic Notes
Chapter 7: Data Dependence and Parallelism
7.1 Discovering Parallel Operations in Sequential Code
7.2 Variables with Complex Names
7.2.1 Nested Loops
7.2.2 Variations on the Array Reference Disambiguation Problem
7.3 Sample Compiler Techniques
7.3.1 Loop Transformations
7.3.2 Loop Restructuring
7.3.3 Loop Replacement Transformations
7.3.4 Anti- and Output Dependence Removal Transformations
7.4 Data Flow Principles
7.4.1 Data Flow Concepts
7.4.2 Graphical Representation of Data Flow Computations
7.4.3 Data Flow Conditionals
7.4.4 Data Flow Iteration
7.4.5 Data Flow Function Application and Recursion
7.4.6 Structured Values in Data Flow--Arrays
7.5 Data Flow Architectures
7.5.1 The MIT Static Data Flow Architecture
7.5.2 Dynamic Data Flow Computers
Manchester Data Flow Computer
The MIT Tagged-Token Data Flow Machine
7.5.3 Issues to Be Addressed by Data Flow Machines
7.6 Systolic Arrays
7.7 Conclusion
7.8 Bibliographic Notes
Chapter 8: Implementing Synchronization and Data Sharing
8.1 The Character of Information Conveyed by Synchronization
8.2 Synchronizing Different Kinds of Cooperative Computations
8.2.1 One Producer with One or More Consumers
8.2.2 Global Reduction
8.2.3 Global Prefix
8.2.4 Cooperative Update of a Partitioned Structure
8.2.5 Managing a Shared Task Set
8.2.6 Cooperative List Manipulation
8.2.7 Parallel Access Queue Using Fetch&add
8.2.8 Histogram--Fine Granularity, Data-Dependent Synchronization
8.3 Waiting Mechanisms
8.3.1 Hardware Waiting
8.3.2 Software Waiting
8.3.3 Multilevel Waiting
8.4 Mutual Exclusion Using Atomic Read and Write
8.5 Proving a Synchronization Implementation Correct
8.5.1 Implementing Produce/Consume Using Locks
8.5.2 Temporal Logic
8.5.3 Proof of Correctness
8.6 Alternative Implementations of Synchronization--Barrier
8.6.1 Features of Barrier Synchronization
8.6.2 Characterization of Barrier Implementations
8.7 Conclusion
8.8 Bibliographic Notes
Chapter 9: Parallel Processor Performance
9.1 Amdahl'' s Law Revisited
9.1.1 The Effect of Work Granularity on Amdahl'' s Law
9.1.2 Least Squares Estimation of Amdahl''s Law Parameters
9.2 Parameterized Execution Time
9.2.1 Pipelined Vector Machine Performance
9.2.2 Pipelined Multiprocessor Performance
Program Sections with Restricted Parallelism
Programs Requiring Critical Sections for Parallel Execution
Granularity Correction to the Critical Section Model
9.2.3 Multiple Pipelined Multiprocessors
9.3 Performance of Barrier Synchronization
9.3.1 Accounting for Barrier Performance
9.3.2 Instrumentation for Barrier Measurement
9.3.3 Examples of Barrier Performance Measurement
9.4 Statistical Models for Static and Dynamic Parallel Loops
9.4.1 Dynamic Scheduling Model
Dynamically Scheduled Iterations of Equal Length
Dynamically Scheduled Iterations of Different Lengths
9.4.2 Static Scheduling Model
Prescheduled Iterations of Equal Length
Prescheduled Iterations of Different Lengths
9.4.3 Comparison with Experimental Results
9.5 Conclusions
9.6 Bibliographic Notes
Chapter 10: Temporal Behavior of Parallel Programs
10.1 Temporal Characterization of Cache Behavior
10.1.1 A Temporal Locality Metric for Cache Behavior
10.1.2 Example Application of the Locality Metric to Bubble Sort
10.2 Read Sharing in Multiprocessors with Distributed Caches
10.2.1 A Simple Example of Read Sharing
10.2.2 The KSR-1 Architecture
10.2.3 Read Multiplicity Metric
10.2.4 Experiments
10.2.5 Programmed Poststore and Prefetch
10.3 Message Waiting in Message Passing Multiprocessors
10.4 Conclusion
10.5 Bibliographic Notes
Chapter 11: Parallel I/O
11.1 The Parallel I/O Problem
11.1.1 Data Dependence and I/O
11.1.2 I/O Format Conversion
11.1.3 Numerical Examples of I/O Latency and Bandwidth
Requirements
11.2 Hardware for Parallel I/O
11.2.1 Control of the Primary Memory Side of the Transfer
11.2.2 I/O Channel Concurrency
11.2.3 Peripheral Device Parallelism
11.3 Parallel Access Disk Arrays--RAID
11.4 Parallel Formatted I/O in Shared Memory Multiprocessors
11.4.1 Parallel Input Using C I/O Routines fread0 and sscanf0
Application Independent Input Operations
Application Dependent Input Operations
11.4.2 Parallel Output Using C I/O Routines sprintf0 and fwrite0
Application Dependent Output Code
Application Independent Part of the Output Code
11.5 Collective I/O in Multiprocessors--MPI-IO
11.5.1 MPI-2 I/O Concepts
11.5.2 MPI-IO Examples
Cyclically Mapped Read from a Striped Disk Array
MPI-IO for Distributed Matrix Multiply
11.6 Conclusions
11.7 Bibliographic Notes
Appendix A: Routines of the MPI Message Passing Library
A. 1 Point-to-Point Communications Routines
A.2 Collective Communications Routines
A.3 MPI Data Types and Constructors
A.4 Communicators, Process Groups, and Topologies
A.5 MPI Environment and Error Handling
A.6 Summary and MPI-2 Extensions
Appendix B: Synchronization Mechanisms
B. 1 Hardware Level Synchronization
B.2 Language Level Synchronization
B.3 Waiting Mechanism
Bibliography
Index
Chapter 1: Parallel Machines and Computations
1.1 The Evolution of Parallel Architectures
1.1.1 Parallelism in Sequential Computers
1.1.2 Vector or SIMD Computers
1.1.3 Multiprocessors or MIMD Computers
1.2 Interconnection Networks
1.3 Application of Architectural Parallelism
1.4 Getting Started in SIMD and MIMD Programming
1.5 Parallelism in Algorithms
1.6 Conclusion
1.7 Bibliographic Notes
Chapter 2: Potential for Parallel Computations
2.1 Parameters Characterizing Algorithm Parallelism
2.2 Prefix Problem
2.3 Parallel Prefix Algorithms
2.3.1 Upper/Lower Parallel Prefix Algorithm
2.3.2 Odd/Even Parallel Prefix Algorithm
2.3.3 Ladner and Fischer''s Parallel Prefix
2.4 Characterizing Algorithm Behavior for Large Problem Size
2.5 Programming Parallel Prefix
2.6 Speedup and Efficiency of Parallel Algorithms
2.7 The Performance Perspective
2.7.1 Factors That Influence Performance
2.7.2 A Simple Performance Model--Amdahl''s Law
2.7.3 Average Execution Rate
2.8 Conclusion
2.9 Bibliographic Notes
Chapter 3: Vector Algorithms and Architectures
3.1 Vector and Matrix Algorithms
3.2 A Vector Architecture---Single Instruction Multiple Data
3.3 An SIMD Instruction Set
3.3.1 Registers and Memories of an SIMD Computer
3.3.2 Vector, Control Unit, and Cooperative Instructions
3.3.3 Data-Dependent Conditional Operations
3.3.4 Vector Length and Strip Mining
3.3.5 Routing Data Among the PEs
3.4 The Prime Memory System
3.5 Use of the PE Index to Solve Storage Layout Problems
3.6 SIMD Language Constructs--Fortran 90
3.6.1 Arrays and Array Sections
3.6.2 Array Assignment and Array Expressions
3.6.3 Fortran 90 Array Intrinsic Functions
3.6.4 Examples of SIMD Operations in Fortran 90
3.7 Pipelined SIMD Vector Computers
3.7.1 Pipelined SIMD Processor Structure
Processor/Memory Interaction
Number and Types of Pipelines
Implementation of Arithmetic
3.7.2 The Memory Interface of a Pipelined SIMD Computer
3.7.3 Performance of Pipelined SIMD Computers
3.8 Vector Architecture Summary
3.9 Bibliographic Notes
Chapter 4: MIMD Computers or Multiprocessors
4.1 Shared Memory and Message-Passing Architectures
4.1.1 Mixed-Type Multiprocessor Architectures
4.1.2 Characteristics of Shared Memory and Message Passing
4.1.3 Switching Topologies for Message Passing Architectures
4.1.4 Direct and Indirect Networks
4.1.5 Classification of Real Systems
4.2 Overview of Shared Memory Multiprocessor Programming
4.2.1 Data Sharing and Process Management
4.2.2 Synchronization
4.2.3 Atomicity and Synchronization
4.2.4 Work Distribution
4.2.5 Many Processes Executing One Program
4.3 Shared Memory Programming Alternatives and Scope
4.3.1 Process Management--Starting, Stopping, and Hierarchy
4.3.2 Data Access by Parallel Processes
4.3.3 Work Distribution
4.3.4 Multiprocessor Synchronization
Atomicity
Hardware and Software Synchronization Mechanisms
Fairness and Mutual Exclusion
4.4 A Shared Memory Multiprocessor Programming Language
4.4.1 The OpenMP Language Extension
Execution Model
Process Control
Parallel Scope of Variables
Work Distribution
Synchronization and Memory Consistency
4.4.2 The OpenMP Fortran Applications Program Interface API
Constructs of the OpenMP Fortran API
4.4.3 OpenMP Fortran Examples and Discussion
4.5 Pipelined MIMD--Multithreading
4.6 Summary and Conclusions
4.7 Bibliographic Notes
Chapter 5: Distributed Memory Multiprocessors
5.1 Distributing Data and Operations Among Processor/Memory Pairs
5.2 Programming with Message Passing
5.2.1 The Communicating Sequential Processes CSP Language
5.2.2 A Distributed Memory Programming Example: Matrix Multiply
5.3 Characterization of Communication
5.3.1 Point-to-Point Communications
5.3.2 Variable Classes in a Distributed Memory Program
5.3.3 High-Level Communication Operations
5.3.4 Distributed Gauss Elimination with High-Level Communications
5.3.5 Process Topology Versus Processor Topology
5.4 The Message Passing Interface, MPI
5.4.1 Basic Concepts in MPI
Communicator Structure
The Envelope
The Data
Point-to-Point Communication Concepts
Collective Communications Concepts
5.4.2 An Example MPI Program--Matrix Multiplication
5.5 Hardware Managed Communication--Distributed Cache
5.5.1 Cache Coherence
5.5.2 Shared Memory Consistency
5.6 Conclusion---Shared Versus Distributed Memory Multiprocessors
5.7 Bibliographic Notes
Chapter 6: Interconneetion Networks
6.1 Network Characteristics
6.2 Permutations
6.3 Static Networks
6.3.1 Mesh
6.3.2 Ring
6.3.3 Tree
6.3.4 Cube Networks
6.3.5 Performance
6.4 Dynamic Networks
6.4.1 Bus
6.4.2 Crossbar
6.4.3 Multistage Interconnection Networks MINs
Benes Network
Butterfly Network
Omega Network
6.4.4 Combining Networks--Mutual Exclusion Free Synchronization
6.4.5 Performance
6.5 Conclusion
6.6 Bibliographic Notes
Chapter 7: Data Dependence and Parallelism
7.1 Discovering Parallel Operations in Sequential Code
7.2 Variables with Complex Names
7.2.1 Nested Loops
7.2.2 Variations on the Array Reference Disambiguation Problem
7.3 Sample Compiler Techniques
7.3.1 Loop Transformations
7.3.2 Loop Restructuring
7.3.3 Loop Replacement Transformations
7.3.4 Anti- and Output Dependence Removal Transformations
7.4 Data Flow Principles
7.4.1 Data Flow Concepts
7.4.2 Graphical Representation of Data Flow Computations
7.4.3 Data Flow Conditionals
7.4.4 Data Flow Iteration
7.4.5 Data Flow Function Application and Recursion
7.4.6 Structured Values in Data Flow--Arrays
7.5 Data Flow Architectures
7.5.1 The MIT Static Data Flow Architecture
7.5.2 Dynamic Data Flow Computers
Manchester Data Flow Computer
The MIT Tagged-Token Data Flow Machine
7.5.3 Issues to Be Addressed by Data Flow Machines
7.6 Systolic Arrays
7.7 Conclusion
7.8 Bibliographic Notes
Chapter 8: Implementing Synchronization and Data Sharing
8.1 The Character of Information Conveyed by Synchronization
8.2 Synchronizing Different Kinds of Cooperative Computations
8.2.1 One Producer with One or More Consumers
8.2.2 Global Reduction
8.2.3 Global Prefix
8.2.4 Cooperative Update of a Partitioned Structure
8.2.5 Managing a Shared Task Set
8.2.6 Cooperative List Manipulation
8.2.7 Parallel Access Queue Using Fetch&add
8.2.8 Histogram--Fine Granularity, Data-Dependent Synchronization
8.3 Waiting Mechanisms
8.3.1 Hardware Waiting
8.3.2 Software Waiting
8.3.3 Multilevel Waiting
8.4 Mutual Exclusion Using Atomic Read and Write
8.5 Proving a Synchronization Implementation Correct
8.5.1 Implementing Produce/Consume Using Locks
8.5.2 Temporal Logic
8.5.3 Proof of Correctness
8.6 Alternative Implementations of Synchronization--Barrier
8.6.1 Features of Barrier Synchronization
8.6.2 Characterization of Barrier Implementations
8.7 Conclusion
8.8 Bibliographic Notes
Chapter 9: Parallel Processor Performance
9.1 Amdahl'' s Law Revisited
9.1.1 The Effect of Work Granularity on Amdahl'' s Law
9.1.2 Least Squares Estimation of Amdahl''s Law Parameters
9.2 Parameterized Execution Time
9.2.1 Pipelined Vector Machine Performance
9.2.2 Pipelined Multiprocessor Performance
Program Sections with Restricted Parallelism
Programs Requiring Critical Sections for Parallel Execution
Granularity Correction to the Critical Section Model
9.2.3 Multiple Pipelined Multiprocessors
9.3 Performance of Barrier Synchronization
9.3.1 Accounting for Barrier Performance
9.3.2 Instrumentation for Barrier Measurement
9.3.3 Examples of Barrier Performance Measurement
9.4 Statistical Models for Static and Dynamic Parallel Loops
9.4.1 Dynamic Scheduling Model
Dynamically Scheduled Iterations of Equal Length
Dynamically Scheduled Iterations of Different Lengths
9.4.2 Static Scheduling Model
Prescheduled Iterations of Equal Length
Prescheduled Iterations of Different Lengths
9.4.3 Comparison with Experimental Results
9.5 Conclusions
9.6 Bibliographic Notes
Chapter 10: Temporal Behavior of Parallel Programs
10.1 Temporal Characterization of Cache Behavior
10.1.1 A Temporal Locality Metric for Cache Behavior
10.1.2 Example Application of the Locality Metric to Bubble Sort
10.2 Read Sharing in Multiprocessors with Distributed Caches
10.2.1 A Simple Example of Read Sharing
10.2.2 The KSR-1 Architecture
10.2.3 Read Multiplicity Metric
10.2.4 Experiments
10.2.5 Programmed Poststore and Prefetch
10.3 Message Waiting in Message Passing Multiprocessors
10.4 Conclusion
10.5 Bibliographic Notes
Chapter 11: Parallel I/O
11.1 The Parallel I/O Problem
11.1.1 Data Dependence and I/O
11.1.2 I/O Format Conversion
11.1.3 Numerical Examples of I/O Latency and Bandwidth
Requirements
11.2 Hardware for Parallel I/O
11.2.1 Control of the Primary Memory Side of the Transfer
11.2.2 I/O Channel Concurrency
11.2.3 Peripheral Device Parallelism
11.3 Parallel Access Disk Arrays--RAID
11.4 Parallel Formatted I/O in Shared Memory Multiprocessors
11.4.1 Parallel Input Using C I/O Routines fread0 and sscanf0
Application Independent Input Operations
Application Dependent Input Operations
11.4.2 Parallel Output Using C I/O Routines sprintf0 and fwrite0
Application Dependent Output Code
Application Independent Part of the Output Code
11.5 Collective I/O in Multiprocessors--MPI-IO
11.5.1 MPI-2 I/O Concepts
11.5.2 MPI-IO Examples
Cyclically Mapped Read from a Striped Disk Array
MPI-IO for Distributed Matrix Multiply
11.6 Conclusions
11.7 Bibliographic Notes
Appendix A: Routines of the MPI Message Passing Library
A. 1 Point-to-Point Communications Routines
A.2 Collective Communications Routines
A.3 MPI Data Types and Constructors
A.4 Communicators, Process Groups, and Topologies
A.5 MPI Environment and Error Handling
A.6 Summary and MPI-2 Extensions
Appendix B: Synchronization Mechanisms
B. 1 Hardware Level Synchronization
B.2 Language Level Synchronization
B.3 Waiting Mechanism
Bibliography
Index
猜您喜欢