书籍详情
高性能集群计算:结构与系统(第一卷 英文版)
作者:Rajkumar Buyya著
出版社:人民邮电出版社
出版时间:2002-01-01
ISBN:9787115103468
定价:¥75.00
购买这本书可以去
内容简介
本书是一本覆盖面非常广的专著,内容包含了有关集群计算的体系结构、网络、协议和I/O、进程调度、资源共享和负载平衡,以及目前典型的集群系统剖析。其中每章都是由该研究领域的国际最知名的专家撰写,因而具有非常高的学术价值和学术指导意义。本书聚集了高性能集群计算领域中100多位资深的从业者所做出的贡献。实质上,对该领域中每一个与系统相关的关键问题本书都提供了最新的信息。在高性能并行计算领域中,无论您是一位开发者、研究者、管理员、教师、学生,还是一个管理者,本书都是一本难得的经典书籍。
作者简介
Rajkumar Buyya是澳大利亚墨尔本Monash大学计算机科学与软件工程学院中的一位研究者,曾经是《Parallel and Distributed Computing Practices》杂志高性能集群计算特刊的客座编委,并与他人合作编写了《Mastering C++》和《Microprocessor x86 Programming》这两本书。他目前还担任IEEE计算机协会集群计算工作组的主席。
目录
Ⅰ Requirements and General Issues
1 Cluster computing at a Glance
1.1 Introduction
1.1.1 Eras of Computing
1.2 Scalable Parallel Computer Architectures
1.3 Towards Low Cost Parallel Computing and Motivations
1.4 Windows of Opportunity
1.5 A Cluster Computer and its Architecture
1.6 Clusters Classifications
1.7 Commodity Components for Clusters
1.7.1 Processors
1.7.2 Memory and Cache
1.7.3 Disk and I/O
1.7.4 System Bus
1.7.5 Cluster Interconnects
1.7.6 Operating Systems
1.8 Network Services/Communication SW
1.9 Cluster Middleware and Single System Image(SSI)
1.9.1 Single System Image Levels/Layers
1.9.2 Single System Image Boundaries
1.9.3 Single System Image Benefits
1.9.4 Middleware Design Goals
1.9.5 Key Services of SSI and Availability Infrastructure
1.10 Resource Management and Scheduling(RMS)
1.11 Programming Environments and Tools
1.11.1 Threads
1.11.2 Message Passing Systems(MPI and PVM)
1.11.3 Distributed Shared Memory (DSM) Systems
1.11.4 Parallel Debuggers and Profilers
1.11.5 Performance Analysis Tools
1.11.6 Cluster Administration Tools
1.12 Cluster Applications
1.13 Representative Cluster Systems
1.13.1 The Berkeley Network Of Workstations(NOW) Project
1.13.2 The High Performance Virtual Machine(HPVM) Project
1.13.3 The Beowulf Project
1.13.4 Solaris MC:A High Performance Operating System for Clusters
1.13.5 A Comparison of the Four Cluster Environments
1.14 Cluster of SMPs(CLUMPS)
1.15 Summary and Conclusions
1.15.1 Hardware and Software Trends
1.15.2 Cluster Technology Trends
1.15.3 Future Cluster Technologies
1.15.4 Final Thoughts
1.16 Bibliography
2 Cluster Setup and its Administration
2.1 Introduction
2.2 Setting up the Cluster
2.2.1 Starting from Scratch
2.2.2 Directory Services inside the Cluster
2.2.3 DCE Integration
2.2.4 Global Clock Synchronization
2.2.5 Heterogeneous Clusters
2.2.6 Some Experiences with PoPC Clusters
2.3 Security
2.3.1 Security Policies
2.3.2 Finding the Weakest Point in NOWs and COWs
2.3.3 A Little Help from a Front-end
2.3.4 Security Versus Performance Tradeoffs
2.3.5 Clusters of Clusters
2.4 System Monitoring
2.4.1 Unsuitability of General Purpose Monitoring Tools
2.4.2 Subjects of Monitoring
2.4.3 Self Diagnosis and Automatic Corrective Procedures
2.5 System Tuning
2.5.1 Developing Custom Models for Bottleneck Detection
2.5.2 Focusing on Throughput or Focusing on Latency
2.5.3 I/O Implications
2.5.4 Caching Strategies
2.5.5 Fine-tuning the OS
2.6 Bibliography
3 Constructing Scalable Services
3.1 Introduction
3.2 Environment
3.2.1 Faults, Delays, and Mobility
3.2.2 Scalability Definition and Measurement
3.2.3 Weak Consistency
3.2.4 Assumptions Summary
3.2.5 Model Definition and Requirements
3.3 Resource Sharing
3.3.1 Introduction
3.3.2 Previous Study
3.3.3 Flexible load Sharing Algorithm
3.3.4 Resource Location Study
3.3.5 Algorithm Analysis
3.4 Resource Sharing Enhanced Locality
3.4.1 State Metric
3.4.2 Basic Algorithm Preserving Mutual Internets
3.4.3 Considering Proximity for Improved Performance
3.4.4 Estimating Proximity(Latency)
3.4.5 Simulation Runs
3.4.6 Simulation Results
3.5 Prototype Implementation and Extension
3.5.1 PVM Resource Manager
3.5.2 Resource Manager Extension to Further Enhance Locality
3.5.3 Initial Performance Measurement Results
3.6 Conclusions and Future Study
3.7 Bibliography
4 Dependable Clustered Computing
4.1 Introduction
4.1.1 Structure
4.2 Two Worlds Converge
4.2.1 Dependable Parallel Computing
4.2.2 Mission/Business Critical Computing
4.3 Dependability Concepts
4.3.1 Faults, Error, Failures
4.3.2 Dependability Attributes
4.3.3 Dependability Means
4.4 Cluster Architectures
4.4.1 Share-Nothing Versus Share-Storage
4.4.2 Active/Standby Versus Active/Active
4.4.3 Interconnects
4.5 Detecting and Masking Faults
4.5.1 Self-Testing
4.5.2 Processor, Memory, and Buses
4.5.3 Watchdog Hardware Timers
4.5.4 Loosing the Software Watchdog
4.5.5 Assertions, Consistency Checking, and ABFT
4.6 Recovering from Faults
4.6.1 Checkpointing and Rollback
4.6.2 Transactions
4.6.3 Failover and Failback
4.6.4 Reconfiguration
4.7 The Practice of Dependable Clustered Computing
4.7.1 Microsoft Cluster Server
4.7.2 NCR LifeKeeper
4.7.3 Oracle Fail Safe and Parallel Server
4.8 Bibliography
5 Deploying a High Throughput Computing Cluster
5.1 Introduction
5.2 Condor Overview
5.3 Software Development
5.3.1 Layered Software Architecture
5.3.2 Layered Resource Management Architecture
5.3.3 Protocol Flexibility
5.3.4 Remote File Access
5.3.5 Checkpointing
5.4 System Administration
5.4.1 Access Policies
5.4.2 Reliability
5.4.3 Problem Diagnosis via System Logs
5.4.4 Monitoring and Accounting
5.4.5 Security
5.4.6 Remote Customers
5.5 Summary
5.6 Bibliography
6 Performance Models and Simulation
6.1 Introduction
6.2 New Performance Issue
6.2.1 Profit-Effective Parallel Computing
6.2.2 Impact of Heterogeneity and Nondedication
6.2.3 Communication Interactions
6.3 A Cost Model for Effective Parallel Computing
6.3.1 The Memory Hierarchy
6.3.2 Parallel Program Structures
6.3.3 The Cost Model and Memory Access Time Prediction
6.3.4 Validation of the Framework and its Models
6.4 Conclusions
6.5 Bibliography
7 Metacomputing: Harnessing Informal Supercomputers
7.1 General Introduction
7.1.1 Why Do We Need Metacomputing?
7.1.2 What Is a Metacomputer?
7.1.3 The Parts of a Metacomputer
7.2 The Evolution of Metacomputing
7.2.1 Introduction
7.2.2 Some Early Examples
7.3 Metacomputer Design Objectives and Issues
7.3.1 General Principles
7.3.2 Underlying Hardware and Software Infrastructure
7.3.3 Middleware-The Metacomputing Environment
7.4 Metacomputing Projects
7.4.1 Introduction
7.4.2 Globus
7.4.3 Legion
7.4.4 WebFlow
7.5 Emerging Metacomputing Environments
7.5.1 Introduction
7.5.2 Summary
7.6 Summary and Conclusions
7.6.1 Introduction
7.6.2 Summary of the Reviewed Metacomputing Environments
7.6.3 Some Observations
7.6.4 Metacomputing Trends
7.6.5 The Impact of Metacomputing
7.7 Bibliography
8 Specifying Resources and Services in Metacomputing Systems
8.1 The Need for Resource Description Tools
8.2 Schemes for Specifying Hardware and Software Resources
8.2.1 Resource Specification in Local HPC Systems
8.2.2 Resource Specification in Distributed Client-Server Systems
8.2.3 The Metacomputing Directory Service(MDS)
8.2.4 The Resource Description Language(RDL)
8.3 Resource and Service Description(RSD)
8.3.1 Requirements
8.3.2 Architecture
8.3.3 Graphical Interface
8.3.4 Language Interface
8.3.5 Internal Data Representation
8.3.6 Implementation
8.4 Summary
8.5 Bibliography
Ⅱ Networking, Protocols, and I/O
9 High Speed Networks
9.1 Introduction
9.1.1 Choice of High Speed Networks
9.1.2 Evolution in Interconnect Trends
9.2 Design Issues
9.2.1 Goals
9.2.2 General Architecture
9.2.3 Design Details
9.3 Fast Ethernet
9.3.1 Fast Ethernet Migration
9.4 High Performance Parallel Interface(HiPPI)
9.4.1 HiPPI-SC(Switch Control)
9.4.2 Serial HiPPI
9.4.3 High Speed SONET Extensions
9.4.4 HiPPI Connection Management
9.4.5 HiPPI Interfaces
9.4.6 Array System:The HiPPI Interconnect
9.5 Asynchronous Transfer Mode(ATM)
9.5.1 Concepts
9.5.2 ATM Adapter
9.5.3 ATM API Basics
9.5.4 Performance Evaluation of ATM
9.5.5 Issues in Distributed Networks for ATM Networks
9.6 Scalable Coherent Interface(SCI)
9.6.1 Data Transfer via SCI
9.6.2 Advantages of SCI
9.7 ServerNet
9.7.1 Scalability and Reliability as Main Goals
9.7.2 Driver and Management Software
9.7.3 Remarks
9.8 Myrinet
9.8.1 Fitting Everybodys Needs
9.8.2 Software and Performance
9.8.3 Remarks
9.9 Memory Channel
9.9.1 Bringing together Simplicity and Performance
9.9.2 Software and Performance
9.9.3 Remarks
9.10 Synfinity
9.10.1 Pushing Networking to the Technological Limits
9.10.2 Remarks
9.11 Bibliography
10 Lightweight Messaging Systems
10.1 Introduction
10.2 Latency/Bandwidth Evaluation of Communication Performance
10.3 Traditional Communication Mechanisms for Clusters
10.3.1 TCP, UDP, IP, and Sockets
10.3.2 RPC
10.3.3 MPI and PVM
10.3.4 Active Messages
10.4 Lightweight Communication Mechanisms
10.4.1 What We Need for Efficient Cluster Computing
10.4.2 Typical Techniques to Optimize Communication
10.4.3 The Importance of Efficient Collective Communications
10.4.4 A Classification of Lightweight Communication Systems
10.5 Kernel-level Lightweight Communications
10.5.1 Industry-standard API Systems
10.5.2 Best-Performance Systems
10.6 User-level Lightweight Communications
10.6.1 BIP
10.6.2 Fast Messages
10.6.3 Hewlett-Packard Active Messages(HPAM)
10.6.4 U-Net for ATM
10.6.5 Virtual Interface Architecture(VIA)
10.7 A Comparison Among Message Passing Systems
10.7.1 Clusters Versus MPPs
10.7.2 Standard Interface Approach Versus Other Approaches
10.7.3 User-level Versus Kernel-level
10.8 Bibliography
11 Active Messages
11.1 Introduction
11.2 Requirements
11.2.1 Top-down Requirement
11.2.2 Bottom-up Requirement
11.2.3 Architecture and Implementation
11.2.4 Summary
11.3 AM Programming Model
11.3.1 Endpoints and Bundles
11.3.2 Transport Operations
11.3.3 Error Model
11.3.4 Programming Examples
11.4 AM Implementation
11.4.1 Endpoints and Bundles
11.4.2 Transport Operations
11.4.3 NIC Firmware
11.4.4 Message Delivery and Flow Control
11.4.5 Events and Error handling
11.4.6 Virtual Networks
11.5 Analysis
11.5.1 Meeting the Requirements
11.6 Programming Models on AM
11.6.1 Message Passing Interface(MPI)
11.6.2 Fast Sockets
11.7 Future Work
11.7.1 Bandwidth Performance
11.7.2 Flow Control and Error Recovery
11.7.3 Shared Memory Protocol
11.7.4 Endpoint Scheduling
11.7.5 Multidevice Support
11.7.6 Memory Management on NIC
11.8 Bibliography
12 Xpress Transport Protocol
12.1 Network Services for Cluster Computing
12.2 A New Approach
12.3 XTP Functionality
12.3.1 Multicast
12.3.2 Multicast Group Management(MGM)
12.3.3 Priority
12.3.4 Rate and Burst Control
12.3.5 Connection Management
12.3.6 Selectable Error Control
12.3.7 Selectable Flow Control
12.3.8 Selective Retransmission
12.3.9 Selective Acknowledgment
12.3.10 Maximum Transmission Unit(MTU) Detection
12.3.11 Out-of-band Data
12.3.12 Alignment
12.3.13 Traffic Descriptors
12.4 Performance
12.4.1 Throughput
12.4.2 Message Throughput
12.4.3 End-to-end Latency
12.5 Applications
12.5.1 Multicast
12.5.2 Gigabyte Files
12.5.3 High Performance
12.5.4 Image Distribution
12.5.5 Digital Telephone
12.5.6 Video File Server
12.5.7 Priority Support
12.5.8 Real-time Systems
12.5.9 Interoperability
12.6 XTP's Future in Cluster Computing
12.7 Bibliography
13 Congestion Management in ATM Clusters
13.1 Introduction to ATM Networking
13.1.1 Integrated Broadband Solution
13.1.2 Virtual Connection Setup
13.1.3 Quality of Service
13.1.4 Traffic and Congestion Management
13.2 Existing Methodologies
13.3 Simulation of ATM on LAN
13.3.1 Different Types of Traffic
13.3.2 Analysis of Results
13.3.3 Heterogeneous Traffic Condition
13.3.4 Summary
13.4 Migration Planning
13.4.1 LAN to Directed Graph
13.4.2 A Congestion Locator Algorithm
13.4.3 An Illustration
13.5 Conclusions
13.6 Bibliography
14 Load Balancing Over Networks
14.1 Introduction
14.2 Methods
14.2.1 Factors Affecting Balancing Methods
14.2.2 Simple Balancing Methods
14.2.3 Advanced Balancing Methods
14.3 Common Errors
14.3.1 Overflow
14.3.2 Underflow
14.3.3 Routing Errors
14.3.4 Induced Network Errors
14.4 Practical Implementations
14.4.1 General Network Traffic Implementations
14.4.2 Web-specific Implementations
14.4.3 Other Application Specific Implementations
14.5 Summary
14.6 Bibliography
15 Multiple Path Communication
15.1 Introduction
15.2 Heterogeneity in Networks and Applications
15.2.1 Varieties of Communication Networks
15.2.2 Exploiting Multiple Communication Paths
15.3 Multiple Path Communication
15.3.1 Performance-Based Path Selection
15.3.2 Performance-Based Path Aggregation
15.3.3 PBPD Library
15.4 Case Study
15.4.1 Multiple Path Characteristics
15.4.2 Communication Patterns of Parallel Applications
15.4.3 Experiments and Results
15.5 Summary and Conclusion
15.6 Bibliography
16 Network RAM
16.1 Introduction
16.1.1 Issues in Using Network RAM
16.2 Remote Memory Paging
16.2.1 Implementation Alternatives
16.2.2 Reliability
16.2.3 Remote Paging Prototypes
16.3 Network Memory File Systems
16.3.1 Using Network Memory as a File Cache
16.3.2 Network RamDisks
16.4 Applications of Network RAM in Databases
16.4.1 Transaction-Based Systems
16.5 Summary
16.5.1 Conclusions
16.5.2 Future Trends
16.6 Bibliography
17 Distributed Shared Memory
17.1 Introduction
17.2 Data Consistency
17.2.1 Data Location
17.2.2 Write Synchronization
17.2.3 Double Faulting
17.2.4 Relaxing Consistency
17.2.5 Application/Type-specific Consistency
17.3 Network Performance Issues
17.4 Other Design Issues
17.4.1 Synchronization
17.4.2 Granularity
17.4.3 Address-Space Structure
17.4.4 Replacement Policy and Secondary Storage
17.4.5 Heterogeneity Support
17.4.6 Fault Tolerance
17.4.7 Memory Allocation
17.4.8 Data Persistence
17.5 Conclusions
17.6 Bibliography
18 Parallel I/O for Clusters: Methodologies and Systems
18.1 Introduction
18.2 A Case for Cluster I/O Systems
18.3 The Parallel I/O Problem
18.3.1 Regular Problems
18.3.2 Irregular Problems
18.3.3 Out-of-Core Computation
18.4 File Abstraction
18.5 Methods and Techniques
18.5.1 Two-Phase Method
18.5.2 Disk-Directed I/O
18.5.3 Two-Phase Data Administration
18.6 Architectures and Systems
18.6.1 Runtime Modules and Libraries
18.6.2 MPI-IO
18.6.3 Parallel File Systems
18.6.4 Parallel Database Systems
18.7 The ViPIOS Approach
18.7.1 Design Principles
18.7.2 System Architecture
18.7.3 Data Administration
18.8 Conclusions and Future Trends
18.9 Bibliography
19 Software RAID and Parallel Filesystems
19.1 Introduction
19.1.1 I/O Problems
19.1.2 Using Clusters to Increase the I/O Performance
19.2 Physical Placement of Data
19.2.1 Increasing the Visibility of the Filesystems
19.2.2 Data Striping
19.2.3 Log-Structured Filesystems
19.2.4 Solving the Small-Write Problem
19.2.5 Network-Attached Devices
19.3 Caching
19.3.1 Multilevel Caching
19.3.2 Cache-Coherence Problems
19.3.3 Cooperative Caching
19.4 Prefetching
19.4.1 Parallel Prefetching
19.4.2 Transparent Informed Prefetching
19.4.3 Scheduling Parallel Prefetching and Caching
19.5 Interfaces
19.5.1 Traditional Interface
19.5.2 Shared File Pointers
19.5.3 Access Methods
19.5.4 Data Distribution
19.5.5 Collective I/O
19.5.6 Extensible Systems
19.6 Bibliography
Ⅲ Process Scheduling, Load Sharing, and Balancing
20 Job and Resource Management Systems
20.1 Motivation and Historical Evolution
20.1.1 A Need for Job Management
20.1.2 Job Management Systems on Workstation Clusters
20.1.3 Primary Application Fields
20.2 Components and Architecture of Job- and Resource Management Systems
20.2.1 Prerequisites
20.2.2 User Interface
20.2.3 Administrative Environment
20.2.4 Managed Objects: Queues, Hosts, Resource, Job, Policies
20.2.5 A Modern Architectural Approach
20.3 The State-of-the-Art in RMS
20.3.1 Automated Policy Based Resource Management
20.3.2 The State-of-the-Art of Job Support
20.4 Challenges for the Present and the Future
20.4.1 Open Interfaces
20.4.2 Resource Control and Mainframe-Like Batch Processing
20.4.3 Heterogeneous Parallel Environments
20.4.4 RMS in a WAN Environment
20.5 Summary
20.6 Bibliography
21 Scheduling Parallel Jobs on Clusters
21.1 Introduction
21.2 Background
21.2.1 Cluster Usage Modes
21.2.2 Job Types and Requirements
21.3 Rigid Jobs with Process Migration
21.3.1 Process Migration
21.3.2 Case Study: PVM with Migration
21.3.3 Case Study: MOSIX
21.4 Malleable Jobs with Dynamic Parallelism
21.4.1 Identifying Idle Workstations
21.4.2 Case Study: Condor and WoDi
21.4.3 Case Study: Piranha and Linda
21.5 Communication-Based Coscheduling
21.5.1 Demand-Based Coscheduling
21.5.2 Implicit Coscheduling
21.6 Batch Scheduling
21.6.1 Admission Controls
21.6.2 Case Study: Utopia/LSF
21.7 Summary
21.8 Bibliography
22 Load Sharing and Fault Tolerance Manager
22.1 Introduction
22.2 Load Sharing in Cluster Computing
22.3 Fault Tolerance by Means of Checkpointing
22.3.1 Checkpointing a Single Process
22.3.2 Checkpointing of Communicating Processes
22.4 Integration of Load Sharing and Fault Tolerance
22.4.1 Environment and Architecture
22.4.2 Process Allocation
22.4.3 Failure Management
22.4.4 Performance Study
22.5 Related Works
22.6 Conclusion
22.7 Bibliography
23 Parallel Program Scheduling Techniques
23.1 Introduction
23.2 The Scheduling Problem for Network Computing Environments
23.2.1 The DAG Model
23.2.2 Generation of a DAG
23.2.3 The Cluster Model
23.2.4 NP-Completeness of the DAG Scheduling Problem
23.2.5 Basic Techniques in DAG Scheduling
23.3 Scheduling Tasks to Machines Connected via Fast Networks
23.3.1 The ISH Algorithm
23.3.2 The MCP Algorithm
23.3.3 The ETF Algorithm
23.3.4 Analytical Performance Bounds
23.4 Scheduling Tasks to Arbitrary Processors Networks
23.4.1 The Message Routing Issue
23.4.2 The MH Algorithm
23.4.3 The DLS Algorithm
23.4.4 The BSA Algorithm
23.5 CASCH: A Parallelization and Scheduling Tool
23.5.1 User Programs
23.5.2 Lexical Analyzer and Parser
23.5.3 Weight Estimator
23.5.4 DAG Generation
23.5.5 Scheduling/Mapping Tool
23.5.6 Communication Inserter
23.5.7 Code Generation
23.5.8 Graphical User Interface
23.6 Summary and Concluding Remarks
23.7 Bibliography
24 Customized Dynamic Load Balancing
24.1 Introduction
24.1.1 Related Work
24.2 Dynamic Load Balancing(DLB)
24.2.1 Load Balancing Strategies
24.2.2 Discussion
24.3 DLB Modeling and Decision Process
24.3.1 Modeling Parameters
24.3.2 Modeling the Strategies-Total Cost Derivation
24.3.3 Decision Process-Using the Model
24.4 Compiler and Runtime Systems
24.4.1 Runtime System
24.4.2 Code Generation
24.5 Experimental Results
24.5.1 Network Characterization
24.5.2 MXM: Matrix Multiplication
24.5.3 TRFD
24.5.4 AC: Adjoint Convolution
24.5.5 Modeling Results: MXM, TRFD, and AC
24.6 Summary
24.7 Bibliography
25 Mapping and Scheduling on Heterogeneous Systems
25.1 Introduction
25.2 Mapping and Scheduling
25.2.1 The Mapping Problem
25.2.2 The Scheduling Problem
25.3 The Issues of Task Granularity and Partitioning
25.3.1 Two Strategies of Scheduling in Clustering
25.3.2 Some Effective Partitioning Algorithms
25.4 Static Scheduling and Dynamic Scheduling
25.4.1 Related Work in Heterogeneous Systems
25.4.2 Future Work Relating to Heterogeneous Systems
25.5 Load Balancing Issues
25.5.1 Load Balancing in Homogeneous Environment
25.5.2 Heterogeneous Computing Environment(HCE)
25.6 Summary
25.7 Bibliography
Ⅳ Representative Cluster Systems
26 Beowulf
26.1 Searching for Beowulf
26.1.1 The Beowulf Model: Satisfying a Critical Need
26.1.2 A Short History of Large Achievements
26.1.3 Application Domains
26.1.4 Other Source of Information
26.2 System Architecture Evolution
26.2.1 The Processor
26.2.2 The Network
26.2.3 Putting It All Together
26.3 Prevailing Software Practices
26.3.1 Small Scale Software Provides Big Scale Performance
26.3.2 The Linux Operating System
26.4 Next Steps in Beowulf-Class Computing
26.4.1 Grendel-Towards Uniform System Software
26.4.2 Large System Scaling
26.4.3 Data-Intensive computation
26.5 Beowulf in the 21 st Century
26.5.1 Processing Nodes
26.5.2 Storage
26.5.3 System Area Networks
26.5.4 The $1M TFLOPS Beowulf
26.5.5 The Software Barrier
26.5.6 Not the Final Word
26.6 Bibliography
27 RWC PC Cluster Ⅱ and Score Cluster System Software
27.1 Introduction
27.2 Building a Compact PC Cluster Using Commodity Hardware
27.2.1 Overview
27.2.2 Networks
27.2.3 Processor Card
27.2.4 Chassis Design
27.2.5 Cooling System
27.3 SCore parallel Operating System Environment on Top of Unix
27.3.1 Software Overview
27.3.2 PM High Performance Communication Driver and Library
27.3.3 MPI on PM
27.3.4 SCore-D parallel Operating System
27.3.5 MPC++ Multi-Thread Template Library
27.4 Performance Evaluation
27.4.1 PM Basic Performance
27.4.2 MPI Basic Performance
27.4.3 NAS Parallel Benchmarks Result
27.4.4 SCore-D Gang Scheduling Overhead
27.5 Concluding Remarks
27.6 Bibliography
28 COMPaS: A Pentium Pro PC-Based SMP Cluster
28.1 COMPaS: A Pentium Pro PC-Based SMP Cluster
28.2 Building PC-Based SMP Cluster
28.2.1 Pentium Pro PC-Based SMP Node
28.2.2 Inter-Node Communication on 100Baes-T Ethernet
28.2.3 NICAM: User-Level Communication Layer of Myrinet for SMP Cluster
28.3 Programming for SMP Cluster
28.3.1 All Message Passing Programming
28.3.2 All Shared Memory Programming
28.3.3 Hybrid Shared Memory/Distributed Memory Programming
28.4 Case Studies-Benchmarks Results on COMPaS
28.4.1 Explicit Laplace Equation Solver
28.4.2 Matrix-Matrix Multiplication
28.4.3 Sparse Matrix Conjugate Gradient Kernel
28.4.4 Radix Sort
28.5 Guidelines for Programming in PC-Based SMP Cluster
28.6 Summary
28.7 Bibliography
29 The NanOS Cluster Operating System
29.1 Introduction
29.1.1 Design Objectives
29.2 Architecture Overview
29.2.1 NanOS Microkernel
29.2.2 Membership Service
29.2.3 Object Request Broker
29.2.4 HIDRA Support for High Availability
29.3 NanOS
29.3.1 An Object-Oriented Microkernel
29.3.2 Microkernel Architecture
29.4 MCMM
29.4.1 MCMM Protocol
29.5 HIDRA
29.5.1 Overview of HIDRA
29.5.2 Replication Models
29.5.3 Object Request Broker
29.5.4 Coordinator-Cohort Replication Model
29.6 Summary
29.7 Bibliography
30 BSP-Based Adaptive Parallel Processing
30.1 Introduction
30.2 The Bulk-Synchronous Parallel Model
30.2.1 Cluster of Workstations as a BSP Computer
30.2.2 Program Reorganization for Parallel Computing on Dedicated Cluster: Plasma Simulation
30.3 Parallel Computing on Nondedicated Workstations
30.3.1 Nondedicated Workstations as Transient Processors
30.3.2 Approaches to Adaptive Parallelism
30.4 Adaptive Parallelism in the BSP Model
30.4.1 Protocol for Replication and Recovery
30.4.2 Performance of Adaptive Replication
30.5 A Programming Environment for Adaptive BSP
30.5.1 Dynamic Extensions to the Oxford BSP Library
30.5.2 The Replication Layer
30.5.3 The User Layer
30.6 Application of A-BSP to Parallel Computations
30.6.1 Maximum Independent Set
30.6.2 Plasma Simulation
30.6.3 Results
30.7 Application of A-BSP to Nondedicated Workstations
30.8 Conclusions
30.9 Bibliography
31 MARS: An Adaptive Parallel Programming Environment
31.1 Motivation and Goals
31.2 Related Work
31.2.1 Exploiting Idle Time
31.2.2 Adaptive Schedulers
31.3 The Available Capacity of NOWs
31.3.1 Node Idleness
31.3.2 Aggregate Idle Time
31.4 The MARS Approach
31.4.1 MARS Infrastructure
31.4.2 Parallel Programming Methodology
31.4.3 The MARS Scheduler
31.5 Experimental Results
31.5.1 Efficiency and Adaptability
31.5.2 Fault Tolerance and Intrusion
31.6 Conclusion and Future Work
31.7 Bibliography
32 The Gardens Approach to Adaptive Parallel Computing
32.1 Introduction
32.2 Related Work
32.3 Communication
32.3.1 Active Messages
32.3.2 Global Objects
32.3.3 Poll Procedure Annotations
32.4 Adaptation and Tasking
32.4.1 Multitasking
32.4.2 Blocking
32.4.3 Task Migration
32.4.4 Gardens Screen Saver
32.5 Performance Results
32.6 Summary
32.7 Bibliography
33 The ParPar System: A Software MPP
33.1 Introduction
33.2 The ParPar System
33.2.1 Hardware Base
33.2.2 Software Structure
33.2.3 Design Principles
33.2.4 Control Protocols
33.2.5 Data Network
33.3 System Configuration and Control
33.3.1 Dynamic Reconfiguration
33.3.2 Reliability and Availability
33.3.3 The Master Control
33.4 Job Control
33.4.1 Job Initiation
33.4.2 Job Termination
33.4.3 Debugging
33.5 Scheduling
33.5.1 Adaptive Partitioning
33.5.2 Gang Scheduling
33.6 Parallel I/O
33.6.1 Terminal I/O
33.6.2 Parallel Files
33.7 Project Status
33.8 Bibliography
34 Pitt Parallel Computer
34.1 Introduction
34.2 The Operating System
34.2.1 Internode Communication
34.2.2 Typical Usage
34.2.3 A Problem Suite for Research
34.3 The Laplace Problem
34.3.1 A One-Dimensional Example
34.3.2 A two-Dimensional Example
34.4 Technical Description of the Laplace Program
34.5 User Description of the Laplace Operating System
34.6 Linear Simultaneous Equations
34.6.1 A Calculation Example
34.6.2 Technical Description
34.6.3 User Description
34.7 An Example Application
34.8 Summary
34.9 Bibliography
35 The RS/6000 SP System: A Scalable Parallel Cluster
35.1 Dual Personalities
35.2 SP System Architecture
35.3 SP System Structure
35.3.1 SP Communications Services
35.3.2 SP System Management
35.3.3 SP Globalized Resources
35.3.4 SP Availability Services
35.3.5 SP Programming Model and Environment
35.4 Conclusion Remarks
35.5 Bibliography
36 A Scalable and Highly Available Cluster Web Server
36.1 Introduction
36.1.1 The Internet and the Need for Clustered Web Servers
36.1.2 Availability
36.1.3 Scalability
36.2 Web Servers and Dynamic Content
36.2.1 Introduction
36.2.2 Static Files on the Web
36.2.3 Common Gateway Interface
36.2.4 Web Server Application Programming Interfaces
36.2.5 FastCGI
36.2.6 Servlets
36.2.7 Summary
36.3 Fine-Grain Load Balancing
36.3.1 Introduction
36.3.2 Domain Name System(DNS)
36.3.3 Round-Robin DNS
36.3.4 Load Imbalances with Round-Robin DNS
36.3.5 Packet Forwarding for Fine-Grain Load Balancing
36.3.6 Summary
36.4 Shared Filesystems and Scalable I/O
36.4.1 Introduction
36.4.2 Shared Fileservers
36.4.3 Wide Striping
36.4.4 Scalable I/O-Virtual Shared Disk Architecture
36.4.5 Real-Time Support for Multimedia Content
36.4.6 Summary
36.5 Scalable Database Access on the Web
36.5.1 Introduction
36.5.2 On-Line Commerce and Databases
36.5.3 Connection Management for Scalability
36.5.4 Java Database Connectivity(JDBC)
36.5.5 Caching
36.5.6 Parallel Databases
36.5.7 Advanced Metadata Management
36.5.8 Summary
36.6 High Availability
36.6.1 Introduction
36.6.2 High Availability Infrastructure
36.6.3 Web Server and Router Recovery
36.6.4 Filesystem and I/O System Recovery
36.6.5 Database Recovery
36.6.6 Summary
36.7 Conclusions
36.8 Bibliography
Index
1 Cluster computing at a Glance
1.1 Introduction
1.1.1 Eras of Computing
1.2 Scalable Parallel Computer Architectures
1.3 Towards Low Cost Parallel Computing and Motivations
1.4 Windows of Opportunity
1.5 A Cluster Computer and its Architecture
1.6 Clusters Classifications
1.7 Commodity Components for Clusters
1.7.1 Processors
1.7.2 Memory and Cache
1.7.3 Disk and I/O
1.7.4 System Bus
1.7.5 Cluster Interconnects
1.7.6 Operating Systems
1.8 Network Services/Communication SW
1.9 Cluster Middleware and Single System Image(SSI)
1.9.1 Single System Image Levels/Layers
1.9.2 Single System Image Boundaries
1.9.3 Single System Image Benefits
1.9.4 Middleware Design Goals
1.9.5 Key Services of SSI and Availability Infrastructure
1.10 Resource Management and Scheduling(RMS)
1.11 Programming Environments and Tools
1.11.1 Threads
1.11.2 Message Passing Systems(MPI and PVM)
1.11.3 Distributed Shared Memory (DSM) Systems
1.11.4 Parallel Debuggers and Profilers
1.11.5 Performance Analysis Tools
1.11.6 Cluster Administration Tools
1.12 Cluster Applications
1.13 Representative Cluster Systems
1.13.1 The Berkeley Network Of Workstations(NOW) Project
1.13.2 The High Performance Virtual Machine(HPVM) Project
1.13.3 The Beowulf Project
1.13.4 Solaris MC:A High Performance Operating System for Clusters
1.13.5 A Comparison of the Four Cluster Environments
1.14 Cluster of SMPs(CLUMPS)
1.15 Summary and Conclusions
1.15.1 Hardware and Software Trends
1.15.2 Cluster Technology Trends
1.15.3 Future Cluster Technologies
1.15.4 Final Thoughts
1.16 Bibliography
2 Cluster Setup and its Administration
2.1 Introduction
2.2 Setting up the Cluster
2.2.1 Starting from Scratch
2.2.2 Directory Services inside the Cluster
2.2.3 DCE Integration
2.2.4 Global Clock Synchronization
2.2.5 Heterogeneous Clusters
2.2.6 Some Experiences with PoPC Clusters
2.3 Security
2.3.1 Security Policies
2.3.2 Finding the Weakest Point in NOWs and COWs
2.3.3 A Little Help from a Front-end
2.3.4 Security Versus Performance Tradeoffs
2.3.5 Clusters of Clusters
2.4 System Monitoring
2.4.1 Unsuitability of General Purpose Monitoring Tools
2.4.2 Subjects of Monitoring
2.4.3 Self Diagnosis and Automatic Corrective Procedures
2.5 System Tuning
2.5.1 Developing Custom Models for Bottleneck Detection
2.5.2 Focusing on Throughput or Focusing on Latency
2.5.3 I/O Implications
2.5.4 Caching Strategies
2.5.5 Fine-tuning the OS
2.6 Bibliography
3 Constructing Scalable Services
3.1 Introduction
3.2 Environment
3.2.1 Faults, Delays, and Mobility
3.2.2 Scalability Definition and Measurement
3.2.3 Weak Consistency
3.2.4 Assumptions Summary
3.2.5 Model Definition and Requirements
3.3 Resource Sharing
3.3.1 Introduction
3.3.2 Previous Study
3.3.3 Flexible load Sharing Algorithm
3.3.4 Resource Location Study
3.3.5 Algorithm Analysis
3.4 Resource Sharing Enhanced Locality
3.4.1 State Metric
3.4.2 Basic Algorithm Preserving Mutual Internets
3.4.3 Considering Proximity for Improved Performance
3.4.4 Estimating Proximity(Latency)
3.4.5 Simulation Runs
3.4.6 Simulation Results
3.5 Prototype Implementation and Extension
3.5.1 PVM Resource Manager
3.5.2 Resource Manager Extension to Further Enhance Locality
3.5.3 Initial Performance Measurement Results
3.6 Conclusions and Future Study
3.7 Bibliography
4 Dependable Clustered Computing
4.1 Introduction
4.1.1 Structure
4.2 Two Worlds Converge
4.2.1 Dependable Parallel Computing
4.2.2 Mission/Business Critical Computing
4.3 Dependability Concepts
4.3.1 Faults, Error, Failures
4.3.2 Dependability Attributes
4.3.3 Dependability Means
4.4 Cluster Architectures
4.4.1 Share-Nothing Versus Share-Storage
4.4.2 Active/Standby Versus Active/Active
4.4.3 Interconnects
4.5 Detecting and Masking Faults
4.5.1 Self-Testing
4.5.2 Processor, Memory, and Buses
4.5.3 Watchdog Hardware Timers
4.5.4 Loosing the Software Watchdog
4.5.5 Assertions, Consistency Checking, and ABFT
4.6 Recovering from Faults
4.6.1 Checkpointing and Rollback
4.6.2 Transactions
4.6.3 Failover and Failback
4.6.4 Reconfiguration
4.7 The Practice of Dependable Clustered Computing
4.7.1 Microsoft Cluster Server
4.7.2 NCR LifeKeeper
4.7.3 Oracle Fail Safe and Parallel Server
4.8 Bibliography
5 Deploying a High Throughput Computing Cluster
5.1 Introduction
5.2 Condor Overview
5.3 Software Development
5.3.1 Layered Software Architecture
5.3.2 Layered Resource Management Architecture
5.3.3 Protocol Flexibility
5.3.4 Remote File Access
5.3.5 Checkpointing
5.4 System Administration
5.4.1 Access Policies
5.4.2 Reliability
5.4.3 Problem Diagnosis via System Logs
5.4.4 Monitoring and Accounting
5.4.5 Security
5.4.6 Remote Customers
5.5 Summary
5.6 Bibliography
6 Performance Models and Simulation
6.1 Introduction
6.2 New Performance Issue
6.2.1 Profit-Effective Parallel Computing
6.2.2 Impact of Heterogeneity and Nondedication
6.2.3 Communication Interactions
6.3 A Cost Model for Effective Parallel Computing
6.3.1 The Memory Hierarchy
6.3.2 Parallel Program Structures
6.3.3 The Cost Model and Memory Access Time Prediction
6.3.4 Validation of the Framework and its Models
6.4 Conclusions
6.5 Bibliography
7 Metacomputing: Harnessing Informal Supercomputers
7.1 General Introduction
7.1.1 Why Do We Need Metacomputing?
7.1.2 What Is a Metacomputer?
7.1.3 The Parts of a Metacomputer
7.2 The Evolution of Metacomputing
7.2.1 Introduction
7.2.2 Some Early Examples
7.3 Metacomputer Design Objectives and Issues
7.3.1 General Principles
7.3.2 Underlying Hardware and Software Infrastructure
7.3.3 Middleware-The Metacomputing Environment
7.4 Metacomputing Projects
7.4.1 Introduction
7.4.2 Globus
7.4.3 Legion
7.4.4 WebFlow
7.5 Emerging Metacomputing Environments
7.5.1 Introduction
7.5.2 Summary
7.6 Summary and Conclusions
7.6.1 Introduction
7.6.2 Summary of the Reviewed Metacomputing Environments
7.6.3 Some Observations
7.6.4 Metacomputing Trends
7.6.5 The Impact of Metacomputing
7.7 Bibliography
8 Specifying Resources and Services in Metacomputing Systems
8.1 The Need for Resource Description Tools
8.2 Schemes for Specifying Hardware and Software Resources
8.2.1 Resource Specification in Local HPC Systems
8.2.2 Resource Specification in Distributed Client-Server Systems
8.2.3 The Metacomputing Directory Service(MDS)
8.2.4 The Resource Description Language(RDL)
8.3 Resource and Service Description(RSD)
8.3.1 Requirements
8.3.2 Architecture
8.3.3 Graphical Interface
8.3.4 Language Interface
8.3.5 Internal Data Representation
8.3.6 Implementation
8.4 Summary
8.5 Bibliography
Ⅱ Networking, Protocols, and I/O
9 High Speed Networks
9.1 Introduction
9.1.1 Choice of High Speed Networks
9.1.2 Evolution in Interconnect Trends
9.2 Design Issues
9.2.1 Goals
9.2.2 General Architecture
9.2.3 Design Details
9.3 Fast Ethernet
9.3.1 Fast Ethernet Migration
9.4 High Performance Parallel Interface(HiPPI)
9.4.1 HiPPI-SC(Switch Control)
9.4.2 Serial HiPPI
9.4.3 High Speed SONET Extensions
9.4.4 HiPPI Connection Management
9.4.5 HiPPI Interfaces
9.4.6 Array System:The HiPPI Interconnect
9.5 Asynchronous Transfer Mode(ATM)
9.5.1 Concepts
9.5.2 ATM Adapter
9.5.3 ATM API Basics
9.5.4 Performance Evaluation of ATM
9.5.5 Issues in Distributed Networks for ATM Networks
9.6 Scalable Coherent Interface(SCI)
9.6.1 Data Transfer via SCI
9.6.2 Advantages of SCI
9.7 ServerNet
9.7.1 Scalability and Reliability as Main Goals
9.7.2 Driver and Management Software
9.7.3 Remarks
9.8 Myrinet
9.8.1 Fitting Everybodys Needs
9.8.2 Software and Performance
9.8.3 Remarks
9.9 Memory Channel
9.9.1 Bringing together Simplicity and Performance
9.9.2 Software and Performance
9.9.3 Remarks
9.10 Synfinity
9.10.1 Pushing Networking to the Technological Limits
9.10.2 Remarks
9.11 Bibliography
10 Lightweight Messaging Systems
10.1 Introduction
10.2 Latency/Bandwidth Evaluation of Communication Performance
10.3 Traditional Communication Mechanisms for Clusters
10.3.1 TCP, UDP, IP, and Sockets
10.3.2 RPC
10.3.3 MPI and PVM
10.3.4 Active Messages
10.4 Lightweight Communication Mechanisms
10.4.1 What We Need for Efficient Cluster Computing
10.4.2 Typical Techniques to Optimize Communication
10.4.3 The Importance of Efficient Collective Communications
10.4.4 A Classification of Lightweight Communication Systems
10.5 Kernel-level Lightweight Communications
10.5.1 Industry-standard API Systems
10.5.2 Best-Performance Systems
10.6 User-level Lightweight Communications
10.6.1 BIP
10.6.2 Fast Messages
10.6.3 Hewlett-Packard Active Messages(HPAM)
10.6.4 U-Net for ATM
10.6.5 Virtual Interface Architecture(VIA)
10.7 A Comparison Among Message Passing Systems
10.7.1 Clusters Versus MPPs
10.7.2 Standard Interface Approach Versus Other Approaches
10.7.3 User-level Versus Kernel-level
10.8 Bibliography
11 Active Messages
11.1 Introduction
11.2 Requirements
11.2.1 Top-down Requirement
11.2.2 Bottom-up Requirement
11.2.3 Architecture and Implementation
11.2.4 Summary
11.3 AM Programming Model
11.3.1 Endpoints and Bundles
11.3.2 Transport Operations
11.3.3 Error Model
11.3.4 Programming Examples
11.4 AM Implementation
11.4.1 Endpoints and Bundles
11.4.2 Transport Operations
11.4.3 NIC Firmware
11.4.4 Message Delivery and Flow Control
11.4.5 Events and Error handling
11.4.6 Virtual Networks
11.5 Analysis
11.5.1 Meeting the Requirements
11.6 Programming Models on AM
11.6.1 Message Passing Interface(MPI)
11.6.2 Fast Sockets
11.7 Future Work
11.7.1 Bandwidth Performance
11.7.2 Flow Control and Error Recovery
11.7.3 Shared Memory Protocol
11.7.4 Endpoint Scheduling
11.7.5 Multidevice Support
11.7.6 Memory Management on NIC
11.8 Bibliography
12 Xpress Transport Protocol
12.1 Network Services for Cluster Computing
12.2 A New Approach
12.3 XTP Functionality
12.3.1 Multicast
12.3.2 Multicast Group Management(MGM)
12.3.3 Priority
12.3.4 Rate and Burst Control
12.3.5 Connection Management
12.3.6 Selectable Error Control
12.3.7 Selectable Flow Control
12.3.8 Selective Retransmission
12.3.9 Selective Acknowledgment
12.3.10 Maximum Transmission Unit(MTU) Detection
12.3.11 Out-of-band Data
12.3.12 Alignment
12.3.13 Traffic Descriptors
12.4 Performance
12.4.1 Throughput
12.4.2 Message Throughput
12.4.3 End-to-end Latency
12.5 Applications
12.5.1 Multicast
12.5.2 Gigabyte Files
12.5.3 High Performance
12.5.4 Image Distribution
12.5.5 Digital Telephone
12.5.6 Video File Server
12.5.7 Priority Support
12.5.8 Real-time Systems
12.5.9 Interoperability
12.6 XTP's Future in Cluster Computing
12.7 Bibliography
13 Congestion Management in ATM Clusters
13.1 Introduction to ATM Networking
13.1.1 Integrated Broadband Solution
13.1.2 Virtual Connection Setup
13.1.3 Quality of Service
13.1.4 Traffic and Congestion Management
13.2 Existing Methodologies
13.3 Simulation of ATM on LAN
13.3.1 Different Types of Traffic
13.3.2 Analysis of Results
13.3.3 Heterogeneous Traffic Condition
13.3.4 Summary
13.4 Migration Planning
13.4.1 LAN to Directed Graph
13.4.2 A Congestion Locator Algorithm
13.4.3 An Illustration
13.5 Conclusions
13.6 Bibliography
14 Load Balancing Over Networks
14.1 Introduction
14.2 Methods
14.2.1 Factors Affecting Balancing Methods
14.2.2 Simple Balancing Methods
14.2.3 Advanced Balancing Methods
14.3 Common Errors
14.3.1 Overflow
14.3.2 Underflow
14.3.3 Routing Errors
14.3.4 Induced Network Errors
14.4 Practical Implementations
14.4.1 General Network Traffic Implementations
14.4.2 Web-specific Implementations
14.4.3 Other Application Specific Implementations
14.5 Summary
14.6 Bibliography
15 Multiple Path Communication
15.1 Introduction
15.2 Heterogeneity in Networks and Applications
15.2.1 Varieties of Communication Networks
15.2.2 Exploiting Multiple Communication Paths
15.3 Multiple Path Communication
15.3.1 Performance-Based Path Selection
15.3.2 Performance-Based Path Aggregation
15.3.3 PBPD Library
15.4 Case Study
15.4.1 Multiple Path Characteristics
15.4.2 Communication Patterns of Parallel Applications
15.4.3 Experiments and Results
15.5 Summary and Conclusion
15.6 Bibliography
16 Network RAM
16.1 Introduction
16.1.1 Issues in Using Network RAM
16.2 Remote Memory Paging
16.2.1 Implementation Alternatives
16.2.2 Reliability
16.2.3 Remote Paging Prototypes
16.3 Network Memory File Systems
16.3.1 Using Network Memory as a File Cache
16.3.2 Network RamDisks
16.4 Applications of Network RAM in Databases
16.4.1 Transaction-Based Systems
16.5 Summary
16.5.1 Conclusions
16.5.2 Future Trends
16.6 Bibliography
17 Distributed Shared Memory
17.1 Introduction
17.2 Data Consistency
17.2.1 Data Location
17.2.2 Write Synchronization
17.2.3 Double Faulting
17.2.4 Relaxing Consistency
17.2.5 Application/Type-specific Consistency
17.3 Network Performance Issues
17.4 Other Design Issues
17.4.1 Synchronization
17.4.2 Granularity
17.4.3 Address-Space Structure
17.4.4 Replacement Policy and Secondary Storage
17.4.5 Heterogeneity Support
17.4.6 Fault Tolerance
17.4.7 Memory Allocation
17.4.8 Data Persistence
17.5 Conclusions
17.6 Bibliography
18 Parallel I/O for Clusters: Methodologies and Systems
18.1 Introduction
18.2 A Case for Cluster I/O Systems
18.3 The Parallel I/O Problem
18.3.1 Regular Problems
18.3.2 Irregular Problems
18.3.3 Out-of-Core Computation
18.4 File Abstraction
18.5 Methods and Techniques
18.5.1 Two-Phase Method
18.5.2 Disk-Directed I/O
18.5.3 Two-Phase Data Administration
18.6 Architectures and Systems
18.6.1 Runtime Modules and Libraries
18.6.2 MPI-IO
18.6.3 Parallel File Systems
18.6.4 Parallel Database Systems
18.7 The ViPIOS Approach
18.7.1 Design Principles
18.7.2 System Architecture
18.7.3 Data Administration
18.8 Conclusions and Future Trends
18.9 Bibliography
19 Software RAID and Parallel Filesystems
19.1 Introduction
19.1.1 I/O Problems
19.1.2 Using Clusters to Increase the I/O Performance
19.2 Physical Placement of Data
19.2.1 Increasing the Visibility of the Filesystems
19.2.2 Data Striping
19.2.3 Log-Structured Filesystems
19.2.4 Solving the Small-Write Problem
19.2.5 Network-Attached Devices
19.3 Caching
19.3.1 Multilevel Caching
19.3.2 Cache-Coherence Problems
19.3.3 Cooperative Caching
19.4 Prefetching
19.4.1 Parallel Prefetching
19.4.2 Transparent Informed Prefetching
19.4.3 Scheduling Parallel Prefetching and Caching
19.5 Interfaces
19.5.1 Traditional Interface
19.5.2 Shared File Pointers
19.5.3 Access Methods
19.5.4 Data Distribution
19.5.5 Collective I/O
19.5.6 Extensible Systems
19.6 Bibliography
Ⅲ Process Scheduling, Load Sharing, and Balancing
20 Job and Resource Management Systems
20.1 Motivation and Historical Evolution
20.1.1 A Need for Job Management
20.1.2 Job Management Systems on Workstation Clusters
20.1.3 Primary Application Fields
20.2 Components and Architecture of Job- and Resource Management Systems
20.2.1 Prerequisites
20.2.2 User Interface
20.2.3 Administrative Environment
20.2.4 Managed Objects: Queues, Hosts, Resource, Job, Policies
20.2.5 A Modern Architectural Approach
20.3 The State-of-the-Art in RMS
20.3.1 Automated Policy Based Resource Management
20.3.2 The State-of-the-Art of Job Support
20.4 Challenges for the Present and the Future
20.4.1 Open Interfaces
20.4.2 Resource Control and Mainframe-Like Batch Processing
20.4.3 Heterogeneous Parallel Environments
20.4.4 RMS in a WAN Environment
20.5 Summary
20.6 Bibliography
21 Scheduling Parallel Jobs on Clusters
21.1 Introduction
21.2 Background
21.2.1 Cluster Usage Modes
21.2.2 Job Types and Requirements
21.3 Rigid Jobs with Process Migration
21.3.1 Process Migration
21.3.2 Case Study: PVM with Migration
21.3.3 Case Study: MOSIX
21.4 Malleable Jobs with Dynamic Parallelism
21.4.1 Identifying Idle Workstations
21.4.2 Case Study: Condor and WoDi
21.4.3 Case Study: Piranha and Linda
21.5 Communication-Based Coscheduling
21.5.1 Demand-Based Coscheduling
21.5.2 Implicit Coscheduling
21.6 Batch Scheduling
21.6.1 Admission Controls
21.6.2 Case Study: Utopia/LSF
21.7 Summary
21.8 Bibliography
22 Load Sharing and Fault Tolerance Manager
22.1 Introduction
22.2 Load Sharing in Cluster Computing
22.3 Fault Tolerance by Means of Checkpointing
22.3.1 Checkpointing a Single Process
22.3.2 Checkpointing of Communicating Processes
22.4 Integration of Load Sharing and Fault Tolerance
22.4.1 Environment and Architecture
22.4.2 Process Allocation
22.4.3 Failure Management
22.4.4 Performance Study
22.5 Related Works
22.6 Conclusion
22.7 Bibliography
23 Parallel Program Scheduling Techniques
23.1 Introduction
23.2 The Scheduling Problem for Network Computing Environments
23.2.1 The DAG Model
23.2.2 Generation of a DAG
23.2.3 The Cluster Model
23.2.4 NP-Completeness of the DAG Scheduling Problem
23.2.5 Basic Techniques in DAG Scheduling
23.3 Scheduling Tasks to Machines Connected via Fast Networks
23.3.1 The ISH Algorithm
23.3.2 The MCP Algorithm
23.3.3 The ETF Algorithm
23.3.4 Analytical Performance Bounds
23.4 Scheduling Tasks to Arbitrary Processors Networks
23.4.1 The Message Routing Issue
23.4.2 The MH Algorithm
23.4.3 The DLS Algorithm
23.4.4 The BSA Algorithm
23.5 CASCH: A Parallelization and Scheduling Tool
23.5.1 User Programs
23.5.2 Lexical Analyzer and Parser
23.5.3 Weight Estimator
23.5.4 DAG Generation
23.5.5 Scheduling/Mapping Tool
23.5.6 Communication Inserter
23.5.7 Code Generation
23.5.8 Graphical User Interface
23.6 Summary and Concluding Remarks
23.7 Bibliography
24 Customized Dynamic Load Balancing
24.1 Introduction
24.1.1 Related Work
24.2 Dynamic Load Balancing(DLB)
24.2.1 Load Balancing Strategies
24.2.2 Discussion
24.3 DLB Modeling and Decision Process
24.3.1 Modeling Parameters
24.3.2 Modeling the Strategies-Total Cost Derivation
24.3.3 Decision Process-Using the Model
24.4 Compiler and Runtime Systems
24.4.1 Runtime System
24.4.2 Code Generation
24.5 Experimental Results
24.5.1 Network Characterization
24.5.2 MXM: Matrix Multiplication
24.5.3 TRFD
24.5.4 AC: Adjoint Convolution
24.5.5 Modeling Results: MXM, TRFD, and AC
24.6 Summary
24.7 Bibliography
25 Mapping and Scheduling on Heterogeneous Systems
25.1 Introduction
25.2 Mapping and Scheduling
25.2.1 The Mapping Problem
25.2.2 The Scheduling Problem
25.3 The Issues of Task Granularity and Partitioning
25.3.1 Two Strategies of Scheduling in Clustering
25.3.2 Some Effective Partitioning Algorithms
25.4 Static Scheduling and Dynamic Scheduling
25.4.1 Related Work in Heterogeneous Systems
25.4.2 Future Work Relating to Heterogeneous Systems
25.5 Load Balancing Issues
25.5.1 Load Balancing in Homogeneous Environment
25.5.2 Heterogeneous Computing Environment(HCE)
25.6 Summary
25.7 Bibliography
Ⅳ Representative Cluster Systems
26 Beowulf
26.1 Searching for Beowulf
26.1.1 The Beowulf Model: Satisfying a Critical Need
26.1.2 A Short History of Large Achievements
26.1.3 Application Domains
26.1.4 Other Source of Information
26.2 System Architecture Evolution
26.2.1 The Processor
26.2.2 The Network
26.2.3 Putting It All Together
26.3 Prevailing Software Practices
26.3.1 Small Scale Software Provides Big Scale Performance
26.3.2 The Linux Operating System
26.4 Next Steps in Beowulf-Class Computing
26.4.1 Grendel-Towards Uniform System Software
26.4.2 Large System Scaling
26.4.3 Data-Intensive computation
26.5 Beowulf in the 21 st Century
26.5.1 Processing Nodes
26.5.2 Storage
26.5.3 System Area Networks
26.5.4 The $1M TFLOPS Beowulf
26.5.5 The Software Barrier
26.5.6 Not the Final Word
26.6 Bibliography
27 RWC PC Cluster Ⅱ and Score Cluster System Software
27.1 Introduction
27.2 Building a Compact PC Cluster Using Commodity Hardware
27.2.1 Overview
27.2.2 Networks
27.2.3 Processor Card
27.2.4 Chassis Design
27.2.5 Cooling System
27.3 SCore parallel Operating System Environment on Top of Unix
27.3.1 Software Overview
27.3.2 PM High Performance Communication Driver and Library
27.3.3 MPI on PM
27.3.4 SCore-D parallel Operating System
27.3.5 MPC++ Multi-Thread Template Library
27.4 Performance Evaluation
27.4.1 PM Basic Performance
27.4.2 MPI Basic Performance
27.4.3 NAS Parallel Benchmarks Result
27.4.4 SCore-D Gang Scheduling Overhead
27.5 Concluding Remarks
27.6 Bibliography
28 COMPaS: A Pentium Pro PC-Based SMP Cluster
28.1 COMPaS: A Pentium Pro PC-Based SMP Cluster
28.2 Building PC-Based SMP Cluster
28.2.1 Pentium Pro PC-Based SMP Node
28.2.2 Inter-Node Communication on 100Baes-T Ethernet
28.2.3 NICAM: User-Level Communication Layer of Myrinet for SMP Cluster
28.3 Programming for SMP Cluster
28.3.1 All Message Passing Programming
28.3.2 All Shared Memory Programming
28.3.3 Hybrid Shared Memory/Distributed Memory Programming
28.4 Case Studies-Benchmarks Results on COMPaS
28.4.1 Explicit Laplace Equation Solver
28.4.2 Matrix-Matrix Multiplication
28.4.3 Sparse Matrix Conjugate Gradient Kernel
28.4.4 Radix Sort
28.5 Guidelines for Programming in PC-Based SMP Cluster
28.6 Summary
28.7 Bibliography
29 The NanOS Cluster Operating System
29.1 Introduction
29.1.1 Design Objectives
29.2 Architecture Overview
29.2.1 NanOS Microkernel
29.2.2 Membership Service
29.2.3 Object Request Broker
29.2.4 HIDRA Support for High Availability
29.3 NanOS
29.3.1 An Object-Oriented Microkernel
29.3.2 Microkernel Architecture
29.4 MCMM
29.4.1 MCMM Protocol
29.5 HIDRA
29.5.1 Overview of HIDRA
29.5.2 Replication Models
29.5.3 Object Request Broker
29.5.4 Coordinator-Cohort Replication Model
29.6 Summary
29.7 Bibliography
30 BSP-Based Adaptive Parallel Processing
30.1 Introduction
30.2 The Bulk-Synchronous Parallel Model
30.2.1 Cluster of Workstations as a BSP Computer
30.2.2 Program Reorganization for Parallel Computing on Dedicated Cluster: Plasma Simulation
30.3 Parallel Computing on Nondedicated Workstations
30.3.1 Nondedicated Workstations as Transient Processors
30.3.2 Approaches to Adaptive Parallelism
30.4 Adaptive Parallelism in the BSP Model
30.4.1 Protocol for Replication and Recovery
30.4.2 Performance of Adaptive Replication
30.5 A Programming Environment for Adaptive BSP
30.5.1 Dynamic Extensions to the Oxford BSP Library
30.5.2 The Replication Layer
30.5.3 The User Layer
30.6 Application of A-BSP to Parallel Computations
30.6.1 Maximum Independent Set
30.6.2 Plasma Simulation
30.6.3 Results
30.7 Application of A-BSP to Nondedicated Workstations
30.8 Conclusions
30.9 Bibliography
31 MARS: An Adaptive Parallel Programming Environment
31.1 Motivation and Goals
31.2 Related Work
31.2.1 Exploiting Idle Time
31.2.2 Adaptive Schedulers
31.3 The Available Capacity of NOWs
31.3.1 Node Idleness
31.3.2 Aggregate Idle Time
31.4 The MARS Approach
31.4.1 MARS Infrastructure
31.4.2 Parallel Programming Methodology
31.4.3 The MARS Scheduler
31.5 Experimental Results
31.5.1 Efficiency and Adaptability
31.5.2 Fault Tolerance and Intrusion
31.6 Conclusion and Future Work
31.7 Bibliography
32 The Gardens Approach to Adaptive Parallel Computing
32.1 Introduction
32.2 Related Work
32.3 Communication
32.3.1 Active Messages
32.3.2 Global Objects
32.3.3 Poll Procedure Annotations
32.4 Adaptation and Tasking
32.4.1 Multitasking
32.4.2 Blocking
32.4.3 Task Migration
32.4.4 Gardens Screen Saver
32.5 Performance Results
32.6 Summary
32.7 Bibliography
33 The ParPar System: A Software MPP
33.1 Introduction
33.2 The ParPar System
33.2.1 Hardware Base
33.2.2 Software Structure
33.2.3 Design Principles
33.2.4 Control Protocols
33.2.5 Data Network
33.3 System Configuration and Control
33.3.1 Dynamic Reconfiguration
33.3.2 Reliability and Availability
33.3.3 The Master Control
33.4 Job Control
33.4.1 Job Initiation
33.4.2 Job Termination
33.4.3 Debugging
33.5 Scheduling
33.5.1 Adaptive Partitioning
33.5.2 Gang Scheduling
33.6 Parallel I/O
33.6.1 Terminal I/O
33.6.2 Parallel Files
33.7 Project Status
33.8 Bibliography
34 Pitt Parallel Computer
34.1 Introduction
34.2 The Operating System
34.2.1 Internode Communication
34.2.2 Typical Usage
34.2.3 A Problem Suite for Research
34.3 The Laplace Problem
34.3.1 A One-Dimensional Example
34.3.2 A two-Dimensional Example
34.4 Technical Description of the Laplace Program
34.5 User Description of the Laplace Operating System
34.6 Linear Simultaneous Equations
34.6.1 A Calculation Example
34.6.2 Technical Description
34.6.3 User Description
34.7 An Example Application
34.8 Summary
34.9 Bibliography
35 The RS/6000 SP System: A Scalable Parallel Cluster
35.1 Dual Personalities
35.2 SP System Architecture
35.3 SP System Structure
35.3.1 SP Communications Services
35.3.2 SP System Management
35.3.3 SP Globalized Resources
35.3.4 SP Availability Services
35.3.5 SP Programming Model and Environment
35.4 Conclusion Remarks
35.5 Bibliography
36 A Scalable and Highly Available Cluster Web Server
36.1 Introduction
36.1.1 The Internet and the Need for Clustered Web Servers
36.1.2 Availability
36.1.3 Scalability
36.2 Web Servers and Dynamic Content
36.2.1 Introduction
36.2.2 Static Files on the Web
36.2.3 Common Gateway Interface
36.2.4 Web Server Application Programming Interfaces
36.2.5 FastCGI
36.2.6 Servlets
36.2.7 Summary
36.3 Fine-Grain Load Balancing
36.3.1 Introduction
36.3.2 Domain Name System(DNS)
36.3.3 Round-Robin DNS
36.3.4 Load Imbalances with Round-Robin DNS
36.3.5 Packet Forwarding for Fine-Grain Load Balancing
36.3.6 Summary
36.4 Shared Filesystems and Scalable I/O
36.4.1 Introduction
36.4.2 Shared Fileservers
36.4.3 Wide Striping
36.4.4 Scalable I/O-Virtual Shared Disk Architecture
36.4.5 Real-Time Support for Multimedia Content
36.4.6 Summary
36.5 Scalable Database Access on the Web
36.5.1 Introduction
36.5.2 On-Line Commerce and Databases
36.5.3 Connection Management for Scalability
36.5.4 Java Database Connectivity(JDBC)
36.5.5 Caching
36.5.6 Parallel Databases
36.5.7 Advanced Metadata Management
36.5.8 Summary
36.6 High Availability
36.6.1 Introduction
36.6.2 High Availability Infrastructure
36.6.3 Web Server and Router Recovery
36.6.4 Filesystem and I/O System Recovery
36.6.5 Database Recovery
36.6.6 Summary
36.7 Conclusions
36.8 Bibliography
Index
猜您喜欢