目录 Preface Acknowledgements CHAPTER.1 Introduction 1.1 Heterogeneous Parallel Computing 1.2 Architecture of a Modern GPU 1.3 Why More Speed or Parallelism 1.4 Speeding Up Real Applications 1.5 Challenges in Parallel Programming 1.6 Parallel Programming Languages and Models 1.7 Overarching Goals 1.8 Organization of the Book References CHAPTER.2 Data Parallel Computing 2.1 Data Parallelism 2.2 CUDA C Program Structure 2.3 A Vector Addition Kernel 2.4 Device Global Memory and Data Transfer 2.5 Kernel Functions and Threading 2.6 Kernel Launch 2.7 Summary Function Declarations Kernel Launch Built-in (Predefined) Variables Run-time API 2.8 Exercises References CHAPTER.3 Scalable Parallel Execution 3.1 CUDA Thread Organization 3.2 Mapping Threads to Multidimensional Data 3.3 Image Blur: A More Complex Kernel 3.4 Synchronization and Transparent Scalability 3.5 Resource Assignment 3.6 Querying Device Properties 3.7 Thread Scheduling and Latency Tolerance 3.8 Summary 3.9 Exercises CHAPTER.4 Memory and Data Locality 4.1 Importance of Memory Access Efficiency 4.2 Matrix Multiplication 4.3 CUDA Memory Types 4.4 Tiling for Reduced Memory Traffic 4.5 A Tiled Matrix Multiplication Kernel 4.6 Boundary Checks 4.7 Memory as a Limiting Factor to Parallelism 4.8 Summary 4.9 Exercises …… CHAPTER 17 Parallel Programming and ComputationalThinking 17.1 Goals of Parallel Computing 17.2 Problem Decomposition
以下为对购买帮助不大的评价