消息首页搜索举报

大规模并行处理器程序设计

26 7.2折 36 八五品

仅1件

湖北武汉

认证卖家担保交易快速发货售后保障

作者[美]柯克著

出版社清华大学出版社

出版时间2010-07

版次1

装帧平装

货号二楼三计算机

上书时间2024-08-15

诚诚书店2010

十三年老店

已实名已认证进店收藏店铺

在售商品暂无
平均发货时间 12小时
好评率暂无

店主推荐
最新上架

著名学者教授孙绵涛信札 ¥800.00

枣阳市水利志（1949--2005） ¥150.00

中华奇石精品（16开精装，有盒套，一版一印，仅1500册近十品 ¥36.00

石鼓文探研【硬精装作者签赠本】 ¥100.00

武汉投资年鉴 2012、2013、2014、2015年【4本合售】 2012-2015 ¥60.00

李氏宗谱青莲堂卷二上下两本 ¥80.00

李氏宗谱青莲堂卷四 ¥50.00

最新草坪技术与管理百科全书【硬精装一二三四 1-4 全套四本】 ¥100.00

中华义门陈氏大成谱：回归庄湖北省华甫公支系宗谱（卷一） ¥100.00

三十六计（1-12册）全12册 ¥68.00

中湘韶山石羊庞氏四修族谱（卷七，卷八，卷九，卷十卷十一）四卷合售 ¥300.00

金庸全集（1,2,3,4）四册全 ¥96.00

雷氏宗谱（全八册） ¥350.00

史记（文白对照本）（精装全四册） ¥80.00

服务管理：运作、战略与信息技术(原书第9版) ¥12.00

Essential Grammar in Use Fourth Edition ¥22.00

第七届聚合物乳液胶粘贴技术与信息交流会论文集 ¥100.00

我是谁：李小龙传 ¥10.00

商品详情

品相描述：八五品

图书标准信息

作者 [美]柯克著
出版社清华大学出版社
出版时间 2010-07
版次 1
ISBN 9787302229735
定价 36.00元
装帧平装
开本 16开
纸张胶版纸
丛书大学计算机教育国外著名教材系列

【内容简介】: 本书介绍了并行程序设计与GPU体系结构的基本概念，并详细探讨了用于构建并行程序的各种技术，用案例演示了并行程序设计的整个开发过程，即从并行计算的思想开始，直到最终实现实际且高效的并行程序。

  本书特点

  介绍了并行计算的思想，使得读者可以把这种问题的思考方式渗透到高性能并行计算中去。

  介绍了CUDA的使用，CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具。

  介绍如何使用CUDA编程模式和OpenCL来获得高性能和高可靠性。
【目录】: Preface

Acknowledgments

Dedication

CHAPTER 1 INTRODUCTION

　1.1　GPUs as Parallel Computers

　1.2　Architecture of a Modern GPU

　1.3　Why More Speed or Parallelism?

　1.4　Parallel Programming Languages and Models

　1.5　Overarching Goals

　1.6　Organization of the Book

CHAPTER 2　HISTORY OF GPU COMPUTING

　2.1　Evolution of Graphics pipelines

　　2.1.1　The Era of Fixed-Function Graphics Pipelines

　　2.1.2　Evolution of Programmable Real-Time Graphics

　　2.1.3　Unified Graphics and Computing Processors

　　2.1.4　GPGPU: An Intermediate Step

　2.2　GPU Computing

　　2.2.1　Scalable GPUs

　　2.2.2　Recent Developments

　2.3　Future Trends

CHAPTER 3　INTRODUCTION TO CUDA

　3.1　Data Parallelism

　3.2　CUDA Program Structure

　3.3　A Matrix-Matrix Multiplication Example

　3.4　Device Memories and Data Transfer

　3.5　Kernel Functions and Threading

　3.6　Summary

　　3.6.1　Function declarations

　　3.6.2　Kernel launch

　　3.6.3　Predefined variables

　　3.6.4　Runtime APl

CHAPTER 4　CUDA THREADS

　4.1　CUDA Thread Organization

　4.2　Using b]ockldx and threadIdx

　4.3　Synchronization and Transparent Scalability

　4.4　Thread Assignment

　4.5　Thread Scheduling and Latency Tolerance

　4.6　Summary

　4.7　Exercises

CHAPTER 5　CUDATM MEMORIES

　5.1　Importance of Memory Access Efficiency

　5.2　CUDA Device Memory Types

　5.3　A Strategy for Reducing Global Memory Traffic

　5.4　Memory as a Limiting Factor to Parallelism

　5.5　Summary

　5.6　Exercises

CHAPTER 6　PERFORMANCE CONSIDERATIONS

　6.1　More on Thread Execution

　6.2　Global Memory Bandwidth

　6.3　Dynamic Partitioning of SM Resources

　6.4　Data Prefetching

　6.5　Instruction Mix

　6.6　Thread Granularity

　6.7　Measured Performance and Summary

　6.8　Exercises

CHAPTER 7　FLOATING POINT CONSIDERATIONS

　7.1　Floating-Point Format

　7.1.1　Normalized Representation of M

　7.1.2　Excess Encoding of E

　7.2　Representable Numbers

　7.3　Special Bit Patterns and Precision

　7.4　Arithmetic Accuracy and Rounding

　7.5　Algorithm Considerations

　7.6　Summary

　7.7　Exercises

CHAPTER 8　APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION

　8.1　Application Background

　8.2　Iterative Reconstruction

　8.3　Computing FHd

　　Step 1. Determine the Kernel Parallelism Structure

　　Step 2. Getting Around the Memory Bandwidth Limitation.

　　Step 3. Using Hardware Trigonometry Functions

　　Step 4. Experimental Performance Tuning

　8.4　Final Evaluation

　8.5　Exercises

CHAPTER 9　APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS

CHAPTER 10　PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING

CHAPTER 11　A BRIEF INTRODUCTION TO OPENCLTM

CHAPTER 12　CONCLUSION AND\\\'FuTuRE OUTLOOK

APPENDIX A　MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE

APPENDIX B　GPU COMPUTE CAPABILITIES

Index

点击展开点击收起

— 没有更多了 —