GPU程式設計例項

gpu是多核技術的代表之一，在一塊晶元上整合多個較低功耗的核心，單個核心頻率基本不變，一般在1~3ghz，設計重心轉向到多核的整合技術，gpu是一種特殊的多核處理器。本文在聯想深騰7000g gpu集群上進行實驗，該集群有100個節點，每個節點包含兩個4核cpu（intel xeon），16gb記憶體，其中16個節點配置一塊gpu卡，18個節點配置兩塊gpu卡。

編譯gpu程式：nvcc –o vectoradd vectoradd.cu

執行：為了方便，寫了簡單的shell指令碼，具體內容如下：

if [ -f [email protected] ]; then
rm [email protected]
fiif [ -f [email protected] ]; then
rm [email protected]
fibsub -q c2050 -o [email protected] -e [email protected] ./$@

示例：1. 向量加法

#include#define n 200000
#define m 500
__global__ void kernelvectoradd(int *dev_a,int *dev_b,int *dev_c)
{	int tid=blockidx.x*blockdim.x+threadidx.x;
if(tid>>(dev_a,dev_b,dev_c);
cudamemcpy(c,dev_c,n*sizeof(int),cudamemcpydevicetohost);
cudafree(dev_a);
cudafree(dev_b);
cudafree(dev_c);
for(int i=0;i
比較簡單，看程式就能看明白。
2. 矩陣乘法
#include#include #include #define n 1000
void matrixmul(int *a, int *b, int *c, int width) {
int i, j, k;
for(i=0; i>>(dev_a,dev_b,dev_c,n);
cudathreadsynchronize();
cudamemcpy(c,dev_c,n*n*sizeof(int),cudamemcpydevicetohost);
cudafree(dev_a);
cudafree(dev_b);
cudafree(dev_c);
int m,n;
for(m=0;m
3.實驗結果：
最終的輸出結果會儲存在 *.log下，如果執行過程中出錯，則錯誤資訊儲存在 *.err中，下面是結果截圖：
				GPU程式設計模型
gpu graphical processing unit 是顯示卡內用於圖形處理的器件。和cpu相比，cpu是序列執行，而gpu是多個核並行執行。gpu是乙個高效能的多核處理器，有很高的計算速度和資料吞吐率。在gpu上的運算能獲得相對於cpu而言很高的加速比。第 一 第二代gpu出現的時候，gpu...
				GPU程式設計筆記（2）
2009 02 02 19 52 高階渲染語言基礎 1 hlsl語法與c語法非常類似。2 資料型別 bool int 32位signed half 16位float float 32位float double 64位float 3 變數宣告 與c一樣 4 型別修飾 可以使用const，與c 一樣 r...
				GPU程式設計優化筆記
參考 硬體上，gpu由多個sm steaming multiprocessor 構成，sm有多個warp，warp有多個sp streaming processor 乙個sp對應乙個執行緒。乙個warp中的sp執行相同的指令。block內部可以使用sm提供的shared memory和 syncth...

GPU程式設計例項

GPU程式設計模型

GPU程式設計筆記（2）

GPU程式設計優化筆記

相關推薦