並行程式設計原理 hw1
馮浩然 1600013009
1
intro
2
實現
1. 外圍函式
/*
* generate random cuda matrix with "curand.h", and copy back to the host as a normal matrix
* cm represents "cuda matrix", m represents "matrix"
*/void generator(float *cm, float *m)
/** check if the result is correct
* dst represents the tranposed, src represents the previous
*/bool check(float *dst, float *src)
return
true;
}
#include
#include
#include
#include
#include
#include
#include
#include
using
namespace
std;
#pragma comment(lib, "curand.lib")
#define n 1024
#define tile 32
/** generate random cuda matrix with "curand.h", and copy back to the host as a normal matrix
* cm represents "cuda matrix", m represents "matrix"
*/void generator(float *cm, float *m)
/** check if the result is correct
* dst represents the tranposed, src represents the previous
*/bool check(float *dst, float *src)
return
true;
}int main()
2.主體轉置函式
2.1 *****方法
2.2 優化step1
2.3 優化step2
/*
* transpose matrix src, and store the result in matrix dst
* dst represents the tranposed, src represents the previous
* optimized step2: a unit is a 32 * 32 matrix, move by 4 * 1 elements
*/__global__ void matrix_trans_3(float *dst, float *src)
__syncthreads();
i = blockidx.y * tile + threadidx.x;
j = blockidx.x * tile + threadidx.y * 4;
ind = j * n + i;
tile_i = threadidx.x;
tile_j = threadidx.y * 4;
for (int i = 0; i < 4; i++)
}int main()
3
執行及效能
4
特別注釋
**位置為
manycore@master:/home/manycore/users/feng.haoran 中的matrix_trans_1, 2, 3, hw1_1, 2, 3是編譯完成的可執行檔案
(1是*****實現,2是step1優化後,3是step2優化後)
並行程式設計與PLINQ 任務並行
任務並行 在tpl當中還可以使用parallel.invoke方法觸發多個非同步任務,其中 actions 中可以包含多個方法或者委託,paralleloptions用於配置parallel類的操作。public static void invoke action actions public st...
c 並行程式設計
本部落格將看c 並行程式設計的例子 1.執行緒程序原理 執行緒是輕量級的程序,乙個程序可以擁有多個執行緒。編譯多執行緒程式加入 g lphread 2.openmp庫加速 2.1 openmp庫加速配置及hello,world 事實上有個openmp庫,可以實現單台cpu的加速 windows下使用...
並行程式設計 cuda memory
cuda儲存器模型 gpu片內 register,shared memory host 記憶體 host memory,pinned memory.板載視訊記憶體 local memory,constant memory,texture memory,texture memory,global me...