CUDA學習筆記05 Mem申請的所有方式

昨天小組討論，有人一直堅持讓我申請制定位置的記憶體，我真的是一臉懵逼，作為程式設計又5年的我從來沒想過這個問題，就好像c++中強制你申請l1 cache快取那樣，我內心中有萬只草泥馬路過：「臣妾做不到啊！！！」，為了給給所有人講清楚cuda程式設計是什麼，接下來的幾篇部落格主要就是講解cuda的硬體和軟體結構，首先應該寫硬體部分，但是我這裡想先寫cuda的記憶體申請部分。

cudamallochost函式

__host__cudaerror_t cudamallochost (void **ptr, size_t size) allocates page-locked memory on the host. 申請的是page-locked的host記憶體，就是說申請的記憶體所在的page永遠待在記憶體，不會被交換出去。 parameters ptr - pointer to allocated host memory size - requested allocation size in bytes

returns - cudasuccess, cudaerrormemoryallocation

cudamalloc 函式

__host____device__cudaerror_t cudamalloc (void **devptr, size_t size)
allocate memory on the device.
就是申請的記憶體。和c裡面的malloc函式一樣
parameters
devptr - pointer to allocated device memory
size - requested allocation size
in bytes
returns - cudasuccess, cudaerrormemoryallocation

cudamalloc3d函式

__host__cudaerror_t cudamalloc3d (cu *pitcheddevptr, cudaextent extent) allocates logical 1
d, 2
d, or 3
d memory objects on the devic
parameters
pitcheddevptr - pointer to allocated pitched device memory
extent - requested allocation size (width field in bytes)，結構體如下：
struct cudaextent 
;returns - cudasuccess, cudaerrormemoryallocation、
分配1d、2
d、3d線性記憶體，也可用於申請紋理記憶體

cudamalloc3darray函式，和上面的cudamalloc3d函式用法一致

__host__cudaerror_t cudamalloc3darray (cudaarray_t *array, const cudachannelformatdesc *desc, cudaextent extent, unsigned int flags)
allocate an array
on the device.
parameters
array - pointer to allocated array
in device memory
desc - requested channel format
struct cudachannelformatdesc 
;extent - requested allocation size (width field in elements)
flags - flags
for extensions
returns - cudasuccess, cudaerrormemoryallocation

cudamallocarray陣列記憶體申請

__host__cudaerror_t cudamallocarray (cudaarray_t *array, const cudachannelformatdesc *desc, size_t width, size_t height, unsigned int flags)
allocate an array
on the device.
parameters
array - pointer to allocated array
in device memory
desc - requested channel format
width - requested array allocation width
height - requested array allocation height
flags - requested properties of allocated array
returns
cudasuccess, cudaerrormemoryallocation

cudamallocmanaged函式

__host__cudaerror_t cudamallocmanaged (void **devptr, size_t size, unsigned int flags)
allocates memory that will be automatically managed by
the unified memory system.
parameters
devptr - pointer to allocated device memory
size - requested allocation size in
bytes
flags - must be either cudamemattachglobal or cudamemattachhost (defaults to cudamemattachglobal)
returns
cudasuccess, cudaerrormemoryallocation cudaerrornotsupported
cudaerrorinvalidvalue
使用函式cudamallocmanaged()開闢一塊儲存空間，無論是在kernel函式中還是main函式中，都可以使用這塊記憶體，達到了統一定址的目的。

通過這種方式大大的簡化了**的複雜度，因為cuda6之前沒有統一定址，進行gpu計算的步驟稍許麻煩：

1. 在視訊記憶體上開闢空間 2. 將記憶體上的資料拷貝到視訊記憶體 3. 呼叫cuda核進行計算

4. 將視訊記憶體上處理過的資料拷貝到記憶體上

而統一定址的最大優勢就是避免了人為的資料拷貝，為什麼說人為呢，是因為即使是統一定址也是要進行資料拷貝的，只不過現在這一部分有程式自動完成，而不用程式設計師操心了。因此，統一定址後程式的執行效率並不會顯著改善，僅僅是為了方便而已。

on the device. 紋理記憶體申請

parameters

in device memory

desc - requested channel format

extent - requested allocation size (width field in elements)

numlevels - number of mipmap levels to allocate

flags - flags

for extensions

returns - cudasuccess, cudaerrormemoryallocation

__host__cudaerror_t cudamallocpitch (void **devptr,
size_t *pitch, size_t width, size_t height)
allocates pitched memory on the device.
parameters
devptr - pointer to allocated pitched device memory
pitch - pitch for allocation
width - requested pitched allocation width (in bytes)
height - requested pitched allocation height
returns - cudasuccess, cudaerrormemoryallocation
分配指定大小的線性記憶體

好的上面就是所有的cuda提供的記憶體申請方式，從上面可以看到，無論是c/c++還是cuda申請記憶體只是制定size就可以了，至於這個記憶體到底在**，這個是編譯器做的事情，我們無法直接制定記憶體申請的具體的物理位置，這是一件好事情，這樣程式設計師只需要專注於申請記憶體，然後去用就行了，不用擔心記憶體的管理（這裡的管理值得涉及到具體物理位置的記憶體的管理），這個nvidia已經幫我們做好了。

總的來說，申請到的記憶體具體在哪乙個位置是nvcc編譯器決定的，我們無法決定。

CUDA學習筆記05 Mem申請的所有方式

學習筆記 mem族函式

cuda學習筆記 1

Cuda學習筆記（三） Cuda程式設計Tips

CUDA學習筆記05 Mem申請的所有方式

學習筆記 mem族函式

cuda學習筆記 1

Cuda學習筆記（三） Cuda程式設計Tips

相關推薦