serial code for sorting an array of 8-bit unsigned numbers:
void countingsort( unsigned char* a, unsigned long a_size )
; // count array is initialized to zero by the compiler
for( unsigned long i = 0; i < a_size; i++ )
count[ a[ i ] ]++;
// fill the array with the number of 0's that were counted, followed by the number of 1's, and then 2's and so on
unsigned long n = 0;
for( unsigned long i = 0; i < numberofcounts; i++ )
for( unsigned long j = 0; j < count[ i ]; j++ )
a[ n++ ] = (unsigned char)i;
test code:
#include #include using namespace std;
typedef unsigned int uint;
const int numdata = 100;
const uint k = 256;
void countsort(vector& inputvec)
void printdata(vector&testvec)
int main()
cout << "test data before sorting : " << endl;
// counting sort
cout << "test data after sorting : " << endl;
return 0;
小結:計數排序比較適合取值範圍k比較小而資料量很大的情形( k << n)
parallel counting sort 參考:
cuda counting sort:
1. 待排序資料從cpu拷貝到gpu (gpu only 操作可忽略一次資料傳輸開銷, 建議:use host pinned memory)
2. 計數陣列的初始化 (for gpu only scenario, 分配一次,然後每一次排序前use cudamemset 恢復初值)
3. 計數統計
4. 填充輸出陣列
5. 已排序資料從gpu拷貝回cpu(gpu only 操作可忽略一次資料傳輸開銷, 建議:use host pinned memory)
to be continued...
