Ascend Pytorch運算元適配層開發

ascend pytorch運算元適配層開發

適配方法

找到和pytorch運算元功能對應的npu tbe運算元，根據運算元功能計算出輸出tensor的size，再根據tbe運算元原型構造對應的input/output/attr，傳遞給acl完成tbe運算元的執行。

說明：tbe運算元實現的原始檔存放路徑由開發套件包toolkit的安裝方式決定：

• 若使用root使用者安裝，則存放在：/usr/local/ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/

• 若使用非root使用者安裝，則存放在：~/.local/ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/

開發者可以通過檢視運算元實現原始檔，確定運算元的功能。

存放路徑和命名格式

對npu的tbe運算元適配檔案儲存在pytorch/aten/src/aten/native/npu目錄下，命名風格採用大駝峰，命名格式：《運算元名》 + .cpp，如：addkernelnpu.cpp。

適配步驟

須知：適配**基於c++開發。

引入依賴標頭檔案。

#include 「aten/native/npu/utils/calcuoputil.h」

#include 「aten/native/npu/utils/kernelnpuoutputsize.h」

#include 「aten/native/npu/utils/npuutils.h」

說明："calcuoputil.h"檔案中主要包含與acl介面相關的函式。

"kernelnpuoutputsize.h"中主要包含運算元輸出shape的推導函式。

"npuutils.h"檔案中主要包含公共能力的函式。

定義add運算元適配主體函式。

結合native_functions.yaml 中 add運算元的分發定義，運算元適配中應包含如下函式：

o add_npu_input 構造輸入的nputensordesc物件

o add_npu_output 構造輸出的nputensordesc物件

o add_npu_attr 構造npu tbe add運算元attr屬性

o add_out_npu 運算元適配函式（yaml中npu派發函式，支援傳入輸出tensor），other引數支援 tensor & scalar

o add_npu 運算元適配函式(yaml中npu派發函式)，other引數支援 tensor & scalar

return inputs;

}// 輸入引數為"self": 「tensor"和"other」: "scalar"時，適配函式add_npu_input的實現

smallvectoradd_npu_input(const tensor& self,const scalar& other) );

}實現函式 add_npu_output。

將函式 add_npu_output的輸出tensor物件構造成nputensordesc物件。

// 輸出引數為「tensor」時，適配函式add_npu_output的實現

smallvectoradd_npu_output(const tensor& result) );

}說明：

一般來說，運算元的輸出不需要特殊處理，直接呼叫createnpuoutputtensordesc即可。

實現函式 add_npu_attr。

根據npu tbe運算元原型中所需的attr規格，將引數適配成npu tbe運算元原型所需要的attr屬性。

// 輸入引數為"other": 「tensor"和"alpha」: 「scalar"時，對應的適配函式add_npu_attr實現

smallvectoradd_npu_attr(const tensor& self, const tensor& other, scalar alpha) ;

return attrs;

}// 輸入引數為"other」: 「scalar"和"alpha」: "scalar"時，對應的適配函式adds_npu_attr實現

smallvectoradds_npu_attr(const tensor& self,const scalar& other,const scalar& alpha) ;

return attrs;

}實現函式 add_out_npu。

tensor& add_out_npu(tensor& result, const tensor& self, const tensor& other, scalar alpha) else if (self.dim() == 0 && !self.is_npu()) else );

// constructs the attr of the npuattrdesc

auto attrs = add_npu_attr(self, other, alpha);

// executing the npu operator

calcuoputil::execute_npu_operate("axpy", inputs, outputs, attrs);

return result;

}

說明：add_out_npu和add_npu的差別是add_out_npu支援顯示指定輸出tensor，往輸出tensor中寫入結果。

26. 實現函式 add_npu。

a. 定義並實現運算元的shape推導函式，根據輸入引數計算輸出的size。

shape推導函式定義規範：

「npu適配函式名稱」 + 「" + 「output」 + "」 + 「size」，如add_npu_output_size()；

說明： shape推導函式定義和實現存放在 pytorch/aten/src/aten/native/npu/utils，對應的標頭檔案和實現在 kernelnpuoutputsize.h 和 kernelnpuoutputsize.cpp中。

 在kernelnpuoutputsize.h中，函式存放位置按照函式名字排序。

//輸入引數為"self": 「tensor"和"other」: "tensor"時，shape推導該函式

smallvectoradd_npu_output_size(const tensor& self,const tensor& other)

// 輸入引數為"self": 「tensor"和"other」: 「scalar"時，shape推導該函式

intarrayref add_npu_output_size(const tensor& self, const scalar& other)

說明：broadcast_ops_npu_output_size函式的作用是：當兩個引數符合pytorch廣播機制時，函式會將兩個引數自動擴充套件為相等大小

b. 呼叫對應的shape推導函式計算輸出的size。

c. 根據輸出的size呼叫at::empty_with_ format建立輸出tensor，函式支援指定輸出tensor的format，預設為nchw格式。

說明：當前制定的format設定規則為重型運算元錨點擴散+連續性法則混合規則。

 重型運算元如卷積、matmul，只支援某種特定format，適配時顯示指定為其需要的format，format向周邊擴散。

 而連續性法則指的是運算元對格式不敏感，運算元format指定為與第乙個輸入tensor的format相同即可。

 npu中的卷積只支援nc1hwc0格式，所以需要顯式指定為nc1hwc0格式

d. 將構造好的輸出tensor和其他引數傳給add_out_npu進行運算

e. // 輸入引數為"self」: 「tensor"和"other」: 「tensor"時，對應的適配函式add_npu實現

f. //呼叫對應的shape推導函式計算輸出的size

g. tensor add_npu(const tensor& self, const tensor& other, scalar alpha)

r. s. // 輸入引數為"self」: 「tensor"和"other」: "scalar"時，對應的適配函式add_npu實現

t. //呼叫對應的shape推導函式計算輸出的size

u. tensor add_npu(const tensor& self, scalar other, scalar alpha)

Ascend Pytorch運算元適配層開發

SparkRDD運算元 sample運算元

spark運算元五 action運算元

運算元的分類和寬依賴運算元窄依賴運算元

Ascend Pytorch運算元適配層開發

SparkRDD運算元 sample運算元

spark運算元 五 action運算元

運算元的分類和 寬依賴運算元 窄依賴運算元

相關推薦

spark運算元五 action運算元

運算元的分類和寬依賴運算元窄依賴運算元