CANN学习资源开源仓的中级算子开发二pybind调用算子
pybind调用算子仍是通过调用aclnn接口实现算子计算,只是将aclnn接口封装成Python函数,方便在Python中调用。首先安装依赖,方便在atomgit ai环境里面依赖都已经满足
pip install torch==2.9.0;pip install torch-npu==2.9.0;pip install pybind11;pip install setuptools; pip install wheel
pybind11(Python binding for C++11)是一个轻量级的C++库,用于将C++代码暴露给Python,创建Python绑定;让Python能直接调用C++函数、类、对象;常用于PyTorch等框架的底层算子扩展,实现高性能C++实现与Python前端的无缝对接
然后编写pybind封装代码,将aclnnAddCustomTemplate接口封装成Python函数npu_add_custom_template。借助pybind11的PYBIND11_MODULE宏,将C++实现的npu_add_custom_template函数,封装成可供Python导入和调用的扩展模块接口。custom_op.cpp代码如下:
#include <torch/extension.h>
#include <torch/csrc/autograd/custom_function.h>
#include "pytorch_npu_helper.hpp" //里面定义了宏EXEC_NPU_CMD用于执行aclnn二段式接口
using torch::autograd::Function;
using torch::autograd::AutogradContext;
using tensor_list = std::vector<at::Tensor>;
using namespace at;
at::Tensor npu_add_custom_template(const at::Tensor &x, const at::Tensor &y) {
at::Tensor z = at::empty_like(x);
EXEC_NPU_CMD(aclnnAddCustomTemplate, x, y, z);
return z;
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { //3个参数:Python 中要调用的函数名、C++ 函数的地址 / 指针、函数的文档字符串(可选)
m.def("npu_add_custom_template", &npu_add_custom_template, "torch add");
}
hpp文件是C++头文件,后缀
.hpp表示“C++ Header File”,与.h的主要区别是:.hpp通常用于包含C++特有语法(类、模板等)的头文件。这里pytorch_npu_helper.hpp是昇腾NPU适配PyTorch时的辅助头文件,定义了EXEC_NPU_CMD等宏来简化ACLNN(Ascend Compute Library for Neural Network)二段式接口的调用
完成代码后,需要编译成扩展模块,供Python调用。写法比较固定,通过setuptools库的setup函数,指定扩展模块的名称、版本、作者、描述等信息,以及需要编译的源文件,这里在setup.py中指定custom_op.cpp文件作为源文件,指定编译出的模块名为custom_ops_lib, 这样我们可以在Python里通过import custom_ops_lib来使用上文PYBIND11_MODULE宏对外暴露的python接口。setup.py写好后,进行pybind封装的扩展模块的编译和安装:
export LD_LIBRARY_PATH=${HOME}/vendors/customize/op_api/lib/:$LD_LIBRARY_PATH;cd Sources/pybind_op/;python3 setup.py build bdist_wheel;pip3 install dist/custom_ops*.whl --force-reinstall
输出
...
running install_scripts
creating build/bdist.linux-aarch64/wheel/custom_ops-1.0.dist-info/WHEEL
creating 'dist/custom_ops-1.0-cp311-cp311-linux_aarch64.whl' and adding 'build/bdist.linux-aarch64/wheel' to it
adding 'custom_ops_lib.cpython-311-aarch64-linux-gnu.so'
adding 'custom_ops-1.0.dist-info/METADATA'
adding 'custom_ops-1.0.dist-info/WHEEL'
adding 'custom_ops-1.0.dist-info/top_level.txt'
adding 'custom_ops-1.0.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel
path string is NULLpath string is NULLDefaulting to user installation because normal site-packages is not writeable
Processing ./dist/custom_ops-1.0-cp311-cp311-linux_aarch64.whl
Installing collected packages: custom-ops
Successfully installed custom-ops-1.0
安装好pybind绑定的扩展模块后,通过import custom_ops_lib来导入编译出的扩展模块,然后就像调用普通Python函数一样调用自定义算子的接口。需要注意的是自定义算子运行在npu上,所以调用时需要将输入数据转换为npu张量,输出数据需要打印或者保存时也需要先转换为cpu张量。代码:
# 我们用Python调用Pybind封装好的npu_add_custom_template函数,以此来运行我们开发的自定义算子
import torch
import torch_npu
import custom_ops_lib
torch.npu.config.allow_internal_format = False
length = [8, 2048]
x = torch.rand(length, device='cpu', dtype=torch.float16)
y = torch.rand(length, device='cpu', dtype=torch.float16)
golden = x + y
output_npu = custom_ops_lib.npu_add_custom_template(x.npu(), y.npu())
print("is same:",torch.allclose(golden, output_npu.cpu(),rtol=0.001, atol=0.001))
运行:
source ${HOME}/vendors/customize/bin/set_env.bash; python Sources/pybind_op/test_op.py
输出:
path string is NULLpath string is NULL[W326 21:01:05.514101901 compiler_depend.ts:338] Warning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (function operator())
opType=AddCustomTemplate, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 0 executed 1 times in total
opType=AddCustomTemplate, DumpHead: AIV-1, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 1 executed 1 times in total
...
opType=AddCustomTemplate, DumpHead: AIV-7, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 7 executed 1 times in total
is same: True
- 点赞
- 收藏
- 关注作者
评论(0)