CANN学习资源开源仓的中级算子开发二pybind调用算子

举报
黄生 发表于 2026/03/26 21:38:52 2026/03/26
【摘要】 pybind调用算子仍是通过调用aclnn接口实现算子计算,只是将aclnn接口封装成Python函数,方便在Python中调用。首先安装依赖,方便在atomgit ai环境里面依赖都已经满足pip install torch==2.9.0;pip install torch-npu==2.9.0;pip install pybind11;pip install setuptools; pi...

pybind调用算子仍是通过调用aclnn接口实现算子计算,只是将aclnn接口封装成Python函数,方便在Python中调用。首先安装依赖,方便在atomgit ai环境里面依赖都已经满足

pip install torch==2.9.0;pip install torch-npu==2.9.0;pip install pybind11;pip install setuptools; pip install wheel

pybind11(Python binding for C++11)是一个轻量级的C++库,用于将C++代码暴露给Python,创建Python绑定;让Python能直接调用C++函数、类、对象;常用于PyTorch等框架的底层算子扩展,实现高性能C++实现与Python前端的无缝对接

然后编写pybind封装代码,将aclnnAddCustomTemplate接口封装成Python函数npu_add_custom_template。借助pybind11的PYBIND11_MODULE宏,将C++实现的npu_add_custom_template函数,封装成可供Python导入和调用的扩展模块接口。custom_op.cpp代码如下:

#include <torch/extension.h>
#include <torch/csrc/autograd/custom_function.h>
#include "pytorch_npu_helper.hpp"  //里面定义了宏EXEC_NPU_CMD用于执行aclnn二段式接口

using torch::autograd::Function;
using torch::autograd::AutogradContext;
using tensor_list = std::vector<at::Tensor>;

using namespace at;
at::Tensor npu_add_custom_template(const at::Tensor &x, const at::Tensor &y) {
    at::Tensor z = at::empty_like(x);
    EXEC_NPU_CMD(aclnnAddCustomTemplate, x, y, z);
    return z;
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { //3个参数:Python 中要调用的函数名、C++ 函数的地址 / 指针、函数的文档字符串(可选)
		m.def("npu_add_custom_template", &npu_add_custom_template, "torch add");
}

hpp文件是C++头文件,后缀.hpp表示“C++ Header File”,与.h的主要区别是:.hpp通常用于包含C++特有语法(类、模板等)的头文件。这里pytorch_npu_helper.hpp是昇腾NPU适配PyTorch时的辅助头文件,定义了EXEC_NPU_CMD等宏来简化ACLNN(Ascend Compute Library for Neural Network)二段式接口的调用

完成代码后,需要编译成扩展模块,供Python调用。写法比较固定,通过setuptools库的setup函数,指定扩展模块的名称、版本、作者、描述等信息,以及需要编译的源文件,这里在setup.py中指定custom_op.cpp文件作为源文件,指定编译出的模块名为custom_ops_lib, 这样我们可以在Python里通过import custom_ops_lib来使用上文PYBIND11_MODULE宏对外暴露的python接口。setup.py写好后,进行pybind封装的扩展模块的编译和安装:

export LD_LIBRARY_PATH=${HOME}/vendors/customize/op_api/lib/:$LD_LIBRARY_PATH;cd Sources/pybind_op/;python3 setup.py build bdist_wheel;pip3 install dist/custom_ops*.whl --force-reinstall

输出

...
running install_scripts
creating build/bdist.linux-aarch64/wheel/custom_ops-1.0.dist-info/WHEEL
creating 'dist/custom_ops-1.0-cp311-cp311-linux_aarch64.whl' and adding 'build/bdist.linux-aarch64/wheel' to it
adding 'custom_ops_lib.cpython-311-aarch64-linux-gnu.so'
adding 'custom_ops-1.0.dist-info/METADATA'
adding 'custom_ops-1.0.dist-info/WHEEL'
adding 'custom_ops-1.0.dist-info/top_level.txt'
adding 'custom_ops-1.0.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel
path string is NULLpath string is NULLDefaulting to user installation because normal site-packages is not writeable
Processing ./dist/custom_ops-1.0-cp311-cp311-linux_aarch64.whl
Installing collected packages: custom-ops
Successfully installed custom-ops-1.0

安装好pybind绑定的扩展模块后,通过import custom_ops_lib来导入编译出的扩展模块,然后就像调用普通Python函数一样调用自定义算子的接口。需要注意的是自定义算子运行在npu上,所以调用时需要将输入数据转换为npu张量,输出数据需要打印或者保存时也需要先转换为cpu张量。代码:

# 我们用Python调用Pybind封装好的npu_add_custom_template函数,以此来运行我们开发的自定义算子
import torch
import torch_npu
import custom_ops_lib

torch.npu.config.allow_internal_format = False

length = [8, 2048]
x = torch.rand(length, device='cpu', dtype=torch.float16)
y = torch.rand(length, device='cpu', dtype=torch.float16)
golden = x + y
output_npu = custom_ops_lib.npu_add_custom_template(x.npu(), y.npu())

print("is same:",torch.allclose(golden, output_npu.cpu(),rtol=0.001, atol=0.001))

运行:

source ${HOME}/vendors/customize/bin/set_env.bash; python Sources/pybind_op/test_op.py

输出:

path string is NULLpath string is NULL[W326 21:01:05.514101901 compiler_depend.ts:338] Warning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (function operator())
opType=AddCustomTemplate, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 0 executed 1 times in total
opType=AddCustomTemplate, DumpHead: AIV-1, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 1 executed 1 times in total
...
opType=AddCustomTemplate, DumpHead: AIV-7, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1048384, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: 8.5.0, TimeStamp: 202507
Core 7 executed 1 times in total
is same: True
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。