- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

MindStudio制作MindSpore TBE算子（一）算子制作

塞恩斯发表于 2025/02/10 00:43:54 2025/02/10

【摘要】 MindStudio制作MindSpore TBE算子（一）算子制作

在操作过程中，也查阅了非常多的资料，从各位前辈、大佬的博客中学到了很多，在此表示感谢。

一、环境准备

环境配置可以查看Ubuntu虚拟机配置MindStudio开发环境，新版本的MindStudio已经支持SSH远程连接的选项，如果有远程连接需求需要安装其他版本。

配置好MindStudio后，如果要开发的是MindSpore的算子，需要配置对应的Mindspore环境，具体参考开发Mindspore算子环境依赖。这里附带了anacoda配置相信信息。

wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
bash Anaconda3-2024.10-1-Linux-x86_64.sh
# 一路回车，有yes输入yes
source ~/.bashrc
conda create -n mindspore python==3.9
conda activate mindspore
conda config --set show_channel_urls yes
vi ~/.condarc

清空原有内容，输入一下内容：

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

之后继续执行

conda clean -i
conda install mindspore=2.4.10 -c mindspore -c conda-forge

软件版本为：

CANN：8.0.0
MindStudio：7.0.0 x86_64

二、实际操作

制作Mindspore的TBE流程如下，这里主要需要编辑四个文件，即xxx_impl.py、xxx.py、还有用于测试的test_xxx_impl.py和xxx_case_timestation.json文件。这里没有涉及到算子测试

2.1 注册算子

打开mindstudio界面，配置好CANN版本，选择IR template，template file选择自己编写，即创建新的json ir文档。

这里可以将attribute删掉。

点击OK按钮，将文件保存到指定的目录中。

2.2 配置python解释器

加载好Python SDK包后，放上去可以出现算子信息即为正确。

2.3 算子实现

算子实现可以参考官方教程（算子实现），其位于Mindspore/impl下，这里的名称为addcustom_impl.py，本示例可以不用修改addcustom.py文件。

from __future__ import absolute_import
from tbe import tvm
import tbe.dsl as tbe
from tbe.common.register import register_op_compute
from tbe.common.utils import shape_refine
from tbe.common.utils import shape_util
from tbe.common.utils import para_check
from functools import reduce
from mindspore.ops.op_info_register import op_info_register, TBERegOp, DataType

SHAPE_SIZE_LIMIT = 2147483648

@register_op_compute("addcustom")
def addcustom_compute(x, y, z):
    """
    The compute function of the Addcustom implementation.
    """
    # shape转为list
    shape_x = shape_util.shape_to_list(x.shape)
    shape_y = shape_util.shape_to_list(y.shape)

    # shape_max取shape_x与shape_y的每个维度的大值
    shape_x, shape_y, shape_max = shape_util.broadcast_shapes(shape_x, shape_y,
                                                              param_name_input1="input_x",
                                                              param_name_input2="input_y")
    shape_size = reduce(lambda x, y: x * y, shape_max[:])
    if shape_size > SHAPE_SIZE_LIMIT:
        raise RuntimeError("the shape is too large to calculate")

    # 将input_x的shape广播为shape_max
    input_x = tbe.broadcast(x, shape_max)
    input_y = tbe.broadcast(y, shape_max)

    # 执行input_x + input_y
    res = tbe.vadd(input_x, input_y)

    return res

# 算子注册信息
# Define the kernel info of Addcustom
# 算子注册名称
addcustom_op_info = (TBERegOp("Addcustom") \
    # 融合策略，这里选择不融合
    .fusion_type("OPAQUE") \
    .partial_flag(True) \
    .async_flag(False) \
    # 生成算子二进制名称
    .binfile_name("addcustom.so") \
    .compute_cost(10) \
    .kernel_name("addcustom_impl") \
    # 算子输入信息
    .input(0, "x", False, "required", "all")\
    .input(1, "y", False, "required", "all")\
    .output(0, "z", False, "required", "all")\
    # 数据格式名称
    .dtype_format(DataType.F16_Default, DataType.F16_Default, DataType.F16_Default)\
    .get_op_info())
# 算子的入口函数，描述了算子的编译过程
# Binding kernel info with the kernel implementation.
# 装饰器与算子注册信息绑定，执行后会注册到后端
@op_info_register(addcustom_op_info)
def addcustom_impl(x, y, z, kernel_name="addcustom_impl"):
    """
    The entry function of the Addcustom implementation.
    """
    # 获取算子输入tensor的shape与dtype
    shape_x = x.get("shape")
    shape_y = y.get("shape")

    # 检验算子输入类型
    check_tuple = ("float16")
    input_data_type = x.get("dtype").lower()
    para_check.check_dtype(input_data_type, check_tuple, param_name="input_x")

    # shape_max取shape_x与shape_y的每个维度的最大值
    shape_x, shape_y, shape_max = shape_util.broadcast_shapes(shape_x, shape_y,
                                                              param_name_input1="x",
                                                              param_name_input2="y")
    # 如果shape的长度等于1，就直接赋值，如果shape的长度不等于1，做切片，将最后一个维度舍弃（按照内存排布，最后一个维度为1与没有最后一个维度的数据排布相同，例如2*3=2*3*1，将最后一个为1的维度舍弃，可提升后续的调度效率）
    if shape_x[-1] == 1 and shape_y[-1] == 1 and shape_max[-1] == 1:
        shape_x = shape_x if len(shape_x) == 1 else shape_x[:-1]
        shape_y = shape_y if len(shape_y) == 1 else shape_y[:-1]
        shape_max = shape_max if len(shape_max) == 1 else shape_max[:-1]

    # 使用TVM的placeholder接口对输入tensor进行占位，返回一个tensor对象
    data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type)
    data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type)

    with tvm.target.cce():
        # 计算过程
        res = addcustom_compute(data_x, data_y, z)
        # 自动调度模块
        sch = tbe.auto_schedule(res)
    # 配置编译信息
    config = {"print_ir": False,
              "name": kernel_name,
              "tensor_list": [data_x, data_y, res]}

    tbe.build(sch, config)

# 算子调用，测试算子计算正确性
if __name__ == '__main__':
    input_output_dict = {"shape": (5, 6, 7),"format": "ND","ori_shape": (5, 6, 7),"ori_format": "ND", "dtype": "float16"}
    addcustom_impl(input_output_dict, input_output_dict, input_output_dict, kernel_name="add")

执行run测试，如果能生成对应的Kernel_meta文件，包含对应的json和.o文件，即可表明算子编写ok，要验证其余内容可以参考算子测试。

建议参考以下内容一起食用：
算子代码实现
 MindSpore框架TBE算子开发全流程

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入