- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

2025年6月 CANN Ascend C算子开发能力认证（中级）环境(ascend910b)与代码

红目香薰发表于 2025/07/08 19:09:04 2025/07/08

【摘要】各种参考，各种学习，各种填坑，终于搞出来了，现阶段全网独一份，我通过，并且带着一些学生都通过了，确定了逻辑的正确性，希望也能给大家带来一定的帮助。

前言

证书开头，并且我要说明的是我一周多的时间，各种参考，各种学习，各种填坑，终于搞出来了。

可以在证书上看到我是6月5日通过的，但是我开始的时候是在5月27日。

真心的不容易，终于在6月5日当天搞定了AscendC::Div函数的使用，最终通过了考试。

环境说明

我这里使用的是华为的ModelArts的环境直接在线安装的各类环境，没有使用线下的板子，如果有线下的板子需要看看那个型号的，我是：ascend910b，上下文需要对照，如果你是ascend310b的自己改一下啊。

心得分享

这里分享一些要注意的点，当然不是全部，有需求可以私聊我。

1、环境踩坑

如果重启后请一定要重新安装一下环境，并再次执行。

source ~/.bashrc

2、加载一个特殊的文件

不要问，一问一个不吱声，运行就行了。

source /home/ma-user/Ascend/ascend-toolkit/set_env.sh

3、修改文件列表

共计修改4个文件：

SigmoidCustom/SigmoidCustom/CMakePresets.json
SigmoidCustom/SigmoidCustom/op_host/sigmoid_custom_tiling.h
SigmoidCustom/SigmoidCustom/op_host/sigmoid_custom.cpp
SigmoidCustom/SigmoidCustom/op_kernel/sigmoid_custom.cpp

看好四个文件啊，一个都别少，少一个都无法得到最终的正确结果。

4、基础代码

基础代码说的是：

SigmoidCustom/SigmoidCustom/op_host/sigmoid_custom_tiling.h
SigmoidCustom/SigmoidCustom/op_host/sigmoid_custom.cpp

这两个文件的代码，相对好理解一些。

sigmoid_custom_tiling.h添加内容

TILING_DATA_FIELD_DEF(uint32_t, totalLength);
TILING_DATA_FIELD_DEF(uint32_t, tileNum);

sigmoid_custom.cpp添加内容：

const uint32_t BLOCK_DIM = 8;
const uint32_t TILE_NUM = 8;
uint32_t totalLength = context->GetInputShape(0)->GetOriginShape().GetShapeSize();
context->SetBlockDim(BLOCK_DIM);
tiling.set_totalLength(totalLength);
tiling.set_tileNum(TILE_NUM);
tiling.SaveToBuffer(context->GetRawTilingData()->GetData(),
context->GetRawTilingData()->GetCapacity());
context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
size_t *currentWorkspace = context->GetWorkspaceSizes(1);
currentWorkspace[0] = 0;
return ge::GRAPH_SUCCESS;

5、核心代码（密）

这里的核心代码部分我没有开放，毕竟我搞了一个多星期，很难的好呗，大致方向给了，后面需要自己探索哦。

#include "kernel_operator.h"
using namespace AscendC;
constexpr int32_t BUFFER_NUM = 2;
class KernelSigmoid {
public:
    __aicore__ inline KernelSigmoid() {}
    __aicore__ inline void Init(GM_ADDR x, GM_ADDR y, uint32_t totalLength, uint32_t tileNum)
    {
        //考生补充初始化代码
        ASSERT(GetBlockNum() != 0 && "block dim can not be zero!");
        this->blockLength = totalLength / GetBlockNum();
        this->tileNum = tileNum;
        ASSERT(tileNum != 0 && "tile num can not be zero!");
        this->tileLength = this->blockLength / tileNum / BUFFER_NUM;
        xGm.SetGlobalBuffer((__gm__ DTYPE_X *)x + this->blockLength * GetBlockIdx(), 
        this->blockLength);
        yGm.SetGlobalBuffer((__gm__ DTYPE_Y *)y + this->blockLength * GetBlockIdx(), 
        this->blockLength);
        pipe.InitBuffer(inQueueX, BUFFER_NUM, this->tileLength * sizeof(half));
        pipe.InitBuffer(outQueueY, BUFFER_NUM, this->tileLength * sizeof(half));
        pipe.InitBuffer(tmpBuffer1, this->tileLength * sizeof(half));
        pipe.InitBuffer(tmpBuffer2, this->tileLength * sizeof(half));
        pipe.InitBuffer(tmpBuffer3, this->tileLength * sizeof(half));
        pipe.InitBuffer(tmpBuffer4, this->tileLength * sizeof(half));
    }
    __aicore__ inline void Process()
    {
        // 补充对“loopCount”的定义，注意对Tiling的处理
        int32_t loopCount = this->blockLength / this->tileLength;
        for (int32_t i = 0; i < loopCount; i++) {
            CopyIn(i);
            Compute(i);
            CopyOut(i);
        }
    }

private:
    __aicore__ inline void CopyIn(int32_t progress)
    {
        //考生补充算子代码
        LocalTensor<DTYPE_X> xLocal = inQueueX.AllocTensor<DTYPE_X>();
        DataCopy(xLocal, xGm[progress * this->tileLength ], this->tileLength);
        inQueueX.EnQue(xLocal);
    }
    __aicore__ inline void Compute(int32_t progress)
    {
        //考生补充算子计算代码·核心部分，不开放
        // 从输入队列中取出当前块的数据

        // 从输出队列分配空间用于存储结果

        // 获取临时缓冲区用于中间计算

        // 获取临时缓冲区用于存储全1向量

        // 初始化全1向量

        // 定义sigmoid计算使用的常量
        DTYPE_X inputVal1 = -1.0;
        DTYPE_X inputVal2 = 1.0;
        // 计算步骤1: x = -x
        Muls(tmpTensor1, xLocal, inputVal1, this->tileLength);
        // 计算步骤2: exp(-x)
        Exp(tmpTensor2, tmpTensor1, this->tileLength);
        // 计算步骤3: 1 + exp(-x)
        Adds(tmpTensor3, tmpTensor2, inputVal2, this->tileLength);
        // 计算步骤4: 1 / (1 + exp(-x)) 得到最终的sigmoid结果

        // 将结果放入输出队列

        // 释放输入数据占用的空间
        inQueueX.FreeTensor(xLocal);
    }
    __aicore__ inline void CopyOut(int32_t progress)
    {
        // 考生补充算子代码
        AscendC::LocalTensor<half> yLocal = outQueueY.DeQue<half>();
        DataCopy(yGm[progress * this->tileLength], yLocal, this->tileLength);
        outQueueY.FreeTensor(yLocal);
    }

private:
    TPipe pipe;
    //create queue for input, in this case depth is equal to buffer num
    TQue<QuePosition::VECIN, BUFFER_NUM> inQueueX;
    //create queue for output, in this case depth is equal to buffer num
    TQue<QuePosition::VECOUT, BUFFER_NUM> outQueueY;
    GlobalTensor<half> xGm;
    GlobalTensor<half> yGm;

    //考生补充自定义成员变量
    TBuf<QuePosition::VECCALC> tmpBuffer1, tmpBuffer2, tmpBuffer3, tmpBuffer4;
    uint32_t blockLength;
    uint32_t tileNum;
    uint32_t tileLength;
};
extern "C" __global__ __aicore__ void sigmoid_custom(GM_ADDR x, GM_ADDR y, GM_ADDR workspace, GM_ADDR tiling) {
    GET_TILING_DATA(tiling_data, tiling);
    KernelSigmoid op;
    //补充init和process函数调用内容
    op.Init(x, y, tiling_data.totalLength, tiling_data.tileNum);
    op.Process();
}

6、给予权限

预先给于所有权限，在SigmoidCustom/SigmoidCustom/下运行。

chmod +x -R *

总结

以上6个部分的心得分享在你看完所有学习视频后绝对会用得上的，代码逻辑都明白了自己就能写个差不多了，希望这六条能有点价值，如果还搞不定，那就得让我出手了啊。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

2025年6月 CANN Ascend C算子开发能力认证（中级）环境(ascend910b)与代码

前言

环境说明

心得分享

1、环境踩坑

2、加载一个特殊的文件

3、修改文件列表

4、基础代码

5、核心代码（密）

6、给予权限

总结

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

2025年6月 CANN Ascend C算子开发能力认证（中级）环境(ascend910b)与代码

前言

环境说明

心得分享

1、环境踩坑

2、加载一个特殊的文件

3、修改文件列表

4、基础代码

5、核心代码（密）

6、给予权限

总结

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品