香橙派AI Pro算子开发(三)kernel直调Add算子的Tiling算法
一、代码准备
这里的代码是从官方仓库拷贝而来,可以参考卡面篇文章香橙派AI Pro算子开发(一)
git clone https://gitee.com/ascend/samples
cd samples/operator/ascendc/tutorials/AddCustomSample
二、代码执行
进入目录后简介的结构为如下,这里有在框架里调用add算子,也有利用kernel直接调用的方式,这里先去解读Kernel调用方式
AddCustomSample
|--FrameworkLaunch
|--KernelLaunch
|--README.md
进入指定目录后执行run.sh脚本
cd KernelLaunch/AddKernelInvocationTilingNeo
bash run.sh -r npu -v Ascend310B4
编译输出信息如下:
(base) HwHiAiUser@orangepiaipro:~/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo$ bash run.sh -r npu -v Ascend310B4
Current compile soc version is Ascend310B4
[INFO]: /usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/custom/ is empty
[INFO]: /usr/local/Ascend/ascend-toolkit/latest/opp/op_proto/custom/ is empty
[INFO]: / is empty
[INFO]: / is empty
[INFO]: /usr/local/Ascend/ascend-toolkit/latest/opp/framework/custom/ is empty
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build
[ 2%] Creating directories for 'ascendc_kernels_npu_precompile'
[ 5%] No download step for 'ascendc_kernels_npu_precompile'
[ 8%] No update step for 'ascendc_kernels_npu_precompile'
[ 11%] No patch step for 'ascendc_kernels_npu_precompile'
[ 13%] Performing configure step for 'ascendc_kernels_npu_precompile'
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
DYNAMIC_MODE
INCLUDE_DIR
-- Build files have been written to: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_precompile-prefix/src/ascendc_kernels_npu_precompile-build
[ 16%] Performing build step for 'ascendc_kernels_npu_precompile'
[100%] Building CXX object CMakeFiles/precompile_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[100%] Built target precompile_obj
[100%] Built target check_src_template
[ 19%] No install step for 'ascendc_kernels_npu_precompile'
[ 22%] Completed 'ascendc_kernels_npu_precompile'
[ 22%] Built target ascendc_kernels_npu_precompile
[ 25%] Creating directories for 'ascendc_kernels_npu_preprocess'
[ 27%] No download step for 'ascendc_kernels_npu_preprocess'
[ 30%] No update step for 'ascendc_kernels_npu_preprocess'
[ 33%] No patch step for 'ascendc_kernels_npu_preprocess'
[ 36%] Performing configure step for 'ascendc_kernels_npu_preprocess'
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_preprocess-prefix/src/ascendc_kernels_npu_preprocess-build
[ 38%] Performing build step for 'ascendc_kernels_npu_preprocess'
[100%] Building CXX object CMakeFiles/preprocess_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[100%] Building CXX object CMakeFiles/m200_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[100%] Built target preprocess_obj
[100%] Built target m200_obj
/usr/local/Ascend/ascend-toolkit/latest/tools/ccec_compiler/bin/ld.lld -m aicorelinux -Ttext=0 /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_preprocess-prefix/src/ascendc_kernels_npu_preprocess-build/CMakeFiles/m200_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o -o /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_preprocess-prefix/src/ascendc_kernels_npu_preprocess-build/CMakeFiles/m200_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o[100%] Built target merge_m200_obj_text
[100%] Built target _host_cpp
[ 41%] No install step for 'ascendc_kernels_npu_preprocess'
[ 44%] Completed 'ascendc_kernels_npu_preprocess'
[ 44%] Built target ascendc_kernels_npu_preprocess
[ 50%] Creating directories for 'ascendc_kernels_npu_device'
[ 50%] Creating directories for 'ascendc_kernels_npu_host'
[ 55%] No download step for 'ascendc_kernels_npu_device'
[ 55%] No download step for 'ascendc_kernels_npu_host'
[ 61%] No update step for 'ascendc_kernels_npu_device'
[ 61%] No update step for 'ascendc_kernels_npu_host'
[ 66%] No patch step for 'ascendc_kernels_npu_device'
[ 66%] No patch step for 'ascendc_kernels_npu_host'
[ 72%] Performing configure step for 'ascendc_kernels_npu_host'
[ 72%] Performing configure step for 'ascendc_kernels_npu_device'
-- The C compiler identification is GNU 11.4.0
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Generating done
-- Configuring done
-- Build files have been written to: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_device-prefix/src/ascendc_kernels_npu_device-build
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
ASCEND_PYTHON_EXECUTABLE
-- Build files have been written to: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_host-prefix/src/ascendc_kernels_npu_host-build
[ 75%] Performing build step for 'ascendc_kernels_npu_device'
[ 77%] Performing build step for 'ascendc_kernels_npu_host'
[100%] Building CXX object CMakeFiles/device_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/auto_gen/ascendc_kernels_npu/auto_gen_add_custom.cpp.o
[100%] Building CXX object CMakeFiles/host_bisheng_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[100%] Built target host_bisheng_obj
[ 80%] Performing install step for 'ascendc_kernels_npu_host'
Consolidate compiler generated dependencies of target host_bisheng_obj
[100%] Building CXX object CMakeFiles/host_bisheng_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[100%] Built target host_bisheng_obj
Install the project...
-- Install configuration: "Debug"
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_host_dir/./objects-Debug/host_bisheng_obj/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o
[ 83%] Completed 'ascendc_kernels_npu_host'
[ 83%] Built target ascendc_kernels_npu_host
[100%] Built target device_obj
/usr/local/Ascend/ascend-toolkit/latest/tools/ccec_compiler/bin/ld.lld -m aicorelinux -Ttext=0 /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_device-prefix/src/ascendc_kernels_npu_device-build/CMakeFiles/device_obj.dir/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/auto_gen/ascendc_kernels_npu/auto_gen_add_custom.cpp.o -static -o /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_merge_obj_dir/device.o
[100%] Built target merge_device_obj
[ 86%] No install step for 'ascendc_kernels_npu_device'
[ 88%] Completed 'ascendc_kernels_npu_device'
[ 88%] Built target ascendc_kernels_npu_device
[ 88%] Built target ascendc_kernels_npu_merge_obj
[ 91%] Building CXX object CMakeFiles/ascendc_kernels_npu_host_stub_obj.dir/auto_gen/ascendc_kernels_npu/host_stub.cpp.o
[ 91%] Built target ascendc_kernels_npu_host_stub_obj
[ 94%] Linking CXX shared library lib/libascendc_kernels_npu.so
/usr/local/Ascend/ascend-toolkit/latest/bin/ascendc_pack_kernel /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/CMakeFiles/ascendc_kernels_npu_host_stub_obj.dir/auto_gen/ascendc_kernels_npu/host_stub.cpp.o /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_merge_obj_dir/device.o 0 /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/CMakeFiles/ascendc_kernels_npu_host_stub_obj.dir/auto_gen/ascendc_kernels_npu/host_stub.cpp.o
recompile: /usr/bin/c++ -fPIC -g -Wl,-z,relro -Wl,-z,now -Wl,-z,noexecstack -shared -Wl,-soname,libascendc_kernels_npu.so -o lib/libascendc_kernels_npu.so CMakeFiles/ascendc_kernels_npu_host_stub_obj.dir/auto_gen/ascendc_kernels_npu/host_stub.cpp.o /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/build/ascendc_kernels_npu_host_dir/objects-Debug/host_bisheng_obj/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/add_custom.cpp.o -L/usr/local/Ascend/ascend-toolkit/latest/lib64 -L/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascend310B4/lib /usr/local/Ascend/ascend-toolkit/latest/lib64/libascendc_runtime.a -lascend_dump -lc_sec
[ 94%] Built target ascendc_kernels_npu
[ 97%] Building CXX object CMakeFiles/ascendc_kernels_bbit.dir/main.cpp.o
[100%] Linking CXX executable ascendc_kernels_bbit
[100%] Built target ascendc_kernels_bbit
-- Install configuration: "Debug"
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/lib/libascendc_kernels_npu.so
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/include
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/include/ascendc_kernels_npu
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/include/ascendc_kernels_npu/aclrtlaunch_add_custom.h
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/include/ascendc_kernels_npu/aclrtlaunch_triple_chevrons_func.h
-- Installing: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/bin/ascendc_kernels_bbit
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.281.846 [task_fail_callback_manager.cc:52] 249713 TaskFailCallBackManager: Constructor.
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.513.720 [msprof_callback_impl.cpp:336] >>> (tid:249713) Started to register profiling ctrl callback.
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.514.173 [msprof_callback_impl.cpp:343] >>> (tid:249713) Started to register profiling hash id callback.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.514.331 [prof_atls_plugin.cpp:117] (tid:249713) RegisterProfileCallback, callback type is 7
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.514.456 [msprof_callback_impl.cpp:350] >>> (tid:249713) Started to register profiling enable host freq callback.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.514.569 [prof_atls_plugin.cpp:117] (tid:249713) RegisterProfileCallback, callback type is 8
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.524.154 [runtime.cc:5471] 249713 GetVisibleDevices: ASCEND_RT_VISIBLE_DEVICES param was not set
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.527.436 [prof_atls_plugin.cpp:210] (tid:249713) Module[7] register callback of ctrl handle.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.528.041 [prof_atls_plugin.cpp:210] (tid:249713) Module[69] register callback of ctrl handle.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.559.400 [prof_atls_plugin.cpp:210] (tid:249713) Module[48] register callback of ctrl handle.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:08.559.740 [prof_atls_plugin.cpp:210] (tid:249713) Module[45] register callback of ctrl handle.
[INFO] GE(249713,ascendc_kernels_bbit):2025-02-08-15:03:09.437.930 [op_tiling_manager.cc:109]249713 ~FuncPerfScope:[GEPERFTRACE] The time cost of OpTilingManager::LoadSo is [878009] micro second.
[INFO] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:09.463.351 [prof_atls_plugin.cpp:210] (tid:249713) Module[6] register callback of ctrl handle.
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.688.198 [msprof_callback_impl.cpp:89] >>> (tid:249713) MsprofCtrlCallback called, type: 255
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.688.994 [ai_drv_dev_api.cpp:333] >>> (tid:249713) Succeeded to DrvGetApiVersion version: 0x72313
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.689.470 [client_manager.cpp:462][GetClientRunMode][tid:249713] runningMode:0
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.689.620 [client_manager.cpp:126][GetInstance][tid:249713] [ClientManager] Current mode:2
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.689.749 [thread_mode_manager.cpp:70][Open][tid:249713] [ThreadModeManager] enter into open process deviceId[0] rankSize[0]
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.690.900 [thread_mode_manager.cpp:280][HandleAICPUPackage][tid:249713] begin load aicpu package dstPath[/home/HwHiAiUser/], srcpath[/usr/local/Ascend/ascend-toolkit/latest/opp/Ascend/aicpu/] file[Ascend-aicpu_syskernels.tar.gz]
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.691.060 [package_worker.cpp:338][LoadAICPUPackageForThreadMode][tid:249713] Package checkcode is [57460226]
[WARNING] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.691.197 [package_worker.cpp:342][LoadAICPUPackageForThreadMode][tid:249713] Open aicpu_package_install.info verifyFile[/home/HwHiAiUser/aicpu_package_install.info], strerror[File exists]
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.691.675 [thread_mode_manager.cpp:280][HandleAICPUPackage][tid:249713] begin load aicpu package dstPath[/home/HwHiAiUser/], srcpath[/usr/local/Ascend/ascend-toolkit/latest/opp/Ascend/aicpu/] file[Ascend-aicpu_extend_syskernels.tar.gz]
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.691.837 [package_worker.cpp:338][LoadAICPUPackageForThreadMode][tid:249713] Package checkcode is [8052802]
[WARNING] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.691.967 [package_worker.cpp:342][LoadAICPUPackageForThreadMode][tid:249713] Open aicpu_package_install.info verifyFile[/home/HwHiAiUser/extend_aicpu_package_install.info], strerror[File exists]
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.070 [thread_mode_manager.cpp:159][SetAICPUProfilingCallback][tid:249713] [ThreadModeManager] profiling callback is nullptr, skip set aicpu profiling callback
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.500 [aicpusd_interface_process.cpp:525][TryGetLogLevelFromParentProcess][tid:249713] get ASCEND_GLOBAL_LOG_LEVEL [] and ASCEND_GLOBAL_EVENT_ENABLE []
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.771 [aicpusd_interface_process.cpp:467][GetCurrentRunMode][tid:249713] Current aicpu mode is offline (call by api).
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.844 [aicpusd_drv_manager.cpp:327][MarkMdc][tid:249713] Get hardware version[7] success.
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.892 [aicpusd_drv_manager.cpp:190][GetNormalAicpuInfo][tid:249713] aicpuBitMap[8], aicpuNum[1].
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.932 [aicpusd_drv_manager.cpp:224][GetCcpuInfo][tid:249713] ccpuBitMap[7].
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.824.990 [aicpusd_drv_manager.cpp:110][GetNormalAicpuDCpuInfo][tid:249713] GetNormalAicpuDCpuInfo, deviceId[0], aicpu_num[1], aicpu_os_sched[281470681743361], ccpu_num[255082402676739], ccpu_os_sched[255082402676737], dcpu_num[255082402676736], dcpu_os_sched[281470681743361], tscpu_num[187647121162240], tscpu_os_sched[0].
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.825.025 [aicpusd_drv_manager.cpp:306][InitDrvMgrCaluniqueVfId][tid:249713] InitDrvMgr uniqueVfId=0, deviceId=0
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.825.055 [aicpusd_drv_manager.cpp:377][InitDrvMgr][tid:249713] host pid[249713], host proc name[], vf id[0], first aicpu index[0], aicpu num[1], dcpu base index[3], dcpu num[0].
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.825.271 [aicpusd_resource_manager.cpp:257][InitBufManager][tid:249713] Aicpu schedule SetBuffCfg successed!
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.825.762 [aicpusd_worker.cpp:74][ThreadPool][tid:249713] ThreadPool
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.826.015 [aicpusd_worker.cpp:274][AddPidToTask][tid:249769] Bind pid by hal.[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.826.084 [aicpusd_worker.cpp:276][AddPidToTask][tid:249769] AddPidToTask by halBindCgroup
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.844.404 [aicpusd_worker.cpp:323][SetAffinityBySelf][tid:249769] [hw]SetAffinityBySelf, physIndex[3], devNum[0]
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.844.488 [aicpusd_worker.cpp:389][SetAffinity][tid:249769] aicpu bind tid by self, index[0], deviceId[0], res[0].
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.844.754 [aicpusd_cust_so_manager.cpp:77][InitAicpuCustSoManager][tid:249713] cust so dir name is /home/HwHiAiUser/cust_aicpu_0_0_249713/.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.850.695 [raw_device.cc:237] 249713 Init: isAddrFlat:0
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.152 [grp_mng.c:81][bufmng] [halGrpCreate 81] Create grp. (grp_name=private_buff_grp_249713; grp_id=3; max_size=0; cache_flag=0)
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.216 [grp_mng.c:144][bufmng] [halGrpAddProc 144] add grp succ, name:private_buff_grp_249713, pid:249713, grp_id:3, admin:1 alloc:1 read:1 write:1
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.397 [buff_recycle.c:758][bufmng] [procMngInit 758] poolId 3 add task node uid 4 pid 249713
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.439 [drv_buff_mbuf.c:65][bufmng] [mbufSetPrivFlag 65] Set mbuf priv flag sucess. (flag=0, g_mbuf_priv_flag=0)
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.469 [grp_mng.c:212][bufmng] [halGrpAttach 212] grp attach, grp_name:private_buff_grp_249713, grp_id:3, timeout:1000
[EVENT] DRV(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.851.503 [drv_buff_memzone.c:450][bufmng] [memzone_cfg 450] BuffCfg success. (cfg num=4; huge_prior=393216; normal=393216; huge_only=393216; dvpp_huge_prior=393216; dvpp_normal=393216; dvpp_huge_only=393216)
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.854.603 [tsd_client.cpp:172][TsdCapabilityGet][tid:249713] TsdCapabilityGet Begin.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.854.817 [engine.cc:76] 249713 Engine: Constructor.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.854.952 [stars_engine.cc:41] 249713 StarsEngine: Constructor.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.857.437 [npu_driver.cc:5784] 249772 GetDeviceStatus: GetDeviceStatus status=1.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.860.889 [device_error_proc.cc:446] 249713 GetTschCapability: Tsch not support capability feature, use old solution.
[INFO] TDT(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.861.141 [client_manager.cpp:195][SetProfilingCallback][tid:249713] [TsdClient] set profiling callback success
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.897.916 [aicpusd_cust_so_manager.cpp:401][DeleteCustSoDir][tid:249713] Access cust so dir /home/HwHiAiUser/cust_aicpu_0_0_249713/ failed, error is No such file or directory.
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.898.002 [dump_task.cpp:1949][ClearResource][tid:249713] clear all resource of data dump
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.898.043 [aicpusd_interface.cpp:229][StopAICPUScheduler][tid:249713] Success to stop aicpu scheduler, deviceId[0], hostPid[249713].
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.898.723 [stars_engine.cc:48] 249713 ~StarsEngine: Destructor.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.898.935 [engine.cc:82] 249713 ~Engine: Destructor.
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.899.453 [msprof_callback_impl.cpp:89] >>> (tid:249713) MsprofCtrlCallback called, type: 3
[EVENT] PROFILING(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.899.627 [ai_drv_dev_api.cpp:333] >>> (tid:249713) Succeeded to DrvGetApiVersion version: 0x72313
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:10.901.189 [aicpusd_worker.cpp:168][WaitForStop][tid:249713] WaitForStop begin.
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:13.996.432 [aicpusd_worker.cpp:174][WaitForStop][tid:249713] WaitForStop end.
[EVENT] CCECPU(249713,ascendc_kernels_bbit):2025-02-08-15:03:13.997.779 [aicpusd_mpi_mgr.cpp:48][PrintStatisticInfo][tid:249713] Mpi Dvpp event statistic: [0]
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:14.452.367 [task_fail_callback_manager.cc:57] 249713 ~TaskFailCallBackManager: Destructor.
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:14.460.369 [runtime.cc:2033] 249713 ~Runtime: deconstruct runtime
[INFO] RUNTIME(249713,ascendc_kernels_bbit):2025-02-08-15:03:14.461.268 [runtime.cc:2040] 249713 ~Runtime: wait monitor success, use=0.
56f5ce95a320d042f4678bff6c2d84cc output/golden.bin
56f5ce95a320d042f4678bff6c2d84cc output/output_z.bin
error ratio: 0.0000, tolrence: 0.0010
test pass
三、代码解读
进入文件的结构如下:
│-- CMakeLists.txt # CMake 配置文件(用于编译)
│-- README.md # 说明文档
│-- add_custom.cpp # 主要 C++ 代码,可能是 Kernel 计算相关的实现
│-- add_custom_tiling.h # 头文件,用于自定义 Tiling 操作
│-- data_utils.h # 数据工具类(可能用于 I/O 处理)
│-- main.cpp # 入口文件,主程序
│-- run.sh # 运行脚本,可能用于编译和执行
│-- cmake/ # CMake 相关文件夹
│-- scripts/ # 可能包含 Shell/Python 脚本(用于辅助运行)
这里核心的代码就是add_custom.cpp、main.cpp和add_custom_tiling.h文件。
3.1 Tiling技术解读
关于tiling可以查看文章:GPU上的Tiling技术
也可以查看官方文档:Ascend C算子优化实用技巧04——Tiling优化
什么是Tiling策略?
大多数情况下,AI Core内部的Unified Buffer,无法完整的容纳算子的输入与输出,需要每次搬运一部分输入进行计算然后搬出,再搬运下一部分输入进行计算,直到得到完整的最终结果,这个数据切分、分块计算的过程称之为Tiling,切分数据的算法称为Tiling算法或者Tiling策略。
总之,Tiling是一种为了降低数据交换带来的延迟所研发的策略,其具体内容这里不再过多叙述,这里强烈建议看完前两篇博客。
3.1 add_custom.cpp
关于add_custom.cpp的基础解读,可以看香橙派AI Pro算子开发(二)kernel直调Add算子
/**
* @file add_custom.cpp
*
* Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*/
#include "add_custom_tiling.h"
#include "kernel_operator.h"
constexpr int32_t BUFFER_NUM = 2; // tensor num for each queue
class KernelAdd {
public:
__aicore__ inline KernelAdd() {}
__aicore__ inline void Init(GM_ADDR x, GM_ADDR y, GM_ADDR z, uint32_t totalLength, uint32_t tileNum)
{
//AscendC::GetBlockNum,用于配置当前任务的核心数量
this->blockLength = totalLength / AscendC::GetBlockNum();
//获取tileNum数量
this->tileNum = tileNum;
this->tileLength = this->blockLength / tileNum / BUFFER_NUM;
xGm.SetGlobalBuffer((__gm__ half *)x + this->blockLength * AscendC::GetBlockIdx(), this->blockLength);
yGm.SetGlobalBuffer((__gm__ half *)y + this->blockLength * AscendC::GetBlockIdx(), this->blockLength);
zGm.SetGlobalBuffer((__gm__ half *)z + this->blockLength * AscendC::GetBlockIdx(), this->blockLength);
pipe.InitBuffer(inQueueX, BUFFER_NUM, this->tileLength * sizeof(half));
pipe.InitBuffer(inQueueY, BUFFER_NUM, this->tileLength * sizeof(half));
pipe.InitBuffer(outQueueZ, BUFFER_NUM, this->tileLength * sizeof(half));
}
__aicore__ inline void Process()
{
int32_t loopCount = this->tileNum * BUFFER_NUM;
for (int32_t i = 0; i < loopCount; i++) {
CopyIn(i);
Compute(i);
CopyOut(i);
}
}
private:
__aicore__ inline void CopyIn(int32_t progress)
{
AscendC::LocalTensor<half> xLocal = inQueueX.AllocTensor<half>();
AscendC::LocalTensor<half> yLocal = inQueueY.AllocTensor<half>();
AscendC::DataCopy(xLocal, xGm[progress * this->tileLength], this->tileLength);
AscendC::DataCopy(yLocal, yGm[progress * this->tileLength], this->tileLength);
inQueueX.EnQue(xLocal);
inQueueY.EnQue(yLocal);
}
__aicore__ inline void Compute(int32_t progress)
{
AscendC::LocalTensor<half> xLocal = inQueueX.DeQue<half>();
AscendC::LocalTensor<half> yLocal = inQueueY.DeQue<half>();
AscendC::LocalTensor<half> zLocal = outQueueZ.AllocTensor<half>();
AscendC::Add(zLocal, xLocal, yLocal, this->tileLength);
outQueueZ.EnQue<half>(zLocal);
inQueueX.FreeTensor(xLocal);
inQueueY.FreeTensor(yLocal);
}
__aicore__ inline void CopyOut(int32_t progress)
{
AscendC::LocalTensor<half> zLocal = outQueueZ.DeQue<half>();
AscendC::DataCopy(zGm[progress * this->tileLength], zLocal, this->tileLength);
outQueueZ.FreeTensor(zLocal);
}
private:
AscendC::TPipe pipe;
AscendC::TQue<AscendC::QuePosition::VECIN, BUFFER_NUM> inQueueX, inQueueY;
AscendC::TQue<AscendC::QuePosition::VECOUT, BUFFER_NUM> outQueueZ;
AscendC::GlobalTensor<half> xGm;
AscendC::GlobalTensor<half> yGm;
AscendC::GlobalTensor<half> zGm;
uint32_t blockLength;
uint32_t tileNum;
uint32_t tileLength;
};
extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, AddCustomTilingData tiling)
{
KernelAdd op;
op.Init(x, y, z, tiling.totalLength, tiling.tileNum);
op.Process();
}
3.2 main.cpp
/**
* @file main.cpp
*
* Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*/
#include "add_custom_tiling.h"
#include "data_utils.h"
#ifndef ASCENDC_CPU_DEBUG
#include "acl/acl.h"
#include "aclrtlaunch_add_custom.h"
#else
#include "tikicpulib.h"
extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, AddCustomTilingData tiling);
#endif
int32_t main(int32_t argc, char *argv[])
{
uint32_t blockDim = 8;
//tilingSize,这是什么,为什么要设计tilingsize呢?
size_t tilingSize = 2 * sizeof(uint32_t);
//总输入数据大小
size_t inputByteSize = 8 * 2048 * sizeof(uint16_t);
//总输出数据大小
size_t outputByteSize = 8 * 2048 * sizeof(uint16_t);
//CPU Debug模式
#ifdef ASCENDC_CPU_DEBUG
// 分配tiling数据
uint8_t *tiling = (uint8_t *)AscendC::GmAlloc(tilingSize);
ReadFile("./input/input_tiling.bin", tilingSize, tiling, tilingSize);
uint8_t *x = (uint8_t *)AscendC::GmAlloc(inputByteSize);
uint8_t *y = (uint8_t *)AscendC::GmAlloc(inputByteSize);
uint8_t *z = (uint8_t *)AscendC::GmAlloc(outputByteSize);
ReadFile("./input/input_x.bin", inputByteSize, x, inputByteSize);
ReadFile("./input/input_y.bin", inputByteSize, y, inputByteSize);
AscendC::SetKernelMode(KernelMode::AIV_MODE);
ICPU_RUN_KF(add_custom, blockDim, x, y, z,
*reinterpret_cast<AddCustomTilingData *>(tiling)); // use this macro for cpu debug
WriteFile("./output/output_z.bin", z, outputByteSize);
AscendC::GmFree((void *)x);
AscendC::GmFree((void *)y);
AscendC::GmFree((void *)z);
AscendC::GmFree((void *)tiling);
//Ascend设备模式
#else
CHECK_ACL(aclInit(nullptr));
int32_t deviceId = 0;
CHECK_ACL(aclrtSetDevice(deviceId));
aclrtStream stream = nullptr;
CHECK_ACL(aclrtCreateStream(&stream));
AddCustomTilingData *tiling;
uint8_t *xHost, *yHost, *zHost;
uint8_t *xDevice, *yDevice, *zDevice;
//存储任务切分信息
CHECK_ACL(aclrtMallocHost((void **)(&tiling), tilingSize));
ReadFile("./input/input_tiling.bin", tilingSize, tiling, tilingSize);
CHECK_ACL(aclrtMallocHost((void **)(&xHost), inputByteSize));
CHECK_ACL(aclrtMallocHost((void **)(&yHost), inputByteSize));
CHECK_ACL(aclrtMallocHost((void **)(&zHost), outputByteSize));
CHECK_ACL(aclrtMalloc((void **)&xDevice, inputByteSize, ACL_MEM_MALLOC_HUGE_FIRST));
CHECK_ACL(aclrtMalloc((void **)&yDevice, inputByteSize, ACL_MEM_MALLOC_HUGE_FIRST));
CHECK_ACL(aclrtMalloc((void **)&zDevice, outputByteSize, ACL_MEM_MALLOC_HUGE_FIRST));
ReadFile("./input/input_x.bin", inputByteSize, xHost, inputByteSize);
ReadFile("./input/input_y.bin", inputByteSize, yHost, inputByteSize);
CHECK_ACL(aclrtMemcpy(xDevice, inputByteSize, xHost, inputByteSize, ACL_MEMCPY_HOST_TO_DEVICE));
CHECK_ACL(aclrtMemcpy(yDevice, inputByteSize, yHost, inputByteSize, ACL_MEMCPY_HOST_TO_DEVICE));
ACLRT_LAUNCH_KERNEL(add_custom)(blockDim, stream, xDevice, yDevice, zDevice, tiling);
CHECK_ACL(aclrtSynchronizeStream(stream));
CHECK_ACL(aclrtMemcpy(zHost, outputByteSize, zDevice, outputByteSize, ACL_MEMCPY_DEVICE_TO_HOST));
WriteFile("./output/output_z.bin", zHost, outputByteSize);
CHECK_ACL(aclrtFree(xDevice));
CHECK_ACL(aclrtFree(yDevice));
CHECK_ACL(aclrtFree(zDevice));
CHECK_ACL(aclrtFreeHost(xHost));
CHECK_ACL(aclrtFreeHost(yHost));
CHECK_ACL(aclrtFreeHost(zHost));
CHECK_ACL(aclrtFreeHost(tiling));
CHECK_ACL(aclrtDestroyStream(stream));
CHECK_ACL(aclrtResetDevice(deviceId));
CHECK_ACL(aclFinalize());
#endif
return 0;
}
3.3 add_custom_tiling.h
/**
* @file add_custom_tiling.h
*
* Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*/
#ifndef ADD_CUSTOM_TILING_H
#define ADD_CUSTOM_TILING_H
#include <cstdint>
struct AddCustomTilingData {
uint32_t totalLength;
uint32_t tileNum;
};
#endif
4. 性能分析
说到底,Tiling是一个用于算子执行加速的模块,那么让我们看看到底有没有加速,这里就需要用到Ascend算子性能分析工具了msProf op了,有需要视频讲解的可以查看Ascend官方讲解MindStudio算子开发调优系列工具分享&赋能。
也可以访问官网的相关文档 算子调优(msProf)
- 导入环境变量
export LD_LIBRARY_PATH=${install_path}/samples/operator/ascendc/0_introduction/3_add_kernellaunch/AddKernelInvocationNeo/out/lib/:$LD_LIBRARY_PATH
#install_path换成实际路径
export LD_LIBRARY_PATH=/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/out/lib/:$LD_LIBRARY_PATH
如果出现文件权限问题,如
$ msprof op ascendc_kernels_bbit -output ./msprof_out/
2025-02-08 17:19:34 [ERROR] The current file should not be allowed to have write permissions to any other users
2025-02-08 17:19:34 [ERROR] output parent dir permission wrong
use msprof op --help or msprof op simulator --help for more details
执行以下指令即可
chmod 700 .
kernel直调场景下如果使用
msprof op ascendc_kernels_bbit
会报错:
2025-02-08 18:47:12 [ERROR] Analyzing profiling data failed for this chip type is not supported.
2025-02-08 18:47:12 [ERROR] Profiling data parse failed. Please check
这里没有运行成果非模拟状态的,可能是版本不支持了,这里也没有去细究,后续采用模拟来验证,查看流水线情况。
ln -s /usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascend310B4/lib/libruntime_camodel.so /home/HwHiAiUser/MyAscend/samples/libruntime.so
export LD_LIBRARY_PATH=/home/HwHiAiUser/MyAscend/samples/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascend310B4/lib:/home/HwHiAiUser/MyAscend/samples:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/out/lib/:$LD_LIBRARY_PATH
msprof op simulator ./ascendc_kernels_bbit
得到以下输出结果
(base) HwHiAiUser@orangepiaipro:~/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo$ msprof op simulator ./ascendc_kernels_bbit
2025-02-08 19:33:40 [INFO] Op profiling analysis start.
2025-02-08 19:33:40 [INFO] Running simulation task: Binary Simulation Running, use simulator in LD_LIBRARY_PATH
[INFO] RUNTIME(167805,msopprof):2025-02-08-19:33:40.217.707 [task_fail_callback_manager.cc:52] 167805 TaskFailCallBackManager: Constructor.[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.386.583 [task_fail_callback_manager.cc:52] 167807 TaskFailCallBackManager: Constructor.
[EVENT] PROFILING(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.672.562 [msprof_callback_impl.cpp:336] >>> (tid:167807) Started to register profiling ctrl callback.
[EVENT] PROFILING(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.673.150 [msprof_callback_impl.cpp:343] >>> (tid:167807) Started to register profiling hash id callback.
[INFO] PROFILING(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.673.322 [prof_atls_plugin.cpp:117] (tid:167807) RegisterProfileCallback, callback type is 7
[EVENT] PROFILING(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.673.474 [msprof_callback_impl.cpp:350] >>> (tid:167807) Started to register profiling enable host freq callback.
[INFO] PROFILING(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.673.612 [prof_atls_plugin.cpp:117] (tid:167807) RegisterProfileCallback, callback type is 8
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:40.688.333 [runtime.cc:5471] 167807 GetVisibleDevices: ASCEND_RT_VISIBLE_DEVICES param was not set
[INFO] GE(167807,ascendc_kernels_bbit):2025-02-08-19:33:41.842.787 [op_tiling_manager.cc:109]167807 ~FuncPerfScope:[GEPERFTRACE] The time cost of OpTilingManager::LoadSo is [964785] micro second.
[INFO] Config file is found, path is /usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux/simulator/Ascend310B1/lib/config_stars.json.
[TmSim]: Run in serial mode, HW concurrency is:4, actual thread num is: 1
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 0: chip_id 0 die_id 0, dev_core_id 0, subcore 0 of 2 (CUBE), spr COREID 0x0000 (0)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 0: chip_id 0 die_id 0, dev_core_id 0, subcore 1 of 2 (VEC), spr COREID 0x0018 (24)
[INFO] AicWrapper attach AIC 0, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 1: chip_id 0 die_id 0, dev_core_id 1, subcore 0 of 2 (CUBE), spr COREID 0x0001 (1)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 1: chip_id 0 die_id 0, dev_core_id 1, subcore 1 of 2 (VEC), spr COREID 0x0019 (25)
[INFO] AicWrapper attach AIC 1, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 2: chip_id 0 die_id 0, dev_core_id 2, subcore 0 of 2 (CUBE), spr COREID 0x0002 (2)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 2: chip_id 0 die_id 0, dev_core_id 2, subcore 1 of 2 (VEC), spr COREID 0x001A (26)
[INFO] AicWrapper attach AIC 2, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 3: chip_id 0 die_id 0, dev_core_id 3, subcore 0 of 2 (CUBE), spr COREID 0x0003 (3)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 3: chip_id 0 die_id 0, dev_core_id 3, subcore 1 of 2 (VEC), spr COREID 0x001B (27)
[INFO] AicWrapper attach AIC 3, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 4: chip_id 0 die_id 0, dev_core_id 4, subcore 0 of 2 (CUBE), spr COREID 0x0004 (4)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 4: chip_id 0 die_id 0, dev_core_id 4, subcore 1 of 2 (VEC), spr COREID 0x001C (28)
[INFO] AicWrapper attach AIC 4, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 5: chip_id 0 die_id 0, dev_core_id 5, subcore 0 of 2 (CUBE), spr COREID 0x0005 (5)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 5: chip_id 0 die_id 0, dev_core_id 5, subcore 1 of 2 (VEC), spr COREID 0x001D (29)
[INFO] AicWrapper attach AIC 5, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 6: chip_id 0 die_id 0, dev_core_id 6, subcore 0 of 2 (CUBE), spr COREID 0x0006 (6)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 6: chip_id 0 die_id 0, dev_core_id 6, subcore 1 of 2 (VEC), spr COREID 0x001E (30)
[INFO] AicWrapper attach AIC 6, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 7: chip_id 0 die_id 0, dev_core_id 7, subcore 0 of 2 (CUBE), spr COREID 0x0007 (7)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 7: chip_id 0 die_id 0, dev_core_id 7, subcore 1 of 2 (VEC), spr COREID 0x001F (31)
[INFO] AicWrapper attach AIC 7, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 8: chip_id 0 die_id 0, dev_core_id 8, subcore 0 of 2 (CUBE), spr COREID 0x0008 (8)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 8: chip_id 0 die_id 0, dev_core_id 8, subcore 1 of 2 (VEC), spr COREID 0x0020 (32)
[INFO] AicWrapper attach AIC 8, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 9: chip_id 0 die_id 0, dev_core_id 9, subcore 0 of 2 (CUBE), spr COREID 0x0009 (9)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 9: chip_id 0 die_id 0, dev_core_id 9, subcore 1 of 2 (VEC), spr COREID 0x0021 (33)
[INFO] AicWrapper attach AIC 9, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 10: chip_id 0 die_id 0, dev_core_id 10, subcore 0 of 2 (CUBE), spr COREID 0x000A (10)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 10: chip_id 0 die_id 0, dev_core_id 10, subcore 1 of 2 (VEC), spr COREID 0x0022 (34)
[INFO] AicWrapper attach AIC 10, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 11: chip_id 0 die_id 0, dev_core_id 11, subcore 0 of 2 (CUBE), spr COREID 0x000B (11)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 11: chip_id 0 die_id 0, dev_core_id 11, subcore 1 of 2 (VEC), spr COREID 0x0023 (35)
[INFO] AicWrapper attach AIC 11, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 12: chip_id 0 die_id 0, dev_core_id 12, subcore 0 of 2 (CUBE), spr COREID 0x000C (12)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 12: chip_id 0 die_id 0, dev_core_id 12, subcore 1 of 2 (VEC), spr COREID 0x0024 (36)
[INFO] AicWrapper attach AIC 12, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 13: chip_id 0 die_id 0, dev_core_id 13, subcore 0 of 2 (CUBE), spr COREID 0x000D (13)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 13: chip_id 0 die_id 0, dev_core_id 13, subcore 1 of 2 (VEC), spr COREID 0x0025 (37)
[INFO] AicWrapper attach AIC 13, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 14: chip_id 0 die_id 0, dev_core_id 14, subcore 0 of 2 (CUBE), spr COREID 0x000E (14)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 14: chip_id 0 die_id 0, dev_core_id 14, subcore 1 of 2 (VEC), spr COREID 0x0026 (38)
[INFO] AicWrapper attach AIC 14, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 15: chip_id 0 die_id 0, dev_core_id 15, subcore 0 of 2 (CUBE), spr COREID 0x000F (15)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 15: chip_id 0 die_id 0, dev_core_id 15, subcore 1 of 2 (VEC), spr COREID 0x0027 (39)
[INFO] AicWrapper attach AIC 15, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 16: chip_id 0 die_id 0, dev_core_id 16, subcore 0 of 2 (CUBE), spr COREID 0x0010 (16)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 16: chip_id 0 die_id 0, dev_core_id 16, subcore 1 of 2 (VEC), spr COREID 0x0028 (40)
[INFO] AicWrapper attach AIC 16, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 17: chip_id 0 die_id 0, dev_core_id 17, subcore 0 of 2 (CUBE), spr COREID 0x0011 (17)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 17: chip_id 0 die_id 0, dev_core_id 17, subcore 1 of 2 (VEC), spr COREID 0x0029 (41)
[INFO] AicWrapper attach AIC 17, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 18: chip_id 0 die_id 0, dev_core_id 18, subcore 0 of 2 (CUBE), spr COREID 0x0012 (18)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 18: chip_id 0 die_id 0, dev_core_id 18, subcore 1 of 2 (VEC), spr COREID 0x002A (42)
[INFO] AicWrapper attach AIC 18, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 19: chip_id 0 die_id 0, dev_core_id 19, subcore 0 of 2 (CUBE), spr COREID 0x0013 (19)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 19: chip_id 0 die_id 0, dev_core_id 19, subcore 1 of 2 (VEC), spr COREID 0x002B (43)
[INFO] AicWrapper attach AIC 19, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 20: chip_id 0 die_id 0, dev_core_id 20, subcore 0 of 2 (CUBE), spr COREID 0x0014 (20)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 20: chip_id 0 die_id 0, dev_core_id 20, subcore 1 of 2 (VEC), spr COREID 0x002C (44)
[INFO] AicWrapper attach AIC 20, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 21: chip_id 0 die_id 0, dev_core_id 21, subcore 0 of 2 (CUBE), spr COREID 0x0015 (21)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 21: chip_id 0 die_id 0, dev_core_id 21, subcore 1 of 2 (VEC), spr COREID 0x002D (45)
[INFO] AicWrapper attach AIC 21, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 22: chip_id 0 die_id 0, dev_core_id 22, subcore 0 of 2 (CUBE), spr COREID 0x0016 (22)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 22: chip_id 0 die_id 0, dev_core_id 22, subcore 1 of 2 (VEC), spr COREID 0x002E (46)
[INFO] AicWrapper attach AIC 22, num_vec_core=2, num_subcore=3
[WARNING] pem_log.cc:212 config_aicore_log config.json cannot found, SIM.rotating of log use default value
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 23: chip_id 0 die_id 0, dev_core_id 23, subcore 0 of 2 (CUBE), spr COREID 0x0017 (23)
[PEM_AIC_LOG] [DEBUG] [0000000000] init_core_spr (196):: Core 23: chip_id 0 die_id 0, dev_core_id 23, subcore 1 of 2 (VEC), spr COREID 0x002F (47)
[INFO] AicWrapper attach AIC 23, num_vec_core=2, num_subcore=3
[INFO] Chip 0 AIC / Scheduler / Soc periods: 200.0000 / 200.0000 / 105.0000
[INFO] chip 0 die 0 device created
================================================================================
>>>>
>>>> " PEM MODEL "
>>>> Total no. of 1 chip(s) Model Init Success!
>>>>
================================================================================
[INFO] Model Start Time: 2025-02-08 19:33:44
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:44.294.606 [raw_device.cc:237] 167807 Init: isAddrFlat:0x1
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:44.307.485 [engine.cc:76] 167807 Engine: Constructor.
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:44.307.849 [stars_engine.cc:41] 167807 StarsEngine: Constructor.
[DRVSTUB_LOG] driver_api.c:539 sendSwapBuf:swapbuf_base_addr:10000000
[DRVSTUB_LOG] driver_api.c:540 sendSwapBuf:sq:0 swapbuf_addr:10000000
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:33:44.319.329 [npu_driver.cc:5784] 167887 GetDeviceStatus: GetDeviceStatus status=1.
[DRVSTUB_LOG] driver_api.c:539 sendSwapBuf:swapbuf_base_addr:10000000
[DRVSTUB_LOG] driver_api.c:540 sendSwapBuf:sq:1 swapbuf_addr:10000040
2025-02-08 19:33:44 [INFO] Start profiling on kernel: add_custom_0
[PEM_AIC_LOG] [INFO] [0000000268] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000013783] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @13783 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000013984] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000024668] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @24668 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000024869] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000035451] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @35451 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000035652] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000046330] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @46330 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000046531] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000057333] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @57333 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000057534] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000068245] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @68245 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000068446] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000079049] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @79049 Main CCU executing `END' instr.
[PEM_AIC_LOG] [INFO] [0000079250] kickstart (293):: Core 0 subcore 0 start pc 0x10d76000
[PEM_AIC_LOG] [INFO] [0000089964] aic_done_callback (78):: Core 0 subcore 0 early_end 0 done
[INFO] pem.cc:605 proc_end_pem_ostd_num_is_one @89964 Main CCU executing `END' instr.
[DRVSTUB_LOG] driver_api.c:2196 send_stars_interrupt:get cq_0 base_addr: 10020000
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:35:47.802.298 [stars_engine.cc:48] 167807 ~StarsEngine: Destructor.
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:35:47.802.625 [engine.cc:82] 167807 ~Engine: Destructor.
[INFO] Model Stop Time: 2025-02-08 19:35:47
Model RUN TIME: 123509 ms
[INFO] Total tick: 90251
[INFO] Model stopped successfully.
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:35:48.641.886 [task_fail_callback_manager.cc:57] 167807 ~TaskFailCallBackManager: Destructor.
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:35:48.651.767 [runtime.cc:2033] 167807 ~Runtime: deconstruct runtime
[INFO] RUNTIME(167807,ascendc_kernels_bbit):2025-02-08-19:35:48.652.673 [runtime.cc:2040] 167807 ~Runtime: wait monitor success, use=0.
2025-02-08 19:35:48 [INFO] Profiling running finished. All task success.
2025-02-08 19:35:49 [WARN] Extract start pc failed from register dump files
2025-02-08 19:36:00 [WARN] file is empty or not exist. filepath: /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/OPPROF_20250208193340_LZJYVOPEHPVCQZFK/add_custom_0/0/dump/core0.veccore0.ifu.icache.log
2025-02-08 19:36:11 [INFO] Extract 839 relations from kernel
2025-02-08 19:36:11 [INFO] Parse 2667 addr2line relations
2025-02-08 19:37:10 [INFO] Core operator results run in simulator as follow:
core_name duration_time(us) running_time(us)
core0.veccore0 170.28 4.82
core0.cubecore0 177.84 82.49
2025-02-08 19:37:13 [INFO] Profiling results saved in /home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/OPPROF_20250208193340_LZJYVOPEHPVCQZFK
2025-02-08 19:37:13 [INFO] Profiling data parse finished.
2025-02-08 19:37:13 [INFO] Op profiling finish. Welcome to next use.
[INFO] RUNTIME(167805,msopprof):2025-02-08-19:37:13.295.111 [task_fail_callback_manager.cc:57] 167805 ~TaskFailCallBackManager: Destructor.(base) HwHiAiUser@orangepiaipro:~/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationNeo$
利用这个得到的相关是相关的分析文件,产出路径是同目录下的文件夹:OPPROF_xxxx,将文件导入到MindStudioInsight,可以看到指令执行情况,
没有tiling策略:
有tiling策略:
新开一个终端
cd samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/
export LD_LIBRARY_PATH=/home/HwHiAiUser/MyAscend/samples/operator/ascendc/tutorials/AddCustomSample/KernelLaunch/AddKernelInvocationTilingNeo/out/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascend310B4/lib:/home/HwHiAiUser/MyAscend/samples:$LD_LIBRARY_PATH
msprof op simulator ./ascendc_kernels_bbit
从这个结果看,增加了tiling策略,时间上反而提升了,可能是add算子过于简单导致。此处留坑,期待日后解决。
- 点赞
- 收藏
- 关注作者
评论(0)