NVIDIA CUDA编程实操一:环境检查与入门
【摘要】 我使用的华为云开发者的AI Gallery提供的环境,选择一个Notebook案例运行即可进入,进入后可以切换到限时免费的GPU环境。 第一阶段:环境检查 1. 验证CUDA驱动安装nvidia-smi输出:Thu Jun 5 16:57:18 2025 +-------------------------------------------------------------...
我使用的华为云开发者的AI Gallery提供的环境,选择一个Notebook案例运行即可进入,进入后可以切换到限时免费的GPU环境。
第一阶段:环境检查
1. 验证CUDA驱动安装
nvidia-smi
输出:
Thu Jun 5 16:57:18 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:0D.0 Off | 0 |
| N/A 32C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2. 检查CUDA工具包
nvcc --version
输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
3. 验证CUDA示例运行
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery
该环境无示例。
第二阶段:第一个CUDA程序
1. 创建hello.cu文件
#include <stdio.h>
__global__ void helloFromGPU() {
printf("Hello World from GPU thread %d!\n", threadIdx.x);
}
int main() {
printf("Hello World from CPU!\n");
helloFromGPU<<<1, 5>>>();
cudaDeviceSynchronize();
return 0;
}
2. 编译并运行
nvcc hello.cu -o hello
./hello
预期输出:
Hello World from CPU!
Hello World from GPU thread 0!
Hello World from GPU thread 1!
Hello World from GPU thread 2!
Hello World from GPU thread 3!
Hello World from GPU thread 4!
第三阶段:设备信息查询
1. 创建device_info.cu
#include <stdio.h>
int main() {
int deviceCount;
cudaGetDeviceCount(&deviceCount);
for (int i = 0; i < deviceCount; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
printf("Device %d: %s\n", i, prop.name);
printf(" Compute capability: %d.%d\n", prop.major, prop.minor);
printf(" Global memory: %.2f GB\n", prop.totalGlobalMem/1024.0/1024.0/1024.0);
printf(" Shared memory per block: %.2f KB\n", prop.sharedMemPerBlock/1024.0);
printf(" Warp size: %d threads\n", prop.warpSize);
printf(" Max threads per block: %d\n", prop.maxThreadsPerBlock);
printf(" Max threads per multiprocessor: %d\n", prop.maxThreadsPerMultiProcessor);
printf(" Multiprocessor count: %d\n", prop.multiProcessorCount);
}
return 0;
}
2. 编译运行
nvcc device_info.cu -o device_info
./device_info
对于Tesla P100应看到类似:
Device 0: Tesla P100-PCIE-16GB
Compute capability: 6.0
Global memory: 15.90 GB
Shared memory per block: 48.00 KB
Warp size: 32 threads
Max threads per block: 1024
Max threads per multiprocessor: 2048
Multiprocessor count: 56
第四阶段:简单向量加法
1. 创建vector_add.cu
#include <stdio.h>
#include <cuda_runtime.h>
#define N 5
__global__ void vectorAdd(int *a, int *b, int *c) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < N) {
c[tid] = a[tid] + b[tid];
}
}
int main() {
int a[N] = {1, 2, 3, 4, 5};
int b[N] = {10, 20, 30, 40, 50};
int c[N] = {0};
int *d_a, *d_b, *d_c;
cudaMalloc(&d_a, N * sizeof(int));
cudaMalloc(&d_b, N * sizeof(int));
cudaMalloc(&d_c, N * sizeof(int));
cudaMemcpy(d_a, a, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, N * sizeof(int), cudaMemcpyHostToDevice);
vectorAdd<<<1, N>>>(d_a, d_b, d_c);
cudaMemcpy(c, d_c, N * sizeof(int), cudaMemcpyDeviceToHost);
for (int i = 0; i < N; i++) {
printf("%d + %d = %d\n", a[i], b[i], c[i]);
}
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
return 0;
}
2. 编译运行
nvcc vector_add.cu -o vector_add
./vector_add
输出:
1 + 10 = 11
2 + 20 = 22
3 + 30 = 33
4 + 40 = 44
5 + 50 = 55
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)