Linux-install-cuda
centos7如何安装CUDA
查看显卡型号
lspci | grep -i vga
,提示-bash: lspci: 未找到命令
,执行yum install pciutils
安装lspci命令1
2[root@exxk ~]# lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation Device 25e2 (rev a1)在输出结果中获取
25e2
关键字,在https://admin.pci-ids.ucw.cz/read/PC/10de/
连接尾部拼接25e2
,例如https://admin.pci-ids.ucw.cz/read/PC/10de/25e2,访问即可,页面回展示类似内容1
2
3Main -> PCI Devices -> Vendor 10de -> Device 10de:25e2
Name: GA107BM [GeForce RTX 3050 Mobile]从中可以知道25e2对应的显卡是
GeForce RTX 3050 Mobile
在该https://www.nvidia.cn/Download/index.aspx?lang=cn网页找到`GeForce RTX 3050 Mobile
=
GeForce RTX 3050 Laptop GPU该驱动,选择
Linux64-bit然后下载下来
NVIDIA-Linux-x86_64-550.78.run `拷贝到centos7上面,执行下面命令进行安装驱动
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67#屏蔽系统自带的nouveau显卡驱动(重启后生效---begin-----------
[root@exxk ~]# lsmod | grep nouveau
nouveau 2543616 0
drm_ttm_helper 12288 1 nouveau
ttm 90112 2 drm_ttm_helper,nouveau
i2c_algo_bit 12288 1 nouveau
mxm_wmi 12288 1 nouveau
drm_display_helper 176128 1 nouveau
drm_kms_helper 225280 4 drm_display_helper,nouveau
drm 643072 6 drm_kms_helper,drm_display_helper,drm_ttm_helper,ttm,nouveau
video 69632 2 ideapad_laptop,nouveau
wmi 36864 5 video,wmi_bmof,ideapad_laptop,mxm_wmi,nouveau
[root@exxk ~]# vi /lib/modprobe.d/dist-blacklist.conf
#将nvidiafb注释掉:
#blacklist nvidiafb
#然后添加以下语句:
#blacklist nouveau
#options nouveau modeset=0
[root@exxk ~]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@exxk ~]# dracut /boot/initramfs-$(uname -r).img $(uname -r)
[root@exxk ~]# reboot
[root@exxk ~]# lsmod | grep nouveau
#重启之后发现lsmod | grep nouveau命令无任何输出代表成功
#屏蔽系统自带的nouveau显卡驱动(重启后生效---end-----------
#安装依赖环境------------begin--------------------------
#安装kernel-devel kernel-headers gcc
[root@exxk ~]# yum install kernel-devel gcc -y
[root@exxk ~]# uname -r
3.10.0-1160.118.1.el7.x86_64
[root@exxk ~]# yum info kernel-devel kernel-headers | grep 发布
发布 :1160.118.1.el7
发布 :1160.118.1.el7
#内核版本要和安装的版本一致
#如果不一致,检查内核是否需要升级
[root@exxk ~]# yum check-update kernel
#查看可用内核
[root@exxk ~]# cat /boot/grub2/grub.cfg |grep menuentry
#升级到和安装工具一致的内核版本
[root@exxk ~]# grub2-set-default 'CentOS Linux (3.10.0-1160.118.1.el7.x86_64) 7 (Core)'
[root@exxk ~]# reboot
#安装依赖环境------------end--------------------------
#安装显卡驱动,一致点同意即可
[root@exxk ~]# ./NVIDIA-Linux-x86_64-550.78.run
#查看驱动信息
[root@exxk ~]# nvidia-smi
Sat May 11 14:43:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P0 10W / 60W | 1MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+安装NVIDIA Container Toolkit,执行下面的命令(确保您已安装适用于您的 Linux 发行版的NVIDIA 驱动程序 请注意,您不需要在主机系统上安装 CUDA Toolkit,但需要安装 NVIDIA 驱动程序)
1
2
3
4[root@exxk ~]# curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
[root@exxk ~]# yum-config-manager --enable nvidia-container-toolkit-experimental
yum install -y nvidia-container-toolkit
[root@exxk ~]# yum install -y nvidia-container-toolkit配置containerd(针对Kubernetes),执行下面命令
1
2
3
4
5[root@exxk ~]# nvidia-ctk runtime configure --runtime=containerd
INFO[0000] Loading config from /etc/containerd/config.toml
INFO[0000] Wrote updated config to /etc/containerd/config.toml
INFO[0000] It is recommended that containerd daemon be restarted.
[root@exxk ~]# systemctl restart containerd-
1
nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.6.1-gpu-cuda12.0-cudnn8.9-trt8.6
新版cuda安装(废弃采用容器安装)
安装cuda 12.4.1最新版(飞浆需要12.0的版本,因此将下面的替换成12.0.1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
sh cuda_12.4.1_550.54.15_linux.run
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer │
│ - [ ] Driver │
│ [ ] 550.54.15 │
│ + [X] CUDA Toolkit 12.4 │
│ [X] CUDA Demo Suite 12.4 │
│ [X] CUDA Documentation 12.4 │
│ - [ ] Kernel Objects │
│ [ ] nvidia-fs │
│ Options │
│ Install │
│ │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-12.4/
Please make sure that
- PATH includes /usr/local/cuda-12.4/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.4/lib64, or, add /usr/local/cuda-12.4/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.4/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 550.00 is required for CUDA 12.4 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log添加环境变量,修改
/etc/profile
文件,执行vi /etc/profile
,增加如下内容,然后执行source /etc/profile
1
2export PATH=${PATH}:/usr/local/cuda-12.4/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.4/lib64验证
1
2
3
4
5
6[root@exxk ~]# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0卸载
1
2
3
4
5
6
7
8
9
10
11
12
13
14[root@exxk ~]# cd /usr/local/cuda-12.4/bin
[root@exxk bin]# ./cuda-uninstaller
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Uninstaller │
│ [X] CUDA_Toolkit_12.4 │
│ [X] CUDA_Demo_Suite_12.4 │
│ [X] CUDA_Documentation_12.4 │
│ Done │
│ │
│ Up/Down: Move | 'Enter': Select │
└──────────────────────────────────────────────────────────────────────────────┘
Successfully uninstalled
[root@exxk local]# cd /usr/local/
[root@exxk local]# rm -rf cuda-12.4
旧版cuda废弃
下载 developer.nvidia.com页面按自己的环境选择,然后下载cuda。安装方式我选择的
runfile(local)
,下载完成后上传到centos执行
sh cuda_7.5.18_linux.run
,按ctrl+c
跳过文档阅读1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47[root@exxk ~]# sh cuda_7.5.18_linux.run
Logging to /tmp/cuda_install_22223.log
Using more to view the EULA.
End User License Agreement
--------------------------
--More--(0%)
Do you accept the previously read EULA? (accept/decline/quit): accept
# 非常关键,我们已经在之前安装了高版本的驱动,这个千万别装。
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /root ]:
Installing the CUDA Toolkit in /usr/local/cuda-7.5 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-7.5_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-7.5
Samples: Installed in /root, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-7.5/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-7.5/lib64, or, add /usr/local/cuda-7.5/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-7.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-7.5/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 352.00 is required for CUDA 7.5 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_22223.log
Signal caught, cleaning up添加环境变量,修改
/etc/profile
文件,执行vi /etc/profile
,增加如下内容,然后执行source /etc/profile
1
2export PATH=${PATH}:/usr/local/cuda/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64验证环境
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57[root@exxk ~]# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
# 如果前面安装了CUDA的example,这里可以执行如下操作:
[root@exxk ~]# cd /root/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery
[root@exxk deviceQuery]# yum install gcc gcc-c++
[root@exxk deviceQuery]# make
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
[root@exxk deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3050 Laptop GPU"
CUDA Driver Version / Runtime Version 12.4 / 7.5
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 3873 MBytes (4060872704 bytes)
MapSMtoCores for SM 8.6 is undefined. Default to use 128 Cores/SM
MapSMtoCores for SM 8.6 is undefined. Default to use 128 Cores/SM
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1500 MHz (1.50 GHz)
Memory Clock rate: 6001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = NVIDIA GeForce RTX 3050 Laptop GPU
Result = PASS卸载
/usr/local/cuda-7.5/bin/uninstall_cuda_7.5.pl