Linux-install-cuda

centos7如何安装CUDA

  1. 查看显卡型号lspci | grep -i vga,提示-bash: lspci: 未找到命令,执行yum install pciutils安装lspci命令

    1
    2
    [root@exxk ~]# lspci | grep -i vga
    01:00.0 VGA compatible controller: NVIDIA Corporation Device 25e2 (rev a1)
  2. 在输出结果中获取25e2关键字,在https://admin.pci-ids.ucw.cz/read/PC/10de/连接尾部拼接25e2,例如https://admin.pci-ids.ucw.cz/read/PC/10de/25e2,访问即可,页面回展示类似内容

    1
    2
    3
    Main -> PCI Devices -> Vendor 10de -> Device 10de:25e2

    Name: GA107BM [GeForce RTX 3050 Mobile]

    从中可以知道25e2对应的显卡是GeForce RTX 3050 Mobile

  3. 在该https://www.nvidia.cn/Download/index.aspx?lang=cn网页找到`GeForce RTX 3050 Mobile=GeForce RTX 3050 Laptop GPU该驱动,选择Linux64-bit然后下载下来 NVIDIA-Linux-x86_64-550.78.run `

  4. 拷贝到centos7上面,执行下面命令进行安装驱动

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    #屏蔽系统自带的nouveau显卡驱动(重启后生效---begin-----------
    [root@exxk ~]# lsmod | grep nouveau
    nouveau 2543616 0
    drm_ttm_helper 12288 1 nouveau
    ttm 90112 2 drm_ttm_helper,nouveau
    i2c_algo_bit 12288 1 nouveau
    mxm_wmi 12288 1 nouveau
    drm_display_helper 176128 1 nouveau
    drm_kms_helper 225280 4 drm_display_helper,nouveau
    drm 643072 6 drm_kms_helper,drm_display_helper,drm_ttm_helper,ttm,nouveau
    video 69632 2 ideapad_laptop,nouveau
    wmi 36864 5 video,wmi_bmof,ideapad_laptop,mxm_wmi,nouveau
    [root@exxk ~]# vi /lib/modprobe.d/dist-blacklist.conf
    #将nvidiafb注释掉:
    #blacklist nvidiafb
    #然后添加以下语句:
    #blacklist nouveau
    #options nouveau modeset=0
    [root@exxk ~]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
    [root@exxk ~]# dracut /boot/initramfs-$(uname -r).img $(uname -r)
    [root@exxk ~]# reboot
    [root@exxk ~]# lsmod | grep nouveau
    #重启之后发现lsmod | grep nouveau命令无任何输出代表成功
    #屏蔽系统自带的nouveau显卡驱动(重启后生效---end-----------

    #安装依赖环境------------begin--------------------------
    #安装kernel-devel kernel-headers gcc
    [root@exxk ~]# yum install kernel-devel gcc -y
    [root@exxk ~]# uname -r
    3.10.0-1160.118.1.el7.x86_64
    [root@exxk ~]# yum info kernel-devel kernel-headers | grep 发布
    发布 :1160.118.1.el7
    发布 :1160.118.1.el7
    #内核版本要和安装的版本一致
    #如果不一致,检查内核是否需要升级
    [root@exxk ~]# yum check-update kernel
    #查看可用内核
    [root@exxk ~]# cat /boot/grub2/grub.cfg |grep menuentry
    #升级到和安装工具一致的内核版本
    [root@exxk ~]# grub2-set-default 'CentOS Linux (3.10.0-1160.118.1.el7.x86_64) 7 (Core)'
    [root@exxk ~]# reboot
    #安装依赖环境------------end--------------------------

    #安装显卡驱动,一致点同意即可
    [root@exxk ~]# ./NVIDIA-Linux-x86_64-550.78.run
    #查看驱动信息
    [root@exxk ~]# nvidia-smi
    Sat May 11 14:43:05 2024
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 |
    |-----------------------------------------+------------------------+----------------------+
    | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |=========================================+========================+======================|
    | 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
    | N/A 47C P0 10W / 60W | 1MiB / 4096MiB | 0% Default |
    | | | N/A |
    +-----------------------------------------+------------------------+----------------------+

    +-----------------------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=========================================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------------------+
  5. 安装NVIDIA Container Toolkit,执行下面的命令(确保您已安装适用于您的 Linux 发行版的NVIDIA 驱动程序 请注意,您不需要在主机系统上安装 CUDA Toolkit,但需要安装 NVIDIA 驱动程序)

    1
    2
    3
    4
    [root@exxk ~]# curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    [root@exxk ~]# yum-config-manager --enable nvidia-container-toolkit-experimental
    yum install -y nvidia-container-toolkit
    [root@exxk ~]# yum install -y nvidia-container-toolkit
  6. 配置containerd(针对Kubernetes),执行下面命令

    1
    2
    3
    4
    5
    [root@exxk ~]# nvidia-ctk runtime configure --runtime=containerd
    INFO[0000] Loading config from /etc/containerd/config.toml
    INFO[0000] Wrote updated config to /etc/containerd/config.toml
    INFO[0000] It is recommended that containerd daemon be restarted.
    [root@exxk ~]# systemctl restart containerd
  7. 安装 PaddlePaddle 镜像

    1
    nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.6.1-gpu-cuda12.0-cudnn8.9-trt8.6

新版cuda安装(废弃采用容器安装)

  1. 安装cuda 12.4.1最新版(飞浆需要12.0的版本,因此将下面的替换成12.0.1)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
    sh cuda_12.4.1_550.54.15_linux.run
    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ CUDA Installer │
    │ - [ ] Driver │
    │ [ ] 550.54.15 │
    │ + [X] CUDA Toolkit 12.4 │
    │ [X] CUDA Demo Suite 12.4 │
    │ [X] CUDA Documentation 12.4 │
    │ - [ ] Kernel Objects │
    │ [ ] nvidia-fs │
    │ Options │
    │ Install │
    │ │
    │ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
    └──────────────────────────────────────────────────────────────────────────────┘
    ===========
    = Summary =
    ===========

    Driver: Not Selected
    Toolkit: Installed in /usr/local/cuda-12.4/

    Please make sure that
    - PATH includes /usr/local/cuda-12.4/bin
    - LD_LIBRARY_PATH includes /usr/local/cuda-12.4/lib64, or, add /usr/local/cuda-12.4/lib64 to /etc/ld.so.conf and run ldconfig as root

    To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.4/bin
    ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 550.00 is required for CUDA 12.4 functionality to work.
    To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

    Logfile is /var/log/cuda-installer.log
  2. 添加环境变量,修改/etc/profile文件,执行vi /etc/profile,增加如下内容,然后执行source /etc/profile

    1
    2
    export PATH=${PATH}:/usr/local/cuda-12.4/bin
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.4/lib64
  3. 验证

    1
    2
    3
    4
    5
    6
    [root@exxk ~]# nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2024 NVIDIA Corporation
    Built on Thu_Mar_28_02:18:24_PDT_2024
    Cuda compilation tools, release 12.4, V12.4.131
    Build cuda_12.4.r12.4/compiler.34097967_0
  4. 卸载

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    [root@exxk ~]# cd /usr/local/cuda-12.4/bin
    [root@exxk bin]# ./cuda-uninstaller
    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ CUDA Uninstaller │
    │ [X] CUDA_Toolkit_12.4 │
    │ [X] CUDA_Demo_Suite_12.4 │
    │ [X] CUDA_Documentation_12.4 │
    │ Done │
    │ │
    │ Up/Down: Move | 'Enter': Select │
    └──────────────────────────────────────────────────────────────────────────────┘
    Successfully uninstalled
    [root@exxk local]# cd /usr/local/
    [root@exxk local]# rm -rf cuda-12.4

旧版cuda废弃

  1. 下载 developer.nvidia.com页面按自己的环境选择,然后下载cuda。安装方式我选择的runfile(local),下载完成后上传到centos

  2. 执行sh cuda_7.5.18_linux.run,按ctrl+c跳过文档阅读

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    [root@exxk ~]# sh cuda_7.5.18_linux.run
    Logging to /tmp/cuda_install_22223.log
    Using more to view the EULA.
    End User License Agreement
    --------------------------
    --More--(0%)
    Do you accept the previously read EULA? (accept/decline/quit): accept
    # 非常关键,我们已经在之前安装了高版本的驱动,这个千万别装。
    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n
    Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
    Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
    Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
    Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
    Enter CUDA Samples Location [ default is /root ]:
    Installing the CUDA Toolkit in /usr/local/cuda-7.5 ...
    Missing recommended library: libGLU.so
    Missing recommended library: libX11.so
    Missing recommended library: libXi.so
    Missing recommended library: libXmu.so

    Installing the CUDA Samples in /root ...
    Copying samples to /root/NVIDIA_CUDA-7.5_Samples now...
    Finished copying samples.

    ===========
    = Summary =
    ===========

    Driver: Not Selected
    Toolkit: Installed in /usr/local/cuda-7.5
    Samples: Installed in /root, but missing recommended libraries

    Please make sure that
    - PATH includes /usr/local/cuda-7.5/bin
    - LD_LIBRARY_PATH includes /usr/local/cuda-7.5/lib64, or, add /usr/local/cuda-7.5/lib64 to /etc/ld.so.conf and run ldconfig as root

    To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-7.5/bin
    To uninstall the NVIDIA Driver, run nvidia-uninstall

    Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-7.5/doc/pdf for detailed information on setting up CUDA.

    ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 352.00 is required for CUDA 7.5 functionality to work.
    To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

    Logfile is /tmp/cuda_install_22223.log
    Signal caught, cleaning up
  3. 添加环境变量,修改/etc/profile文件,执行vi /etc/profile,增加如下内容,然后执行source /etc/profile

    1
    2
    export PATH=${PATH}:/usr/local/cuda/bin
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
  4. 验证环境

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    [root@exxk ~]# nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2015 NVIDIA Corporation
    Built on Tue_Aug_11_14:27:32_CDT_2015
    Cuda compilation tools, release 7.5, V7.5.17
    # 如果前面安装了CUDA的example,这里可以执行如下操作:
    [root@exxk ~]# cd /root/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery
    [root@exxk deviceQuery]# yum install gcc gcc-c++
    [root@exxk deviceQuery]# make
    "/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
    "/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o
    mkdir -p ../../bin/x86_64/linux/release
    cp deviceQuery ../../bin/x86_64/linux/release
    [root@exxk deviceQuery]# ./deviceQuery
    ./deviceQuery Starting...

    CUDA Device Query (Runtime API) version (CUDART static linking)

    Detected 1 CUDA Capable device(s)

    Device 0: "NVIDIA GeForce RTX 3050 Laptop GPU"
    CUDA Driver Version / Runtime Version 12.4 / 7.5
    CUDA Capability Major/Minor version number: 8.6
    Total amount of global memory: 3873 MBytes (4060872704 bytes)
    MapSMtoCores for SM 8.6 is undefined. Default to use 128 Cores/SM
    MapSMtoCores for SM 8.6 is undefined. Default to use 128 Cores/SM
    (16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
    GPU Max Clock rate: 1500 MHz (1.50 GHz)
    Memory Clock rate: 6001 Mhz
    Memory Bus Width: 128-bit
    L2 Cache Size: 1572864 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
    Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 1536
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: No
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = NVIDIA GeForce RTX 3050 Laptop GPU
    Result = PASS
  5. 卸载 /usr/local/cuda-7.5/bin/uninstall_cuda_7.5.pl

参考

linux服务器上查看显卡(nvidia)型号

Centos7.9安装Nvidia驱动

CentOS7安装NVIDIA显卡驱动

CentOS 7安装GPU、Cuda、Tensorflow

centos7 安装cuda toolkit