vGPU

Install dependencies

sudo apt install unzip

Download driver

Download package NVIDIA-GRID-Ubuntu-KVM-525.85.07-525.85.05-528.24.zip
Create token for licensing
Extract

mkdir -p /mnt/nfs/vGPU
unzip NVIDIA-GRID-Ubuntu-KVM-525.85.07-525.85.05-528.24.zip -d /mnt/nfs/vGPU

Remove current NVIDIA driver if existing

bash /mnt/local/NVIDIA-Linux-x86_64-450.216.04.run --uninstall

Host machine (compute node)

Disable Nouveau and enable nvfio

echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all

Disable X server

sudo init 3

Ref: link

Install driver

cd /mnt/nfs/vGPU/Host_Drivers
sudo apt install ./nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb

Ref: link

Configuring vGPU [Deprecated]

Check domain/bus/slot/function

admin@cluter-gn1:~$ lsmod | grep vfio
nvidia_vgpu_vfio       65536  272
mdev                   28672  1 nvidia_vgpu_vfio

admin@cluter-gn1:~$ lspci | grep NVIDIA
89:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
8a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
b2:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
b3:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

admin@cluter-gn1:~$ virsh nodedev-list --cap pci| grep 89_00_0
pci_0000_89_00_0

admin@cluter-gn1:~$ virsh nodedev-dumpxml pci_0000_89_00_0 | egrep 'domain|bus|slot|function'
    <domain>0</domain>
    <bus>137</bus>
    <slot>0</slot>
    <function>0</function>

Create vGPU manually

Change to the root user

sudo -i

Seach nvidia profiles
- 8Q - 8G/vGPU
- 16Q - 16G/vGPU

cd /sys/bus/pci/devices/0000:89:00.0/mdev_supported_types
grep -l "V100DX-8Q" nvidia-*/name

The output:

nvidia-197/name

Check available instances, the result should be greater than 0

cat nvidia-197/available_instances

Generate uuid for the vGPU

# uuidgen
b87b1cd3-feb8-4ca6-88af-33b3c9f81425

Write the UUID that you obtained in the previous step to the create file in the registration information directory for the vGPU type that you want to create

echo "b87b1cd3-feb8-4ca6-88af-33b3c9f81425" > nvidia-197/create

Make the mdev device file that you created to represent the vGPU persistent.

mdevctl define --auto --uuid b87b1cd3-feb8-4ca6-88af-33b3c9f81425

Confirm that the vGPU was created

ls -l /sys/bus/mdev/devices/

or

mdevctl list

Create all vGPU

cd /mnt/nfs/vGPU
sudo ./admin-create-all-vgpus.sh -r 8

-r 8: choose 8GB of vGPU memory

Adding One or More vGPUs to a Linux with KVM Hypervisor VM by Using virsh

virsh edit vgn1

Add device entries

<device>
...
    <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
      <source>
        <address uuid=''/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
      <source>
        <address uuid=''/>
      </source>
    </hostdev>
</device>

Start/Restart the VM

virsh start vgn1

VMs

Access to the VM via `virsh`

admin@cluter-gn1:/mnt/nfs/vGPU$ virsh list --all
 Id   Name   State
----------------------
 2    vgn1   running

admin@cluter-gn1:/mnt/nfs/vGPU$ virsh console vgn1
Connected to domain 'vgn1'
Escape character is ^] (Ctrl + ])

admin@cluter-vgn1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:05:41:79 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.101/24 brd 172.20.0.255 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe05:4179/64 scope link 
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:89:d9:16 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.41/24 brd 172.16.0.255 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe89:d916/64 scope link 
       valid_lft forever preferred_lft forever
4: enp20s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:51:d3:51 brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.101/24 brd 192.168.33.255 scope global enp20s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe51:d351/64 scope link 
       valid_lft forever preferred_lft forever

Install SSH server

sudo apt install openssh-server

Exit console: Ctrl + ]

Access to the VM via `ssh`

Copy ssh key

ssh-copy-id -i ~/.ssh/id_rsa.pub ubuntu@172.16.0.101

SSH to the VM

ssh ubuntu@172.16.0.101

Copy NVIDIA Guest drive and token

scp /mnt/nfs/vGPU/Guest_Drivers/nvidia-linux-grid-525_525.85.05_amd64.deb ubuntu@172.16.0.101:/home/admin
scp /mnt/nfs/vGPU/client_configuration_token_03-23-2023-09-07-06.tok ubuntu@172.16.0.101:/home/admin

Install NVIDIA driver

sudo apt install ./nvidia-linux-grid-525_525.85.05_amd64.deb

Change FeatureType from 0 to 2

sudo nano /etc/nvidia/gridd.conf

# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
#    0 => for unlicensed state
#    1 => for NVIDIA vGPU (Optional, autodetected as per vGPU type)
#    2 => for NVIDIA RTX Virtual Workstation
#    4 => for NVIDIA Virtual Compute Server
# All other values reserved
FeatureType=2

Restart VM

sudo reboot

SSH to VM
Copy token

sudo cp client_configuration_token_03-23-2023-09-07-06.tok /etc/nvidia/ClientConfigToken/

Change mode

chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_03-23-2023-09-07-06.tok

Restart nvidia-gridd deamon

sudo systemctl restart nvidia-gridd.service

Test

nvidia-smi -q

Rebooting the VM may save your time when the vGPU does not recognize the license

sudo reboot

vGPU

Install dependencies

Install dependencies

Download driver

Remove current NVIDIA driver if existing

Host machine (compute node)

Disable Nouveau and enable nvfio

Disable X server

Install driver

Configuring vGPU [Deprecated]

Create vGPU manually

Create all vGPU

Adding One or More vGPUs to a Linux with KVM Hypervisor VM by Using virsh

VMs

Access to the VM via virsh

Access to the VM via ssh

Access to the VM via `virsh`

Access to the VM via `ssh`