GPU check#

Check NVIDIA driver

[1]:

!nvidia-smi

Thu Apr 13 17:18:51 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10-4Q       On   | 00000000:00:07.0 Off |                  N/A |
| N/A   N/A    P0    N/A /  N/A |    128MiB /  3932MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Import PyTorch to check GPU devices

[2]:

import torch

Check the version of PyTorch

[3]:

torch.__version__

[3]:

'2.0.0+cu117'

Check if PyTorch can call GPUs

[4]:

torch.cuda.is_available()

[4]:

True

Check the number of the GPUs of the computer in PyTorch

[5]:

gpu_num = torch.cuda.device_count()
print(gpu_num)

Check the GPU type of the computer in PyTorch

[6]:

for i in range(gpu_num):
    print('GPU {}.: {}'.format(i,torch.cuda.get_device_name(i)))

GPU 0.: NVIDIA A10-4Q

Import TensorFlow to check GPU devices

[7]:

import tensorflow as tf

2023-04-13 17:18:52.308723: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-13 17:18:52.422989: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-13 17:18:52.453216: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-04-13 17:18:52.918465: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.2/lib64:/home/limingbo/anaconda3/envs/tensorflow/lib:/home/limingbo/TensorRT-7.2.3.4/lib
2023-04-13 17:18:52.918653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.2/lib64:/home/limingbo/anaconda3/envs/tensorflow/lib:/home/limingbo/TensorRT-7.2.3.4/lib
2023-04-13 17:18:52.918661: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Check the version of TensorFlow

[8]:

tf.__version__

[8]:

'2.10.1'

Check if TensorFlow can call GPUs

[9]:

tf.test.is_gpu_available()

WARNING:tensorflow:From /tmp/ipykernel_345917/2294581100.py:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.

2023-04-13 17:18:54.000431: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-13 17:18:54.001471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:54.012039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:54.012246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.755971: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.756159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.756317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.756460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 1415 MB memory:  -> device: 0, name: NVIDIA A10-4Q, pci bus id: 0000:00:07.0, compute capability: 8.6

[9]:

True

Check physical GPUs in TensorFlow

[10]:

tf.config.list_physical_devices('GPU')

2023-04-13 17:18:56.761259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.761449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.761587: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

[10]:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Check the GPU type of the computer in TensorFlow

[11]:

from tensorflow.python.client import device_lib

local_device_protos = device_lib.list_local_devices()
for x in local_device_protos:
    if x.device_type == 'GPU':
        print(x.physical_device_desc)

device: 0, name: NVIDIA A10-4Q, pci bus id: 0000:00:07.0, compute capability: 8.6

2023-04-13 17:18:56.765979: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.766168: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.766340: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.766507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.766647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-13 17:18:56.766791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 1415 MB memory:  -> device: 0, name: NVIDIA A10-4Q, pci bus id: 0000:00:07.0, compute capability: 8.6

Import Jax and jaxlib to check GPU devices

[12]:

import jax
import jaxlib

Check the version of jax and jaxlib

[13]:

jax.__version__

[13]:

'0.4.8'

[14]:

jaxlib.__version__

[14]:

'0.4.7'

Check all the devices from the default backend of jax

[15]:

jax.devices()

[15]:

[StreamExecutorGpuDevice(id=0, process_index=0, slice_index=0)]

Check the total number of devices of jax

[16]:

jax.device_count()

[16]:

Check the number of JAX processes associated with the backend of jax

[17]:

jax.process_count()

[17]:

Check the backend of jaxlib

[18]:

from jax.lib import xla_bridge
print(xla_bridge.get_backend().platform)

gpu

Import SecretFlow to verify that no errors are reported in this environemnt

[19]:

import secretflow