|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
在debian11上执行python,关于torch的报错:
- Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
- Type "help", "copyright", "credits" or "license" for more information.
- >>> import torch
- >>> torch.cuda.is_available()
- /root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
- return torch._C._cuda_getDeviceCount() > 0
- False
- >>>
复制代码
运行stable diffusion 2.0的时候也是出现了这个错误。
- (sd_GPU) root@debian:/home/LYF/stablediffusion/stablediffusion-main# python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt models/ldm/sd_v2/768model.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
- /root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
- warn(f"Failed to load image Python extension: {e}")
- Global seed set to 42
- Loading model from models/ldm/sd_v2/768model.ckpt
- Global Step: 110000
- No module 'xformers'. Proceeding without it.
- LatentDiffusion: Running in v-prediction mode
- DiffusionWrapper has 865.91 M params.
- making attention of type 'vanilla' with 512 in_channels
- Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
- making attention of type 'vanilla' with 512 in_channels
- Traceback (most recent call last):
- File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 289, in <module>
- main(opt)
- File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 190, in main
- model = load_model_from_config(config, f"{opt.ckpt}")
- File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 43, in load_model_from_config
- model.cuda()
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 127, in cuda
- return super().cuda(device=device)
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in cuda
- return self._apply(lambda t: t.cuda(device))
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
- module._apply(fn)
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
- module._apply(fn)
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
- module._apply(fn)
- [Previous line repeated 1 more time]
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
- param_applied = fn(param)
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in <lambda>
- return self._apply(lambda t: t.cuda(device))
- File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
- torch._C._cuda_init()
- RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice
-
复制代码
用的显卡是GTX 980 Ti 系统Debian11 ,cuda版本:cuda-11.4
- (sd_GPU) root@debian:/home/LYF/stablediffusion/stablediffusion-main# nvidia-smi
- Fri Jan 13 10:04:08 2023
- +-----------------------------------------------------------------------------+
- | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
- |-------------------------------+----------------------+----------------------+
- | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
- | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
- | | | MIG M. |
- |===============================+======================+======================|
- | 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
- | 22% 43C P8 27W / 250W | 1MiB / 6075MiB | 0% Default |
- | | | N/A |
- +-------------------------------+----------------------+----------------------+
-
- +-----------------------------------------------------------------------------+
- | Processes: |
- | GPU GI CI PID Type Process name GPU Memory |
- | ID ID Usage |
- |=============================================================================|
- | No running processes found |
- +-----------------------------------------------------------------------------+
-
-
复制代码
所以请问这是什么问题呢?是cuda版本问题吗?还是显卡驱动的问题?非常感谢!!
这个报错信息指出了一个HIP初始化的错误。HIP是AMD提供的一个类似于CUDA的加速框架。检查您的代码中是否有任何类似于hipGetDeviceCount()或NumHipDevices()的HIP函数调用。如果有,请确保在这些调用之前没有其他CUDA函数调用。
此外,尝试更新pytorch-lightning和pytorch版本以解决此问题。目前最新版本的pytorch为1.10.0,最新的pytorch-lightning可以通过pip安装,命令为:
- pip install pytorch-lightning
复制代码
如果您的电脑上还安装了其他版本的pytorch,您可以使用conda或者virtualenv等工具来创建一个隔离的python环境,以防止不同版本之间的冲突。
|
|