mirror of
https://github.com/0xSojalSec/airllm.git
synced 2026-03-07 22:33:47 +00:00
update README adding troubleshooting
This commit is contained in:
@@ -118,6 +118,55 @@ run_dpo_training.sh 中的参数大部分和Anima的训练参数一致。有几
|
||||
|
||||
我们认为提升模型的基础推理能力和编码基础的knowledge储备更适合通过大规模与训练进行。而DPO和RLHF更适合进行模型的输出控制,或者领域知识的训练。
|
||||
|
||||
# Troubleshooting
|
||||
|
||||
### 1. cuda lib 路径问题
|
||||
|
||||
如果training或者inference碰到以下的问题:可能是cuda lib的路径问题:
|
||||
|
||||
|
||||
```bash
|
||||
libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4
|
||||
```
|
||||
|
||||
```bash
|
||||
ERROR: python: undefined symbol: cudaRuntimeGetVersion
|
||||
```
|
||||
|
||||
```bash
|
||||
CUDA SETUP: libcudart.so path is None
|
||||
```
|
||||
解决方法:
|
||||
把以下代码加入到 in .bashrc
|
||||
|
||||
```bash
|
||||
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
然后:
|
||||
|
||||
```bash
|
||||
source ~/.bashrc
|
||||
```
|
||||
|
||||
可以参考:
|
||||
https://github.com/TimDettmers/bitsandbytes/issues/85
|
||||
|
||||
### 2. cuda 问题
|
||||
如果碰到以下问题:
|
||||
|
||||
```bash
|
||||
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
|
||||
```
|
||||
可能是cuda驱动或者toolkit安装问题,请查看cuda是否安装成功。可以运行一下命令查看是不是cuda安装成功:
|
||||
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
可以参考:
|
||||
https://stackoverflow.com/q/73530569/21230266
|
||||
|
||||
# 参与贡献
|
||||
|
||||
欢迎大家参与贡献本项目 🙏
|
||||
|
||||
Reference in New Issue
Block a user