From e4f918d8e95d24af154ccb08ea789f855baaf5d5 Mon Sep 17 00:00:00 2001 From: Yu Li Date: Tue, 25 Jul 2023 11:45:56 +0800 Subject: [PATCH] update README adding troubleshooting --- rlhf/README.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/rlhf/README.md b/rlhf/README.md index 60b6d8b..288892c 100644 --- a/rlhf/README.md +++ b/rlhf/README.md @@ -118,6 +118,55 @@ run_dpo_training.sh 中的参数大部分和Anima的训练参数一致。有几 我们认为提升模型的基础推理能力和编码基础的knowledge储备更适合通过大规模与训练进行。而DPO和RLHF更适合进行模型的输出控制,或者领域知识的训练。 +# Troubleshooting + +### 1. cuda lib 路径问题 + +如果training或者inference碰到以下的问题:可能是cuda lib的路径问题: + + +```bash +libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4 +``` + +```bash +ERROR: python: undefined symbol: cudaRuntimeGetVersion +``` + +```bash +CUDA SETUP: libcudart.so path is None +``` +解决方法: +把以下代码加入到 in .bashrc + +```bash +export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH +``` + +然后: + +```bash +source ~/.bashrc +``` + +可以参考: +https://github.com/TimDettmers/bitsandbytes/issues/85 + +### 2. cuda 问题 +如果碰到以下问题: + +```bash +RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' +``` +可能是cuda驱动或者toolkit安装问题,请查看cuda是否安装成功。可以运行一下命令查看是不是cuda安装成功: + +```bash +nvidia-smi +``` + +可以参考: +https://stackoverflow.com/q/73530569/21230266 + # 参与贡献 欢迎大家参与贡献本项目 🙏