mirror of
https://github.com/0xSojalSec/airllm.git
synced 2026-03-07 22:33:47 +00:00
update readme
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||

|
||||

|
||||
|
||||
[**Quickstart**](#quickstart) |
|
||||
[**Configurations**](#configurations) |
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||

|
||||

|
||||
|
||||
[**Quickstart**](#quickstart) |
|
||||
[**Configurations**](#configurations) |
|
||||
@@ -8,8 +8,6 @@
|
||||
|
||||
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
|
||||
|
||||
AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏,剪枝等模型压缩。
|
||||
|
||||
<a href="https://github.com/lyogavin/Anima/stargazers"></a>
|
||||
[](https://pepy.tech/project/airllm)
|
||||
|
||||
@@ -28,36 +26,22 @@ AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理
|
||||
|
||||
[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.
|
||||
|
||||
AirLLM天然支持Llama3 70B。4GB显存运行Llama3 70B大模型。
|
||||
|
||||
[2023/12/25] v2.8.2: Support MacOS running 70B large language models.
|
||||
|
||||
支持苹果系统运行70B大模型!
|
||||
|
||||
[2023/12/20] v2.7: Support AirLLMMixtral.
|
||||
|
||||
[2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model.
|
||||
|
||||
提供AuoModel,自动根据repo参数检测模型类型,自动初始化模型。
|
||||
|
||||
[2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement.
|
||||
|
||||
[2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**!
|
||||
|
||||
支持ChatGLM, QWEN, Baichuan, Mistral, InternLM!
|
||||
|
||||
[2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard.
|
||||
|
||||
支持safetensor系列模型,现在open llm leaderboard前10的模型都已经支持。
|
||||
|
||||
[2023/12/01] airllm 2.0. Support compressions: **3x run time speed up!**
|
||||
|
||||
airllm2.0。支持模型压缩,速度提升3倍。
|
||||
|
||||
[2023/11/20] airllm Initial verion!
|
||||
|
||||
airllm发布。
|
||||
|
||||
## Table of Contents
|
||||
|
||||
* [Quick start](#quickstart)
|
||||
@@ -75,13 +59,10 @@ airllm发布。
|
||||
|
||||
First, install the airllm pip package.
|
||||
|
||||
首先安装airllm包。
|
||||
|
||||
```bash
|
||||
pip install airllm
|
||||
```
|
||||
|
||||
如果找不到package,可能是因为默认的镜像问题。可以尝试制定原始镜像:
|
||||
```bash
|
||||
pip install -i https://pypi.org/simple/ airllm
|
||||
```
|
||||
@@ -90,12 +71,8 @@ pip install -i https://pypi.org/simple/ airllm
|
||||
|
||||
Then, initialize AirLLMLlama2, pass in the huggingface repo ID of the model being used, or the local path, and inference can be performed similar to a regular transformer model.
|
||||
|
||||
然后,初始化AirLLMLlama2,传入所使用模型的huggingface repo ID,或者本地路径即可类似于普通的transformer模型进行推理。
|
||||
|
||||
(*You can also specify the path to save the splitted layered model through **layer_shards_saving_path** when init AirLLMLlama2.*
|
||||
|
||||
*如果需要指定另外的路径来存储分层的模型可以在初始化AirLLMLlama2是传入参数:**layer_shards_saving_path**。*)
|
||||
|
||||
```python
|
||||
from airllm import AutoModel
|
||||
|
||||
@@ -133,15 +110,11 @@ print(output)
|
||||
|
||||
Note: During inference, the original model will first be decomposed and saved layer-wise. Please ensure there is sufficient disk space in the huggingface cache directory.
|
||||
|
||||
注意:推理过程会首先将原始模型按层分拆,转存。请保证huggingface cache目录有足够的磁盘空间。
|
||||
|
||||
|
||||
## Model Compression - 3x Inference Speed Up!
|
||||
|
||||
We just added model compression based on block-wise quantization-based model compression. Which can further **speed up the inference speed** for up to **3x** , with **almost ignorable accuracy loss!** (see more performance evaluation and why we use block-wise quantization in [this paper](https://arxiv.org/abs/2212.09720))
|
||||
|
||||
我们增加了基于block-wise quantization的模型压缩,推理速度提升3倍几乎没有精度损失。精度评测可以参考此paper:[this paper](https://arxiv.org/abs/2212.09720)
|
||||
|
||||

|
||||
|
||||
#### How to enable model compression speed up:
|
||||
@@ -166,8 +139,6 @@ While in our case the bottleneck is mainly at the disk loading, we only need to
|
||||
|
||||
When initialize the model, we support the following configurations:
|
||||
|
||||
初始化model的时候,可以指定以下的配置参数:
|
||||
|
||||
* **compression**: supported options: 4bit, 8bit for 4-bit or 8-bit block-wise quantization, or by default None for no compression
|
||||
* **profiling_mode**: supported options: True to output time consumptions or by default False
|
||||
* **layer_shards_saving_path**: optionally another path to save the splitted model
|
||||
@@ -194,51 +165,6 @@ Example colabs here:
|
||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
||||
</a>
|
||||
|
||||
## Supported Models
|
||||
|
||||
#### [HF open llm leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) top models
|
||||
|
||||
|
||||
**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default)
|
||||
|
||||
@12/01/23
|
||||
|
||||
| Rank | Model | Supported | Model Class |
|
||||
| ------------- | ------------- | ------------- | ------------- |
|
||||
| 1 | TigerResearch/tigerbot-70b-chat-v2 | ✅ | AirLLMLlama2 |
|
||||
| 2 | upstage/SOLAR-0-70b-16bit | ✅ | AirLLMLlama2 |
|
||||
| 3 | ICBU-NPU/FashionGPT-70B-V1.1 | ✅ | AirLLMLlama2 |
|
||||
| 4 | sequelbox/StellarBright | ✅ | AirLLMLlama2 |
|
||||
| 5 | bhenrym14/platypus-yi-34b | ✅ | AirLLMLlama2 |
|
||||
| 6 | MayaPH/GodziLLa2-70B | ✅ | AirLLMLlama2 |
|
||||
| 7 | 01-ai/Yi-34B | ✅ | AirLLMLlama2 |
|
||||
| 8 | garage-bAInd/Platypus2-70B-instruct | ✅ | AirLLMLlama2 |
|
||||
| 9 | jondurbin/airoboros-l2-70b-2.2.1 | ✅ | AirLLMLlama2 |
|
||||
| 10 | chargoddard/Yi-34B-Llama | ✅ | AirLLMLlama2 |
|
||||
| ? | mistralai/Mistral-7B-Instruct-v0.1 | ✅ | AirLLMMistral |
|
||||
| ? | mistralai/Mixtral-8x7B-v0.1 | ✅ | AirLLMMixtral |
|
||||
|
||||
|
||||
#### [opencompass leaderboard](https://opencompass.org.cn/leaderboard-llm) top models
|
||||
|
||||
**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default)
|
||||
|
||||
@12/01/23
|
||||
|
||||
| Rank | Model | Supported | Model Class |
|
||||
| ------------- | ------------- | ------------- | ------------- |
|
||||
| 1 | GPT-4 | closed.ai😓 | N/A |
|
||||
| 2 | TigerResearch/tigerbot-70b-chat-v2 | ✅ | AirLLMLlama2 |
|
||||
| 3 | THUDM/chatglm3-6b-base | ✅ | AirLLMChatGLM |
|
||||
| 4 | Qwen/Qwen-14B | ✅| AirLLMQWen |
|
||||
| 5 | 01-ai/Yi-34B | ✅ | AirLLMLlama2 |
|
||||
| 6 | ChatGPT | closed.ai😓 | N/A |
|
||||
| 7 | OrionStarAI/OrionStar-Yi-34B-Chat | ✅ | AirLLMLlama2 |
|
||||
| 8 | Qwen/Qwen-14B-Chat | ✅ | AirLLMQWen |
|
||||
| 9 | Duxiaoman-DI/XuanYuan-70B | ✅ | AirLLMLlama2 |
|
||||
| 10 | internlm/internlm-20b | ✅ | AirLLMInternLM |
|
||||
| 26 | baichuan-inc/Baichuan2-13B-Chat | ✅ | AirLLMBaichuan |
|
||||
|
||||
#### example of other models (ChatGLM, QWen, Baichuan, Mistral, etc):
|
||||
|
||||
<details>
|
||||
@@ -333,8 +259,6 @@ safetensors_rust.SafetensorError: Error while deserializing header: MetadataInco
|
||||
|
||||
If you run into this error, most possible cause is you run out of disk space. The process of splitting model is very disk-consuming. See [this](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12). You may need to extend your disk space, clear huggingface [.cache](https://huggingface.co/docs/datasets/cache) and rerun.
|
||||
|
||||
如果你碰到这个error,很有可能是空间不足。可以参考一下[这个](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12) 可能需要扩大硬盘空间,删除huggingface的[.cache](https://huggingface.co/docs/datasets/cache),然后重新run。
|
||||
|
||||
### 2. ValueError: max() arg is an empty sequence
|
||||
|
||||
Most likely you are loading QWen or ChatGLM model with Llama2 class. Try the following:
|
||||
@@ -392,7 +316,6 @@ BibTex entry:
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Contribution
|
||||
|
||||
Welcomed contributions, ideas and discussions!
|
||||
|
||||
@@ -5,13 +5,13 @@ with open("README.md", "r") as fh:
|
||||
|
||||
setuptools.setup(
|
||||
name="airllm",
|
||||
version="2.8.3",
|
||||
version="2.8.6",
|
||||
author="Gavin Li",
|
||||
author_email="gavinli@animaai.cloud",
|
||||
description="AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning.",
|
||||
long_description=long_description,
|
||||
long_description_content_type="text/markdown",
|
||||
url="https://github.com/lyogavin/Anima/tree/main/air_llm",
|
||||
url="https://github.com/lyogavin/airllm",
|
||||
packages=setuptools.find_packages(),
|
||||
install_requires=[
|
||||
'tqdm',
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 22 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 11 KiB |
Reference in New Issue
Block a user