diff --git a/README.md b/README.md index 1e23904..eb2dc45 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true) +![airllm_logo](https://github.com/lyogavin/airllm/blob/main/assets/airllm_logo_sm.png?v=3&raw=true) [**Quickstart**](#quickstart) | [**Configurations**](#configurations) | diff --git a/air_llm/README.md b/air_llm/README.md index 71fb2f9..eb2dc45 100644 --- a/air_llm/README.md +++ b/air_llm/README.md @@ -1,4 +1,4 @@ -![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true) +![airllm_logo](https://github.com/lyogavin/airllm/blob/main/assets/airllm_logo_sm.png?v=3&raw=true) [**Quickstart**](#quickstart) | [**Configurations**](#configurations) | @@ -8,8 +8,6 @@ **AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed. -AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏,剪枝等模型压缩。 - ![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/Anima?style=social) [![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm) @@ -28,36 +26,22 @@ AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理 [2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU. -AirLLM天然支持Llama3 70B。4GB显存运行Llama3 70B大模型。 - [2023/12/25] v2.8.2: Support MacOS running 70B large language models. -支持苹果系统运行70B大模型! - [2023/12/20] v2.7: Support AirLLMMixtral. [2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model. -提供AuoModel,自动根据repo参数检测模型类型,自动初始化模型。 - [2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement. [2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**! -支持ChatGLM, QWEN, Baichuan, Mistral, InternLM! - [2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard. -支持safetensor系列模型,现在open llm leaderboard前10的模型都已经支持。 - [2023/12/01] airllm 2.0. Support compressions: **3x run time speed up!** -airllm2.0。支持模型压缩,速度提升3倍。 - [2023/11/20] airllm Initial verion! -airllm发布。 - ## Table of Contents * [Quick start](#quickstart) @@ -75,13 +59,10 @@ airllm发布。 First, install the airllm pip package. -首先安装airllm包。 - ```bash pip install airllm ``` -如果找不到package,可能是因为默认的镜像问题。可以尝试制定原始镜像: ```bash pip install -i https://pypi.org/simple/ airllm ``` @@ -90,12 +71,8 @@ pip install -i https://pypi.org/simple/ airllm Then, initialize AirLLMLlama2, pass in the huggingface repo ID of the model being used, or the local path, and inference can be performed similar to a regular transformer model. -然后,初始化AirLLMLlama2,传入所使用模型的huggingface repo ID,或者本地路径即可类似于普通的transformer模型进行推理。 - (*You can also specify the path to save the splitted layered model through **layer_shards_saving_path** when init AirLLMLlama2.* -*如果需要指定另外的路径来存储分层的模型可以在初始化AirLLMLlama2是传入参数:**layer_shards_saving_path**。*) - ```python from airllm import AutoModel @@ -133,15 +110,11 @@ print(output) Note: During inference, the original model will first be decomposed and saved layer-wise. Please ensure there is sufficient disk space in the huggingface cache directory. -注意:推理过程会首先将原始模型按层分拆,转存。请保证huggingface cache目录有足够的磁盘空间。 - ## Model Compression - 3x Inference Speed Up! We just added model compression based on block-wise quantization-based model compression. Which can further **speed up the inference speed** for up to **3x** , with **almost ignorable accuracy loss!** (see more performance evaluation and why we use block-wise quantization in [this paper](https://arxiv.org/abs/2212.09720)) -我们增加了基于block-wise quantization的模型压缩,推理速度提升3倍几乎没有精度损失。精度评测可以参考此paper:[this paper](https://arxiv.org/abs/2212.09720) - ![speed_improvement](https://github.com/lyogavin/Anima/blob/main/assets/airllm2_time_improvement.png?v=2&raw=true) #### How to enable model compression speed up: @@ -166,8 +139,6 @@ While in our case the bottleneck is mainly at the disk loading, we only need to When initialize the model, we support the following configurations: -初始化model的时候,可以指定以下的配置参数: - * **compression**: supported options: 4bit, 8bit for 4-bit or 8-bit block-wise quantization, or by default None for no compression * **profiling_mode**: supported options: True to output time consumptions or by default False * **layer_shards_saving_path**: optionally another path to save the splitted model @@ -194,51 +165,6 @@ Example colabs here: Open In Colab -## Supported Models - -#### [HF open llm leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) top models - - -**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default) - -@12/01/23 - -| Rank | Model | Supported | Model Class | -| ------------- | ------------- | ------------- | ------------- | -| 1 | TigerResearch/tigerbot-70b-chat-v2 | ✅ | AirLLMLlama2 | -| 2 | upstage/SOLAR-0-70b-16bit | ✅ | AirLLMLlama2 | -| 3 | ICBU-NPU/FashionGPT-70B-V1.1 | ✅ | AirLLMLlama2 | -| 4 | sequelbox/StellarBright | ✅ | AirLLMLlama2 | -| 5 | bhenrym14/platypus-yi-34b | ✅ | AirLLMLlama2 | -| 6 | MayaPH/GodziLLa2-70B | ✅ | AirLLMLlama2 | -| 7 | 01-ai/Yi-34B | ✅ | AirLLMLlama2 | -| 8 | garage-bAInd/Platypus2-70B-instruct | ✅ | AirLLMLlama2 | -| 9 | jondurbin/airoboros-l2-70b-2.2.1 | ✅ | AirLLMLlama2 | -| 10 | chargoddard/Yi-34B-Llama | ✅ | AirLLMLlama2 | -| ? | mistralai/Mistral-7B-Instruct-v0.1 | ✅ | AirLLMMistral | -| ? | mistralai/Mixtral-8x7B-v0.1 | ✅ | AirLLMMixtral | - - -#### [opencompass leaderboard](https://opencompass.org.cn/leaderboard-llm) top models - -**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default) - -@12/01/23 - -| Rank | Model | Supported | Model Class | -| ------------- | ------------- | ------------- | ------------- | -| 1 | GPT-4 | closed.ai😓 | N/A | -| 2 | TigerResearch/tigerbot-70b-chat-v2 | ✅ | AirLLMLlama2 | -| 3 | THUDM/chatglm3-6b-base | ✅ | AirLLMChatGLM | -| 4 | Qwen/Qwen-14B | ✅| AirLLMQWen | -| 5 | 01-ai/Yi-34B | ✅ | AirLLMLlama2 | -| 6 | ChatGPT | closed.ai😓 | N/A | -| 7 | OrionStarAI/OrionStar-Yi-34B-Chat | ✅ | AirLLMLlama2 | -| 8 | Qwen/Qwen-14B-Chat | ✅ | AirLLMQWen | -| 9 | Duxiaoman-DI/XuanYuan-70B | ✅ | AirLLMLlama2 | -| 10 | internlm/internlm-20b | ✅ | AirLLMInternLM | -| 26 | baichuan-inc/Baichuan2-13B-Chat | ✅ | AirLLMBaichuan | - #### example of other models (ChatGLM, QWen, Baichuan, Mistral, etc):
@@ -333,8 +259,6 @@ safetensors_rust.SafetensorError: Error while deserializing header: MetadataInco If you run into this error, most possible cause is you run out of disk space. The process of splitting model is very disk-consuming. See [this](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12). You may need to extend your disk space, clear huggingface [.cache](https://huggingface.co/docs/datasets/cache) and rerun. -如果你碰到这个error,很有可能是空间不足。可以参考一下[这个](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12) 可能需要扩大硬盘空间,删除huggingface的[.cache](https://huggingface.co/docs/datasets/cache),然后重新run。 - ### 2. ValueError: max() arg is an empty sequence Most likely you are loading QWen or ChatGLM model with Llama2 class. Try the following: @@ -392,7 +316,6 @@ BibTex entry: ``` - ## Contribution Welcomed contributions, ideas and discussions! diff --git a/air_llm/setup.py b/air_llm/setup.py index fdffe05..7f26515 100644 --- a/air_llm/setup.py +++ b/air_llm/setup.py @@ -5,13 +5,13 @@ with open("README.md", "r") as fh: setuptools.setup( name="airllm", - version="2.8.3", + version="2.8.6", author="Gavin Li", author_email="gavinli@animaai.cloud", description="AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning.", long_description=long_description, long_description_content_type="text/markdown", - url="https://github.com/lyogavin/Anima/tree/main/air_llm", + url="https://github.com/lyogavin/airllm", packages=setuptools.find_packages(), install_requires=[ 'tqdm', diff --git a/assets/airllm_logo.png b/assets/airllm_logo.png index 5693eba..a5f7196 100644 Binary files a/assets/airllm_logo.png and b/assets/airllm_logo.png differ diff --git a/assets/airllm_logo_sm.png b/assets/airllm_logo_sm.png index e2dfe04..1239a28 100644 Binary files a/assets/airllm_logo_sm.png and b/assets/airllm_logo_sm.png differ