diff --git a/README.md b/README.md
index 1e23904..eb2dc45 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
+![airllm_logo](https://github.com/lyogavin/airllm/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
 
 [**Quickstart**](#quickstart) | 
 [**Configurations**](#configurations) | 
diff --git a/air_llm/README.md b/air_llm/README.md
index 71fb2f9..eb2dc45 100644
--- a/air_llm/README.md
+++ b/air_llm/README.md
@@ -1,4 +1,4 @@
-![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
+![airllm_logo](https://github.com/lyogavin/airllm/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
 
 [**Quickstart**](#quickstart) | 
 [**Configurations**](#configurations) | 
@@ -8,8 +8,6 @@
 
 **AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
 
-AirLLM优化inference内存，4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏，剪枝等模型压缩。
-
 <a href="https://github.com/lyogavin/Anima/stargazers">![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/Anima?style=social)</a>
 [![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm)
 
@@ -28,36 +26,22 @@ AirLLM优化inference内存，4GB单卡GPU可以运行70B大语言模型推理
 
 [2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.
 
-AirLLM天然支持Llama3 70B。4GB显存运行Llama3 70B大模型。
-
 [2023/12/25] v2.8.2: Support MacOS running 70B large language models.
 
-支持苹果系统运行70B大模型！
-
 [2023/12/20] v2.7: Support AirLLMMixtral. 
 
 [2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model.
 
-提供AuoModel，自动根据repo参数检测模型类型，自动初始化模型。
-
 [2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement.
 
 [2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**!
 
-支持ChatGLM, QWEN, Baichuan, Mistral, InternLM!
-
 [2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard.
 
-支持safetensor系列模型，现在open llm leaderboard前10的模型都已经支持。
-
 [2023/12/01] airllm 2.0. Support compressions: **3x run time speed up!**
 
-airllm2.0。支持模型压缩，速度提升3倍。
-
 [2023/11/20] airllm Initial verion!
 
-airllm发布。
-
 ## Table of Contents
 
 * [Quick start](#quickstart)
@@ -75,13 +59,10 @@ airllm发布。
 
 First, install the airllm pip package.
 
-首先安装airllm包。
-
 ```bash
 pip install airllm
 ```
 
-如果找不到package，可能是因为默认的镜像问题。可以尝试制定原始镜像：
 ```bash
 pip install -i https://pypi.org/simple/ airllm
 ```
@@ -90,12 +71,8 @@ pip install -i https://pypi.org/simple/ airllm
 
 Then, initialize AirLLMLlama2, pass in the huggingface repo ID of the model being used, or the local path, and inference can be performed similar to a regular transformer model.
 
-然后，初始化AirLLMLlama2，传入所使用模型的huggingface repo ID，或者本地路径即可类似于普通的transformer模型进行推理。
-
 (*You can also specify the path to save the splitted layered model through **layer_shards_saving_path** when init AirLLMLlama2.*
 
-*如果需要指定另外的路径来存储分层的模型可以在初始化AirLLMLlama2是传入参数：**layer_shards_saving_path**。*)
-
 ```python
 from airllm import AutoModel
 
@@ -133,15 +110,11 @@ print(output)
  
 Note: During inference, the original model will first be decomposed and saved layer-wise. Please ensure there is sufficient disk space in the huggingface cache directory.
  
-注意：推理过程会首先将原始模型按层分拆，转存。请保证huggingface cache目录有足够的磁盘空间。
-
 
 ## Model Compression - 3x Inference Speed Up!
 
 We just added model compression based on block-wise quantization-based model compression. Which can further **speed up the inference speed** for up to **3x** , with **almost ignorable accuracy loss!** (see more performance evaluation and why we use block-wise quantization in [this paper](https://arxiv.org/abs/2212.09720))
 
-我们增加了基于block-wise quantization的模型压缩，推理速度提升3倍几乎没有精度损失。精度评测可以参考此paper：[this paper](https://arxiv.org/abs/2212.09720)
-
 ![speed_improvement](https://github.com/lyogavin/Anima/blob/main/assets/airllm2_time_improvement.png?v=2&raw=true)
 
 #### How to enable model compression speed up:
@@ -166,8 +139,6 @@ While in our case the bottleneck is mainly at the disk loading, we only need to
  
 When initialize the model, we support the following configurations:
 
-初始化model的时候，可以指定以下的配置参数：
-
 * **compression**: supported options: 4bit, 8bit for 4-bit or 8-bit block-wise quantization, or by default None for no compression
 * **profiling_mode**: supported options: True to output time consumptions or by default False
 * **layer_shards_saving_path**: optionally another path to save the splitted model
@@ -194,51 +165,6 @@ Example colabs here:
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 
-## Supported Models
-
-#### [HF open llm leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) top models 
-
-
-**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default)
-
-@12/01/23
-
-| Rank  | Model | Supported | Model Class |
-| ------------- | ------------- | ------------- | ------------- |
-| 1 | TigerResearch/tigerbot-70b-chat-v2  | ✅ | AirLLMLlama2 |
-| 2 | upstage/SOLAR-0-70b-16bit | ✅ | AirLLMLlama2 |
-| 3 | ICBU-NPU/FashionGPT-70B-V1.1 | ✅ | AirLLMLlama2 |
-| 4 | sequelbox/StellarBright | ✅ | AirLLMLlama2 |
-| 5 | bhenrym14/platypus-yi-34b  | ✅ | AirLLMLlama2 |
-| 6 | MayaPH/GodziLLa2-70B  | ✅ | AirLLMLlama2 |
-| 7 | 01-ai/Yi-34B | ✅ | AirLLMLlama2 |
-| 8 | garage-bAInd/Platypus2-70B-instruct  | ✅ | AirLLMLlama2 |
-| 9 | jondurbin/airoboros-l2-70b-2.2.1  | ✅ | AirLLMLlama2 |
-| 10 | chargoddard/Yi-34B-Llama  | ✅ | AirLLMLlama2 |
-| ？ | mistralai/Mistral-7B-Instruct-v0.1  | ✅ | AirLLMMistral |
-| ？ | mistralai/Mixtral-8x7B-v0.1 | ✅ | AirLLMMixtral |
-
-
-#### [opencompass leaderboard](https://opencompass.org.cn/leaderboard-llm) top models
-
-**Including but not limited to the following:** (Most of the open models are based on llama2, so should be supported by default)
-
-@12/01/23
-
-| Rank  | Model | Supported | Model Class |
-| ------------- | ------------- | ------------- | ------------- |
-| 1 | GPT-4  | closed.ai😓 | N/A |
-| 2 | TigerResearch/tigerbot-70b-chat-v2 | ✅ | AirLLMLlama2 |
-| 3 | THUDM/chatglm3-6b-base | ✅ | AirLLMChatGLM |
-| 4 | Qwen/Qwen-14B | ✅| AirLLMQWen |
-| 5 | 01-ai/Yi-34B  | ✅ | AirLLMLlama2 |
-| 6 | ChatGPT  | closed.ai😓  | N/A |
-| 7 | OrionStarAI/OrionStar-Yi-34B-Chat | ✅ | AirLLMLlama2 |
-| 8 | Qwen/Qwen-14B-Chat  | ✅ | AirLLMQWen |
-| 9 | Duxiaoman-DI/XuanYuan-70B  | ✅ | AirLLMLlama2 |
-| 10 | internlm/internlm-20b  | ✅ | AirLLMInternLM |
-| 26 | baichuan-inc/Baichuan2-13B-Chat | ✅ | AirLLMBaichuan |
-
 #### example of other models (ChatGLM, QWen, Baichuan, Mistral, etc):
 
 <details>
@@ -333,8 +259,6 @@ safetensors_rust.SafetensorError: Error while deserializing header: MetadataInco
 
 If you run into this error, most possible cause is you run out of disk space. The process of splitting model is very disk-consuming. See [this](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12). You may need to extend your disk space, clear huggingface [.cache](https://huggingface.co/docs/datasets/cache) and rerun. 
 
-如果你碰到这个error，很有可能是空间不足。可以参考一下[这个](https://huggingface.co/TheBloke/guanaco-65B-GPTQ/discussions/12) 可能需要扩大硬盘空间，删除huggingface的[.cache](https://huggingface.co/docs/datasets/cache)，然后重新run。
-
 ### 2. ValueError: max() arg is an empty sequence
 
 Most likely you are loading QWen or ChatGLM model with Llama2 class. Try the following:
@@ -392,7 +316,6 @@ BibTex entry:
 ```
 
 
-
 ## Contribution 
 
 Welcomed contributions, ideas and discussions!
diff --git a/air_llm/setup.py b/air_llm/setup.py
index fdffe05..7f26515 100644
--- a/air_llm/setup.py
+++ b/air_llm/setup.py
@@ -5,13 +5,13 @@ with open("README.md", "r") as fh:
 
 setuptools.setup(
     name="airllm",
-    version="2.8.3",
+    version="2.8.6",
     author="Gavin Li",
     author_email="gavinli@animaai.cloud",
     description="AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning.",
     long_description=long_description,
     long_description_content_type="text/markdown",
-    url="https://github.com/lyogavin/Anima/tree/main/air_llm",
+    url="https://github.com/lyogavin/airllm",
     packages=setuptools.find_packages(),
     install_requires=[
         'tqdm',
diff --git a/assets/airllm_logo.png b/assets/airllm_logo.png
index 5693eba..a5f7196 100644
Binary files a/assets/airllm_logo.png and b/assets/airllm_logo.png differ
diff --git a/assets/airllm_logo_sm.png b/assets/airllm_logo_sm.png
index e2dfe04..1239a28 100644
Binary files a/assets/airllm_logo_sm.png and b/assets/airllm_logo_sm.png differ