Merge branch 'main' of github.com:lyogavin/airllm

2026-03-07 22:33:47 +00:00 · 2024-08-01 08:48:46 -05:00
parent a60256ef57 d0760a348d
commit 52893f1f34
1 changed files with 2 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@
 [**Example notebooks**](#example-python-notebook) | 
 [**FAQ**](#faq)

-**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed. And you can run 405B Llama3.1 on 8GB vmem.
+**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now.

 <a href="https://github.com/lyogavin/airllm/stargazers">![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/airllm?style=social)</a>
 [![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm)
@@ -24,7 +24,7 @@

 ## Updates

-[2024/07/30] Support Llama3.1 405B ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support 4bit quantization.
+[2024/07/30] Support Llama3.1 **405B** ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support **8bit/4bit quantization**.

 [2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.