mirror of
https://github.com/0xSojalSec/airllm.git
synced 2026-04-26 08:06:16 +00:00
Update README.md
This commit is contained in:
@@ -6,7 +6,7 @@
|
||||
[**Example notebooks**](#example-python-notebook) |
|
||||
[**FAQ**](#faq)
|
||||
|
||||
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vmem now.
|
||||
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vmem** now.
|
||||
|
||||
<a href="https://github.com/lyogavin/airllm/stargazers"></a>
|
||||
[](https://pepy.tech/project/airllm)
|
||||
@@ -24,7 +24,7 @@
|
||||
|
||||
## Updates
|
||||
|
||||
[2024/07/30] Support Llama3.1 405B ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support 4bit quantization.
|
||||
[2024/07/30] Support Llama3.1 **405B** ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support **8bit/4bit quantization**.
|
||||
|
||||
[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user