mirror of
https://github.com/0xSojalSec/airllm.git
synced 2026-03-07 22:33:47 +00:00
Update README.md
This commit is contained in:
@@ -6,7 +6,7 @@
|
||||
[**Example notebooks**](#example-python-notebook) |
|
||||
[**FAQ**](#faq)
|
||||
|
||||
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vmem** now.
|
||||
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now.
|
||||
|
||||
<a href="https://github.com/lyogavin/airllm/stargazers"></a>
|
||||
[](https://pepy.tech/project/airllm)
|
||||
|
||||
Reference in New Issue
Block a user