Update README.md

This commit is contained in:
Gavin Li
2024-07-30 22:47:20 -05:00
committed by GitHub
parent 72215acdb4
commit 029f011660

View File

@@ -6,7 +6,7 @@
[**Example notebooks**](#example-python-notebook) |
[**FAQ**](#faq)
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed. And you can run 405B Llama3.1 on 8GB vmem.
<a href="https://github.com/lyogavin/airllm/stargazers">![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/airllm?style=social)</a>
[![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm)