From d0760a348d1422b37d03ef97a64b6e015a1eec8b Mon Sep 17 00:00:00 2001 From: Gavin Li Date: Wed, 31 Jul 2024 11:41:04 -0500 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3fadbf7..f833f79 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ [**Example notebooks**](#example-python-notebook) | [**FAQ**](#faq) -**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vmem** now. +**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now. ![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/airllm?style=social) [![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm)