From 3340916c339dd35a8614a795e5a87ae45dc3bb0c Mon Sep 17 00:00:00 2001 From: Gavin Li Date: Tue, 30 Jul 2024 22:54:11 -0500 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ad73682..564d446 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ [**Example notebooks**](#example-python-notebook) | [**FAQ**](#faq) -**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed. And you can run 405B Llama3.1 on 8GB vmem. +**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vmem now. ![GitHub Repo stars](https://img.shields.io/github/stars/lyogavin/airllm?style=social) [![Downloads](https://static.pepy.tech/personalized-badge/airllm?period=total&units=international_system&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/airllm)