diff --git a/air_llm/README.md b/air_llm/README.md
index 1906363..93b4be8 100644
--- a/air_llm/README.md
+++ b/air_llm/README.md
@@ -1,9 +1,16 @@
 ![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
 
+[**Quickstart**](#quickstart) | 
+[**Configurations**](#configurations) | 
+[**MacOS**](#macos) | 
+[**Example notebooks**](#example-python-notebook) | 
+[**FAQ**](#faq)
+
 **AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
 
 AirLLM优化inference内存，4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏，剪枝等模型压缩。
 
+
 ## Updates
 
 [2023/12/25] v2.8.2: Support MacOS running 70B large language models.
@@ -350,6 +357,23 @@ input_tokens = model.tokenizer(input_text,
 )
 ```
 
+## Citing AirLLM
+
+If you find
+AirLLM useful in your research and wish to cite it, please use the following
+BibTex entry:
+
+```
+@software{airllm2023,
+  author = {Gavin Li},
+  title = {AirLLM: scaling large language models on low-end commodity computers},
+  url = {https://github.com/lyogavin/Anima/tree/main/air_llm},
+  version = {0.0},
+  year = {2023},
+}
+```
+
+
 
 ## Contribution