Nextra docs

2025-12-01 01:23:14 +00:00 · 2023-09-07 12:36:39 +01:00
parent 94738d8fc4
commit 4f735a5d11
23 changed files with 5436 additions and 0 deletions
--- a/docs/pages/Guides/Customising-prompts.md
+++ b/docs/pages/Guides/Customising-prompts.md
@@ -0,0 +1,4 @@
+## To customise a main prompt navigate to `/application/prompt/combine_prompt.txt`
+
+You can try editing it to see how the model responds.
+
--- a/docs/pages/Guides/How-to-train-on-other-documentation.md
+++ b/docs/pages/Guides/How-to-train-on-other-documentation.md
@@ -0,0 +1,60 @@
+## How to train on other documentation
+This AI can use any documentation, but first it needs to be prepared for similarity search. 
+
+![video-example-of-how-to-do-it](https://d3dg1063dc54p9.cloudfront.net/videos/how-to-vectorise.gif)
+
+Start by going to 
+`/scripts/` folder
+
+If you open this file you will see that it uses RST files from the folder to create a `index.faiss` and `index.pkl`. 
+
+It currently uses OPEN_AI to create vector store, so make sure your documentation is not too big. Pandas cost me around 3-4$
+
+You can usually find documentation on github in docs/ folder for most open-source projects.
+
+### 1. Find documentation in .rst/.md and create a folder with it in your scripts directory
+Name it `inputs/`  
+Put all your .rst/.md files in there  
+The search is recursive, so you don't need to flatten them
+
+If there are no .rst/.md files just convert whatever you find to txt and feed it. (don't forget to change the extension in script)
+
+### 2. Create .env file in `scripts/` folder
+And write your OpenAI API key inside
+`OPENAI_API_KEY=<your-api-key>`
+
+### 3. Run scripts/ingest.py
+
+`python ingest.py ingest`
+
+It will tell you how much it will cost
+
+### 4. Move `index.faiss` and `index.pkl` generated in `scripts/output` to `application/` folder. 
+
+
+### 5. Run web app
+Once you run it will use new context that is relevant to your documentation
+Make sure you select default in the dropdown in the UI
+
+## Customisation 
+You can learn more about options while running ingest.py by running:
+
+`python ingest.py --help`
+|              Options             |                                                                                                                                |
+|:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|
+|            **ingest**            | Runs 'ingest' function converting documentation to to Faiss plus Index format                                                  |
+| --dir TEXT                       | List of paths to directory for index creation. E.g. --dir inputs --dir inputs2 [default: inputs]                               |
+| --file TEXT                      | File paths to use (Optional; overrides directory) E.g. --files inputs/1.md --files inputs/2.md                                 |
+| --recursive / --no-recursive     | Whether to recursively search in subdirectories [default: recursive]                                                           |
+| --limit INTEGER                  | Maximum number of files to read                                                                                                |
+| --formats TEXT                   | List of required extensions (list with .) Currently supported: .rst, .md, .pdf, .docx, .csv, .epub, .html [default: .rst, .md] |
+| --exclude / --no-exclude         | Whether to exclude hidden files (dotfiles) [default: exclude]                                                                  |
+| -y, --yes                        | Whether to skip price confirmation                                                                                             |
+| --sample / --no-sample           | Whether to output sample of the first 5 split documents. [default: no-sample]                                                  |
+| --token-check / --no-token-check | Whether to group small documents and split large. Improves semantics. [default: token-check]                                   |
+| --min_tokens INTEGER             | Minimum number of tokens to not group. [default: 150]                                                                          |
+| --max_tokens INTEGER             | Maximum number of tokens to not split. [default: 2000]                                                                         |
+|                                  |                                                                                                                                |
+|            **convert**           | Creates documentation in .md format from source code                                                                           |
+| --dir TEXT                       | Path to a directory with source code. E.g. --dir inputs [default: inputs]                                                      |
+| --formats TEXT                   | Source code language from which to create documentation. Supports py, js and java.  E.g. --formats py [default: py]            |
--- a/docs/pages/Guides/How-to-use-different-LLM.md
+++ b/docs/pages/Guides/How-to-use-different-LLM.md
@@ -0,0 +1,32 @@
+Fortunately there are many providers for LLM's and some of them can even be ran locally
+
+There are two models used in the app:
+1. Embeddings
+2. Text generation
+
+By default we use OpenAI's models but if you want to change it or even run it locally, its very simple!
+
+### Go to .env file or set environment variables:
+
+`LLM_NAME=<your Text generation>`
+
+`API_KEY=<api_key for Text generation>`
+
+`EMBEDDINGS_NAME=<llm for embeddings>`
+
+`EMBEDDINGS_KEY=<api_key for embeddings>`
+
+`VITE_API_STREAMING=<true or false (true if using openai, false for all others)>`
+
+You dont need to provide keys if you are happy with users providing theirs, so make sure you set LLM_NAME and EMBEDDINGS_NAME
+
+Options:  
+LLM_NAME (openai, manifest, cohere, Arc53/docsgpt-14b, Arc53/docsgpt-7b-falcon)  
+EMBEDDINGS_NAME (openai_text-embedding-ada-002, huggingface_sentence-transformers/all-mpnet-base-v2, huggingface_hkunlp/instructor-large, cohere_medium)
+
+Thats it!
+
+### Hosting everything locally and privately (for using our optimised open-source models)
+If you are working with important data and dont want anything to leave your premises.
+
+Make sure you set SELF_HOSTED_MODEL as true in you .env variable and for your LLM_NAME you can use anything thats on Huggingface 
--- a/docs/pages/Guides/My-AI-answers-questions-using-external-knowledge.md
+++ b/docs/pages/Guides/My-AI-answers-questions-using-external-knowledge.md
@@ -0,0 +1,19 @@
+If your AI uses external knowledge and is not explicit enough it ok, because we try to make docsgpt friendly.
+
+But if you want to adjust it here is a simple way.
+
+Got to `application/prompts/chat_combine_prompt.txt`
+
+And change it to
+
+
+```
+
+You are a DocsGPT, friendly and helpful AI assistant by Arc53 that provides help with documents. You give thorough answers with code examples if possible.
+Write an answer for the question below based on the provided context.
+If the context provides insufficient information, reply "I cannot answer".
+You have access to chat history, and can use it to help answer the question.
+----------------
+{summaries}
+
+```
--- a/docs/pages/Guides/_meta.json
+++ b/docs/pages/Guides/_meta.json
@@ -0,0 +1,18 @@
+{
+  "Customising-prompts": {
+    "title": "🏗️️ Customising Prompts",
+    "href": "/Guides/Customising-prompts"
+  },
+  "How-to-train-on-other-documentation": {
+    "title": "📥 Training on docs",
+    "href": "/Guides/How-to-train-on-other-documentation"
+  },
+  "How-to-use-different-LLM": {
+    "title": "⚙️️ How to use different LLM's",
+    "href": "/Guides/How-to-use-different-LLM"
+  },
+  "My-AI-answers-questions-using-external-knowledge": {
+    "title": "💭️ Avoiding hallucinations",
+    "href": "/Guides/My-AI-answers-questions-using-external-knowledge"
+  }
+}