This commit is contained in:
digvijay shelar
2024-05-28 20:32:16 +05:30
parent 41cb765255
commit e5bec957a1
16 changed files with 123 additions and 131 deletions

View File

@@ -1,10 +1,25 @@
import Image from 'next/image'
# Customizing the Main Prompt
Customizing the main prompt for DocsGPT gives you the ability to tailor the AI's responses to your specific requirements. By modifying the prompt text, you can achieve more accurate and relevant answers. Here's how you can do it:
1. Navigate to `/application/prompts/combine_prompt.txt`.
1. Navigate to `SideBar -> Settings`.
2.In Settings select the `Active Prompt` now you will be able to see various prompts style.x
3.Click on the `edit icon` on the prompt of your choice and you will be able to see the current prompt for it,you can now customise the prompt as per your choice.
### Video Demo
<Image src="/prompts.gif" alt="prompts" width={800} height={500} />
2. Open the `combine_prompt.txt` file and modify the prompt text to suit your needs. You can experiment with different phrasings and structures to observe how the model responds. The main prompt serves as guidance to the AI model on how to generate responses.
## Example Prompt Modification

View File

@@ -1,63 +0,0 @@
## How to train on other documentation
This AI can utilize any documentation, but it requires preparation for similarity search. Follow these steps to get your documentation ready:
**Step 1: Prepare Your Documentation**
![video-example-of-how-to-do-it](https://d3dg1063dc54p9.cloudfront.net/videos/how-to-vectorise.gif)
Start by going to `/scripts/` folder.
If you open this file, you will see that it uses RST files from the folder to create a `index.faiss` and `index.pkl`.
It currently uses OPENAI to create the vector store, so make sure your documentation is not too large. Using Pandas cost me around $3-$4.
You can typically find documentation on GitHub in the `docs/` folder for most open-source projects.
### 1. Find documentation in .rst/.md format and create a folder with it in your scripts directory.
- Name it `inputs/`.
- Put all your .rst/.md files in there.
- The search is recursive, so you don't need to flatten them.
If there are no .rst/.md files, convert whatever you find to a .txt file and feed it. (Don't forget to change the extension in the script).
### Step 2: Configure Your OpenAI API Key
1. Create a .env file in the scripts/ folder.
- Add your OpenAI API key inside: OPENAI_API_KEY=<your-api-key>.
### Step 3: Run the Ingestion Script
`python ingest.py ingest`
It will provide you with the estimated cost.
### Step 4: Move `index.faiss` and `index.pkl` generated in `scripts/output` to `application/` folder.
### Step 5: Run the Web App
Once you run it, it will use new context relevant to your documentation.Make sure you select default in the dropdown in the UI.
## Customization
You can learn more about options while running ingest.py by running:
- Make sure you select 'default' from the dropdown in the UI.
## Customization
You can learn more about options while running ingest.py by executing:
`python ingest.py --help`
| Options | |
|:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|
| **ingest** | Runs 'ingest' function, converting documentation to Faiss plus Index format |
| --dir TEXT | List of paths to directory for index creation. E.g. --dir inputs --dir inputs2 [default: inputs] |
| --file TEXT | File paths to use (Optional; overrides directory) E.g. --files inputs/1.md --files inputs/2.md |
| --recursive / --no-recursive | Whether to recursively search in subdirectories [default: recursive] |
| --limit INTEGER | Maximum number of files to read |
| --formats TEXT | List of required extensions (list with .) Currently supported: .rst, .md, .pdf, .docx, .csv, .epub, .html [default: .rst, .md] |
| --exclude / --no-exclude | Whether to exclude hidden files (dotfiles) [default: exclude] |
| -y, --yes | Whether to skip price confirmation |
| --sample / --no-sample | Whether to output sample of the first 5 split documents. [default: no-sample] |
| --token-check / --no-token-check | Whether to group small documents and split large. Improves semantics. [default: token-check] |
| --min_tokens INTEGER | Minimum number of tokens to not group. [default: 150] |
| --max_tokens INTEGER | Maximum number of tokens to not split. [default: 2000] |
| | |
| **convert** | Creates documentation in .md format from source code |
| --dir TEXT | Path to a directory with source code. E.g. --dir inputs [default: inputs] |
| --formats TEXT | Source code language from which to create documentation. Supports py, js and java. E.g. --formats py [default: py] |

View File

@@ -0,0 +1,44 @@
import { Callout } from 'nextra/components'
import Image from 'next/image'
import { Steps } from 'nextra/components'
## How to train on other documentation
Training on other documentation sources can greatly enhance the versatility and depth of DocsGPT's knowledge. By incorporating diverse materials, you can broaden the AI's understanding and improve its ability to generate insightful responses across a range of topics. Here's a step-by-step guide on how to effectively train DocsGPT on additional documentation sources:
**Get your document ready**:
Make sure you have the document on which you want to train on ready with you on the device which you are using .You can also use links to the documentation to train on.
<Callout type="warning" emoji="⚠️">
Note: The document should be either of the given file formats .pdf, .txt, .rst, .docx, .md, .zip and limited to 25mb.You can also train using the link of the documentation.
</Callout>
### Video Demo
<Image src="/docs.gif" alt="prompts" width={800} height={500} />
<Steps>
### Step1
Navigate to the sidebar where you will find `Source Docs` option,here you will find 3 options built in which are default,Web Search and None.
### Step 2
Click on the `Upload icon` just beside the source docs options,now borwse and upload the document which you want to train on or select the `remote` option if you have to insert the link of the documentation.
### Step 3
Now you will be able to see the name of the file uploaded under the Uploaded Files ,now click on `Train`,once you click on train it might take some time to train on the document. You will be able to see the `Training progress` and once the training is completed you can click the `finish` button and there you go your docuemnt is uploaded.
### Step 4
Go to `New chat` and from the side bar select the document you uploaded under the `Source Docs` and go ahead with your chat, now you can ask qestions regarding the document you uploaded and you will get the effective answer based on it.
</Steps>

View File

@@ -1,48 +1,17 @@
# Setting Up Local Language Models for Your App
Your app relies on two essential models: Embeddings and Text Generation. While OpenAI's default models work seamlessly, you have the flexibility to switch providers or even run the models locally.
Setting up local language models for your app can significantly enhance its capabilities, enabling it to understand and generate text in multiple languages without relying on external APIs. By integrating local language models, you can improve privacy, reduce latency, and ensure continuous functionality even in offline environments. Here's a comprehensive guide on how to set up local language models for your application:
## Step 1: Configure Environment Variables
Navigate to the `.env` file or set the following environment variables:
```env
LLM_NAME=<your Text Generation model>
API_KEY=<API key for Text Generation>
EMBEDDINGS_NAME=<LLM for Embeddings>
EMBEDDINGS_KEY=<API key for Embeddings>
VITE_API_STREAMING=<true or false>
```
You can omit the keys if users provide their own. Ensure you set `LLM_NAME` and `EMBEDDINGS_NAME`.
## Step 2: Choose Your Models
**Options for `LLM_NAME`:**
- openai ([More details](https://platform.openai.com/docs/models))
- anthropic ([More details](https://docs.anthropic.com/claude/reference/selecting-a-model))
- manifest ([More details](https://python.langchain.com/docs/integrations/llms/manifest))
- cohere ([More details](https://docs.cohere.com/docs/llmu))
- llama.cpp ([More details](https://python.langchain.com/docs/integrations/llms/llamacpp))
- huggingface (Arc53/DocsGPT-7B by default)
- sagemaker ([Mode details](https://aws.amazon.com/sagemaker/))
## Steps:
1.Visit the chat screen and you will be to see the default LLM selected
Note: for huggingface you can choose any model inside application/llm/huggingface.py or pass llm_name on init, loads
2.Click on it and you will get a drop down of various LLM's available to choose
**Options for `EMBEDDINGS_NAME`:**
- openai_text-embedding-ada-002
- huggingface_sentence-transformers/all-mpnet-base-v2
- huggingface_hkunlp/instructor-large
- cohere_medium
3.Choose the LLM of your choice
If you want to be completely local, set `EMBEDDINGS_NAME` to `huggingface_sentence-transformers/all-mpnet-base-v2`.
### Video Demo
<Image src="/llms.gif" alt="llms" />
For llama.cpp Download the required model and place it in the `models/` folder.
Alternatively, for local Llama setup, run `setup.sh` and choose option 1. The script handles the DocsGPT model addition.
## Step 3: Local Hosting for Privacy
If working with sensitive data, host everything locally by setting `LLM_NAME`, llama.cpp or huggingface, use any model available on Hugging Face, for llama.cpp you need to convert it into gguf format.
That's it! Your app is now configured for local and private hosting, ensuring optimal security for critical data.