Merge branch 'main' into code-ingestion

2026-02-04 21:30:35 +00:00 · 2023-02-14 16:47:56 +00:00
parent 605c599b5d 5883ce2685
commit d4ede13747
57 changed files with 11576 additions and 29 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,44 @@
+name: Build and push DocsGPT Docker image
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - main
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v1
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v1
+
+      - name: Login to DockerHub
+        uses: docker/login-action@v2
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+          
+      - name: Login to ghcr.io
+        uses: docker/login-action@v2
+        with:
+          registry: ghcr.io
+          username: ${{ github.repository_owner }}
+          password: ${{ secrets.GHCR_TOKEN }}
+
+      # Runs a single command using the runners shell
+      - name: Build and push Docker images to docker.io and ghcr.io
+        uses: docker/build-push-action@v2
+        with:
+          file: './application/Dockerfile'
+          platforms: linux/amd64
+          context: ./application
+          push: true
+          tags: |
+            ${{ secrets.DOCKER_USERNAME }}/docsgpt:latest
+            ghcr.io/${{ github.repository_owner }}/docsgpt:latest
--- a/.gitignore
+++ b/.gitignore
@@ -133,4 +133,32 @@ dmypy.json
 # macOS
 .DS_Store

-application/vectors/
+#frontend
+# Logs
+frontend/logs
+frontend/*.log
+frontend/npm-debug.log*
+frontend/yarn-debug.log*
+frontend/yarn-error.log*
+frontend/pnpm-debug.log*
+frontend/lerna-debug.log*
+
+frontend/node_modules
+frontend/dist
+frontend/dist-ssr
+frontend/*.local
+
+# Editor directories and files
+frontend/.vscode/*
+frontend/!.vscode/extensions.json
+frontend/.idea
+frontend/.DS_Store
+frontend/*.suo
+frontend/*.ntvs*
+frontend/*.njsproj
+frontend/*.sln
+frontend/*.sw?
+
+application/vectors/
+
+**/inputs
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,38 @@
+# Welcome to DocsGPT Contributing guideline
+
+Thank you for choosing this project to contribute to, we are all very grateful!
+
+# We accept different types of contributions
+
+📣 Discussions - where you can start a new topic or answer some questions
+
+🐞 Issues - Is how we track tasks, sometimes its bugs that need fixing, sometimes its new features
+
+🛠️ Pull requests - Is how you can suggest changes to our repository, to work on existing issue or to add new features
+
+📚 Wiki - where we have our documentation
+
+
+## 🐞 Issues and Pull requests
+
+We value contributions to our issues in form of discussion or suggestion, we recommend that you check out existing issues and our [Roadmap](https://github.com/orgs/arc53/projects/2)
+
+If you want to contribute by writing code there are few things that you should know before doing it:
+We have frontend (React, Vite) and Backend (python)
+
+### If you are looking to contribute to Frontend (⚛️React, Vite):
+Current frontend is being migrated from /application to /frontend with a new design, so please contribute to the new on. Check out this [Milestone](https://github.com/arc53/DocsGPT/milestone/1) and its issues also [Figma](https://www.figma.com/file/OXLtrl1EAy885to6S69554/DocsGPT?node-id=0%3A1&t=hjWVuxRg9yi5YkJ9-1)
+Please try to follow guidelines
+
+
+### If you are looking to contribute to Backend (🐍Python):
+Check out our issues, and contribute to /application or /scripts (ignore old  ingest_rst.py ingest_rst_sphinx.py files, they will be deprecated soon)
+Currently we don't have any tests(which would be useful😉) but before submitting you PR make sure that after you ingested some test data its queryable
+
+### Workflow:
+Create a fork, make changes on your forked repository, submit changes in a form of pull request
+
+## Questions / collaboration
+Please join our [Discord](https://discord.gg/n5BX8dh8rU) don't hesitate, we are very friendly and welcoming to new contributors.
+
+# Thank you so much for considering to contribute to DocsGPT!🙏
--- a/README.md
+++ b/README.md
@@ -25,8 +25,8 @@ Say goodbye to time-consuming manual searches, and let <strong>DocsGPT</strong>

 You can find our [Roadmap](https://github.com/orgs/arc53/projects/2) here, please don't hesitate contributing or creating issues, it helps us make DocsGPT better!

-## Screenshot
-<img width="1440" alt="image" src="https://user-images.githubusercontent.com/15183589/216717215-adc6ea2d-5b35-4694-ac0d-e39a396025f4.png">
+## Preview
+![video-example-of-docs-gpt](https://d3dg1063dc54p9.cloudfront.net/videos/demo.gif)

 ## [Live preview](https://docsgpt.arc53.com/)

@@ -34,11 +34,11 @@ You can find our [Roadmap](https://github.com/orgs/arc53/projects/2) here, pleas


 ## Project structure
-application - flask app (main application)
+- Application - flask app (main application)

-extensions - chrome extension
+- Extensions - chrome extension

-scripts - script that creates similarity search index and store for other libraries 
+- Scripts - script that creates similarity search index and store for other libraries. 

 ## QuickStart
 Please note: current vector database uses pandas Python documentation, thus responses will be related to it, if you want to use other docs please follow a guide below
@@ -57,7 +57,7 @@ Copy .env_sample and create .env with your openai api token

 ## [Guides](https://github.com/arc53/docsgpt/wiki)

-
+## [Interested in contributing?](https://github.com/arc53/DocsGPT/blob/main/CONTRIBUTING.md)

 ## [How to use any other documentation](https://github.com/arc53/docsgpt/wiki/How-to-train-on-other-documentation)

--- a/application/app.py
+++ b/application/app.py
@@ -5,8 +5,8 @@ import datetime
 from flask import Flask, request, render_template
 # os.environ["LANGCHAIN_HANDLER"] = "langchain"
 import faiss
-from langchain import OpenAI
-from langchain.chains import VectorDBQAWithSourcesChain
+from langchain import OpenAI, VectorDBQA
+from langchain.chains.question_answering import load_qa_chain
 from langchain.prompts import PromptTemplate
 import requests

@@ -24,6 +24,13 @@ dotenv.load_dotenv()
 with open("combine_prompt.txt", "r") as f:
    template = f.read()

+# check if OPENAI_API_KEY is set
+if os.getenv("OPENAI_API_KEY") is not None:
+    api_key_set = True
+
+else:
+    api_key_set = False
+


 app = Flask(__name__)
@@ -31,14 +38,18 @@ app = Flask(__name__)

@app.route("/")
 def home():
-    return render_template("index.html")
+    return render_template("index.html", api_key_set=api_key_set)


@app.route("/api/answer", methods=["POST"])
 def api_answer():
    data = request.get_json()
    question = data["question"]
-    api_key = data["api_key"]
+    if not api_key_set:
+        api_key = data["api_key"]
+    else:
+        api_key = os.getenv("OPENAI_API_KEY")
+
    # check if the vectorstore is set
    if "active_docs" in data:
        vectorstore = "vectors/" + data["active_docs"]
@@ -57,11 +68,23 @@ def api_answer():
    # create a prompt template
    c_prompt = PromptTemplate(input_variables=["summaries", "question"], template=template)
    # create a chain with the prompt template and the store
-    chain = VectorDBQAWithSourcesChain.from_llm(llm=OpenAI(openai_api_key=api_key, temperature=0), vectorstore=store, combine_prompt=c_prompt)
+
+    #chain = VectorDBQA.from_llm(llm=OpenAI(openai_api_key=api_key, temperature=0), vectorstore=store, combine_prompt=c_prompt)
+    # chain = VectorDBQA.from_chain_type(llm=OpenAI(openai_api_key=api_key, temperature=0), chain_type='map_reduce',
+    #                                    vectorstore=store)
+
+    qa_chain = load_qa_chain(OpenAI(openai_api_key=api_key, temperature=0), chain_type="map_reduce",
+                             combine_prompt=c_prompt)
+    chain = VectorDBQA(combine_documents_chain=qa_chain, vectorstore=store)
+
+
+
    # fetch the answer
-    result = chain({"question": question})
+    result = chain({"query": question})
+    print(result)

    # some formatting for the frontend
+    result['answer'] = result['result']
    result['answer'] = result['answer'].replace("\\n", "<br>")
    result['answer'] = result['answer'].replace("SOURCES:", "")
    # mock result
--- a/application/requirements.txt
+++ b/application/requirements.txt
@@ -20,24 +20,32 @@ idna==3.4
 imagesize==1.4.1
 itsdangerous==2.1.2
 Jinja2==3.1.2
-langchain==0.0.76
+joblib==1.2.0
+langchain==0.0.81
 lxml==4.9.2
 MarkupSafe==2.1.2
 marshmallow==3.19.0
 marshmallow-enum==1.5.1
 multidict==6.0.4
 mypy-extensions==0.4.3
+nltk==3.8.1
 numpy==1.24.1
 openai==0.26.4
 packaging==23.0
+pandas==1.5.3
+Pillow==9.4.0
 pycryptodomex==3.17
 pydantic==1.10.4
 Pygments==2.14.0
+PyPDF2==3.0.1
+python-dateutil==2.8.2
 python-dotenv==0.21.1
+python-pptx==0.6.21
 pytz==2022.7.1
 PyYAML==6.0
 regex==2022.10.31
 requests==2.28.2
+six==1.16.0
 snowballstemmer==2.2.0
 Sphinx==6.1.3
 sphinxcontrib-applehelp==1.0.4
@@ -47,12 +55,15 @@ sphinxcontrib-jsmath==1.0.1
 sphinxcontrib-qthelp==1.0.3
 sphinxcontrib-serializinghtml==1.1.5
 SQLAlchemy==1.4.46
+tenacity==8.2.1
 tiktoken==0.1.2
 tokenizers==0.13.2
 tqdm==4.64.1
 transformers==4.26.0
+typer==0.7.0
 typing-inspect==0.8.0
 typing_extensions==4.4.0
 urllib3==1.26.14
 Werkzeug==2.2.2
+XlsxWriter==3.0.8
 yarl==1.8.2
--- a/application/static/dist/css/output.css
+++ b/application/static/dist/css/output.css
@@ -785,16 +785,6 @@ video {
  color: rgb(17 24 39 / var(--tw-text-opacity));
 }

-.text-green-500 {
-  --tw-text-opacity: 1;
-  color: rgb(34 197 94 / var(--tw-text-opacity));
-}
-
-.text-red-500 {
-  --tw-text-opacity: 1;
-  color: rgb(239 68 68 / var(--tw-text-opacity));
-}
-
 .opacity-75 {
  opacity: 0.75;
 }
--- a/application/static/favicon/android-chrome-192x192.png
+++ b/application/static/favicon/android-chrome-192x192.png
--- a/application/static/favicon/android-chrome-512x512.png
+++ b/application/static/favicon/android-chrome-512x512.png
--- a/application/static/favicon/apple-touch-icon.png
+++ b/application/static/favicon/apple-touch-icon.png
--- a/application/static/favicon/favicon-16x16.png
+++ b/application/static/favicon/favicon-16x16.png
--- a/application/static/favicon/favicon-32x32.png
+++ b/application/static/favicon/favicon-32x32.png
--- a/application/static/favicon/favicon.ico
+++ b/application/static/favicon/favicon.ico
--- a/application/static/favicon/site.webmanifest
+++ b/application/static/favicon/site.webmanifest
@@ -0,0 +1 @@
+{"name":"","short_name":"","icons":[{"src":"/android-chrome-192x192.png","sizes":"192x192","type":"image/png"},{"src":"/android-chrome-512x512.png","sizes":"512x512","type":"image/png"}],"theme_color":"#ffffff","background_color":"#ffffff","display":"standalone"}
--- a/application/templates/index.html
+++ b/application/templates/index.html
@@ -3,6 +3,11 @@
  <head>
    <title>DocsGPT 🦖 Preview</title>
    <link href="{{url_for('static',filename='dist/css/output.css')}}" rel="stylesheet">
+      <link rel="favicon" href="{{ url_for('static', filename='favicon/favicon.ico') }}">
+      <link rel="apple-touch-icon" sizes="180x180" href="{{ url_for('static', filename='favicon/apple-touch-icon.png') }}">
+    <link rel="icon" type="image/png" sizes="32x32" href="{{ url_for('static', filename='favicon/favicon-32x32.png') }}">
+    <link rel="icon" type="image/png" sizes="16x16" href="{{ url_for('static', filename='favicon/favicon-16x16.png') }}">
+    <link rel="manifest" href="{{ url_for('static', filename='favicon//site.webmanifest') }}">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">


@@ -18,7 +23,9 @@
      <h1 class="text-lg font-medium">DocsGPT 🦖 Preview</h1>
        <div>
      <a href="https://github.com/arc53/docsgpt" class="text-blue-500 hover:text-blue-800 text-sm">About</a>
+            {% if not api_key_set %}
      <button class="text-sm text-yellow-500 hover:text-yellow-800" onclick="resetApiKey()">Reset Key</button>
+        {% endif %}
            </div>
    </header>
    <div class="lg:flex ml-2 mr-2">
@@ -72,6 +79,8 @@ This will return a new DataFrame with all the columns from both tables, and only
  <div class="flex items-center justify-center h-full">

 </div>
+
+{% if not api_key_set %}
 <div class="fixed z-10 overflow-y-auto top-0 w-full left-0 hidden" id="modal">
  <div class="flex items-center justify-center min-height-100vh pt-4 px-4 pb-20 text-center sm:block sm:p-0">
    <div class="fixed inset-0 transition-opacity">
@@ -95,15 +104,16 @@ This will return a new DataFrame with all the columns from both tables, and only
    </div>
  </div>
 </div>
+{% endif %}
      <script>
          function docsIndex() {
                // loads latest index from https://raw.githubusercontent.com/arc53/DocsHUB/main/combined.json
                // and stores it in localStorage
-                fetch('https://raw.githubusercontent.com/arc53/DocsHUB/main/combined.json')
+                fetch('https://d3dg1063dc54p9.cloudfront.net/combined.json')
                    .then(response => response.json())
                    .then(data => {
-                        console.log('Success:', data);
                        localStorage.setItem("docsIndex", JSON.stringify(data));
+                        localStorage.setItem("docsIndexDate", Date.now());
                        generateOptions()
                    }

@@ -119,8 +129,6 @@ This will return a new DataFrame with all the columns from both tables, and only
                // create option for each key in docsIndex
                for (var key in docsIndex) {
                    var option = document.createElement("option");
-                    console.log(key)
-                    console.log(docsIndex[key])
                    if (docsIndex[key].name == docsIndex[key].language) {
                        option.text = docsIndex[key].name + " " + docsIndex[key].version;
                        option.value = docsIndex[key].name + "/" + ".project" + "/" + docsIndex[key].version + "/";
@@ -134,20 +142,27 @@ This will return a new DataFrame with all the columns from both tables, and only
                }

          }
-      // check if api_key is set
+        {% if not api_key_set %}
        if (localStorage.getItem('apiKey') === null) {
            console.log("apiKey is not set")
            document.getElementById('modal').classList.toggle('hidden')
        }
+        {% endif %}
        if (localStorage.getItem('docsIndex') === null) {
            console.log("docsIndex is not set")
            docsIndex()
        }
+        else if (localStorage.getItem("docsIndexDate") < Date.now() - 900000) {
+            console.log("docsIndex is older than 15 minutes")
+            docsIndex()
+        }

        generateOptions()

  </script>
+    {% if not api_key_set %}
    <script src="{{url_for('static',filename='src/authapi.js')}}"></script>
+    {% endif %}
  <script src="{{url_for('static',filename='src/chat.js')}}"></script>
  <script src="{{url_for('static',filename='src/choiceChange.js')}}"></script>

--- a/frontend/.eslintignore
+++ b/frontend/.eslintignore
@@ -0,0 +1,17 @@
+node_modules/
+dist/
+prettier.config.cjs
+.eslintrc.cjs
+env.d.ts
+public/
+assets/
+vite-env.d.ts
+.prettierignore
+package-lock.json
+package.json
+postcss.config.cjs
+prettier.config.cjs
+tailwind.config.cjs
+tsconfig.json
+tsconfig.node.json
+vite.config.ts
--- a/frontend/.eslintrc.cjs
+++ b/frontend/.eslintrc.cjs
@@ -0,0 +1,43 @@
+module.exports = {
+  env: {
+    browser: true,
+    es2021: true,
+    node: true,
+  },
+  extends: [
+    'eslint:recommended',
+    'plugin:@typescript-eslint/recommended',
+    'plugin:react/recommended',
+    'plugin:prettier/recommended',
+  ],
+  overrides: [],
+  parser: '@typescript-eslint/parser',
+  parserOptions: {
+    ecmaVersion: 'latest',
+    sourceType: 'module',
+  },
+  plugins: ['react'],
+  rules: {
+    'react/react-in-jsx-scope': 'off',
+  },
+  settings: {
+    'import/parsers': {
+      '@typescript-eslint/parser': ['.ts', '.tsx'],
+    },
+    react: {
+      version: 'detect',
+    },
+    'import/resolver': {
+      node: {
+        paths: ['src'],
+        extensions: ['.js', '.jsx', '.ts', '.tsx'],
+      },
+    },
+  },
+  'prettier/prettier': [
+    'error',
+    {
+      endOfLine: 'auto',
+    },
+  ],
+}
--- a/frontend/.husky/pre-commit
+++ b/frontend/.husky/pre-commit
@@ -0,0 +1,6 @@
+#!/usr/bin/env sh
+. "$(dirname -- "$0")/_/husky.sh"
+
+# npm test
+cd frontend
+npx lint-staged
--- a/frontend/.prettierignore
+++ b/frontend/.prettierignore
@@ -0,0 +1,17 @@
+node_modules/
+dist/
+prettier.config.cjs
+.eslintrc.cjs
+env.d.ts
+public/
+assets/
+vite-env.d.ts
+.prettierignore
+package-lock.json
+package.json
+postcss.config.cjs
+prettier.config.cjs
+tailwind.config.cjs
+tsconfig.json
+tsconfig.node.json
+vite.config.ts
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -0,0 +1,12 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>DocsGPT 🦖</title>
+  </head>
+  <body>
+    <div id="root" class="h-screen"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -0,0 +1,50 @@
+{
+  "name": "frontend",
+  "private": true,
+  "version": "0.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc && vite build",
+    "preview": "vite preview",
+    "lint": "eslint ./src --ext .jsx,.js,.ts,.tsx",
+    "lint-fix": "eslint ./src --ext .jsx,.js,.ts,.tsx --fix",
+    "format": "prettier ./src --write",
+    "prepare": "cd .. && husky install frontend/.husky"
+  },
+  "lint-staged": {
+    "**/*.{js,jsx,ts,tsx}": [
+      "npm run lint-fix",
+      "npm run format"
+    ]
+  },
+  "dependencies": {
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "react-router-dom": "^6.8.1"
+  },
+  "devDependencies": {
+    "@types/react": "^18.0.27",
+    "@types/react-dom": "^18.0.10",
+    "@typescript-eslint/eslint-plugin": "^5.51.0",
+    "@typescript-eslint/parser": "^5.51.0",
+    "@vitejs/plugin-react": "^3.1.0",
+    "autoprefixer": "^10.4.13",
+    "eslint": "^8.33.0",
+    "eslint-config-prettier": "^8.6.0",
+    "eslint-config-standard-with-typescript": "^34.0.0",
+    "eslint-plugin-import": "^2.27.5",
+    "eslint-plugin-n": "^15.6.1",
+    "eslint-plugin-prettier": "^4.2.1",
+    "eslint-plugin-promise": "^6.1.1",
+    "eslint-plugin-react": "^7.32.2",
+    "husky": "^8.0.0",
+    "lint-staged": "^13.1.1",
+    "postcss": "^8.4.21",
+    "prettier": "^2.8.4",
+    "prettier-plugin-tailwindcss": "^0.2.2",
+    "tailwindcss": "^3.2.4",
+    "typescript": "^4.9.5",
+    "vite": "^4.1.0"
+  }
+}
--- a/frontend/postcss.config.cjs
+++ b/frontend/postcss.config.cjs
@@ -0,0 +1,6 @@
+module.exports = {
+  plugins: {
+    tailwindcss: {},
+    autoprefixer: {},
+  },
+}
--- a/frontend/prettier.config.cjs
+++ b/frontend/prettier.config.cjs
@@ -0,0 +1,7 @@
+module.exports = {
+  trailingComma: 'all',
+  tabWidth: 2,
+  semi: true,
+  singleQuote: true,
+  printWidth: 80,
+}
--- a/frontend/public/vite.svg
+++ b/frontend/public/vite.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="iconify iconify--logos" width="31.88" height="32" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 257"><defs><linearGradient id="IconifyId1813088fe1fbc01fb466" x1="-.828%" x2="57.636%" y1="7.652%" y2="78.411%"><stop offset="0%" stop-color="#41D1FF"></stop><stop offset="100%" stop-color="#BD34FE"></stop></linearGradient><linearGradient id="IconifyId1813088fe1fbc01fb467" x1="43.376%" x2="50.316%" y1="2.242%" y2="89.03%"><stop offset="0%" stop-color="#FFEA83"></stop><stop offset="8.333%" stop-color="#FFDD35"></stop><stop offset="100%" stop-color="#FFA800"></stop></linearGradient></defs><path fill="url(#IconifyId1813088fe1fbc01fb466)" d="M255.153 37.938L134.897 252.976c-2.483 4.44-8.862 4.466-11.382.048L.875 37.958c-2.746-4.814 1.371-10.646 6.827-9.67l120.385 21.517a6.537 6.537 0 0 0 2.322-.004l117.867-21.483c5.438-.991 9.574 4.796 6.877 9.62Z"></path><path fill="url(#IconifyId1813088fe1fbc01fb467)" d="M185.432.063L96.44 17.501a3.268 3.268 0 0 0-2.634 3.014l-5.474 92.456a3.268 3.268 0 0 0 3.997 3.378l24.777-5.718c2.318-.535 4.413 1.507 3.936 3.838l-7.361 36.047c-.495 2.426 1.782 4.5 4.151 3.78l15.304-4.649c2.372-.72 4.652 1.36 4.15 3.788l-11.698 56.621c-.732 3.542 3.979 5.473 5.943 2.437l1.313-2.028l72.516-144.72c1.215-2.423-.88-5.186-3.54-4.672l-25.505 4.922c-2.396.462-4.435-1.77-3.759-4.114l16.646-57.705c.677-2.35-1.37-4.583-3.769-4.113Z"></path></svg>
--- a/frontend/src/App.css
+++ b/frontend/src/App.css
@@ -0,0 +1,4 @@
+html,
+body {
+  min-height: 100vh;
+}
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -0,0 +1,55 @@
+import { useEffect, useState } from 'react';
+import { Routes, Route } from 'react-router-dom';
+import Navigation from './components/Navigation/Navigation';
+import DocsGPT from './components/DocsGPT';
+import APIKeyModal from './components/APIKeyModal';
+import './App.css';
+
+export default function App() {
+  //Currently using primitive state management. Will most likely be replaced with Redux.
+  const [isMobile, setIsMobile] = useState(true);
+  const [isMenuOpen, setIsMenuOpen] = useState(true);
+  const [isApiModalOpen, setIsApiModalOpen] = useState(true);
+  const [apiKey, setApiKey] = useState('');
+
+  const handleResize = () => {
+    if (window.innerWidth > 768 && isMobile) {
+      setIsMobile(false);
+    } else {
+      setIsMobile(true);
+    }
+  };
+
+  useEffect(() => {
+    window.addEventListener('resize', handleResize);
+    handleResize();
+
+    return () => {
+      window.removeEventListener('resize', handleResize);
+    };
+  }, []);
+
+  return (
+    <div
+      className={`${
+        isMobile ? 'flex-col' : 'flex-row'
+      } relative flex transition-all`}
+    >
+      <APIKeyModal
+        apiKey={apiKey}
+        setApiKey={setApiKey}
+        isApiModalOpen={isApiModalOpen}
+        setIsApiModalOpen={setIsApiModalOpen}
+      />
+      <Navigation
+        isMobile={isMobile}
+        isMenuOpen={isMenuOpen}
+        setIsMenuOpen={setIsMenuOpen}
+        setIsApiModalOpen={setIsApiModalOpen}
+      />
+      <Routes>
+        <Route path="/" element={<DocsGPT isMenuOpen={isMenuOpen} />} />
+      </Routes>
+    </div>
+  );
+}
--- a/frontend/src/components/APIKeyModal.tsx
+++ b/frontend/src/components/APIKeyModal.tsx
@@ -0,0 +1,60 @@
+import { useState } from 'react';
+
+export default function APIKeyModal({
+  isApiModalOpen,
+  setIsApiModalOpen,
+  apiKey,
+  setApiKey,
+}: {
+  isApiModalOpen: boolean;
+  setIsApiModalOpen: React.Dispatch<React.SetStateAction<boolean>>;
+  apiKey: string;
+  setApiKey: React.Dispatch<React.SetStateAction<string>>;
+}) {
+  const [formError, setFormError] = useState(false);
+
+  const handleResetKey = () => {
+    if (!apiKey) {
+      setFormError(true);
+    } else {
+      setFormError(false);
+      setIsApiModalOpen(false);
+    }
+  };
+
+  return (
+    <div
+      className={`${
+        isApiModalOpen ? 'visible' : 'hidden'
+      } absolute z-30  h-screen w-screen  bg-gray-alpha`}
+    >
+      <div className="mx-auto mt-24 flex w-128 flex-col gap-4 rounded-lg bg-white p-6 shadow-lg">
+        <p className="text-xl text-jet">OpenAI API Key</p>
+        <p className="text-lg leading-5 text-gray-500">
+          Before you can start using DocsGPT we need you to provide an API key
+          for llm. Currently, we support only OpenAI but soon many more. You can
+          find it here.
+        </p>
+        <input
+          type="text"
+          className="h-10 w-full border-b-2 border-jet focus:outline-none"
+          value={apiKey}
+          maxLength={100}
+          placeholder="API Key"
+          onChange={(e) => setApiKey(e.target.value)}
+        />
+        <div className="flex justify-between">
+          {formError && (
+            <p className="text-sm text-red-500">Please enter a valid API key</p>
+          )}
+          <button
+            onClick={handleResetKey}
+            className="ml-auto h-10 w-20 rounded-lg bg-violet-800 text-white transition-all hover:bg-violet-700"
+          >
+            Save
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/DocsGPT.tsx
+++ b/frontend/src/components/DocsGPT.tsx
@@ -0,0 +1,7 @@
+export default function DocsGPT({ isMenuOpen }: { isMenuOpen: boolean }) {
+  return (
+    <div className={`${isMenuOpen ? 'md:ml-72 lg:ml-96' : 'ml-16'}`}>
+      Docs GPT Chat Placeholder
+    </div>
+  );
+}
--- a/frontend/src/components/Navigation/Navigation.tsx
+++ b/frontend/src/components/Navigation/Navigation.tsx
@@ -0,0 +1,103 @@
+import React, { useState } from 'react';
+import Arrow1 from './imgs/arrow.svg';
+import Key from './imgs/key.svg';
+import Info from './imgs/info.svg';
+import Link from './imgs/link.svg';
+
+function MobileNavigation() {
+  return <div>Mobile Navigation</div>;
+}
+
+function DesktopNavigation({
+  isMenuOpen,
+  setIsMenuOpen,
+  setIsApiModalOpen,
+}: {
+  isMenuOpen: boolean;
+  setIsMenuOpen: React.Dispatch<React.SetStateAction<boolean>>;
+  setIsApiModalOpen: React.Dispatch<React.SetStateAction<boolean>>;
+}) {
+  return (
+    <div
+      className={`${
+        isMenuOpen ? 'w-72 lg:w-96' : 'w-16'
+      } fixed flex h-screen flex-col border-r-2 border-gray-100 bg-gray-50 transition-all`}
+    >
+      <div
+        className={`${
+          isMenuOpen ? 'w-full' : 'w-16'
+        } ml-auto h-16 border-b-2 border-gray-100`}
+      >
+        <button
+          className="float-right mr-5 mt-5 h-5 w-5"
+          onClick={() => setIsMenuOpen(!isMenuOpen)}
+        >
+          <img
+            src={Arrow1}
+            alt="menu toggle"
+            className={`${
+              isMenuOpen ? 'rotate-0' : 'rotate-180'
+            } m-auto w-3 transition-all`}
+          />
+        </button>
+      </div>
+
+      {isMenuOpen && (
+        <>
+          <div className="flex-grow border-b-2 border-gray-100"></div>
+
+          <div className="flex h-16 flex-col border-b-2 border-gray-100">
+            <div
+              className="my-auto mx-4 flex h-12 cursor-pointer gap-4 rounded-md hover:bg-gray-100"
+              onClick={() => setIsApiModalOpen(true)}
+            >
+              <img src={Key} alt="key" className="ml-2 w-6" />
+              <p className="my-auto text-eerie-black">Reset Key</p>
+            </div>
+          </div>
+
+          <div className="flex h-48 flex-col border-b-2 border-gray-100">
+            <div className="my-auto mx-4 flex h-12 cursor-pointer gap-4 rounded-md hover:bg-gray-100">
+              <img src={Info} alt="info" className="ml-2 w-5" />
+              <p className="my-auto text-eerie-black">About</p>
+            </div>
+
+            <div className="my-auto mx-4 flex h-12 cursor-pointer gap-4 rounded-md hover:bg-gray-100">
+              <img src={Link} alt="link" className="ml-2 w-5" />
+              <p className="my-auto text-eerie-black">Discord</p>
+            </div>
+
+            <div className="my-auto mx-4 flex h-12 cursor-pointer gap-4 rounded-md hover:bg-gray-100">
+              <img src={Link} alt="link" className="ml-2 w-5" />
+              <p className="my-auto text-eerie-black">Github</p>
+            </div>
+          </div>
+        </>
+      )}
+    </div>
+  );
+}
+
+export default function Navigation({
+  isMobile,
+  isMenuOpen,
+  setIsMenuOpen,
+  setIsApiModalOpen,
+}: {
+  isMobile: boolean;
+  isMenuOpen: boolean;
+  setIsMenuOpen: React.Dispatch<React.SetStateAction<boolean>>;
+  setIsApiModalOpen: React.Dispatch<React.SetStateAction<boolean>>;
+}) {
+  if (isMobile) {
+    return <MobileNavigation />;
+  } else {
+    return (
+      <DesktopNavigation
+        isMenuOpen={isMenuOpen}
+        setIsMenuOpen={setIsMenuOpen}
+        setIsApiModalOpen={setIsApiModalOpen}
+      />
+    );
+  }
+}
--- a/frontend/src/components/Navigation/PastChat.tsx
+++ b/frontend/src/components/Navigation/PastChat.tsx
@@ -0,0 +1 @@
+export default function PastChat() {}
--- a/frontend/src/components/Navigation/imgs/arrow.svg
+++ b/frontend/src/components/Navigation/imgs/arrow.svg
@@ -0,0 +1,3 @@
+<svg width="8" height="12" viewBox="0 0 8 12" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M7.41 10.59L2.83 6L7.41 1.41L6 0L0 6L6 12L7.41 10.59Z" fill="black" fill-opacity="0.54"/>
+</svg>
--- a/frontend/src/components/Navigation/imgs/info.svg
+++ b/frontend/src/components/Navigation/imgs/info.svg
@@ -0,0 +1,3 @@
+<svg width="20" height="20" viewBox="0 0 20 20" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M10 0C4.48 0 0 4.48 0 10C0 15.52 4.48 20 10 20C15.52 20 20 15.52 20 10C20 4.48 15.52 0 10 0ZM11 15H9V9H11V15ZM11 7H9V5H11V7Z" fill="black" fill-opacity="0.54"/>
+</svg>
--- a/frontend/src/components/Navigation/imgs/key.svg
+++ b/frontend/src/components/Navigation/imgs/key.svg
@@ -0,0 +1,3 @@
+<svg width="22" height="12" viewBox="0 0 22 12" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M11.65 4C10.83 1.67 8.61 0 6 0C2.69 0 0 2.69 0 6C0 9.31 2.69 12 6 12C8.61 12 10.83 10.33 11.65 8H16V12H20V8H22V4H11.65ZM6 8C4.9 8 4 7.1 4 6C4 4.9 4.9 4 6 4C7.1 4 8 4.9 8 6C8 7.1 7.1 8 6 8Z" fill="black" fill-opacity="0.54"/>
+</svg>
--- a/frontend/src/components/Navigation/imgs/link.svg
+++ b/frontend/src/components/Navigation/imgs/link.svg
@@ -0,0 +1,3 @@
+<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M16 16H2V2H9V0H2C0.89 0 0 0.9 0 2V16C0 17.1 0.89 18 2 18H16C17.1 18 18 17.1 18 16V9H16V16ZM11 0V2H14.59L4.76 11.83L6.17 13.24L16 3.41V7H18V0H11Z" fill="black" fill-opacity="0.54"/>
+</svg>
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
@@ -0,0 +1,355 @@
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+
+/*! normalize.css v8.0.1 | MIT License | github.com/necolas/normalize.css */
+
+/* Document
+   ========================================================================== */
+
+/**
+ * 1. Correct the line height in all browsers.
+ * 2. Prevent adjustments of font size after orientation changes in iOS.
+ */
+
+html {
+  line-height: 1.15; /* 1 */
+  -webkit-text-size-adjust: 100%; /* 2 */
+}
+
+/* Sections
+   ========================================================================== */
+
+/**
+ * Remove the margin in all browsers.
+ */
+
+body {
+  margin: 0;
+}
+
+/**
+ * Render the `main` element consistently in IE.
+ */
+
+main {
+  display: block;
+}
+
+/**
+ * Correct the font size and margin on `h1` elements within `section` and
+ * `article` contexts in Chrome, Firefox, and Safari.
+ */
+
+h1 {
+  font-size: 2em;
+  margin: 0.67em 0;
+}
+
+/* Grouping content
+   ========================================================================== */
+
+/**
+ * 1. Add the correct box sizing in Firefox.
+ * 2. Show the overflow in Edge and IE.
+ */
+
+hr {
+  box-sizing: content-box; /* 1 */
+  height: 0; /* 1 */
+  overflow: visible; /* 2 */
+}
+
+/**
+ * 1. Correct the inheritance and scaling of font size in all browsers.
+ * 2. Correct the odd `em` font sizing in all browsers.
+ */
+
+pre {
+  font-family: monospace, monospace; /* 1 */
+  font-size: 1em; /* 2 */
+}
+
+/* Text-level semantics
+   ========================================================================== */
+
+/**
+ * Remove the gray background on active links in IE 10.
+ */
+
+a {
+  background-color: transparent;
+}
+
+/**
+ * 1. Remove the bottom border in Chrome 57-
+ * 2. Add the correct text decoration in Chrome, Edge, IE, Opera, and Safari.
+ */
+
+abbr[title] {
+  border-bottom: none; /* 1 */
+  text-decoration: underline; /* 2 */
+  text-decoration: underline dotted; /* 2 */
+}
+
+/**
+ * Add the correct font weight in Chrome, Edge, and Safari.
+ */
+
+b,
+strong {
+  font-weight: bolder;
+}
+
+/**
+ * 1. Correct the inheritance and scaling of font size in all browsers.
+ * 2. Correct the odd `em` font sizing in all browsers.
+ */
+
+code,
+kbd,
+samp {
+  font-family: monospace, monospace; /* 1 */
+  font-size: 1em; /* 2 */
+}
+
+/**
+ * Add the correct font size in all browsers.
+ */
+
+small {
+  font-size: 80%;
+}
+
+/**
+ * Prevent `sub` and `sup` elements from affecting the line height in
+ * all browsers.
+ */
+
+sub,
+sup {
+  font-size: 75%;
+  line-height: 0;
+  position: relative;
+  vertical-align: baseline;
+}
+
+sub {
+  bottom: -0.25em;
+}
+
+sup {
+  top: -0.5em;
+}
+
+/* Embedded content
+   ========================================================================== */
+
+/**
+ * Remove the border on images inside links in IE 10.
+ */
+
+img {
+  border-style: none;
+}
+
+/* Forms
+   ========================================================================== */
+
+/**
+ * 1. Change the font styles in all browsers.
+ * 2. Remove the margin in Firefox and Safari.
+ */
+
+button,
+input,
+optgroup,
+select,
+textarea {
+  font-family: inherit; /* 1 */
+  font-size: 100%; /* 1 */
+  line-height: 1.15; /* 1 */
+  margin: 0; /* 2 */
+}
+
+/**
+ * Show the overflow in IE.
+ * 1. Show the overflow in Edge.
+ */
+
+button,
+input {
+  /* 1 */
+  overflow: visible;
+}
+
+/**
+ * Remove the inheritance of text transform in Edge, Firefox, and IE.
+ * 1. Remove the inheritance of text transform in Firefox.
+ */
+
+button,
+select {
+  /* 1 */
+  text-transform: none;
+}
+
+/**
+ * Correct the inability to style clickable types in iOS and Safari.
+ */
+
+button,
+[type='button'],
+[type='reset'],
+[type='submit'] {
+  -webkit-appearance: button;
+}
+
+/**
+ * Remove the inner border and padding in Firefox.
+ */
+
+button::-moz-focus-inner,
+[type='button']::-moz-focus-inner,
+[type='reset']::-moz-focus-inner,
+[type='submit']::-moz-focus-inner {
+  border-style: none;
+  padding: 0;
+}
+
+/**
+ * Restore the focus styles unset by the previous rule.
+ */
+
+button:-moz-focusring,
+[type='button']:-moz-focusring,
+[type='reset']:-moz-focusring,
+[type='submit']:-moz-focusring {
+  outline: 1px dotted ButtonText;
+}
+
+/**
+ * Correct the padding in Firefox.
+ */
+
+fieldset {
+  padding: 0.35em 0.75em 0.625em;
+}
+
+/**
+ * 1. Correct the text wrapping in Edge and IE.
+ * 2. Correct the color inheritance from `fieldset` elements in IE.
+ * 3. Remove the padding so developers are not caught out when they zero out
+ *    `fieldset` elements in all browsers.
+ */
+
+legend {
+  box-sizing: border-box; /* 1 */
+  color: inherit; /* 2 */
+  display: table; /* 1 */
+  max-width: 100%; /* 1 */
+  padding: 0; /* 3 */
+  white-space: normal; /* 1 */
+}
+
+/**
+ * Add the correct vertical alignment in Chrome, Firefox, and Opera.
+ */
+
+progress {
+  vertical-align: baseline;
+}
+
+/**
+ * Remove the default vertical scrollbar in IE 10+.
+ */
+
+textarea {
+  overflow: auto;
+}
+
+/**
+ * 1. Add the correct box sizing in IE 10.
+ * 2. Remove the padding in IE 10.
+ */
+
+[type='checkbox'],
+[type='radio'] {
+  box-sizing: border-box; /* 1 */
+  padding: 0; /* 2 */
+}
+
+/**
+ * Correct the cursor style of increment and decrement buttons in Chrome.
+ */
+
+[type='number']::-webkit-inner-spin-button,
+[type='number']::-webkit-outer-spin-button {
+  height: auto;
+}
+
+/**
+ * 1. Correct the odd appearance in Chrome and Safari.
+ * 2. Correct the outline style in Safari.
+ */
+
+[type='search'] {
+  -webkit-appearance: textfield; /* 1 */
+  outline-offset: -2px; /* 2 */
+}
+
+/**
+ * Remove the inner padding in Chrome and Safari on macOS.
+ */
+
+[type='search']::-webkit-search-decoration {
+  -webkit-appearance: none;
+}
+
+/**
+ * 1. Correct the inability to style clickable types in iOS and Safari.
+ * 2. Change font properties to `inherit` in Safari.
+ */
+
+::-webkit-file-upload-button {
+  -webkit-appearance: button; /* 1 */
+  font: inherit; /* 2 */
+}
+
+/* Interactive
+   ========================================================================== */
+
+/*
+ * Add the correct display in Edge, IE 10+, and Firefox.
+ */
+
+details {
+  display: block;
+}
+
+/*
+ * Add the correct display in all browsers.
+ */
+
+summary {
+  display: list-item;
+}
+
+/* Misc
+   ========================================================================== */
+
+/**
+ * Add the correct display in IE 10+.
+ */
+
+template {
+  display: none;
+}
+
+/**
+ * Add the correct display in IE 10.
+ */
+
+[hidden] {
+  display: none;
+}
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
@@ -0,0 +1,13 @@
+import React from 'react';
+import ReactDOM from 'react-dom/client';
+import App from './App';
+import { BrowserRouter } from 'react-router-dom';
+import './index.css';
+
+ReactDOM.createRoot(document.getElementById('root') as HTMLElement).render(
+  <React.StrictMode>
+    <BrowserRouter>
+      <App />
+    </BrowserRouter>
+  </React.StrictMode>,
+);
--- a/frontend/src/vite-env.d.ts
+++ b/frontend/src/vite-env.d.ts
@@ -0,0 +1 @@
+/// <reference types="vite/client" />
--- a/frontend/tailwind.config.cjs
+++ b/frontend/tailwind.config.cjs
@@ -0,0 +1,18 @@
+/** @type {import('tailwindcss').Config} */
+module.exports = {
+  content: ['./index.html', './src/**/*.{js,ts,jsx,tsx}'],
+  theme: {
+    extend: {
+      spacing: {
+        112: '28rem',
+        128: '32rem',
+      },
+      colors: {
+        'eerie-black': '#212121',
+        jet: '#343541',
+        'gray-alpha': 'rgba(0,0,0, .1)',
+      },
+    },
+  },
+  plugins: [],
+};
--- a/frontend/tsconfig.json
+++ b/frontend/tsconfig.json
@@ -0,0 +1,21 @@
+{
+  "compilerOptions": {
+    "target": "ESNext",
+    "useDefineForClassFields": true,
+    "lib": ["DOM", "DOM.Iterable", "ESNext"],
+    "allowJs": false,
+    "skipLibCheck": true,
+    "esModuleInterop": false,
+    "allowSyntheticDefaultImports": true,
+    "strict": true,
+    "forceConsistentCasingInFileNames": true,
+    "module": "ESNext",
+    "moduleResolution": "Node",
+    "resolveJsonModule": true,
+    "isolatedModules": true,
+    "noEmit": true,
+    "jsx": "react-jsx"
+  },
+  "include": ["src"],
+  "references": [{ "path": "./tsconfig.node.json" }]
+}
--- a/frontend/tsconfig.node.json
+++ b/frontend/tsconfig.node.json
@@ -0,0 +1,9 @@
+{
+  "compilerOptions": {
+    "composite": true,
+    "module": "ESNext",
+    "moduleResolution": "Node",
+    "allowSyntheticDefaultImports": true
+  },
+  "include": ["vite.config.ts"]
+}
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@@ -0,0 +1,7 @@
+import { defineConfig } from 'vite'
+import react from '@vitejs/plugin-react'
+
+// https://vitejs.dev/config/
+export default defineConfig({
+  plugins: [react()],
+})
--- a/scripts/ingest.py
+++ b/scripts/ingest.py
@@ -0,0 +1,64 @@
+import sys
+import nltk
+import dotenv
+import typer
+
+from typing import List, Optional
+
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+
+from parser.file.bulk import SimpleDirectoryReader
+from parser.schema.base import Document
+from parser.open_ai_func import call_openai_api, get_user_permission
+
+dotenv.load_dotenv()
+
+app = typer.Typer(add_completion=False)
+
+nltk.download('punkt', quiet=True)
+nltk.download('averaged_perceptron_tagger', quiet=True)
+
+#Splits all files in specified folder to documents
+@app.command()
+def ingest(directory: Optional[str] = typer.Option("inputs",
+                                                   help="Path to the directory for index creation."),
+           files: Optional[List[str]] = typer.Option(None,
+                                                   help="""File paths to use (Optional; overrides directory).
+                                                        E.g. --files inputs/1.md --files inputs/2.md"""),
+           recursive: Optional[bool] = typer.Option(True,
+                                                   help="Whether to recursively search in subdirectories."),
+           limit: Optional[int] = typer.Option(None,
+                                                   help="Maximum number of files to read."),
+           formats: Optional[List[str]] = typer.Option([".rst", ".md"],
+                                                   help="""List of required extensions (list with .)
+                                                        Currently supported: .rst, .md, .pdf, .docx, .csv, .epub"""),
+           exclude: Optional[bool] = typer.Option(True, help="Whether to exclude hidden files (dotfiles).")):
+
+    """
+        Creates index from specified location or files.
+        By default /inputs folder is used, .rst and .md are parsed.
+    """
+    raw_docs = SimpleDirectoryReader(input_dir=directory, input_files=files, recursive=recursive,
+                                     required_exts=formats, num_files_limit=limit,
+                                     exclude_hidden=exclude).load_data()
+    raw_docs = [Document.to_langchain_format(raw_doc) for raw_doc in raw_docs]
+    print(raw_docs)
+    # Here we split the documents, as needed, into smaller chunks.
+    # We do this due to the context limits of the LLMs.
+    text_splitter = RecursiveCharacterTextSplitter()
+    docs = text_splitter.split_documents(raw_docs)
+
+    # Here we check for command line arguments for bot calls.
+    # If no argument exists or the permission_bypass_flag argument is not '-y',
+    # user permission is requested to call the API.
+    if len(sys.argv) > 1:
+        permission_bypass_flag = sys.argv[1]
+        if permission_bypass_flag == '-y':
+            call_openai_api(docs)
+        else:
+            get_user_permission(docs)
+    else:
+        get_user_permission(docs)
+
+if __name__ == "__main__":
+  app()
--- a/scripts/old/ingest_rst.py
+++ b/scripts/old/ingest_rst.py
--- a/scripts/old/ingest_rst_sphinx.py
+++ b/scripts/old/ingest_rst_sphinx.py
@@ -29,6 +29,18 @@ def convert_rst_to_txt(src_dir, dst_dir):
               f"-D source_suffix=.rst " \
               f"-C {dst_dir} "
        sphinx_main(args.split())
+      elif file.endswith(".md"):
+        # Rename the .md file to .rst file
+        src_file = os.path.join(root, file)
+        dst_file = os.path.join(root, file.replace(".md", ".rst"))
+        os.rename(src_file, dst_file)
+        # Convert the .rst file to .txt file using sphinx-build
+        args = f". -b text -D extensions=sphinx.ext.autodoc " \
+                f"-D master_doc={dst_file} " \
+                f"-D source_suffix=.rst " \
+                f"-C {dst_dir} "
+        sphinx_main(args.split())
+

 def num_tokens_from_string(string: str, encoding_name: str) -> int:
 # Function to convert string to tokens and estimate user cost.
--- a/scripts/parser/init.py
+++ b/scripts/parser/init.py
@@ -0,0 +1 @@
+
--- a/scripts/parser/file/base.py
+++ b/scripts/parser/file/base.py
@@ -0,0 +1,20 @@
+"""Base reader class."""
+from abc import abstractmethod
+from typing import Any, List
+
+from langchain.docstore.document import Document as LCDocument
+
+from parser.schema.base import Document
+
+
+class BaseReader:
+    """Utilities for loading data from a directory."""
+
+    @abstractmethod
+    def load_data(self, *args: Any, **load_kwargs: Any) -> List[Document]:
+        """Load data from the input directory."""
+
+    def load_langchain_documents(self, **load_kwargs: Any) -> List[LCDocument]:
+        """Load data in LangChain document format."""
+        docs = self.load_data(**load_kwargs)
+        return [d.to_langchain_format() for d in docs]
--- a/scripts/parser/file/base_parser.py
+++ b/scripts/parser/file/base_parser.py
@@ -0,0 +1,38 @@
+"""Base parser and config class."""
+
+from abc import abstractmethod
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+
+
+class BaseParser:
+    """Base class for all parsers."""
+
+    def __init__(self, parser_config: Optional[Dict] = None):
+        """Init params."""
+        self._parser_config = parser_config
+
+    def init_parser(self) -> None:
+        """Init parser and store it."""
+        parser_config = self._init_parser()
+        self._parser_config = parser_config
+
+    @property
+    def parser_config_set(self) -> bool:
+        """Check if parser config is set."""
+        return self._parser_config is not None
+
+    @property
+    def parser_config(self) -> Dict:
+        """Check if parser config is set."""
+        if self._parser_config is None:
+            raise ValueError("Parser config not set.")
+        return self._parser_config
+
+    @abstractmethod
+    def _init_parser(self) -> Dict:
+        """Initialize the parser with the config."""
+
+    @abstractmethod
+    def parse_file(self, file: Path, errors: str = "ignore") -> Union[str, List[str]]:
+        """Parse file."""
--- a/scripts/parser/file/bulk.py
+++ b/scripts/parser/file/bulk.py
@@ -0,0 +1,158 @@
+"""Simple reader that reads files of different formats from a directory."""
+import logging
+from pathlib import Path
+from typing import Callable, Dict, List, Optional, Union
+
+from parser.file.base import BaseReader
+from parser.file.base_parser import BaseParser
+from parser.file.docs_parser import DocxParser, PDFParser
+from parser.file.epub_parser import EpubParser
+from parser.file.markdown_parser import MarkdownParser
+from parser.file.rst_parser import RstParser
+from parser.file.tabular_parser import PandasCSVParser
+from parser.schema.base import Document
+
+DEFAULT_FILE_EXTRACTOR: Dict[str, BaseParser] = {
+    ".pdf": PDFParser(),
+    ".docx": DocxParser(),
+    ".csv": PandasCSVParser(),
+    ".epub": EpubParser(),
+    ".md": MarkdownParser(),
+    ".rst": RstParser(),
+}
+
+
+class SimpleDirectoryReader(BaseReader):
+    """Simple directory reader.
+
+    Can read files into separate documents, or concatenates
+    files into one document text.
+
+    Args:
+        input_dir (str): Path to the directory.
+        input_files (List): List of file paths to read (Optional; overrides input_dir)
+        exclude_hidden (bool): Whether to exclude hidden files (dotfiles).
+        errors (str): how encoding and decoding errors are to be handled,
+              see https://docs.python.org/3/library/functions.html#open
+        recursive (bool): Whether to recursively search in subdirectories.
+            False by default.
+        required_exts (Optional[List[str]]): List of required extensions.
+            Default is None.
+        file_extractor (Optional[Dict[str, BaseParser]]): A mapping of file
+            extension to a BaseParser class that specifies how to convert that file
+            to text. See DEFAULT_FILE_EXTRACTOR.
+        num_files_limit (Optional[int]): Maximum number of files to read.
+            Default is None.
+        file_metadata (Optional[Callable[str, Dict]]): A function that takes
+            in a filename and returns a Dict of metadata for the Document.
+            Default is None.
+    """
+
+    def __init__(
+        self,
+        input_dir: Optional[str] = None,
+        input_files: Optional[List] = None,
+        exclude_hidden: bool = True,
+        errors: str = "ignore",
+        recursive: bool = True,
+        required_exts: Optional[List[str]] = None,
+        file_extractor: Optional[Dict[str, BaseParser]] = None,
+        num_files_limit: Optional[int] = None,
+        file_metadata: Optional[Callable[[str], Dict]] = None,
+    ) -> None:
+        """Initialize with parameters."""
+        super().__init__()
+
+        if not input_dir and not input_files:
+            raise ValueError("Must provide either `input_dir` or `input_files`.")
+
+        self.errors = errors
+
+        self.recursive = recursive
+        self.exclude_hidden = exclude_hidden
+        self.required_exts = required_exts
+        self.num_files_limit = num_files_limit
+
+        if input_files:
+            self.input_files = []
+            for path in input_files:
+                input_file = Path(path)
+                self.input_files.append(input_file)
+        elif input_dir:
+            self.input_dir = Path(input_dir)
+            self.input_files = self._add_files(self.input_dir)
+
+        self.file_extractor = file_extractor or DEFAULT_FILE_EXTRACTOR
+        self.file_metadata = file_metadata
+
+    def _add_files(self, input_dir: Path) -> List[Path]:
+        """Add files."""
+        input_files = sorted(input_dir.iterdir())
+        new_input_files = []
+        dirs_to_explore = []
+        for input_file in input_files:
+            if input_file.is_dir():
+                if self.recursive:
+                    dirs_to_explore.append(input_file)
+            elif self.exclude_hidden and input_file.name.startswith("."):
+                continue
+            elif (
+                self.required_exts is not None
+                and input_file.suffix not in self.required_exts
+            ):
+                continue
+            else:
+                new_input_files.append(input_file)
+
+        for dir_to_explore in dirs_to_explore:
+            sub_input_files = self._add_files(dir_to_explore)
+            new_input_files.extend(sub_input_files)
+
+        if self.num_files_limit is not None and self.num_files_limit > 0:
+            new_input_files = new_input_files[0 : self.num_files_limit]
+
+        # print total number of files added
+        logging.debug(
+            f"> [SimpleDirectoryReader] Total files added: {len(new_input_files)}"
+        )
+
+        return new_input_files
+
+    def load_data(self, concatenate: bool = False) -> List[Document]:
+        """Load data from the input directory.
+
+        Args:
+            concatenate (bool): whether to concatenate all files into one document.
+                If set to True, file metadata is ignored.
+                False by default.
+
+        Returns:
+            List[Document]: A list of documents.
+
+        """
+        data: Union[str, List[str]] = ""
+        data_list: List[str] = []
+        metadata_list = []
+        for input_file in self.input_files:
+            if input_file.suffix in self.file_extractor:
+                parser = self.file_extractor[input_file.suffix]
+                if not parser.parser_config_set:
+                    parser.init_parser()
+                data = parser.parse_file(input_file, errors=self.errors)
+            else:
+                # do standard read
+                with open(input_file, "r", errors=self.errors) as f:
+                    data = f.read()
+            if isinstance(data, List):
+                data_list.extend(data)
+            else:
+                data_list.append(str(data))
+            if self.file_metadata is not None:
+                metadata_list.append(self.file_metadata(str(input_file)))
+
+        if concatenate:
+            return [Document("\n".join(data_list))]
+        elif self.file_metadata is not None:
+            return [Document(d, extra_info=m) for d, m in zip(data_list, metadata_list)]
+        else:
+            return [Document(d) for d in data_list]
--- a/scripts/parser/file/docs_parser.py
+++ b/scripts/parser/file/docs_parser.py
@@ -0,0 +1,59 @@
+"""Docs parser.
+
+Contains parsers for docx, pdf files.
+
+"""
+from pathlib import Path
+from typing import Dict
+
+from parser.file.base_parser import BaseParser
+
+
+class PDFParser(BaseParser):
+    """PDF parser."""
+
+    def _init_parser(self) -> Dict:
+        """Init parser."""
+        return {}
+
+    def parse_file(self, file: Path, errors: str = "ignore") -> str:
+        """Parse file."""
+        try:
+            import PyPDF2
+        except ImportError:
+            raise ValueError("PyPDF2 is required to read PDF files.")
+        text_list = []
+        with open(file, "rb") as fp:
+            # Create a PDF object
+            pdf = PyPDF2.PdfReader(fp)
+
+            # Get the number of pages in the PDF document
+            num_pages = len(pdf.pages)
+
+            # Iterate over every page
+            for page in range(num_pages):
+                # Extract the text from the page
+                page_text = pdf.pages[page].extract_text()
+                text_list.append(page_text)
+        text = "\n".join(text_list)
+
+        return text
+
+
+class DocxParser(BaseParser):
+    """Docx parser."""
+
+    def _init_parser(self) -> Dict:
+        """Init parser."""
+        return {}
+
+    def parse_file(self, file: Path, errors: str = "ignore") -> str:
+        """Parse file."""
+        try:
+            import docx2txt
+        except ImportError:
+            raise ValueError("docx2txt is required to read Microsoft Word files.")
+
+        text = docx2txt.process(file)
+
+        return text
--- a/scripts/parser/file/epub_parser.py
+++ b/scripts/parser/file/epub_parser.py
@@ -0,0 +1,43 @@
+"""Epub parser.
+
+Contains parsers for epub files.
+"""
+
+from pathlib import Path
+from typing import Dict
+
+from parser.file.base_parser import BaseParser
+
+
+class EpubParser(BaseParser):
+    """Epub Parser."""
+
+    def _init_parser(self) -> Dict:
+        """Init parser."""
+        return {}
+
+    def parse_file(self, file: Path, errors: str = "ignore") -> str:
+        """Parse file."""
+        try:
+            import ebooklib
+            from ebooklib import epub
+        except ImportError:
+            raise ValueError("`EbookLib` is required to read Epub files.")
+        try:
+            import html2text
+        except ImportError:
+            raise ValueError("`html2text` is required to parse Epub files.")
+
+        text_list = []
+        book = epub.read_epub(file, options={"ignore_ncx": True})
+
+        # Iterate through all chapters.
+        for item in book.get_items():
+            # Chapters are typically located in epub documents items.
+            if item.get_type() == ebooklib.ITEM_DOCUMENT:
+                text_list.append(
+                    html2text.html2text(item.get_content().decode("utf-8"))
+                )
+
+        text = "\n".join(text_list)
+        return text
--- a/scripts/parser/file/markdown_parser.py
+++ b/scripts/parser/file/markdown_parser.py
@@ -0,0 +1,130 @@
+"""Markdown parser.
+
+Contains parser for md files.
+
+"""
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union, cast
+
+from parser.file.base_parser import BaseParser
+
+
+class MarkdownParser(BaseParser):
+    """Markdown parser.
+
+    Extract text from markdown files.
+    Returns dictionary with keys as headers and values as the text between headers.
+
+    """
+
+    def __init__(
+        self,
+        *args: Any,
+        remove_hyperlinks: bool = True,
+        remove_images: bool = True,
+        # remove_tables: bool = True,
+        **kwargs: Any,
+    ) -> None:
+        """Init params."""
+        super().__init__(*args, **kwargs)
+        self._remove_hyperlinks = remove_hyperlinks
+        self._remove_images = remove_images
+        # self._remove_tables = remove_tables
+
+    def markdown_to_tups(self, markdown_text: str) -> List[Tuple[Optional[str], str]]:
+        """Convert a markdown file to a dictionary.
+
+        The keys are the headers and the values are the text under each header.
+
+        """
+        markdown_tups: List[Tuple[Optional[str], str]] = []
+        lines = markdown_text.split("\n")
+
+        current_header = None
+        current_text = ""
+
+        for line in lines:
+            header_match = re.match(r"^#+\s", line)
+            if header_match:
+                if current_header is not None:
+                    if current_text == "" or None:
+                        continue
+                    markdown_tups.append((current_header, current_text))
+
+                current_header = line
+                current_text = ""
+            else:
+                current_text += line + "\n"
+        markdown_tups.append((current_header, current_text))
+
+        if current_header is not None:
+            # pass linting, assert keys are defined
+            markdown_tups = [
+                (re.sub(r"#", "", cast(str, key)).strip(), re.sub(r"<.*?>", "", value))
+                for key, value in markdown_tups
+            ]
+        else:
+            markdown_tups = [
+                (key, re.sub("\n", "", value)) for key, value in markdown_tups
+            ]
+
+        return markdown_tups
+
+    def remove_images(self, content: str) -> str:
+        """Get a dictionary of a markdown file from its path."""
+        pattern = r"!{1}\[\[(.*)\]\]"
+        content = re.sub(pattern, "", content)
+        return content
+
+    # def remove_tables(self, content: str) -> List[List[str]]:
+    #     """Convert markdown tables to nested lists."""
+    #     table_rows_pattern = r"((\r?\n){2}|^)([^\r\n]*\|[^\r\n]*(\r?\n)?)+(?=(\r?\n){2}|$)"
+    #     table_cells_pattern = r"([^\|\r\n]*)\|"
+    #
+    #     table_rows = re.findall(table_rows_pattern, content, re.MULTILINE)
+    #     table_lists = []
+    #     for row in table_rows:
+    #         cells = re.findall(table_cells_pattern, row[2])
+    #         cells = [cell.strip() for cell in cells if cell.strip()]
+    #         table_lists.append(cells)
+    #     return str(table_lists)
+
+    def remove_hyperlinks(self, content: str) -> str:
+        """Get a dictionary of a markdown file from its path."""
+        pattern = r"\[(.*?)\]\((.*?)\)"
+        content = re.sub(pattern, r"\1", content)
+        return content
+
+    def _init_parser(self) -> Dict:
+        """Initialize the parser with the config."""
+        return {}
+
+    def parse_tups(
+        self, filepath: Path, errors: str = "ignore"
+    ) -> List[Tuple[Optional[str], str]]:
+        """Parse file into tuples."""
+        with open(filepath, "r") as f:
+            content = f.read()
+        if self._remove_hyperlinks:
+            content = self.remove_hyperlinks(content)
+        if self._remove_images:
+            content = self.remove_images(content)
+        # if self._remove_tables:
+        #     content = self.remove_tables(content)
+        markdown_tups = self.markdown_to_tups(content)
+        return markdown_tups
+
+    def parse_file(
+        self, filepath: Path, errors: str = "ignore"
+    ) -> Union[str, List[str]]:
+        """Parse file into string."""
+        tups = self.parse_tups(filepath, errors=errors)
+        results = []
+        # TODO: don't include headers right now
+        for header, value in tups:
+            if header is None:
+                results.append(value)
+            else:
+                results.append(f"\n\n{header}\n{value}")
+        return results
--- a/scripts/parser/file/rst_parser.py
+++ b/scripts/parser/file/rst_parser.py
@@ -0,0 +1,171 @@
+"""reStructuredText parser.
+
+Contains parser for md files.
+
+"""
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union, cast
+
+from parser.file.base_parser import BaseParser
+
+
+class RstParser(BaseParser):
+    """reStructuredText parser.
+
+    Extract text from .rst files.
+    Returns dictionary with keys as headers and values as the text between headers.
+
+    """
+
+    def __init__(
+        self,
+        *args: Any,
+        remove_hyperlinks: bool = True,
+        remove_images: bool = True,
+        remove_table_excess: bool = True,
+        remove_interpreters: bool = True,
+        remove_directives: bool = True,
+        remove_whitespaces_excess: bool = True,
+        #Be carefull with remove_characters_excess, might cause data loss
+        remove_characters_excess: bool = True,
+        **kwargs: Any,
+    ) -> None:
+        """Init params."""
+        super().__init__(*args, **kwargs)
+        self._remove_hyperlinks = remove_hyperlinks
+        self._remove_images = remove_images
+        self._remove_table_excess = remove_table_excess
+        self._remove_interpreters = remove_interpreters
+        self._remove_directives = remove_directives
+        self._remove_whitespaces_excess = remove_whitespaces_excess
+        self._remove_characters_excess = remove_characters_excess
+
+    def rst_to_tups(self, rst_text: str) -> List[Tuple[Optional[str], str]]:
+        """Convert a reStructuredText file to a dictionary.
+
+        The keys are the headers and the values are the text under each header.
+
+        """
+        rst_tups: List[Tuple[Optional[str], str]] = []
+        lines = rst_text.split("\n")
+
+        current_header = None
+        current_text = ""
+
+        for i, line in enumerate(lines):
+            header_match = re.match(r"^[^\S\n]*[-=]+[^\S\n]*$", line)
+            if header_match and i > 0 and (len(lines[i - 1].strip()) == len(header_match.group().strip()) or lines[i - 2] == lines[i - 2]):
+                if current_header is not None:
+                    if current_text == "" or None:
+                        continue
+                    # removes the next heading from current Document
+                    if current_text.endswith(lines[i - 1] + "\n"):
+                        current_text = current_text[:len(current_text) - len(lines[i - 1] + "\n")]
+                    rst_tups.append((current_header, current_text))
+
+                current_header = lines[i - 1]
+                current_text = ""
+            else:
+                current_text += line + "\n"
+        rst_tups.append((current_header, current_text))
+
+        #TODO: Format for rst
+        #
+        # if current_header is not None:
+        #     # pass linting, assert keys are defined
+        #     rst_tups = [
+        #         (re.sub(r"#", "", cast(str, key)).strip(), re.sub(r"<.*?>", "", value))
+        #         for key, value in rst_tups
+        #     ]
+        # else:
+        #     rst_tups = [
+        #         (key, re.sub("\n", "", value)) for key, value in rst_tups
+        #     ]
+
+        if current_header is None:
+            rst_tups = [
+                (key, re.sub("\n", "", value)) for key, value in rst_tups
+            ]
+        return rst_tups
+
+    def remove_images(self, content: str) -> str:
+        pattern = r"\.\. image:: (.*)"
+        content = re.sub(pattern, "", content)
+        return content
+
+    def remove_hyperlinks(self, content: str) -> str:
+        pattern = r"`(.*?) <(.*?)>`_"
+        content = re.sub(pattern, r"\1", content)
+        return content
+
+    def remove_directives(self, content: str) -> str:
+        """Removes reStructuredText Directives"""
+        pattern = r"`\.\.([^:]+)::"
+        content = re.sub(pattern, "", content)
+        return content
+
+    def remove_interpreters(self, content: str) -> str:
+        """Removes reStructuredText Interpreted Text Roles"""
+        pattern = r":(\w+):"
+        content = re.sub(pattern, "", content)
+        return content
+
+    def remove_table_excess(self, content: str) -> str:
+        """Pattern to remove grid table separators"""
+        pattern = r"^\+[-]+\+[-]+\+$"
+        content = re.sub(pattern, "", content, flags=re.MULTILINE)
+        return content
+
+    def remove_whitespaces_excess(self, content: List[Tuple[str, Any]]) -> List[Tuple[str, Any]]:
+        """Pattern to match 2 or more consecutive whitespaces"""
+        pattern = r"\s{2,}"
+        content = [(key, re.sub(pattern, "  ", value)) for key, value in content]
+        return content
+
+    def remove_characters_excess(self, content: List[Tuple[str, Any]]) -> List[Tuple[str, Any]]:
+        """Pattern to match 2 or more consecutive characters"""
+        pattern = r"(\S)\1{2,}"
+        content = [(key, re.sub(pattern, r"\1\1\1", value, flags=re.MULTILINE)) for key, value in content]
+        return content
+
+    def _init_parser(self) -> Dict:
+        """Initialize the parser with the config."""
+        return {}
+
+    def parse_tups(
+        self, filepath: Path, errors: str = "ignore"
+    ) -> List[Tuple[Optional[str], str]]:
+        """Parse file into tuples."""
+        with open(filepath, "r") as f:
+            content = f.read()
+        if self._remove_hyperlinks:
+            content = self.remove_hyperlinks(content)
+        if self._remove_images:
+            content = self.remove_images(content)
+        if self._remove_table_excess:
+            content = self.remove_table_excess(content)
+        if self._remove_directives:
+            content = self.remove_directives(content)
+        if self._remove_interpreters:
+            content = self.remove_interpreters(content)
+        rst_tups = self.rst_to_tups(content)
+        if self._remove_whitespaces_excess:
+            rst_tups = self.remove_whitespaces_excess(rst_tups)
+        if self._remove_characters_excess:
+            rst_tups = self.remove_characters_excess(rst_tups)
+        return rst_tups
+
+    def parse_file(
+        self, filepath: Path, errors: str = "ignore"
+    ) -> Union[str, List[str]]:
+        """Parse file into string."""
+        tups = self.parse_tups(filepath, errors=errors)
+        results = []
+        # TODO: don't include headers right now
+        for header, value in tups:
+            if header is None:
+                results.append(value)
+            else:
+                results.append(f"\n\n{header}\n{value}")
+        return results
--- a/scripts/parser/file/tabular_parser.py
+++ b/scripts/parser/file/tabular_parser.py
@@ -0,0 +1,115 @@
+"""Tabular parser.
+
+Contains parsers for tabular data files.
+
+"""
+from pathlib import Path
+from typing import Any, Dict, List, Union
+
+from parser.file.base_parser import BaseParser
+
+
+class CSVParser(BaseParser):
+    """CSV parser.
+
+    Args:
+        concat_rows (bool): whether to concatenate all rows into one document.
+            If set to False, a Document will be created for each row.
+            True by default.
+
+    """
+
+    def __init__(self, *args: Any, concat_rows: bool = True, **kwargs: Any) -> None:
+        """Init params."""
+        super().__init__(*args, **kwargs)
+        self._concat_rows = concat_rows
+
+    def _init_parser(self) -> Dict:
+        """Init parser."""
+        return {}
+
+    def parse_file(self, file: Path, errors: str = "ignore") -> Union[str, List[str]]:
+        """Parse file.
+
+        Returns:
+            Union[str, List[str]]: a string or a List of strings.
+
+        """
+        try:
+            import csv
+        except ImportError:
+            raise ValueError("csv module is required to read CSV files.")
+        text_list = []
+        with open(file, "r") as fp:
+            csv_reader = csv.reader(fp)
+            for row in csv_reader:
+                text_list.append(", ".join(row))
+        if self._concat_rows:
+            return "\n".join(text_list)
+        else:
+            return text_list
+
+
+class PandasCSVParser(BaseParser):
+    r"""Pandas-based CSV parser.
+
+    Parses CSVs using the separator detection from Pandas `read_csv`function.
+    If special parameters are required, use the `pandas_config` dict.
+
+    Args:
+        concat_rows (bool): whether to concatenate all rows into one document.
+            If set to False, a Document will be created for each row.
+            True by default.
+
+        col_joiner (str): Separator to use for joining cols per row.
+            Set to ", " by default.
+
+        row_joiner (str): Separator to use for joining each row.
+            Only used when `concat_rows=True`.
+            Set to "\n" by default.
+
+        pandas_config (dict): Options for the `pandas.read_csv` function call.
+            Refer to https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
+            for more information.
+            Set to empty dict by default, this means pandas will try to figure
+            out the separators, table head, etc. on its own.
+
+    """
+
+    def __init__(
+        self,
+        *args: Any,
+        concat_rows: bool = True,
+        col_joiner: str = ", ",
+        row_joiner: str = "\n",
+        pandas_config: dict = {},
+        **kwargs: Any
+    ) -> None:
+        """Init params."""
+        super().__init__(*args, **kwargs)
+        self._concat_rows = concat_rows
+        self._col_joiner = col_joiner
+        self._row_joiner = row_joiner
+        self._pandas_config = pandas_config
+
+    def _init_parser(self) -> Dict:
+        """Init parser."""
+        return {}
+
+    def parse_file(self, file: Path, errors: str = "ignore") -> Union[str, List[str]]:
+        """Parse file."""
+        try:
+            import pandas as pd
+        except ImportError:
+            raise ValueError("pandas module is required to read CSV files.")
+
+        df = pd.read_csv(file, **self._pandas_config)
+
+        text_list = df.apply(
+            lambda row: (self._col_joiner).join(row.astype(str).tolist()), axis=1
+        ).tolist()
+
+        if self._concat_rows:
+            return (self._row_joiner).join(text_list)
+        else:
+            return text_list
--- a/scripts/parser/open_ai_func.py
+++ b/scripts/parser/open_ai_func.py
@@ -0,0 +1,72 @@
+import faiss
+import pickle
+import tiktoken
+from langchain.vectorstores import FAISS
+from langchain.embeddings import OpenAIEmbeddings
+
+
+def num_tokens_from_string(string: str, encoding_name: str) -> int:
+# Function to convert string to tokens and estimate user cost.
+    encoding = tiktoken.get_encoding(encoding_name)
+    num_tokens = len(encoding.encode(string))
+    total_price = ((num_tokens/1000) * 0.0004)
+    return num_tokens, total_price
+
+def call_openai_api(docs):
+# Function to create a vector store from the documents and save it to disk.
+    from tqdm import tqdm
+    docs_test = [docs[0]]
+    # remove the first element from docs
+    docs.pop(0)
+    # cut first n docs if you want to restart
+    #docs = docs[:n]
+    c1 = 0
+    store = FAISS.from_documents(docs_test, OpenAIEmbeddings())
+    for i in tqdm(docs, desc="Embedding 🦖", unit="docs", total=len(docs), bar_format='{l_bar}{bar}| Time Left: {remaining}'):
+        try:
+            import time
+            store.add_texts([i.page_content], metadatas=[i.metadata])
+        except Exception as e:
+            print(e)
+            print("Error on ", i)
+            print("Saving progress")
+            print(f"stopped at {c1} out of {len(docs)}")
+            faiss.write_index(store.index, "docs.index")
+            store_index_bak = store.index
+            store.index = None
+            with open("faiss_store.pkl", "wb") as f:
+                pickle.dump(store, f)
+            print("Sleeping for 60 seconds and trying again")
+            time.sleep(60)
+            faiss.write_index(store_index_bak, "docs.index")
+            store.index = store_index_bak
+            store.add_texts([i.page_content], metadatas=[i.metadata])
+        c1 += 1
+
+
+    faiss.write_index(store.index, "docs.index")
+    store.index = None
+    with open("faiss_store.pkl", "wb") as f:
+        pickle.dump(store, f)
+
+def get_user_permission(docs):
+# Function to ask user permission to call the OpenAI api and spend their OpenAI funds.
+    # Here we convert the docs list to a string and calculate the number of OpenAI tokens the string represents.
+    #docs_content = (" ".join(docs))
+    docs_content = ""
+    for doc in docs:
+        docs_content += doc.page_content
+
+
+    tokens, total_price = num_tokens_from_string(string=docs_content, encoding_name="cl100k_base")
+    # Here we print the number of tokens and the approx user cost with some visually appealing formatting.
+    print(f"Number of Tokens = {format(tokens, ',d')}")
+    print(f"Approx Cost = ${format(total_price, ',.2f')}")
+    #Here we check for user permission before calling the API.
+    user_input = input("Price Okay? (Y/N) \n").lower()
+    if user_input == "y":
+        call_openai_api(docs)
+    elif user_input == "":
+        call_openai_api(docs)
+    else:
+        print("The API was not called. No money was spent.")
--- a/scripts/parser/schema/base.py
+++ b/scripts/parser/schema/base.py
@@ -0,0 +1,35 @@
+"""Base schema for readers."""
+from dataclasses import dataclass
+
+from langchain.docstore.document import Document as LCDocument
+
+from parser.schema.schema import BaseDocument
+
+
+@dataclass
+class Document(BaseDocument):
+    """Generic interface for a data document.
+
+    This document connects to data sources.
+
+    """
+
+    def __post_init__(self) -> None:
+        """Post init."""
+        if self.text is None:
+            raise ValueError("text field not set.")
+
+    @classmethod
+    def get_type(cls) -> str:
+        """Get Document type."""
+        return "Document"
+
+    def to_langchain_format(self) -> LCDocument:
+        """Convert struct to LangChain document format."""
+        metadata = self.extra_info or {}
+        return LCDocument(page_content=self.text, metadata=metadata)
+
+    @classmethod
+    def from_langchain_format(cls, doc: LCDocument) -> "Document":
+        """Convert struct from LangChain document format."""
+        return cls(text=doc.page_content, extra_info=doc.metadata)
--- a/scripts/parser/schema/schema.py
+++ b/scripts/parser/schema/schema.py
@@ -0,0 +1,64 @@
+"""Base schema for data structures."""
+from abc import abstractmethod
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional
+
+from dataclasses_json import DataClassJsonMixin
+
+
+@dataclass
+class BaseDocument(DataClassJsonMixin):
+    """Base document.
+
+    Generic abstract interfaces that captures both index structs
+    as well as documents.
+
+    """
+
+    # TODO: consolidate fields from Document/IndexStruct into base class
+    text: Optional[str] = None
+    doc_id: Optional[str] = None
+    embedding: Optional[List[float]] = None
+
+    # extra fields
+    extra_info: Optional[Dict[str, Any]] = None
+
+    @classmethod
+    @abstractmethod
+    def get_type(cls) -> str:
+        """Get Document type."""
+
+    def get_text(self) -> str:
+        """Get text."""
+        if self.text is None:
+            raise ValueError("text field not set.")
+        return self.text
+
+    def get_doc_id(self) -> str:
+        """Get doc_id."""
+        if self.doc_id is None:
+            raise ValueError("doc_id not set.")
+        return self.doc_id
+
+    @property
+    def is_doc_id_none(self) -> bool:
+        """Check if doc_id is None."""
+        return self.doc_id is None
+
+    def get_embedding(self) -> List[float]:
+        """Get embedding.
+
+        Errors if embedding is None.
+
+        """
+        if self.embedding is None:
+            raise ValueError("embedding not set.")
+        return self.embedding
+
+    @property
+    def extra_info_str(self) -> Optional[str]:
+        """Extra info string."""
+        if self.extra_info is None:
+            return None
+
+        return "\n".join([f"{k}: {str(v)}" for k, v in self.extra_info.items()])
				`@@ -0,0 +1 @@`
				`{"name":"","short_name":"","icons":[{"src":"/android-chrome-192x192.png","sizes":"192x192","type":"image/png"},{"src":"/android-chrome-512x512.png","sizes":"512x512","type":"image/png"}],"theme_color":"#ffffff","background_color":"#ffffff","display":"standalone"}`
				`@@ -0,0 +1 @@`
				<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="iconify iconify--logos" width="31.88" height="32" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 257"><defs><linearGradient id="IconifyId1813088fe1fbc01fb466" x1="-.828%" x2="57.636%" y1="7.652%" y2="78.411%"><stop offset="0%" stop-color="#41D1FF"></stop><stop offset="100%" stop-color="#BD34FE"></stop></linearGradient><linearGradient id="IconifyId1813088fe1fbc01fb467" x1="43.376%" x2="50.316%" y1="2.242%" y2="89.03%"><stop offset="0%" stop-color="#FFEA83"></stop><stop offset="8.333%" stop-color="#FFDD35"></stop><stop offset="100%" stop-color="#FFA800"></stop></linearGradient></defs><path fill="url(#IconifyId1813088fe1fbc01fb466)" d="M255.153 37.938L134.897 252.976c-2.483 4.44-8.862 4.466-11.382.048L.875 37.958c-2.746-4.814 1.371-10.646 6.827-9.67l120.385 21.517a6.537 6.537 0 0 0 2.322-.004l117.867-21.483c5.438-.991 9.574 4.796 6.877 9.62Z"></path><path fill="url(#IconifyId1813088fe1fbc01fb467)" d="M185.432.063L96.44 17.501a3.268 3.268 0 0 0-2.634 3.014l-5.474 92.456a3.268 3.268 0 0 0 3.997 3.378l24.777-5.718c2.318-.535 4.413 1.507 3.936 3.838l-7.361 36.047c-.495 2.426 1.782 4.5 4.151 3.78l15.304-4.649c2.372-.72 4.652 1.36 4.15 3.788l-11.698 56.621c-.732 3.542 3.979 5.473 5.943 2.437l1.313-2.028l72.516-144.72c1.215-2.423-.88-5.186-3.54-4.672l-25.505 4.922c-2.396.462-4.435-1.77-3.759-4.114l16.646-57.705c.677-2.35-1.37-4.583-3.769-4.113Z"></path></svg>