Alex
|
f61d112cea
|
feat: process pdfs synthetically im model does not support file natively (#2263)
* feat: process pdfs synthetically im model does not support file natively
* fix: small code optimisations
|
2026-01-15 02:30:33 +02:00 |
|
Alex
|
2c55c6cd9a
|
fix: tiktoken import in markdown parser
|
2026-01-12 23:04:20 +00:00 |
|
Alex
|
df57053613
|
feat: improve crawlers and update chunk filtering (#2250)
|
2026-01-06 00:52:12 +02:00 |
|
Alex
|
05c835ed02
|
feat: enable OCR for docling when parsing attachments and update file extractor (#2246)
|
2025-12-31 02:08:49 +02:00 |
|
Alex
|
9e7f1ad1c0
|
Add Amazon S3 support and synchronization features (#2244)
* Add Amazon S3 support and synchronization features
* refactor: remove unused variable in load_data test
|
2025-12-30 20:26:51 +02:00 |
|
Alex
|
aef3e0b4bb
|
chore: update workflow permissions and fix paths in settings (#2227)
* chore: update workflow permissions and fix paths in settings
* dep
* dep upgraes
|
2025-12-25 14:26:01 +02:00 |
|
Alex
|
98e949d2fd
|
Patches (#2218)
* feat: implement URL validation to prevent SSRF
* feat: add zip extraction security
* ruff fixes
|
2025-12-24 17:05:35 +02:00 |
|
Alex
|
ccd29b7d4e
|
feat: implement Docling parsers (#2202)
* feat: implement Docling parsers
* fix office
* docling-ocr-fix
* Docling smart ocr
* ruff fix
---------
Co-authored-by: Pavel <pabin@yandex.ru>
|
2025-12-23 18:33:51 +02:00 |
|
Alex
|
e0a9f08632
|
refactor and deps (#2184)
|
2025-12-10 23:53:59 +02:00 |
|
Harshit Ranjan
|
695191d888
|
added error saving vector store (#2081)
* added error saving vector store
* fixed code formating
* added tests for embedding pipeline
|
2025-10-31 16:29:35 +02:00 |
|
Alex
|
03452ffd9f
|
feat: add GitHub access token support and fix file content fetching logic (#2032)
|
2025-10-07 16:53:14 +03:00 |
|
ManishMadan2882
|
2d0e97b66d
|
(feat:oauth) provider as state, not args
|
2025-09-25 04:55:25 +05:30 |
|
ManishMadan2882
|
d317f6473d
|
(feat:gdrive) upload files only
|
2025-09-22 20:19:56 +05:30 |
|
ManishMadan2882
|
da2f8477e6
|
(feat:drive) oauth for drive.file scope, picker
|
2025-09-17 19:37:01 +05:30 |
|
ManishMadan2882
|
7896526f19
|
(feat:load_files) search feature
|
2025-09-05 10:35:23 +05:30 |
|
ManishMadan2882
|
f7f6042579
|
(feat:connector) paginate files
|
2025-09-04 07:58:12 +05:30 |
|
ManishMadan2882
|
7e2cbdd88c
|
(feat:connector) redirect url as backend overhead
|
2025-09-03 09:57:13 +05:30 |
|
ManishMadan2882
|
f9b2c95695
|
(feat:connector) sync, simply re-ingest
|
2025-09-02 18:06:04 +05:30 |
|
ManishMadan2882
|
384ad3e0ac
|
(feat:connector) raw sync flow
|
2025-09-02 13:34:31 +05:30 |
|
ManishMadan2882
|
f39ac9945f
|
(feat:auth) follow connector-session
|
2025-08-28 00:53:19 +05:30 |
|
GH Action - Upstream Sync
|
f08067a161
|
Merge branch 'main' of https://github.com/arc53/DocsGPT
|
2025-08-27 01:36:38 +00:00 |
|
Alex
|
545caacfa3
|
feat: prevent NUL character ingestion failures
|
2025-08-26 23:30:57 +01:00 |
|
ManishMadan2882
|
578c68205a
|
(feat:connectors) abstracting auth, base class
|
2025-08-26 02:46:36 +05:30 |
|
ManishMadan2882
|
f09f1433a9
|
(feat:connectors) separate layer
|
2025-08-26 01:38:36 +05:30 |
|
ManishMadan2882
|
2410bd8654
|
(fix:driveLoader) folder ingesting
|
2025-08-22 19:07:52 +05:30 |
|
ManishMadan2882
|
92d6ae54c3
|
(fix:google-oauth) no explicit datetime compare
|
2025-08-22 13:35:03 +05:30 |
|
ManishMadan2882
|
8c3f75e3e2
|
(feat:ingestion) google drive loader
|
2025-08-22 13:32:40 +05:30 |
|
ManishMadan2882
|
b2b04268e9
|
(feat:drive) oauth flow
|
2025-08-21 02:46:32 +05:30 |
|
GH Action - Upstream Sync
|
9903fad1e9
|
Merge branch 'main' of https://github.com/arc53/DocsGPT
|
2025-08-07 01:55:18 +00:00 |
|
Alex
|
9281fac898
|
fix: improve error logging for index creation and add PARSE_IMAGE_REMOTE setting
|
2025-08-06 10:40:20 +01:00 |
|
ManishMadan2882
|
ba260e3382
|
(fix:faiss) not save tmp dir
|
2025-08-06 02:53:39 +05:30 |
|
ManishMadan2882
|
1356d71839
|
(lint) ruff fix
|
2025-08-05 15:37:39 +05:30 |
|
ManishMadan2882
|
a61e44d175
|
(feat:dir_tree) improvement
|
2025-08-02 01:48:43 +05:30 |
|
ManishMadan2882
|
c92d778894
|
(feat:chunker) do not combine text
|
2025-07-31 02:13:55 +05:30 |
|
ManishMadan2882
|
bbce872ac5
|
(fix:chunker) combine metadata as well
|
2025-07-04 02:19:58 +05:30 |
|
ManishMadan2882
|
0f7ebcd8e4
|
(feat:dir-reader) store mime types, file size in db
|
2025-07-03 18:09:19 +05:30 |
|
ManishMadan2882
|
82fc19e7b7
|
(fix:dir-reader) conflict of same filename in dir
|
2025-07-03 17:28:12 +05:30 |
|
ManishMadan2882
|
2ef23fe1b3
|
(feat:dir-reader) maintain dir structure in db
|
2025-07-03 01:24:22 +05:30 |
|
ManishMadan2882
|
fd905b1a06
|
(feat:dir-reader) save tokens with filenames
|
2025-07-02 16:30:29 +05:30 |
|
ManishMadan2882
|
e1aa2cc0b8
|
(fix:ingestion) store file name as metadata, not path
|
2025-05-09 02:26:35 +05:30 |
|
Alex
|
481df4d604
|
fix: enhance error logging with exception info across multiple modules
|
2025-05-05 13:12:39 +01:00 |
|
Pavel
|
57a6fb31b2
|
periodic header injection
|
2025-03-31 22:28:04 +04:00 |
|
asminkarki012
|
c70be12bfd
|
fix[csv_parser]:missing header
|
2025-03-28 22:46:11 +05:45 |
|
Alex
|
d47232246a
|
fix: remove old pypdf
|
2025-02-06 19:59:42 +00:00 |
|
Pavel
|
fddee69f92
|
web loader fix
Changes web loader to the correct output.
|
2025-01-17 19:13:23 +03:00 |
|
Pavel
|
13fcbe3e74
|
scraper with markdownify
|
2025-01-15 01:08:09 +03:00 |
|
Alex
|
41b4c28430
|
fix: linting
|
2024-12-23 17:41:44 +00:00 |
|
Pavel
|
b41a989051
|
test version
|
2024-12-23 16:59:27 +00:00 |
|
GH Action - Upstream Sync
|
628f83172a
|
Merge branch 'main' of https://github.com/arc53/DocsGPT
|
2024-11-22 01:25:17 +00:00 |
|
Alex
|
a0a05b676f
|
Merge pull request #1303 from jayantp2003/bugfix/859-large-zip-breaking-stream-endpoint
Bugfix/859 large zip breaking stream endpoint
|
2024-11-21 17:34:21 +00:00 |
|