14 Commits

Author SHA1 Message Date
Alex
2c55c6cd9a fix: tiktoken import in markdown parser 2026-01-12 23:04:20 +00:00
Alex
df57053613 feat: improve crawlers and update chunk filtering (#2250) 2026-01-06 00:52:12 +02:00
Alex
9e7f1ad1c0 Add Amazon S3 support and synchronization features (#2244)
* Add Amazon S3 support and synchronization features

* refactor: remove unused variable in load_data test
2025-12-30 20:26:51 +02:00
Alex
98e949d2fd Patches (#2218)
* feat: implement URL validation to prevent SSRF

* feat: add zip extraction security

* ruff fixes
2025-12-24 17:05:35 +02:00
Alex
e0a9f08632 refactor and deps (#2184) 2025-12-10 23:53:59 +02:00
Harshit Ranjan
695191d888 added error saving vector store (#2081)
* added error saving vector store

* fixed code formating

* added tests for embedding pipeline
2025-10-31 16:29:35 +02:00
Alex
03452ffd9f feat: add GitHub access token support and fix file content fetching logic (#2032) 2025-10-07 16:53:14 +03:00
ManishMadan2882
7d8ed2d102 ruff-fix 2025-10-01 02:23:53 +05:30
ManishMadan2882
402d5e054b (fix:test_markdown) patch tokenizer 2025-09-30 13:35:03 +05:30
ManishMadan2882
e24a0ac686 (test:parsers) github, reddit 2025-09-29 20:33:05 +05:30
ManishMadan2882
8c91b1c527 (tests:parsers) remote 2025-09-29 19:39:24 +05:30
ManishMadan2882
2b38f80d04 (test:files) epub, image, rst 2025-09-29 17:39:20 +05:30
ManishMadan2882
282bd35f52 (test:files) html, md, json 2025-09-29 17:23:15 +05:30
ManishMadan2882
ba496a772b (test) doc parsers coverage 2025-09-26 16:07:12 +05:30