docs: add missing feature docs for Matrix E2EE thumbnails, LINE media, and CJK memory

- Matrix: note encrypted thumbnail behavior in E2EE rooms (#54711)
- LINE: add outbound media section for image/video/audio sends (#45826)
- Memory: document CJK trigram tokenization and chunk sizing
This commit is contained in:
Vincent Koc
2026-03-29 17:04:33 +09:00
parent 3aac43e30b
commit f897aba69a
3 changed files with 13 additions and 0 deletions

View File

@@ -422,6 +422,7 @@ Notes:
- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
- If FTS5 can't be created, we keep vector-only search (no hard failure).
- **CJK support**: FTS5 uses configurable trigram tokenization with a short-substring fallback so Chinese, Japanese, and Korean text is searchable without breaking mixed-length queries. CJK-heavy text is also weighted correctly during chunk size estimation, and surrogate-pair characters are preserved during fine splits.
This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes.
If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization