skills: add video-quote-finder with timestamp links

2026-03-07 22:44:16 +00:00 · 2026-02-16 22:39:09 +02:00
parent 89ce1460e1
commit 61726a2fbd
4 changed files with 153 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -262,6 +262,14 @@ ClawHub is a minimal skill registry. With ClawHub enabled, the agent can search

 [ClawHub](https://clawhub.com)

+### Example skill: video-quote-finder
+
+Use `skills/video-quote-finder` to locate where a quote appears in a YouTube video and return timestamp links.
+
+Original prompt:
+
+> "OK I want to make a PR with this skill back to openclaw... make a new one that will search for a point on the video and give me the timestamp. For example I want to find the timestamp in this video where peter says 'I think vibe coding is a slur' https://youtu.be/YFjfBk8HI5o?si=DTT2nVt0HQ4dSIoV"
+
 ## Chat commands

 Send these in WhatsApp/Telegram/Slack/Google Chat/Microsoft Teams/WebChat (group commands are owner-only):
--- a/skills/video-quote-finder/SKILL.md
+++ b/skills/video-quote-finder/SKILL.md
@@ -0,0 +1,35 @@
+---
+name: video-quote-finder
+description: Find where a quote appears in a YouTube video and return timestamped links. Use when users ask "where in this video does X say Y", "find the timestamp for this line", or "locate quote in this YouTube video".
+---
+
+# Video Quote Finder
+
+Find quote timestamps in YouTube videos using the `summarize` CLI transcript extraction with timestamps.
+
+## Quick start
+
+```bash
+python3 skills/video-quote-finder/scripts/find_quote_timestamp.py \
+  "https://youtu.be/YFjfBk8HI5o" \
+  "I think vibe coding is a slur"
+```
+
+## Workflow
+
+1. Extract transcript with timestamps via `summarize --extract --timestamps`.
+2. Score transcript lines against the requested quote.
+3. Return best match + top alternatives.
+4. Include direct YouTube links with `t=<seconds>`.
+
+## Output format
+
+- `best_match` timestamp + line + score
+- `best_link` with timestamp
+- up to 5 candidate timestamps with links
+
+## Notes
+
+- Requires `summarize` CLI (`@steipete/summarize`) in PATH.
+- Works best when YouTube captions are available.
+- If no exact match is found, uses fuzzy matching and suggests alternatives.
--- a/skills/video-quote-finder/references/usage.md
+++ b/skills/video-quote-finder/references/usage.md
@@ -0,0 +1,15 @@
+# Usage
+
+## Find timestamp for a quote
+
+```bash
+python3 skills/video-quote-finder/scripts/find_quote_timestamp.py \
+  "https://youtu.be/YFjfBk8HI5o?si=DTT2nVt0HQ4dSIoV" \
+  "I think vibe coding is a slur"
+```
+
+## Tips
+
+- Start with exact quote text.
+- If no match, use a distinctive 3-8 word fragment.
+- Prefer phrase fragments unlikely to repeat frequently.
--- a/skills/video-quote-finder/scripts/find_quote_timestamp.py
+++ b/skills/video-quote-finder/scripts/find_quote_timestamp.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+import argparse
+import re
+import subprocess
+import sys
+from difflib import SequenceMatcher
+
+TS_LINE = re.compile(r"^\[(\d{1,2}:\d{2}(?::\d{2})?)\]\s*(.*)$")
+
+
+def ts_to_seconds(ts: str) -> int:
+    parts = [int(x) for x in ts.split(':')]
+    if len(parts) == 2:
+        m, s = parts
+        return m * 60 + s
+    h, m, s = parts
+    return h * 3600 + m * 60 + s
+
+
+def with_timestamp_url(url: str, ts: str) -> str:
+    sec = ts_to_seconds(ts)
+    joiner = '&' if '?' in url else '?'
+    return f"{url}{joiner}t={sec}s"
+
+
+def run_extract(url: str) -> str:
+    cmd = ["summarize", url, "--extract", "--timestamps"]
+    p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
+    if p.returncode != 0:
+        raise RuntimeError(p.stderr.strip() or "summarize failed")
+    return p.stdout
+
+
+def normalize(s: str) -> str:
+    return re.sub(r"\s+", " ", s.lower()).strip()
+
+
+def score(quote: str, line: str) -> float:
+    q = normalize(quote)
+    l = normalize(line)
+    if not q or not l:
+        return 0.0
+    if q in l:
+        return 1.0
+
+    q_words = set(q.split())
+    l_words = set(l.split())
+    overlap = len(q_words & l_words) / max(1, len(q_words))
+    ratio = SequenceMatcher(None, q, l).ratio()
+    return 0.6 * overlap + 0.4 * ratio
+
+
+def find_matches(text: str, quote: str):
+    matches = []
+    for line in text.splitlines():
+        m = TS_LINE.match(line)
+        if not m:
+            continue
+        ts, body = m.group(1), m.group(2)
+        s = score(quote, body)
+        if s >= 0.35:
+            matches.append((s, ts, body))
+    matches.sort(key=lambda x: x[0], reverse=True)
+    return matches[:5]
+
+
+def main():
+    ap = argparse.ArgumentParser(description="Find quote timestamp in YouTube transcript")
+    ap.add_argument("url")
+    ap.add_argument("quote")
+    args = ap.parse_args()
+
+    try:
+        text = run_extract(args.url)
+        matches = find_matches(text, args.quote)
+    except Exception as e:
+        print(f"ERROR: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if not matches:
+        print("No matches found. Try a shorter quote fragment.")
+        sys.exit(2)
+
+    best = matches[0]
+    best_link = with_timestamp_url(args.url, best[1])
+    print(f"best_match: [{best[1]}] score={best[0]:.2f} :: {best[2]}")
+    print(f"best_link: {best_link}")
+    print("candidates:")
+    for s, ts, body in matches:
+        print(f"- [{ts}] score={s:.2f} :: {body}")
+        print(f"  link: {with_timestamp_url(args.url, ts)}")
+
+
+if __name__ == "__main__":
+    main()