Gemini 现在可以原生嵌入视频,所以我构建了亚秒级视频搜索。
Show HN: Gemini can now natively embed video, so I built sub-second video search

原始链接: https://github.com/ssrajadh/sentrysearch

## SentrySearch:用于行车记录仪画面的语义搜索 SentrySearch 能够使用自然语言快速搜索行车记录仪视频。它的工作原理是将视频分割成片段,使用 Google 的 Gemini Embedding 模型将每个片段嵌入为视频数据,并将这些嵌入存储在本地 ChromaDB 数据库中。 用户只需输入查询(例如“红卡车闯红灯”),查询也会被嵌入,然后与视频嵌入进行匹配。最相关的片段会自动剪辑并保存为剪辑。 **主要特点:** * **直接视频嵌入:** 无需转录或字幕 – Gemini 直接处理视频像素。 * **成本优化:** 预处理(降至 480p/5fps)和静帧跳过可降低 API 成本(索引 1 小时约 2.50 美元)。 * **易于设置:** 克隆 GitHub 仓库 ([https://github.com/ssrajadh/sentrysearch](https://github.com/ssrajadh/sentrysearch)),安装依赖项,并提供 Gemini API 密钥。 * **可定制:** 可以调整片段时长、重叠和预处理。 目前处于预览阶段,SentrySearch 支持 MP4 视频,并依赖启发式方法进行静帧检测。未来的改进旨在实现更智能的片段划分,并解决潜在的 API 变更。

## Gemini 与亚秒级视频搜索:摘要 一位开发者(sohamrj)构建了一个命令行工具,利用 Google 的 Gemini Embedding 2 实现对视频素材的亚秒级搜索。Gemini 现在可以直接将原始视频转换为 768 维向量,绕过传统的转录或帧字幕等方法。这使得自然语言查询(例如“绿车超车”)可以直接与视频内容进行比较。 该工具将素材索引到 ChromaDB,然后使用自然语言搜索自动剪辑匹配的片段。初步测试,使用行车记录仪素材,显示索引成本约为每小时 2.50 美元,通过跳过静态帧可以降低成本。 讨论强调了其潜力与担忧。虽然为安全、行车记录仪回顾以及潜在的内容审核等应用提供了强大的搜索能力,但该技术也引发了重大的隐私问题。担忧集中在潜在的广泛监控上,人工智能不断索引和分析视频流,可能因特定个人或活动而触发。开发者承认这些担忧,并希望通过开源、本地模型来解决隐私问题。Gemini 的替代方案也在探索中。
相关文章

原文

Semantic search over dashcam footage. Type what you're looking for, get a trimmed clip back.

ClawHub Skill

demo.mp4

SentrySearch splits your dashcam videos into overlapping chunks, embeds each chunk directly as video using Google's Gemini Embedding model, and stores the vectors in a local ChromaDB database. When you search, your text query is embedded into the same vector space and matched against the stored video embeddings. The top match is automatically trimmed from the original file and saved as a clip.

  1. Clone and install:
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .
  1. Set up your API key:

This prompts for your Gemini API key, writes it to .env, and validates it with a test embedding.

  1. Index your footage:
sentrysearch index /path/to/dashcam/footage
  1. Search:
sentrysearch search "red truck running a stop sign"

ffmpeg is required for video chunking and trimming. If you don't have it system-wide, the bundled imageio-ffmpeg is used automatically.

Manual setup: If you prefer not to use sentrysearch init, you can copy .env.example to .env and add your key from aistudio.google.com/apikey manually.

$ sentrysearch init
Enter your Gemini API key (get one at https://aistudio.google.com/apikey): ****
Validating API key...
Setup complete. You're ready to go — run `sentrysearch index <directory>` to get started.

If a key is already configured, you'll be asked whether to overwrite it.

$ sentrysearch index /path/to/dashcam/footage
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 1/4]
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 2/4]
...
Indexed 12 new chunks from 3 files. Total: 12 chunks from 3 files.

Options:

  • --chunk-duration 30 — seconds per chunk
  • --overlap 5 — overlap between chunks
  • --no-preprocess — skip downscaling/frame rate reduction (send raw chunks)
  • --target-resolution 480 — target height in pixels for preprocessing
  • --target-fps 5 — target frame rate for preprocessing
  • --no-skip-still — embed all chunks, even ones with no visual change
$ sentrysearch search "red truck running a stop sign"
  #1 [0.87] front_2024-01-15_14-30.mp4 @ 02:15-02:45
  #2 [0.74] left_2024-01-15_14-30.mp4 @ 02:10-02:40
  #3 [0.61] front_2024-01-20_09-15.mp4 @ 00:30-01:00

Saved clip: ./match_front_2024-01-15_14-30_02m15s-02m45s.mp4

Options: --results N, --output-dir DIR, --no-trim to skip auto-trimming.

$ sentrysearch stats
Total chunks:  47
Source files:  12

Add --verbose to either command for debug info (embedding dimensions, API response times, similarity scores).

Gemini Embedding 2 can natively embed video — raw video pixels are projected into the same 768-dimensional vector space as text queries. There's no transcription, no frame captioning, no text middleman. A text query like "red truck at a stop sign" is directly comparable to a 30-second video clip at the vector level. This is what makes sub-second semantic search over hours of footage practical.

Indexing 1 hour of footage costs ~$2.50 with Gemini's embedding API (default settings: 30s chunks, 5s overlap). The API bills by video duration, so this cost is driven by the number of chunks, not file size.

Two built-in optimizations help reduce costs in different ways:

  • Preprocessing (on by default) — chunks are downscaled to 480p at 5fps before embedding. This reduces upload size and token count but does not reduce the number of API calls, so it primarily improves speed rather than cost.
  • Still-frame skipping (on by default) — chunks with no meaningful visual change (e.g. a parked car) are skipped entirely. This saves real API calls and directly reduces cost. The savings depend on your footage — Sentry Mode recordings with hours of idle time benefit the most, while action-packed driving footage may have nothing to skip.

Search queries are negligible (text embedding only).

Tuning options:

  • --chunk-duration / --overlap — longer chunks with less overlap = fewer API calls = lower cost
  • --no-skip-still — embed every chunk even if nothing is happening
  • --target-resolution / --target-fps — adjust preprocessing quality
  • --no-preprocess — send raw chunks to the API

Limitations & Future Work

  • Still-frame detection is heuristic — it uses JPEG file size comparison across sampled frames. It may occasionally skip chunks with subtle motion or embed chunks that are truly static. Disable with --no-skip-still if you need every chunk indexed.
  • Search quality depends on chunk boundaries — if an event spans two chunks, the overlapping window helps but isn't perfect. Smarter chunking (e.g. scene detection) could improve this.
  • Gemini Embedding 2 is in preview — API behavior and pricing may change.

This works with any footage in mp4 format, not just Tesla Sentry Mode. The directory scanner recursively finds all .mp4 files regardless of folder structure.

  • Python 3.10+
  • ffmpeg on PATH, or use bundled ffmpeg via imageio-ffmpeg (installed by default)
  • Gemini API key (get one free)
联系我们 contact @ memedata.com