展示HN:AutoShorts – 本地、GPU加速的AI视频管道,面向创作者。
Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

原始链接: https://github.com/divyaprakash0426/autoshorts

## AutoShorts:AI驱动的游戏片段生成 AutoShorts 自动从长篇游戏录像中创建适合病毒式传播的短视频。它利用AI场景分析(通过OpenAI或Google Gemini),识别引人入胜的时刻——动作、失误或精彩片段——并智能地裁剪和渲染成垂直视频(9:16 宽高比),优化用于TikTok和Reels等平台。 主要功能包括AI驱动的字幕生成(具有可定制的样式和可选的AI字幕)以及使用ChatterBox TTS的20多种语言的AI配音,具有情感控制甚至语音克隆功能。整个流程均通过GPU加速,利用NVENC进行编码,利用PyTorch进行分析。 强大的备用系统确保即使组件发生故障也能正常运行。安装通过Makefile或Docker简化(需要GPU)。配置选项,详述于`.env`文件中,允许定制AI提供商、分析参数、字幕设置和输出偏好。AutoShorts 优先考虑速度和可靠性,为内容创作者提供自动和可配置的工作流程。

## AutoShorts:本地AI视频流程 开发者divyaprakash0426创建了**AutoShorts**,这是一个本地、GPU加速的AI视频流程,专为寻求替代昂贵且缓慢的云端AI工具的内容创作者设计。AutoShorts专为终端中心化工作流(Arch/Nushell)构建,优先考虑硬件利用率和隐私。 该流程利用**PyTorch & decord**进行场景分析(动作密度和频谱通量),**ChatterBox**进行本地文本转语音,以及**NVENC**进行快速渲染。这避免了API成本和数据隐私问题。 目前项目已Docker化并带有Makefile,欢迎协作。Divyaprakash特别寻求在**智能自动缩放**(使用YOLO/RT-DETR)和**语音引擎升级**(ChatterBoxTurbo或NVIDIA TTS)方面的贡献。也欢迎对流程架构的反馈。 项目GitHub地址:[github.com/divyaprakash0426](github.com/divyaprakash0426)
相关文章

原文

Automatically generate viral-ready vertical short clips from long-form gameplay footage using AI-powered scene analysis, GPU-accelerated rendering, and optional AI voiceovers.

AutoShorts analyzes your gameplay videos to identify the most engaging moments—action sequences, funny fails, or highlight achievements—then automatically crops, renders, and adds subtitles or AI voiceovers to create ready-to-upload short-form content.

Python PyTorch CUDA Docker License: MIT


Here are some shorts automatically generated from gameplay footage:


🎯 AI-Powered Scene Analysis

  • Multi-Provider Support: Choose between OpenAI (GPT-5-mini, GPT-4o) or Google Gemini for scene analysis
  • Semantic Analysis Modes:
    • action — Focus on intense combat/action moments
    • funny — Detect fail compilations and humorous moments
    • highlight — Find memorable achievements and clutch plays
    • mixed — Auto-detect the best category for each clip (recommended)

🎙️ Subtitle Generation

  • Speech Mode: Uses OpenAI Whisper to transcribe voice/commentary
  • AI Captions Mode: AI-generated contextual captions for gameplay without voice
  • Caption Styles: gaming, dramatic, funny, minimal, or auto
  • PyCaps Integration: Multiple visual templates including hype, retro-gaming, neo-minimal
  • AI Enhancement: Semantic tagging and emoji suggestions (e.g., "HEADSHOT! 💀🔥")

🔊 AI Voiceover (ChatterBox TTS)

  • Local TTS Generation: No cloud API needed for voice synthesis
  • Emotion Control: Adjustable emotion/exaggeration levels for English
  • Multilingual Support: 20+ languages including Japanese, Korean, Chinese, Spanish, French, and more
  • Voice Cloning: Optional reference audio for custom voice styles
  • Smart Mixing: Automatic ducking of game audio when voiceover plays

⚡ GPU-Accelerated Pipeline

  • Scene Detection: Custom implementation using decord + PyTorch on GPU
  • Audio Analysis: torchaudio on GPU for fast RMS and spectral flux calculation
  • Video Analysis: GPU streaming via decord for stable motion estimation
  • Image Processing: cupy (CUDA-accelerated NumPy) for blur and transforms
  • Rendering: PyTorch + NVENC hardware encoder for ultra-fast rendering

📐 Smart Video Processing

  • Scenes ranked by combined action score (audio 0.6 + video 0.4 weights)
  • Configurable aspect ratio (default 9:16 for TikTok/Shorts/Reels)
  • Smart cropping with optional blurred background for non-vertical footage
  • Retry logic during rendering to avoid spurious failures

🛡️ Robust Fallback System

AutoShorts is designed to work even when optimal components fail:

Component Primary Fallback
Video Encoding NVENC (GPU) libx264 (CPU)
Subtitle Rendering PyCaps (styled) FFmpeg burn-in (basic)
AI Analysis OpenAI/Gemini API Heuristic scoring (local)
TTS Device CUDA (GPU) CPU inference

  • NVIDIA GPU with CUDA support (RTX series recommended for NVENC + TTS)
  • NVIDIA Drivers compatible with CUDA 12.x
  • Python 3.10
  • FFmpeg 4.4.2 (for Decord compatibility)
  • CUDA Toolkit with nvcc (for building Decord from source)
  • System libraries: libgl1, libglib2.0-0

Option 1: Makefile Installation (Recommended)

The Makefile handles everything automatically—environment creation, dependency installation, and building Decord with CUDA support.

git clone https://github.com/divyaprakash0426/autoshorts.git
cd autoshorts

# Run the installer (uses conda/micromamba automatically)
make install

# Activate the environment
overlay use .venv/bin/activate.nu    # For Nushell
# OR
source .venv/bin/activate            # For Bash/Zsh

The Makefile will:

  1. Download micromamba if conda/mamba is not found
  2. Create a Python 3.10 environment with FFmpeg 4.4.2
  3. Install NV Codec Headers for NVENC support
  4. Build Decord from source with CUDA enabled
  5. Install all pip requirements

Option 2: Docker (GPU Required)

Prerequisite: NVIDIA Container Toolkit must be installed.

# Build the image
docker build -t autoshorts .

# Run with GPU access
docker run --rm \
    --gpus all \
    -v $(pwd)/gameplay:/app/gameplay \
    -v $(pwd)/generated:/app/generated \
    --env-file .env \
    autoshorts

Note: The --gpus all flag is essential for NVENC and CUDA acceleration.


Copy .env.example to .env and configure:

Key Configuration Options

Category Variable Description
AI Provider AI_PROVIDER openai, gemini, or local
AI_ANALYSIS_ENABLED Enable/disable AI scene analysis
OPENAI_MODEL Model for analysis (e.g., gpt-5-mini)
AI_SCORE_WEIGHT How much to weight AI vs heuristic (0.0-1.0)
Semantic Analysis SEMANTIC_GOAL action, funny, highlight, or mixed
CANDIDATE_CLIP_COUNT Number of clips to analyze
Subtitles ENABLE_SUBTITLES Enable subtitle generation
SUBTITLE_MODE speech (Whisper), ai_captions, or none
CAPTION_STYLE gaming, dramatic, funny, minimal, auto
PYCAPS_TEMPLATE Visual template for captions
TTS Voiceover ENABLE_TTS Enable ChatterBox voiceover
TTS_LANGUAGE Language code (e.g., en, ja, es)
TTS_EMOTION_LEVEL Emotion intensity or auto
Video Output TARGET_RATIO_W/H Aspect ratio (default 9:16)
SCENE_LIMIT Max clips per source video
MIN/MAX_SHORT_LENGTH Clip duration bounds (seconds)

See .env.example for the complete list with detailed descriptions.


  1. Place source videos in the gameplay/ directory

  2. Run the script:

  3. Generated clips are saved to generated/

generated/
├── video_name scene-0.mp4          # Rendered short clip
├── video_name scene-0_sub.json     # Subtitle data
├── video_name scene-0.ffmpeg.log   # Render log
├── video_name scene-1.mp4
└── ...

pip install ruff
ruff check .

Tests mock GPU availability and can run in standard CI environments.

For faster iteration during development, you can skip expensive steps using these environment variables in your .env:

Variable Description
DEBUG_SKIP_ANALYSIS=1 Skip AI scene analysis (uses cached/heuristic scores)
DEBUG_SKIP_RENDER=1 Skip video rendering (useful for testing analysis only)
DEBUG_RENDERED_CLIPS="path1:category,path2" Test with specific pre-rendered clips

Example workflow for testing subtitles only:

# In .env
DEBUG_SKIP_ANALYSIS=1
DEBUG_SKIP_RENDER=1
DEBUG_RENDERED_CLIPS="generated/test_clip.mp4:action"

Issue Solution
"CUDA not available" Ensure --gpus all (Docker) or CUDA toolkit is installed
NVENC Error Falls back to libx264 automatically; check GPU driver
PyCaps fails Falls back to FFmpeg burn-in subtitles automatically
Decord EOF hang Increase DECORD_EOF_RETRY_MAX or set DECORD_SKIP_TAIL_FRAMES=300
API rate limits Switch to gpt-5-mini (10M free tokens/day) or use local provider

This project builds upon the excellent work of:


This project is licensed under the MIT License.

"Buy Me A Coffee"

联系我们 contact @ memedata.com