展示HN：具有时间一致性的视频DeepDream

展示HN：具有时间一致性的视频DeepDream
Show HN: DeepDream for Video with Temporal Consistency

原始链接: https://github.com/jeremicna/deepdream-video-pytorch

本项目是DeepDream的PyTorch实现，增强了视频处理能力。它基于原始的neural-dream，增加了**光流估计**和**遮挡掩码**，以创建时间上一致且流畅的DeepDream视频。核心功能位于`video_dream.py`，它使用RAFT光流将先前视频片段的梦境帧扭曲到当前帧中，从而最大限度地减少闪烁并跟踪对象。遮挡掩码可以防止对象重叠时出现“重影”。主要特性包括原始视频和梦境帧之间可调节的混合、禁用时间一致性以进行独立帧处理的选项，以及对临时文件管理的控制。它支持标准的DeepDream参数，如层选择、八度音阶和迭代次数——由于时间一致性，视频推荐迭代次数为1。安装需要PyTorch、torchvision、OpenCV、NumPy和Pillow（通过`pip install -r requirements.txt`安装）。模型（如Inception/GoogLeNet）需要使用`python models/download_models.py`单独下载。该项目提供了解决内存问题和处理速度慢的方案，建议减少图像大小、优化后端（cudnn）和减少迭代次数。

## 视频的DeepDream与时间一致性 - 摘要一名开发者fork了一个PyTorch的DeepDream实现，并增加了视频支持，从而产生了更流畅、闪烁更少的DeepDream视频。该项目具有光流功能，以确保帧间幻觉的一致性，并使用遮挡掩码来防止重影。它兼容GPU、CPU和Apple Silicon，并支持多种预训练图像分类器，如GoogLeNet。 Hacker News的讨论引发了关于这项技术潜力和艺术价值的争论。一些人将视觉效果比作迷幻体验，而另一些人则觉得它们令人不安。一篇长篇评论详细描述了一位电影制作人多年来尝试AI工具的历程，从早期的DeepDream尝试到目前使用Stable Diffusion和Luma Dream Machine的项目，突出了行业同行的兴奋和最初的怀疑。对话还涉及了AI在电影制作中的未来——它是否会 democratize 高质量制作，或者仅仅会导致风格上的同质化。人们对AI可能贬低艺术意图以及模仿与真正创意愿景之间的区别表示担忧。一些用户将Corridor Crew作为拥抱AI的专业人士的例子，而另一些人则批评他们在充当“反应者”而非VFX艺术家。

原文

This is a fork of neural-dream, a PyTorch implementation of DeepDream. This fork introduces optical flow estimation and occlusion masking to apply DeepDream to videos with temporal consistency.

Standard DeepDream: The original single-image implementation.
Video DeepDream: New CLI (video_dream.py) that uses RAFT Optical Flow to warp previous dream frames into the current frame, ensuring smooth transitions and object tracking.
Occlusion Masking: Automatically detects when objects move in front of one another to prevent "ghosting" artifacts.

With temporal consistency

mallard_demo.mp4

highway_demo.mp4

With frames processed independently

mallard_independent_demo.mp4

highway_independent_demo.mp4

mallard.mp4

highway.mp4

This project requires the following key packages:

PyTorch
torchvision
OpenCV
NumPy
Pillow

Install Dependencies:

pip install -r requirements.txt

Download Models: Run the download script to fetch the standard Inception/GoogLeNet models:

python models/download_models.py

To download all compatible models:

python models/download_models.py -models all-caffe-googlenet

To dream on a video, use the video_dream.py script. This wrapper accepts specific video arguments and any argument accepted by the standard image dreamer (e.g., layers, octaves, iterations).

Basic Video Command:

python video_dream.py -content_video input.mp4 -output_video output.mp4 -num_iterations 1

Note: For video processing, we recommend using -num_iterations 1. The temporal consistency from optical flow means each frame builds on the previous dream, so fewer iterations per frame are needed compared to single images.

Video-Specific Arguments:

Argument	Default	Description
`-content_video`	`input.mp4`	Path to the source video file.
`-output_video`	`output.mp4`	Path where the final video will be saved.
`-blend`	`0.5`	(0.0 - 1.0): Mix ratio between the raw video frame and the warped previous dream. Higher values (closer to 1.0) use more of the raw frame; lower values (closer to 0.0) preserve more of the previous hallucinations.
`-independent`	`False`	Flag: If set, disables temporal consistency (Optical Flow). Every frame is dreamed on independently (causes flickering).
`-update_interval`	`5`	Updates the output video file on disk every N frames (allows you to preview progress while running).
`-temp_dir`	`temp`	Directory to store extracted frames, flow data, and masks during processing.
`-keep_temp`	`False`	Flag: If set, the temporary directory is not deleted after processing finishes.
`-verbose`	`False`	Flag: Enable detailed logs (prints DeepDream iteration logs for every frame).

2. Standard DeepDream Arguments

All of the following arguments are from the single frame implementation, and you can mix and match any of these with the video-specific arguments above. Refer to neural-dream for more information on single frame parameters.

Example combining video and standard args:

python video_dream.py -content_video test.mp4 -dream_layers inception_4d -num_iterations 1 -octave_scale 0.7 -image_size 512

For single image processing only:

python neural_dream.py -content_image <image.jpg> -dream_layers inception_4c -num_iterations 10

Note: Paths to images should not contain the ~ character; use relative or absolute paths.

-image_size: Maximum side length (in pixels) of the generated image. Default is 512.
-gpu: Zero-indexed ID of the GPU to use; for CPU mode set -gpu to c; for MPS mode (Apple Silicon) set -gpu to mps.

-dream_weight: How much to weight DeepDream. Default is 1e3.
-tv_weight: Weight of total-variation (TV) regularization; helps smooth the image. Default 0.
-l2_weight: Weight of latent state regularization. Default 0.
-num_iterations: Number of iterations. Default is 10. For video, use 1 (temporal consistency reduces the need for multiple iterations per frame).
-init: Initialization method: image (content image) or random (noise). Default image.
-jitter: Apply jitter to image. Default 32.
-layer_sigma: Apply gaussian blur to image. Default 0 (disabled).
-optimizer: lbfgs or adam. Default adam.
-learning_rate: Learning rate (step size). Default 1.5.
-normalize_weights: Divide dream weights by the number of channels.
-loss_mode: Loss mode: bce, mse, mean, norm, or l2. Default l2.

-output_image: Name of the output image. Default out.png.
-output_start_num: Number to start output image names at. Default 1.
-print_iter: Print progress every N iterations.
-save_iter: Save image every N iterations.

-dream_layers: Comma-separated list of layer names to use.
-channels: Comma-separated list of channels to use.
-channel_mode: Selection mode: all, strong, avg, weak, or ignore.
-channel_capture: once or octave_iter.

-num_octaves: Number of octaves per iteration. Default 4.
-octave_scale: Resize value. Default 0.6.
-octave_iter: Iterations (steps) per octave. Default 50.
-octave_mode: normal, advanced, manual_max, manual_min, or manual.

Laplacian Pyramid Options

-lap_scale: Number of layers in laplacian pyramid. Default 0 (disabled).
-sigma: Strength of gaussian blur in pyramids. Default 1.

-zoom: Amount to zoom in.
-zoom_mode: percentage or pixel.
-tile_size: Desired tile size. Default 0 (disabled).
-overlap_percent: Percentage of overlap for tiles. Default 50.

-original_colors: Set to 1 to keep content image colors.
-model_file: Path to .pth file. Default is VGG-19.
-model_type: caffe, pytorch, keras, or auto.
-backend: nn, cudnn, openmp, or mkl.
-cudnn_autotune: Use built-in cuDNN autotuner (slower start, faster run).

Frequently Asked Questions

Problem: The program runs out of memory (OOM) Solution:

Reduce -image_size (e.g., to 512 or 256).
If using GPU, use -backend cudnn.
For video: Reduce the input video resolution before processing.

Problem: Video processing is very slow Solution: Video DeepDreaming is computationally expensive. It runs the full DeepDream process per frame, plus Optical Flow calculations.

Use -num_iterations 1 (recommended for video; temporal consistency means fewer iterations are needed).
Reduce -octave_iter (e.g., to 10 or 20).
Use a smaller -image_size.

By default, neural-dream uses the nn backend.

Use cuDNN: -backend cudnn (GPU only, reduces memory).
Reduce Size: -image_size 256 (Halves memory usage).

With default settings, standard execution uses ~1.3 GB GPU memory.

You can use multiple devices with -gpu and -multidevice_strategy. Example: -gpu 0,1,2,3 -multidevice_strategy 3,6,12 splits layers across 4 GPUs. See ProGamerGov/neural-dream for details.