This is a fork of neural-dream, a PyTorch implementation of DeepDream. This fork introduces optical flow estimation and occlusion masking to apply DeepDream to videos with temporal consistency.
- Standard DeepDream: The original single-image implementation.
- Video DeepDream: New CLI (
video_dream.py) that uses RAFT Optical Flow to warp previous dream frames into the current frame, ensuring smooth transitions and object tracking. - Occlusion Masking: Automatically detects when objects move in front of one another to prevent "ghosting" artifacts.
mallard_demo.mp4
highway_demo.mp4
mallard_independent_demo.mp4
highway_independent_demo.mp4
mallard.mp4
highway.mp4
This project requires the following key packages:
- PyTorch
- torchvision
- OpenCV
- NumPy
- Pillow
Install Dependencies:
pip install -r requirements.txtDownload Models: Run the download script to fetch the standard Inception/GoogLeNet models:
python models/download_models.pyTo download all compatible models:
python models/download_models.py -models all-caffe-googlenetTo dream on a video, use the video_dream.py script. This wrapper accepts specific video arguments and any argument accepted by the standard image dreamer (e.g., layers, octaves, iterations).
Basic Video Command:
python video_dream.py -content_video input.mp4 -output_video output.mp4 -num_iterations 1Note: For video processing, we recommend using -num_iterations 1. The temporal consistency from optical flow means each frame builds on the previous dream, so fewer iterations per frame are needed compared to single images.
Video-Specific Arguments:
| Argument | Default | Description |
|---|---|---|
-content_video |
input.mp4 |
Path to the source video file. |
-output_video |
output.mp4 |
Path where the final video will be saved. |
-blend |
0.5 |
(0.0 - 1.0): Mix ratio between the raw video frame and the warped previous dream. Higher values (closer to 1.0) use more of the raw frame; lower values (closer to 0.0) preserve more of the previous hallucinations. |
-independent |
False |
Flag: If set, disables temporal consistency (Optical Flow). Every frame is dreamed on independently (causes flickering). |
-update_interval |
5 |
Updates the output video file on disk every N frames (allows you to preview progress while running). |
-temp_dir |
temp |
Directory to store extracted frames, flow data, and masks during processing. |
-keep_temp |
False |
Flag: If set, the temporary directory is not deleted after processing finishes. |
-verbose |
False |
Flag: Enable detailed logs (prints DeepDream iteration logs for every frame). |
All of the following arguments are from the single frame implementation, and you can mix and match any of these with the video-specific arguments above. Refer to neural-dream for more information on single frame parameters.
Example combining video and standard args:
python video_dream.py -content_video test.mp4 -dream_layers inception_4d -num_iterations 1 -octave_scale 0.7 -image_size 512For single image processing only:
python neural_dream.py -content_image <image.jpg> -dream_layers inception_4c -num_iterations 10Note: Paths to images should not contain the ~ character; use relative or absolute paths.
-image_size: Maximum side length (in pixels) of the generated image. Default is 512.-gpu: Zero-indexed ID of the GPU to use; for CPU mode set-gputoc; for MPS mode (Apple Silicon) set-gputomps.
-dream_weight: How much to weight DeepDream. Default is1e3.-tv_weight: Weight of total-variation (TV) regularization; helps smooth the image. Default0.-l2_weight: Weight of latent state regularization. Default0.-num_iterations: Number of iterations. Default is10. For video, use1(temporal consistency reduces the need for multiple iterations per frame).-init: Initialization method:image(content image) orrandom(noise). Defaultimage.-jitter: Apply jitter to image. Default32.-layer_sigma: Apply gaussian blur to image. Default0(disabled).-optimizer:lbfgsoradam. Defaultadam.-learning_rate: Learning rate (step size). Default1.5.-normalize_weights: Divide dream weights by the number of channels.-loss_mode: Loss mode:bce,mse,mean,norm, orl2. Defaultl2.
-output_image: Name of the output image. Defaultout.png.-output_start_num: Number to start output image names at. Default1.-print_iter: Print progress every N iterations.-save_iter: Save image every N iterations.
-dream_layers: Comma-separated list of layer names to use.-channels: Comma-separated list of channels to use.-channel_mode: Selection mode:all,strong,avg,weak, orignore.-channel_capture:onceoroctave_iter.
-num_octaves: Number of octaves per iteration. Default4.-octave_scale: Resize value. Default0.6.-octave_iter: Iterations (steps) per octave. Default50.-octave_mode:normal,advanced,manual_max,manual_min, ormanual.
-lap_scale: Number of layers in laplacian pyramid. Default0(disabled).-sigma: Strength of gaussian blur in pyramids. Default1.
-zoom: Amount to zoom in.-zoom_mode:percentageorpixel.-tile_size: Desired tile size. Default0(disabled).-overlap_percent: Percentage of overlap for tiles. Default50.
-original_colors: Set to1to keep content image colors.-model_file: Path to.pthfile. Default is VGG-19.-model_type:caffe,pytorch,keras, orauto.-backend:nn,cudnn,openmp, ormkl.-cudnn_autotune: Use built-in cuDNN autotuner (slower start, faster run).
Problem: The program runs out of memory (OOM) Solution:
- Reduce
-image_size(e.g., to 512 or 256). - If using GPU, use
-backend cudnn. - For video: Reduce the input video resolution before processing.
Problem: Video processing is very slow Solution: Video DeepDreaming is computationally expensive. It runs the full DeepDream process per frame, plus Optical Flow calculations.
- Use
-num_iterations 1(recommended for video; temporal consistency means fewer iterations are needed). - Reduce
-octave_iter(e.g., to 10 or 20). - Use a smaller
-image_size.
By default, neural-dream uses the nn backend.
- Use cuDNN:
-backend cudnn(GPU only, reduces memory). - Reduce Size:
-image_size 256(Halves memory usage).
With default settings, standard execution uses ~1.3 GB GPU memory.
You can use multiple devices with -gpu and -multidevice_strategy.
Example: -gpu 0,1,2,3 -multidevice_strategy 3,6,12 splits layers across 4 GPUs. See ProGamerGov/neural-dream for details.