Command Line Options

The minimal demo showed how to use the command line interface to produce a visualization of a video file. While the default options in the command line tool often produce reasonable outputs, is can be useful to modify some of these values. This tutorial introduces some of the most common options and explains how to work with them. As with the minimal demo, the code here assumes that you have installed the dvt toolkit and have the video file video-clip.mp4 in your working directory.

To run the command line tool with the default settings on the video-clip.mp4 file, one can execute the following code in a terminal:

python3 -m dvt video-viz video-clip.mp4

Alternatively, all of the default values can be explicitly set using the following:

python3 -m dvt video-viz video-clip.mp4 \
    --dirout=dvt-output-data \
    --pipeline-level=2 \
    --diff-cutoff=10 \
    --cut-min-length=30 \
    --frequency=0

What does changing these options do? By default, the output data is places in a directory called “dvt-output-data” inside of the current working directory. You can change this in the call above to place the input anywhere on your machine. The pipeline level determines how much output is constructed. Setting this value to 0 produces only a JSON metadata file for the video clip; making it 1 also creates the output images (frames, annotated frames, and visualizations of the optical flow). Finally, setting it to 2 (the default) also creates all of the extra files needed to run a local website to visualize the results.

The last three default options above control how the cuts are determined from the video file. Making the “diff-cutoff” lower causes the algorithm to be more aggressive (it makes more cuts) when determining whether a cut occurs; higher values produce fewer cuts. The minimum cut length determine the minimal length in frames that a cut can occur for. Finally, the frequency value provides a different way of determining which frames to annotate. When set to a positive integer, the command line tool will forgo determining cut breaks and simply extract one out of every “frequency” frames. The reason for including so many options for cut detection is that, while the defaults work reasonably well for recent, high-definition films and scripted television, they can be quite unreliable when working with other sources. Manipulating the cut-off scores (or, if that fails, just setting a frequency) allows users to still make use of the command line interface.

Three final option in the command line tool allow users to pass additional data to the pipeline: known faces, an audio track, and subtitles. Adding audio and/or subtitle data just requires passing an additional path to the interface. Assuming that we have a file video-clip.wav with audio data and a file video-clip.srt with subtitle data, the following runs the pipeline with just these inputs:

python3 -m dvt video-viz video-clip.mp4
–dirout=dvt-output-data –pipeline-level=2 –diff-cutoff=10 –cut-min-length=30 –frequency=0 –path-to-audio=video-clip.wav –path-to-subtitle=video-clip.srt

Note that the pipeline currently only support wav files (audio) and srt files (subtitles) as inputs. It is possible to only specify the audio file or to only specify the subtitle file, but the pipeline does always require a video input file.

In order to include face recognition, create a directory on your machine with one image per person that you would like to detect. Name the files with the desired name of the person; for example, if you want to detect images of Oprah Winfrey, you may add an image titled “oprah-winfrey.png”. Then, assuming your images are in a file in the working directory called “face-images”, you can include these in the pipeline as follows:

python3 -m dvt video-viz video-clip.mp4 \
    --dirout=dvt-output-data \
    --pipeline-level=2 \
    --diff-cutoff=10 \
    --cut-min-length=30 \
    --frequency=0 \
    --path-to-faces=face-images

Note that the annotation process will take slightly longer when detecting faces, but you will (potentially) have more rich data included in the output. The face detection option can be combined with the audio and subtitles when available.