Published using Google Docs
RV Software Guide
Updated automatically every 5 minutes

AusOcean RV Software Guide

Date Revised: 21/03/2020

Author(s):

Saxon Nelson-Milton <saxon@ausocean.org>

Copyright

Copyright © The Australian Ocean Laboratory Limited (AusOcean) 2020 .

The information contained herein is licensed under a Creative Commons Attribution 3.0 Australia License.

1. Introduction        4

2. Revid Pipeline        4

2.1 Inputs        4

2.1.1 Raspivid        5

2.1.2 RTSP        5

2.1.3 File        5

2.1.4 Audio        5

2.1.5 V4L        5

2.2 Lexers/Extractors        5

2.3 Filter        6

2.3.1 NoOp        6

2.3.2 Basic        6

Supported variables        7

2.3.3 Diff        8

Supported variables        8

2.3.4 MOG        9

Supported variables        9

2.3.5 KNN        10

Supported variables        10

2.3.6 VFPS        10

Supported variables        10

2.4 Encoders/Packetisers        11

2.4.1 MPEG-TS        11

2.4.2 FLV        11

2.5 Output        11

2. Variables        12

3. Common AusOcean Configurations        12

3.1 YouTube Streaming        12

3.2 Motion Triggered Video Data Capture        12

3.3 Survey        12

1.0 Introduction

RV is AusOcean’s media handling software that’s run on underwater cameras and hydrophones. It couples the netsender and revid APIs to create a remotely configurable “muxer” that can take audio or video input from a number of devices, combine it with useful metadata in a packet format, and then output to a destination of our choosing. RV has been written in golang, aiming to be lightweight and easily maintainable with low cost hardware in mind, such as the Raspberry Pi.

This document provides an overview of the RV software’s pipeline and provides information on its configurable parameters and their usage in common configurations.

2.0 Revid Pipeline

The revid pipeline consists of an input interface from which media is obtained, such as a camera or hydrophone, a lexer/extractor to obtain discrete frames or access units from the media stream, filter(s) to discard non interesting data, encoders/packetisers to format the data in a nice way and add metadata for sending, ring buffers for robust outputting even if there’s send failures, and finally outputs for forwarding to a destination. Here we look at each of these components in more detail, and the options that are currently supported by revid.

2.1 Inputs

AusOcean is interested in obtaining data from a range of sources; low cost, low quality options, to high cost, high quality options. There is also the desire for audio input. Here we explore the input options of revid, including Raspivid, RTSP, File, Audio and finally V4L.

2.1.1 Raspivid

Revid can pipe bytestream data from the raspivid command line tool which allows capture with the Raspberry Pi Camera Module. Raspivid can be configured to obtain data in a variety of codecs including H.264 and MJPEG with varying frame rate and picture settings like auto white balance and exposure. To use this input mode, the raspberry pi must be connected to a pi camera module. Take note of the revid logs; any failure to interface with the harder will result in errors showing up here.

2.1.2 RTSP

RTSP stands for Real Time Streaming Protocol. The protocol is designed for use in entertainment and communications systems to control streaming media servers. It is commonly adopted by Internet Protocol cameras for streaming of the recorded content, henceforth AusOcean’s interest; we are currently using GeoVision IP cameras in application.

2.1.3 File

Revid can read raw H.264 or MJPEG byte stream files and perform the same processing as would be done with a live media source. This is useful for the broadcast or upload of video that has not been recorded using a device on which RV is running. It is also extremely useful in testing when a live source is not available (see Loop mode for repeated playback of small files).

2.1.4 Audio

Revid can read data from ALSA which has wide compatibility with common audio cards/interfaces. Revid can then lex PCM/ADPCM and packetise into MPEG-TS in the same way that it does with video data. This is generally only useful for HTTP output and then database storage and analysis, as PCM and ADPCM are not space efficient codecs.  

2.1.5 V4L

Revid can also use FFMPEG to source H.264 media from a webcam. This is useful mostly for testing, but feasibly this could also be used for small hardwares that do provide an inbuilt camera.

2.2 Lexers/Extractors

The lexer/extractor component of the revid pipeline identifies discrete meaningful “access units” of media from a raw stream, and then passes them on to the packetisation stage. Revid supports the following lexers/extractors:

Lexer/Extractor

Description

H.264 Lexer

Will lex H.264 NAL units from a H.264 byte stream. This is used with direct camera->hardware interfaces like Pi Cameras (Raspivid) and Webcams (FFMPEG).

MJPEG Lexer

Will lex JPEG images from an MJPEG stream. This is used with direct camera->hardware interfaces like Pi Cameras (Raspivid) and Webcams (FFMPEG).

H.264/RTP Extractor

Extract H.264 NAL units from an RTP stream that has been established using RTSP i.e. the conventional mode of sourcing from an IP camera.

H.265/RTP Extractor

Extract H.265 NAL units from an RTP stream that has been established using RTSP i.e the conventional mode of sourcing data from an IP camera.

JPEG/RTP Extractor

Extract JPEG images from an RTP stream that has been established using RTSP i.e the conventional mode of sourcing data from an IP camera.

2.3 Filter

When using MJPEG input codec, revid can use filters to identify “interesting” frames of video. In doing so, we can greatly reduce the amount of data we are packetising and sending off. This is extremely valuable with a high bitrate codec like MJPEG. This provides us with extremely high quality segments of video on the cloud that we can use for analysis or re-encoding. Here we explore the various types of filters available.

NOTE: these filters are very CPU intensive. They will not work on a Raspberry Pi Zero, and even struggle to work on a Raspberry Pi 3 unless downscaling or frame skip is used.

2.3.1 NoOp

The NoOp filter or no operation filter will leave a video stream unaltered. It is used for when there is no need for a motion detection filter.

2.3.2 Basic

The Basic filter is the only motion filter that can be used without GoCV (see 4.2 Revid without GOCV).  It compares the RGB values of every pixel in the frame to the previous frame, if the change is over a threshold then that pixel is counted to have motion in it. If the total number of pixels is above another threshold then the frame is considered to be a motion frame.

Advantages

Disadvantages

It does not require GoCV.

It is bad at ignoring changes in lighting and small movements from seagrass.

It is the slowest of all the filters.

Supported variables

Variable

Default Value

MotionThreshold

45000

MotionPixels

1000


2.3.3 Diff

The Diff (difference) motion filter takes the absolute difference between the current frame and the previous frame. If the mean value of the resulting image is greater than the specified threshold, it will be considered to have motion.

Advantages

Disadvantages

It uses the least cpu of all the motion filters.

It is bad at ignoring changes in lighting and small movements from seagrass.

Supported variables

Variable

Default Value

MotionThreshold

3.0

MotionDownscaling

1

MotionInterval

5 frames

MotionPadding

10 frames


2.3.4 MOG

The MOG filter uses the mixture of gaussians algorithm. It models each pixel using the sum of many gaussian distributions. If a pixel lies outside the predicted range, then it considers that there is motion at that pixel.

Advantages

Disadvantages

It is good at ignoring changes in lighting and small movements from seagrass.

It uses more cpu than the difference filter.

It works well for a wide range of variable values - this makes it easy to set up.

Supported variables

Variable

Default Value

MotionMinArea

25.0 pixels2

MotionThreshold

20.0

MotionHistory

500 frames

MotionKernel

3x3 pixels

MotionDownscaling

1

MotionInterval

5 frames

MotionPadding

10 frames


2.3.5 KNN

The KNN filter uses the K-nearest neighbour algorithm.

Advantages

Disadvantages

It is fairly good at ignoring changes in lighting and small movements from seagrass.

It uses more cpu than the difference filter.

Supported variables

Variable

Default Value

MotionMinArea

25.0 pixels2

MotionThreshold

300

MotionHistory

300 frames

MotionKernel

4x4 pixels

MotionDownscaling

1

MotionInterval

5 frames

MotionPadding

10 frames

2.3.6 VFPS

The VFPS (variable fps) filter is a filter that sends frames at a lower frame rate if there is no motion detected. The variable fps filter uses the MOG algorithm for doing motion detection.

Supported variables

Variable

Default Value

MinFPS

1 fps

2.4 Encoders/Packetisers

After revid has lexed/extracted discrete access units of the input media, it will perform packetisation to wrap the media in a container/packet format to provide more information about subsequent presentation, e.g. timing, as well as additional metadata. Revid currently supports MPEG-TS and FLV.

2.4.1 MPEG-TS

MPEG-TS stands for Moving Picture Experts Group Transport Stream. It is a container format for transmission/storage of video, audio and program and system information protocol data. It provides features to help with error correction and synchronization. AusOcean defaults to this container format for transmission of data to cloud storage for analysis. Metadata such as world time and GPS can be stored in Program Mapping Tables. MPEG-TS also allows for multiple underlying streams i.e. we could have multiple synchronised sources that could be video or audio. MPEG-TS is the default container format for HTTP output.

2.4.2 FLV

FLV stands for flash video. It is a container format for the transmission of digital video content over the internet using adobe flash player. It is also the standard container format for the Real Time Messaging Protocol (RTMP), therefore this format is used when RTMP output is selected. RTMP/FLV is required for live youtube streaming.

2.5 Output

The final stage of the revid pipeline is the output. For the most part, outputs are protocols for which a destination can be defined. Outputs have abstractions called “senders” these senders provide buffering and discontinuity correction. Multiple outputs may be defined i.e. we can send data to the cloud and also youtube.  

Output

Description

File

File output is useful for testing and surveying using a towable underwater sled.

HTTP

HTTP output is used when we wish to send MPEG-TS data to cloud storage.

RTMP

RTMP is used for youtube streaming.

RTP

RTP is used for low latency streaming to a video player such as VLC configured to receive RTP. Also useful for towable sled surveying.

2. Variables

Variables may be assigned values via the cloud to control RV behaviour. The following table lists variables with their description. Defaults for non input specific variables are listed. NOTE: variables pertinent to motion detection filters are not listed here. See Filter section.

Variable

Description

AutoWhiteBalance

Defines the white balance mode for Raspivid input.

Options: off, auto, sun, cloud, shade, tungsten, fluorescent, incandescent, flash, horizon

Default: auto

BitDepth

Defines audio sampling bit depth from Audio input.

Default: ?

Bitrate

Defines input source constant bitrate. Not applicable if set to VBR.

Brightness

Defines input source brightness for Raspivid.

Default: 50

BurstPeriod

Defines period in seconds for which revid records when in burst mode.

Default: 10

CameraChan

Defines channel to be used by GeoVision RTSP input.

Default: 2

CameraIP

Defines IP address of an IP Camera.

Default: 192.168.1.50

CBR

If true, forces constant bitrate. If false, the quantisation parameter is used by Raspivid otherwise quality for GeoVision.

Default: true

Channels

Number of audio channels i.e. 1 = mono, 2 = stereo.

Default: 1

Exposure

Defines exposure mode.

Options: auto, night, nightpreview, backlight, spotlight, sports, snow, beach, verylong, fixedfps, antishake, fireworks

Default: auto

FileFPS

Rate at which to read file sources i.e. number of frames (access units) to process per second.

Default: 25

Filters

Defines chain of filters to be used.

Options: NoOp, Basic, Diff, MOG, KNN, VFPS

Default: NoOp

FrameRate

Input source frame rate in frames per sec.

Default: 25

Height

Input source video height in pixels.

HorizontalFlip

Flips input video image horizontally for Raspivid input.

Default: false

HTTPAddress

Currently not supported.

Input

Media input source from which we collect data.

Options: Raspivid, V4l, File, RTSP, Audio

Default: Raspivid

InputCodec

The media codec we wish to receive from the input.

Options: H264,H265,MJPEG,PCM,ADPCM

InputPath

Used in the case of file input. This is the location of the file to source media from.

Default: “”

logging

Define logging level.

Options: Debug, Info, Warning, Error, Fatal

Default: Error

MinFPS

This is the minimum number of frames to process by filters per second i.e. filters skip frames to increase efficiency, but media out will be at the defined FrameRate.

Default: 1

MinFrames

Number of NAL units before new key frame is inserted.

Default: 100 

OutputPath

Path of output file in the case of File output.

Outputs

Defines revid outputs. Can define 1 or 2.

Options: File, HTTP, RTMP, RTP

Default: RTMP

PSITime

Defines PSI interval in seconds, i.e. period between insertion of PSI into output MPEG-TS stream. Applicable to HTTP output.

Default: 2

Quantization

Determines variable bitrate quality when using Raspivid. Between 10 and 40; higher values result in lower overall quality and lower values result in higher overall quality.

Default: 30

RBCapacity

Number of bytes the ring buffer will occupy.

Default: 50000000 (50MB)

RBWriteTimeout

Ring buffer write timeout in seconds. Higher numbers reduce risk of data dropping with flaky networks.

Default: 5

RecPeriod

Number of seconds of audio to record at a time.

Rotation

Rotation angle of input video for Raspivid input.

Default: 0

RTMPURL

The destination address for RTMP output. This is obtainable from youtube live streaming UI.

RTPAddress

The destination address for RTP output.

Default: localhost:6970

SampleRate

Sample rate for audio input.

Default: 48000

Saturation

Input video saturation level for Raspivid.

Default: 0

Suppress

If true turns on logger suppression, i.e. repeated logs within a period of time (see ausocean/utils/logger) are suppressed.

Default: true

VBRBitrate

Maximal variable bitrate used by GeoVision RTSP input kBits.

Default: 400

VBRQuality

Average quality of variable bitrate for GeoVision RTSP input.

Options: standard, fair, good, great, excellent

Default: standard

VerticalFlip

If true, video is flipped around vertical access for Raspivid input.

Default: false

Width

Video input width.

WriteRate

Not currently supported.

There are also a range of variables available for the currently implemented filters. NOTE: All defaults are provided in Section 2.3 Filter.

Variable

Description

Applicable Filters

Basic

Diff

MOG

KNN

VFPS

MotionMinArea

Defines the area of pixels in which motion detected must be greater than to be considered.

MotionThreshold

Specifies the intensity value that the algorithm considers motion.

MotionHistory

Defines the length of the filter’s history, used for determining the background for background separation.

MotionKernel

Defines the size of the kernel used for filling holes and removing noise.

MotionDownscaling

Specifies the downscaling factor of frame resolution used for motion detection.

Note: This does not affect the resolution of the output video.

MotionInterval

Specifies how often (in frames) motion detection is performed on the video stream.

MotionPadding

Specifies the number of frames to keep before and after motion detected.

MinFPS

Specifies the minimum frame rate to send when there is no motion detected.

MotionPixels

Specifies the number of pixels with motion that is needed for a whole frame to be considered.

3. Common AusOcean Configurations

Here we explore 4 main configurations that AusOcean uses. Disclaimer: these configurations are also the most tested; there may not currently be support for other configurations.

3.1 Public Streaming

AusOcean adopts public streaming to youtube for public engagement. Furthermore, storage of video after streaming is free and there is currently no limit to storage. The following table summarises variables to be set with the corresponding values. NOTE: only constant bitrate willo work for youtube streaming.

Variable

Setting

Input

Raspivid (or RTSP in the case of GeoVision)

Output

RTMP

RTMPURL

<rtmp url from youtube live streaming UI>

Bitrate

1000000 (adjust based on network quality)

 

3.2 Private Streaming

AusOcean also performs streaming to the AusOcean vidgrind cloud service that runs on google appengine. Here we have a place where we can store and analyse media. We also have means to perform live streaming of H.264/MPEG-TS and are currently developing MJPEG browser viewing. NOTE: either CBR or VBR can be used, but VBR is the preferred method; it is much more efficient in achieving quality per byte.

Variable

Setting

Input

Raspivid (or RTSP in the case of GeoVision)

Output

HTTP

CBR

False (can also be true if CBR is desired; in which case Bitrate variable must be defined)

Quantization

30 (modify based on network quality)

3.3 Motion Triggered MJPEG Capture

AusOcean uses additional settings to send video only when there is motion in the frame.The previously defined settings from 3.1 or 3.2 can be used depending on the desired streaming location of this data. AusOcean typically uses the MOG filter, as it is one of the best performing filters currently implemented in Revid. The variables

have reasonable default values, but may need to be adjusted based on the environment being recorded.

Variable

Setting

Filter

MOG, VFPS

MinFPS

0.5 (sends a ‘spare’ frame every 2 seconds)

3.4 Survey

AusOcean uses two outputs for tasks where we need to both store, and see video with low latency during sled surveying. AusOcean uses RTP output with a VLC destination, and a HTTP output with a local Vidgrind instance destination for storage.

Variable

Setting

Input

Raspivid (or RTSP in the case of GeoVision)

Outputs

HTTP,RTP

RTPAddress

<ip of computer running VLC>:6970

CBR

false (things should run smoother this way)

Quantization

30 (modify based on RTP stream quality. If frame skipping occurs it’s likely the Raspberry Pi is struggling; raise this value)

4.0 Running RV