|DRAFT: FADGI's Significant Properties for DIgital Video|
|For context and more information, see||https://docs.google.com/document/d/1Y3MTBlEq_bj0BOodBFZGgDdPXSTXPdcCvy98IczoDdc/edit?usp=sharing|
|This work carries a CC0 1.0 Universal license for worldwide use and reuse|
|Class of Significant Property (from JISC report)||Property Name||Definition - in depth technical||Definition - summary / lay person||Notes||Reference for definition - see Resources + Ref worksheet||Typical values (not exhaustive list)||Impact of change on this property||Relevant Standards||Map to FADGI BDV High Level Recommended Practices||How is this data represented through metadata in commonly used open source tools?|
|mediainfo||ffprobe (need to run command with the -show_streams and -show_format flags)||exiftool (need to run command with -X, which gives better structured XML metadata)|
|Content||Duration||The length of time (in hours, minutes, seconds and frames) a video lasts when played/viewed. Follows the ISO 8601 format of hh:mm:ss, which represents hours, minutes, and seconds. |
Two common types of duration which may exist in same file: 1) file duration which include headers with color bars, slates which run before program content etc.; and 2) program content only from the first frame to final frame of the fade out/ring out.
|The runtime of the video which can be just the program content or include elements beyond the program content such as color bars, titles etc. Duration is usually represented in the standard ISO 8601 format of hh:mm:ss, which represents hours, minutes, and seconds.||Duration is closely linked to timecode (described separately below). More a more in depth look at this relationship, see the DRAFT - in depth worksheet||Adapted from Tektronix Glossary of Video Terms and Acronyms||10s 444ms (MediaInfo = Duration_string)||Changing the duration would have to be done by either cutting out content or changing the playback speed. Software stipped out counter and replaced with it's own.||ISO 8601 DateTime format |
ST 12-1:2014 - SMPTE Standard - Time and Control Code
|n/a||Duration||Duration||Essence Length, Duration|
|Content||Number of Moving Image/Video Channels or Tracks||Number of video channels or streams present in a single file. Every stream is indexed in a wrapper.||Number of video channels or streams present in a single file.||Sometimes the terms "tracks", "channels" or "streams" are used interchangibly but they have distinct meanings. See the DRAFT - in depth worksheet for more information.||Adapted from Tektronix Glossary of Video Terms and Acronyms||Represented by whole integer value ("1", "2", etc)||if lost or not ID, lose content. ffmpeg will only map 1st stream unless told otherwise||.mxf, .mkv, .mov, other wrappers that allow multiple video streams in a single file||n/a||Count of video streams||nb_streams||Not specifically expressed, but each stream is listed|
|Content||Number of Audio Channels or Audio Tracks||Number of audio channels or audio tracks present. An audio channel is one single instantiation of audio. There are multiple methods in which audio may be organized. An audio track is an instantiation of audio that is recorded onto physical media or in a file. Audio tracks may contain more than one channel of audio.||Number of audio channels or tracks present in a single file.||Sometimes the terms "tracks", "channels" or "streams" are used interchangibly but they have distinct meanings. See the DRAFT - in depth worksheet for more information.||Adapted from FADGI Glossary||Represented by whole integer value ("1", "2", etc)||if lost or not ID, lose content||To come||RP 3.13 Select formats that can contain and label complex audio configurations including multiple channels and sound fields beyond mono and stereo||Tracks = "Count of audio streams", Channels = "Channel(s)'||channels, channel layout|
|Context||Associated metadata||Additional technical metadata describing the files, their relation to each other, or the program. Ancillary metadata can either be data the describes the essence but is not required for playback, or it can be data required for the proper reproduction of the essence. It often includes organization specific descriptive (“cataloging”) or administrative metadata, or by specialized forms of process metadata. Ancillary metadata is often in XML.||Associated metadata is additional technical metadata documenting more details in how they were created or how they are constructed. Can be either stored in the files or as separate files that accompany the video and audio files. Some of this data may be essential to proper playback, for example, some cameras create packages of files with files that can inform how the files should be read and played back, containing either/both technical information about the video files themselves and information about how the files relate to each other. Associated metadata is in the form of separate media files that contain scripts or image files.||Ancillary metadata can be contained in the files or it can reside in sidecar files that accompany camera original media files off camera; this data can include non-essential camera or production information, or essential technical information about the audio and video files and how they relate to each other, such as disk information, clip information, or playlist information, or video stream frame rate, frames per second, and aspect ratio. Depending on the camera original format, sometimes these files must be retained in order to guarantee accurate playback of the files.||Adapted from SMPTE RDD 48 definition of Supplementary Metadata||Removing accompanying data files from camera original formats increases the risk that files will not be able to be played back as intended, or that they may not be able to reconstructed at all.||RP 2.2 Document relationships between the video object and other files, such as closed captions, scripts, location notes and other supplemental material||n/a||various under "metadata" heading||Various under "system", "file", and format (in this case "mxf") elements.|
|Rendering/Appearance||Display Aspect Ratio (DAR)||The Display Aspect Ratio is where the Presentation Aspect Ratio and the size of the physical screen it is to be presented upon are different. This arose because in the early 1990s analog and digital SD programming could be recorded and transmitted using the same resolution (486 x 720 for the System M [525 line] countries and 576 x 720 for the 625 line countries) but with different aspect ratios. Analog broadcasting up to the late 1980s was always 4:3 aspect ratio. With the creation of PALplus in Europe, the same 576 x 720 resolution transmitted a 15:9 picture (this was never used in the 525 line countries). With the creation of HDTV in the 1980s, the 16:9 aspect ratio was created. All HD systems (1080 line and 720 line) are 16:9. However, when downconverted to SD for backwards compatibility with existing TVs and videotape recorders the 16:9 aspect ratio of the downconverted HD video was presented in the same 486 x 720 and 576 x 720 resolutions. Therefor you could have two Display Aspect Ratios for 525 line video (4:3 and 16:9), and three for 625 line video (4:3, 15:9 and 16:9). For nearly 15 years the only way to know how to display video was to look at it and make the best guess.|
Starting in 2001 the SMPTE created the Active Format Description (AFD) metadata standard to insert 4 bit bytes of data into a digital video stream to allow recorders and video displays to automatically know how to display the video properly. You may expect to find these bytes in the Video User Bits section of uncompressed and compressed video files and bitstreams as defined in the SMPTE ST2016-1: 2009 standard document.
|DAR is, ultimately, how the file players will resolve visual characteristics and display the height versus width ratio relationship in any playback. |
The proportional relationship between the horizontal and vertical dimensions of an image. The horizontal dimension is normally listed first and the two values are separated by a colon. There are several subtypes of aspect ratios. Typically for digital video the DAR or display aspect ratio is the key value. Video production and distribution organizations usually refer to this property as Active Format Description (AFD)
There is a logical relationship between the aspect ratio and the image size, meaning the number and aspect ratio of the pixels, the number of pixels per line, and the number of lines, govern the aspect ratio of the picture as a whole.
|A file can potentially display conflicting DARs. Some file containers, like AVI, may not clearly document aspect ratio, resulting in a file presentation that may stretch or adjust from the original intended image. In MXF, AFD values are stored in the MXF Picture Descriptor while changeable AFD values are stored in a SMPTE ST 436-1:2013-compliant VBI/ANC GC Data Element.||PAR, SAR, and DAR: Making Sense of Standard Definition (SD) video pixels by By Katherine Frances Nagels/BAVC|
FADGI Glossary for Aspect Ratio
ST 377-1:2011 Material Exchange Format (MXF) File Format Specification
SMPTE RDD 48: MXF Archive and Preservation Format Registered Disclosure Document
|Standard definition: 4:3; High definition: 16:9||Useful to understand intentional and original video display aspect ratios per formats for digitized video to, if necessary, correct display aspect ratio in playback - resulting in a file presentation that may stretch or adjust from the original intended image.||BT.601 (SD) |
SMPTE ST 2016-1: 2009 - Format for Active Format Description and Bar Data, especially Table 1 for allowable codes
ST 377-1:2011 Material Exchange Format (MXF) File Format Specification
SMPTE RDD 48: MXF Archive and Preservation Format Registered Disclosure Document
|RP 1.5 Select larger picture sizes over smaller picture sizes (information related to DAR and SAR, but not explicitly the same)||Display aspect ratio||display_aspect_ratio (also one of the values listed in the generic "video" element)||PresentationAspectRatio|
|Rendering/Appearance||Image Size||The size of image measured in pixels for the horizontal dimension (width) and lines for the vertical dimension (height). The horizontal dimension is listed first and the two values are separated by an "x".||Often referred to as resolution, the image size is the basic measurement of how much information is on the screen. It is usually described as the number of pixels in the horizontal axis by the number of horizontal lines. The higher the numbers, the better the system’s resolution.||While the display and rendering of the video will depend on other factors as well, this value represents information that cannot be recouped once lost.||Adapted from Tektronix Glossary of Video Terms and Acronyms|
NTSC VHS – 240 x 485;
NTSC broadcast – 330 x 485;
NTSC laserdisc – 425 x 485;
ITU-R BT.601 (525/60) – 720 x 485;
Computer screen – 1280 x 1024
|potentially loss of information if reduced. If scaled up dramatically for display, may introduce visual artifacts and muddyness.||BT.601 (SD) |
BT.2020 (UHDTV), 3840x2160 4K UHD (defined by SMPTE ST 2036-1), 4096x2160 DCI 4K,
|RP 1.5 Select larger picture sizes over smaller picture sizes||Width, Height||Width, Height (also one of the values listed in the generic "video" element)||ImageWidth, ImageHeight|
|Rendering/Appearance||Audio bit depth||Bit depth determines the encoded dynamic range of an audio event or item. 24-bit audio provides a dynamic range that approaches the limits of the dynamic ranges of sound encountered in nature; in contrast, 16-bit audio, the CD standard, may be inadequate for many types of material especially where high level transients are encoded, such as the transfer of damaged discs. When reformatting, the International Association of Sound and Audiovisual Archives (IASA) recommends a bit depth of at least 24 in order to ensure that the transfer process captures the full dynamic range.||The number of bits used for each individual sample to encode digital audio. The higher the bit depth, the greater range between quietest and loudest value of the signal (dynamic range).||Adapted from FADGI Glossary||16, 24, 32 |
Native bit rate for audio in digital video is normally 16 or 24-bit, 24-bit is recommended
|Lower bit depths signifies lower signal to noise ratio and potential for less accurate digital representation of signal.||IASA TC-04||RP 1.7 Select higher bit depths over lower bit depths|
RP 2.3 Identify the file characteristics at the most granular level possible, including the wrapper and video
Description stalks about video characteristics only but the same can be said about audio
bit_rate (also one of the values listed in the generic "video" and "audio" elements)
|Rendering/Appearance||Audio sampling rate||Sampling rate (sometimes referred to as sampling frequency) defines the number of samples per second (or per other unit) taken from a continuous signal to make a discrete or digital signal. For time-domain signals like the waveforms for sound (and other audio-visual content types), frequencies are measured in in hertz (Hz) or cycles per second. The Nyquist–Shannon sampling theorem (Nyquist principle) states that perfect reconstruction of a signal is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled. For example, if an audio signal has an upper limit of 20,000 Hz (the approximate upper limit of human hearing), a sampling frequency greater than 40,000 Hz (40 kHz) will avoid aliasing and allow theoretically perfect reconstruction.||The rate at which audio signal is sampled; measured in Hertz(Hz) or kiloHertz (kHz). Usually expressed as samples per second (i.e. 44100 samples per second can be expressed as either 44100 Hz, or 44.1 kHz; capturing frequencies up to 22kHz). Can be an indicator of perceived quality in which the higher the sampling rate, the more information captured from source. Higher sampling rates encode audio outside of the human hearing range, but the overall effect of higher sampling may improve the audio quality.||Adapted from FADGI Glossary||Most common sampling rates: 44.1kHz, 48kHz, 96kHz, 192kHz. |
48kHz is common for audio in digital video, albeit it may be higher or lower depending on the original source material (TC-06).
|You would have to significantly decrease the sampling rate before you could hear the degradation in sound quality. For many, sample rate conversion downwards (i.e. 48kHz to 44.1kHz) diminishes the high frequencies that are out of range for human hearing. However, continual down sampling for derivatives may eventually result in loss of quality over time. |
Spoken word is less sensitive to reductions in sampling rate versus music or other content with more dynamic range.
It is recommended to maintain the sampling rate of your source material. If you need to down-sample to create your derivatives, be aware that it may affect your audio quality.
|IASA TC-04, TC-06||RP 1.7 Select higher bit depths over lower bit depths|
RP 2.3 Identify the file characteristics at the most granular level possible, including the wrapper and video
Description stalks about video characteristics only but the same can be said about audio
|Rendering/Appearance||Video bit depth||The number of bits used to represent each color channel or component in a sample of video. As you increase bit depth, you increase the number of colors that may be represented. For example, 8-bit has a range of 0-256 whereas 10-bit has a range of 0-1024. Bit depth is also referred to as color depth and bits per pixel (BPP).||The number of bits per color channel or component. The more bits, the greater the color range and detail possible in the video file. |
For more information on bit depth: https://www.bhphotovideo.com/explora/video/tips-and-solutions/8-bit-10-bit-what-does-it-all-mean-for-your-videos
|Adapted from FADGI Glossary||Typical values: 8-bit, 10-bit||If you have a 10-bit file and reduce it to an 8-bit file, you may lose smooth transitions and fine details in color gradients. Chroma subsampling also has a significant effect on this.||IASA TC-06, |
EBU R 103: Video Signal Tolerance in Digital Television Systems
|RP 1.7 Select higher bit depths over lower bit depths||Bit depth||bits_per_raw_sample,||ComponentDepth|
|Rendering/Appearance||Video Bit rate||The number of bits that are conveyed or processed per unit of time, most often (but not exclusively) employed when discussing time-based media like sound or video. Often expressed in terms of kilobits per second (kbit/s or kbps, 10 to the third power), megabits per second (Mbit/s or Mbps, 10 to the sixth power), or gigabits per second (Gbit/s or Gbps, 10 to the 9th power). Bit rate is one means used to define the amount of compression used on a video signal. For example Uncompressed D1 has a bit rate of 270 Mbps. MPEG-1 has a bit rate of 1.2 Mbps.|
There is a difference between compressed bitrates and uncompressed bitrates. In files that use compression (regardless of whether it is lossy or mathematically lossless), the video bitrate will always be less than the bitrate of the video when decoded for viewing. The bitrate after decoding for viewing is the uncompressed bitrate.
|The amount of data transported in a given amount of time. It is one way to define the amount of compression used. Generally the more data transported, the more data stored, the less compression used, the large the file size, and the higher the quality of the file.||Adapted from FADGI Glossary and Tektronix Glossary||Standard definition: |
1080p: 8 Mbps
720p : 5 Mbps
480p: 2.5 Mbps
2160p (4k): 44-56 Mbps
1440p (2k): 20 Mbps
720p: 6.5 Mbps
|IASA TC-06,||RP 1.6 Select higher bit rates over lower bit rates|
|Rendering/Appearance||Video Bit rate mode (constant/variable)||Bit rate can be either constant (at the same rate throughout the compressed file) or variable (changeable during the decoding of a compressed bit stream).|
Constant bit rate (CBR) is the bit rate is constant or unchanging from start to finish of the compressed bit stream. CBR is useful for streaming multimedia content on limited capacity channels since it is the maximum bit rate that matters, not the average, so CBR would be used to take advantage of all of the capacity. CBR would not be the optimal choice for storage as it would not allocate enough data for complex sections (resulting in degraded quality) while wasting data on simple sections.Variable bit rate (VBR) files vary the amount of output data per time segment during the decoding of a compressed bit stream. VBR allows a higher bitrate (and therefore more storage space) to be allocated to the more complex segments of media files while less space is allocated to less complex segments. The average of these rates can be calculated to produce an average bitrate for the file.
|Bit rate can be either constant (at the same rate throughout the compressed file) or variable (changeable during the decoding of a compressed bit stream).||CBR: https://en.wikipedia.org/wiki/Constant_bitrate|
|To come||Bit rate mode||not expressed||not expressed|
|Rendering/Appearance||Frame rate (frames per second)||Frame rate (expressed in frames per second or fps) is the frequency (rate) at which consecutive images called frames appear on a display. The term applies equally to film and video cameras, computer graphics, and motion capture systems. Frame rate may also be called the frame frequency, and be expressed in hertz.||Whether digital or celluloid film, video is a series of still images that, when viewed in order at a certain speed, give the appearance of motion. Frame rate is the speed at which those images are shown. Each image represents a frame, so if a video is captured and played back at 24fps, that means each second of video shows 24 distinct still images.||One example of very low FR is security camera footage||Standard film: 24fps (frames per second); Television: 29.97fps (NTSC); PAL: 25fps||<24fps = begin to lose "persistence of vision"; 48fps (as in "The Hobbit") = "Soap Opera Effect"|
Here are some fun visual examples of frame rate changes: https://frames-per-second.appspot.com/
Additional info on frame rate: https://gizmodo.com/why-frame-rate-matters-1675153198 (includes a basic visual of how frame rates affect moving images)
|Standard film: 24fps (frames per second); Television: 29.97fps (NTSC); PAL: 25fps||RP 1.10 Stay within the range of common frame rates of 24-30 frames per second (fps)||Frame rate||r_frame_rate (also one of the values listed in the generic "video" element)||EditRate|
|Rendering/Appearance||Color gamut||The range of colors represented by a specified COLOR SPACE. It specifies the range of colors represented by a percentage of the 1931 CIE color chart, which defined the total range of colors the average human eye can see. Color gamut is the range of colors, from least saturated to most highly saturated, that can be seen by the human eye (represented as a percentage of the 1931 CIE color chart) that can be represented by a given color space definition (such as RGB or XYZ). Wide color gamut is where more colors seen by the human eye are represented than commonly used color gamut methods. Most commonly thought of in relation to Adobe's wide gamut RGB, it is any color system that can represent more colors than typically used color systems. The ability to record color gamut properly is defined by the number of BITS per COLOR CHANNEL used: 8 bits means visible contouring will be seen in images, whereas 10 bits are enough for the human eye to see smooth color shade contouring and 16 bits is enough for digital processing systems (hardware and/or software) to process images without visible artifacts resulting from that processing.||The color gamut describes a range of color within the spectrum of colors on the visible color spectrum. In order to standardize the reproduction of colors across devices, the CIE (International Commission on Illumination) established a standard color chart which defined the total range of colors identifiable by the human eye. This range is known as the color gamut. |
The range of colors that can be generated by a specific output device (such as a monitor or printer), or can be interpreted by a color model. Often referred to as color gamut.
|Adapted from Tektronix Glossary and FADGI Glossary||To come||To come||To come||n/a||To come||To come||To come|
|Rendering/Appearance||Color channels||Color channels, sometimes known as 'color primaries', describe how an image is represented by a set of 3 or more color primaries. The most typical is RGB and its related color primary representations Y, R-Y, B-Y; YPbPr (a shorter way or writing Y, R-Y, B-Y); and YCbCr (the digital version of YPbPr) which are all directly related to RGB but take up less bandwidth.||A color channel stores the color information for one of the primary color components of a color model. For example, the RGB color model has three separate color channels; one for red, one for green and one for blue.||Adapted from FADGI Glossary||RGB|
Y, B-Y, R-Y
|To come||To come||n/a||To come||To come||To come|
|Rendering/Appearance||Color model||A color model is the method for specifying color primary channels numerically. For example, for any given color when using RGB color space, the R, G and B will have specific numerical values that together make up the specific color being represented: for black in RGB those numerical values would be 0,0,0 (0R, 0G, 0B). The numbers used to describe each color primary channel are defined by the BIT DEPTH being used: for 8 bits R, G and B can use values from 0 to 255 (256 discrete levels, or 8^2). For 10 bit: R, G and B use values from 1 to 1023 (1024 levels, or 10^2).||A color model is a way of specifying or describing a color numerically; common examples include RGB, HSV and CMYK. For example, in the 24-bit-deep RGB color model, the intensity of each of the red, green and blue components of the model (8 bits for each channel) are represented on a scale from 0 to 255. The lowest intensity of any color represented by 0 and 255 representing the maximum intensity.|
There are two main categories of color models, additive and subtractive. Additive color models (such as RGB) are based on transmitted light while subtractive color models (such as CMYK) are based on reflected light.
|Adapted from FADGI Glossary||8 bits R, G and B can use values from 0 to 255 (256 discrete levels, or 8^2) |
10 bit: R, G and B use values from 1 to 1023 (1024 levels, or 10^2)
|To come||To come||n/a||implied as part of the information conveyed by the "Standard" element? otherwise it might not be explicitly outlined. Also might be part of the "Color range" and "Color primaries" elements?||possibly expressed in color_range?||not expressed|
|Rendering/Appearance||Color space (specific organization of colours) [ex YUV etc]||The definition for how a specific grouping of color primary channels are used to represent color. They are typically represented by how they map to the 1931 CIE color chart, which defines all colors visible to the human eye. They can be defined by international standards (such as SMPTE RGB, P3 or XYZ), by vendor definitions (such as AdobeRGB), or by industry practice.||The definition for how a specific grouping of color primary channels are used to represent color. They are typically represented by how they map to the 1931 CIE color chart, which defines all colors visible to the human eye. They can be defined by international standards (such as SMPTE RGB, P3 or XYZ), by vendor definitions (such as AdobeRGB), or by industry practice.||Adapted from FADGI Glossary||SMPTE RGB, P3 or XYZ, by vendor definitions (such as AdobeRGB), or by industry practice.||To come||To come||n/a||Color space||color_space||not expressed|
|Rendering/Appearance||Chroma sampling||Chroma sampling is the number of samples per sampling point taken when converting an analog signal to digital. It is represented by each color channel separated by a colon: for example 4:4:4. |
In the case of component color difference signals (which are a method or representing RGB color primaries but with less data) 4:2:2 represents YCbCr. The term "YUV" has also been used interchangeably to represent YCbCr but is a misnomer: YUV is actually the definition of analog PAL and SECAM component color difference signals (U stands for "unvarying" and V for "varying"). The proper term is YCbCr.
|Chroma sampling is the number of samples per sampling point taken when converting an analog signal to digital. It is represented by each color channel separated by a colon: for example 4:4:4. In the case of component color difference signals (which are a method or representing RGB color primaries but with less data) 4:2:2 represents YCbCr. The term "YUV" has also been used interchangeably to represent YCbCr but is a misnomer: YUV is actually the definition of analog PAL and SECAM component color difference signals (U stands for "unvarying" and V for "varying"). The proper term is YCbCr.|
For more info on Chroma Sampling, see appendix B of http://www.digitizationguidelines.gov/guidelines/FADGI_VideoReFormatCompare_pt5_20141202.pdf and https://www.iasa-web.org/sites/default/files/publications/IASA-TC_06-B_20180518.pdf
|To come||To come||RP 1.8 Use higher chroma subsampling ratios rather than lower||Chroma subsampling||pix_fmt (also one of the values listed in the generic "video" element)||not expressed|
|Rendering/Appearance||File format (wrapper/container)||A wrapper or container format encapsulates its constituent video and audio bitstreams/encodings as well as embedded metadata and other components such as timecode, captions and more.||Files are often recognized at the wrapper/container level in that the file extension identifies the file format at a high level, e.g., .mov, .mkv||Adapted from FADGI Glossary and Sustainability of Digital Formats website (http://www.loc.gov/preservation/digital/formats/intro/format_eval_rel.shtml)||https://www.iasa-web.org/sites/default/files/publications/IASA-TC_06-B_20180518.pdf||Different wrappers have different capacity for the files' components, especially beyond the imagery data. Some include better support for internal metadata, multiple audio tracks, timecodes and more.||To come||RP 2.3 Identify the file characteristics at the most granular level possible, including the wrapper and video stream encoding||(in general stream) Format, Commercial name, Format version, Format profile.||format_name, format_long_name||FileType|
|Rendering/Appearance||File bitstream encoding||Contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes. Generally speaking, a bitstream cannot be transformed into a standalone file without the addition of file structure such as a wrapper or container.||Adapted from FADGI Glossary and Sustainability of Digital Formats website (http://www.loc.gov/preservation/digital/formats/intro/format_eval_rel.shtml)||https://www.iasa-web.org/sites/default/files/publications/IASA-TC_06-B_20180518.pdf||Changes in the encoding of the video data can have significant impacts on the visual appearance and technical fidelity of the information. Different encodings have different parameters they can support such as compression, color space, sampling rates and more.||To come||RP 2.3 Identify the file characteristics at the most granular level possible, including the wrapper and video stream encoding||(in video and audio streams) Format, Codec id, Commercial name. (in general "wrapper" stream) Video_Format_List, Audio_Format_List, Other_Format_List, Other_Codec_List.||codec_name, |
codec_long_name (also one of the values listed in the generic "video" element)
|Structure||Scan type (interlaced or progressive)||Video displays use one of two scanning methods to draw the picture on the screen: interlaced and progressive. In interlaced scan systems, the information for one picture is divided up into two fields. Each field contains one-half of the lines required to produce the entire picture. Adjacent lines in the picture are in alternate fields. In progressive scan systems, lines of a picture are transmitted consecutively, such as in the computer world.||Generally speaking video frames can be structured in two different ways- as a single image (progressive) or as two alternating fields (interlaced), where each field contains half of the lines needed to represent the entire picture. One field will contain the even-numbered lines of the picture and the other will contain the odd-numbered.||Analog video is typically interlaced. Digital video may be either interlaced or progressive.. All HDTVs are progressive-scan displays — so even if the signal being sent to the HDTV is interlaced, the HDTV will convert it to progressive scan for display on the screen.||Adapted from Tektronix Glossary and IASA-TC06||Interlaced (i), Progressive (p), Progressive segmented frame (PsF)|
You will also see scan type expressed with the resolution info: 480i, 720p, 1080i, 1080p, etc.
Or with the frame rate info: 59.94i, 24p, 24PsF, etc.
|For more about going from interlaced to progressive, see the DRAFT - in depth worksheet||RP 2.3 Identify the file characteristics at the most granular level possible, including the wrapper and video stream encoding|
RP 2.7 Select appropriate technical characteristics for the video encoding if transcoding, normalizing or
otherwise changing the video stream to meet business needs
|(in video stream) Scan type, Scan order||doesn't seem to be expressed||not expressed|
|Structure||Timecode||Timecode (sometimes seen as time code) is an annotation of elapsed time along a track in which each frame or field is assigned a unique digital code number from an electronic clock. Timecode is expressed hours:minutes:seconds and frames. A file may carry multiple timecodes which may be continuous or non-continuous.||Timecode is an electronic signal used to identify a precise location in a time-based media file or tape. Its primary use is synchronization of various data streams but it can also have important uses in search and discovery to mark or find a specific point in the video. A file can contain multiple timecodes, especially if the file is the result of digitization of an analog videotape.||Timecode may be documented as drop frame or non-drop frame. Drop frame drops 2 frames every minute except on every 10th minute to account for 29.97/s frame rate to allow it to correspond a real-time clock.||https://www.iasa-web.org/sites/default/files/publications/IASA-TC_06-B_20180518.pdf||ST 12-1:2014 - SMPTE Standard - Time and Control Code|
SMPTE ST 12-1:2014 Time and Control Code;
SMPTE ST 12-2:2008 Transmission of Time Code in the Ancillary Data Space;
SMPTE ST 12-3:2016 Time Code for High Frame Rate Signals and Formatting in the Ancillary Data Space;
EBU R 122 Material Exchange Format Timecode Implementation
|RP 1.9 Generate a high integrity and continuous master timecode||(in "Other" stream) Format, Type, (in general stream) Other_Format_List.||timecode (but didn't actually detect the timecode in the sample file, because they're separate streams and not part of the video stream)||ComponentDataDefinition|
|Structure||File size||The size of the file or track measured in bytes||The size of the file or track measured in bytes||File size||size||FileSize|
|Content||Captions/Subtitles||The terms Captions and Subtitles are often more or less interchangeable for text intended for display over a timeline, in synchronization with the image and sound. They are used to improved accessibility for hearing impaired as well as language translation.|
When present, the captions in NTSC video governed by the Consumer Electronics Association standard ANSI/CTA-608-E (CEA-608) are generally encoded into line 21, considered to be part of the active picture area. ANSI/CTA-708-E (CEA-708) is the standard for closed captioning for ATSC digital television (DTV) streams in the United States and Canada. ANSI/CTA-708-E (CEA-708) captions consist of binary-format textual data but this data is not carried on line 21 and will be pre-rendered by the receiver. For backwards compatibility for settop boxes and receivers that output analog video signals the CEA-708 captions contain the CEA-608 captions as a portion of the data, and only enhancements beyond basic text are carried in the CEA-708 extension data.
|In the analog NTSC standard the closed captions are carried in line 21. In uncompressed digital video they are carried in the SMPTE ST436 defined closed caption space in the video frame header. In lossy compressed video it is carried (if captions are permitted by the compression scheme: some early compression types did not allow closed captions to be carried) in the video user bits in the header of the compressed video frames. Closed captions are always carried attached to video frames since the timing of the captions to specific frames of video is critical.|
In production environments, closed captions may also be carried outside of the video program file itself as a standalone closed caption file, typically referred to as a 'sidecar file'. They typically are XML formatted and have .SCC, .SCT or .TT filename extensions.
|NTSC: Consumer Electronics Association standard ANSI/CTA-608-E (CEA-608);|
ATSC (aka DTV): ANSI/CTA-708-E (CEA-708)
|RP 2.2 Document relationships between the video object and other files, such as closed captions, scripts,
location notes and other supplemental material
|(in "Other" stream) Format, Type, (in general stream) Other_Format_List.||codec_name, codec_type||(in the subtitle stream) TrackType, CodecID|
|Structure and/or Rendering/Appearance||Field order||Field order in uncompressed video is defined as ODD and EVEN fields. The ODD fields are those with the odd numbered lines and are FIELD ONE. The EVEN fields are those with even numbered lines and are FIELD TWO. This applies to all analog and digital standard video formats: 525 (System M, used in analog NTSC and PAL-M and digital SMPTE 125M), 625 (PAL & SECAM in analog, SMPTE 125M in digital); and the 1080i25 and 1080i29.97 HD formats. Note that no HD 720 format is interlaced.|
In some video compression systems the terms UPPER and LOWER are used. In some compression systems some lines of active video may be truncated as part of the compression process, meaning what had been a frame that started with an odd numbered line after encoding might have an even numbered line (before compression) as the first line of video. When the video is decoded back to uncompressed for display on an interlace screen, the original ODD/EVEN field order will be restored on displays that permit interlacing. When viewing on computer screens, which are by their nature progressive displays, the video will have been de-interlaced for viewing before hand and any reference to ODD/EVEN or UPPER/LOWER fields will have been lost in the deinterlacing for display.
|In interlaced video, this property specifies which field is displayed first- upper or lower, top or bottom, field 1 or field 2. |
This is not a factor in progressive video because all of the frame is displayed at one time and the image data is not split into two fields.
|SMPTE standard 125M-1995 references fields as field 1 (odd) and field 2 (even); video editing and transcoding software usually refers to field order in interlace video as top or bottom field.||https://support.apple.com/kb/PH16641?locale=en_US |
|Upper field first, lower field first|
Top field first, bottom field first
Field 1, Field 2
|If field order is mixed up, motion can appear staggered or video may look jittery. See this post for more info: https://www.provideocoalition.com/field_order/ |
Brief mention of field order problems: "If the ordering of the fields is altered so that the images appear in the wrong order the effect can be substantial." See: https://www.tate.org.uk/about-us/projects/pericles/sustaining-consistent-video-presentation
Recommendation: Maintain field order of source video.
|IASA TC-06, SMPTE 125M||RP 2.3, 2.7, 2.10||Scan order||field_order||FieldDominance|
|Additional Potential Properties|
|Rendering/Appearance||Frame rate mode (constant/variable)||Frame rate can be either constant (at the same rate throughout file) or variable (changeable throughout the file). Frame rate mode may not be explicitly declared in file headers but rather can be derived by examining the duration of a frame and the time stamp difference. See https://sourceforge.net/p/mediainfo/discussion/297610/thread/b2a2708d/ for more information|
Variable frame rate (or VFR) is a term in video compression for a feature supported by some container formats which allows for the frame rate to change actively during video playback, or to drop the idea of frame rate completely and set individual timecode for each frame.VFR is especially useful for creating videos of slideshow presentations or when the video contains large amounts of completely static frames, as a means of improving compression rate, or if the video contains a combination of 24/25/30/50/60 FPS footages and the creator or editor of the video wishes to avoid artifacts arising from frame rate-conversion.
|VFR allows for the frame rate to change actively during video playback, which allows for certain artistic choices in composition - to achieve effect.||HandBrake has a decent definition/explanation of constant frame rate: "Constant Frame Rate (CFR) makes your new video exactly one frame rate throughout." https://handbrake.fr/docs/en/latest/technical/frame-rates.html||CFR VFR||strobing||Standard film: 24fps (frames per second); Television: 29.97fps (NTSC); PAL: 25fps||Frame rate mode||not expressed (but inferrable from avg_frame_rate?)||not expressed|
|Rendering/Appearance||Compression ratio||The ratio of a file's uncompressed size over its compressed size. A file compressed ten-fold over its uncompressed size would be described as having a ten-to-one compression, expressed as 10:1. Some formats such as JPEG and JPEG 2000 allow the user to specify the compression ratio.||Video that is compressed has a smaller file size than the original. If compressed too much though, the file can have problematic playback such as a loss of detail or resolution.||Lossless is preferred over lossy compression - if compression needs to be used||FADGI|
|Bit rates can be variable||Lossless compression rates usually are 4:1 or less (https://pdfs.semanticscholar.org/8f82/76d13ff4cb4ccc9a4e6473e0fb8f494ecb9f.pdf)||RP 3.3 Select uncompressed video encoding over compressed encoding||not expressed, but sort of explained in "Compression mode" element.||not expressed, but the video stream in test file had "lossless" as a value in the generic video element. the generic audio element did not have "lossy" for the lossy audio||not expressed|