Audio & Video Primer (Adapted from various user manuals)
Video
Light & Color

Light
Light is only a small part within the large spectrum of electromagnetic radiation. It is a
form of energy which, in order to end up as a visual sensation, first has to be perceived
by the eye, then conducted and interpreted by nerves and by the brain.
In order to propagate, light does not need any medium; it perfectly travels thru the
empty space. Nevertheless, the speed of light, or celerity, well depend on the medium.
Every radiation is characterized by a wave length and an oscillation frequency. Light
waves travel at a speed close to c = 300 000 km/s.

Color
Every frequency or wave length of light radiation corresponds to a pure
(monochromatic) color, the shortest visible wave length corresponding to violet and the
longest to red. Most light sources produce a complex mix of various monochromatic
radiations; their light is therefore called polychromatic. The relative amplitudes of each
of these radiations determine the dominant color we observe. Light containing the
whole visible spectrum in equal proportions is called white or gray light, and seems to
have no color.

Absorption & Reflection
A luminous object sends out light, while a lighted object receives and reflects a certain
quantity of light. We can takes pictures of objects thanks to the light they reflect. But,
objects absorb and diffuse the various monochromatic components of white light in a
very selective way, that’s actually what gives them their color. Much in the same way, a
filter will absorb certain parts of the spectrum and let pass others.

Hue, Saturation & Luminance
Luminance qualifies the impression of brightness and intensity of a light source. Hue is
the color tone; we perceive or interpret as colors the special mixture of a light source’s
dominant wave lengths. Saturation concerns the purity of the radiation. When a color
doesn’t contain any white light, it is at it’s maximum saturation. On the contrary, the
more white or gray light it contains, the more it is desaturated. Saturated colors are
vivid, desaturated colors give a pastel, or washed out impression.

RGB (Red, Green, Blue)
Every color of the visible spectrum can be obtained by a precise mixture of the three
primal colors red, green and blue, by determining and controlling the quantities of red,
green and blue it contains.

The Video Signal
Luminance & Chrominance
When television norms were defined, a compatible system had to be found with the
already existing black & white (B&W) system.
Information about luminance (brilliance or luminosity) had to be separated from
information about chrominance (color). The RGB signals would then be combined in
one luminance signal and two chrominance signals. These chrominance signals are
modulated by a carrier wave situated within the high spectrum of the B&W signal. This
is the composite signal compatible with B&W receptors (NTSC, PAL or SECAM).
Frames/Second
The eye, in order to perceive the illusion of a moving picture while in reality being
presented with a succession of fixed images, needs an absolute minimum of 15 images
per second, (the human retina retains an image for about 1/15 s.). The mosaic of
elementary points that compose a video image is explored along parallel horizontal
lines. System with 625 lines (Europe for example) work with 25 frames per second, and
systems with 525 lines (USA for example) with 30 frames per second (B&W video).
Note that Pal video standard runs at 25 fps frame rate (24 fps is used exclusively for
film application); and NTSC video works with the color video standard, 29,97 frames
per second.
Nevertheless, 25 or 30 images are not sufficient to make the flickering in luminous
zones of the screen completely disappear. Thanks to the techniques of interlaced video,
the refreshment frequency of the screen has been augmented so as to better cheat our
visual system.
The digital Image
The quality of digitalization depends on two factors, pixels and bits.

Pixel & Bit
Pixels are used in order to quantify the number of points of a digital image; the pixel is
actually the smallest unit. Bits indicate the number of possible nuances contained in an
image. A 1 bit digitalized document for example has only plain B&W and no gray
tones, an 8 bit document is monochromatic B&W but with gray nuances, and at 24 bits
RGB (or 8 bits per pixel and per color) we have a color picture.

Definition or Resolution
Resolution is expressed in dpi, dots per inch (or ppp, points or pixels per ‘pouce’). A 4
x 5 in. image (10.16 cm x 12. 7 cm) with a 150 dpi resolution for example measures 600
x 750 pixels (4 x 150 and 5 x 150), and consequently ‘weighs’ a total of 450,000 pixels.
The more pixels an image of a given size comprises, the higher is it’s definition, and the
more memory it will take.

The DV Format
Digital video is a series of lines, with a rectangular pixel matrix. For color video 8 bits
are needed for each RGB color, that means 24 bits per pixel, corresponding to about
16.8 million colors. DV is the first digital video recording format designed for a large
public. This format records luminance and color information separately, such
guaranteeing a larger bandwidth for the signals and suppressing all risk of interference
between them. The DV signal has a sample rate of 4:1:1 in 525/60 systems (NTSC), and
4 :2 :0 in 626/50 systems (PAL). For 4:1:1, the chrominance resolution is reduced by
half horizontally, for 4 :2 :0 vertically. 5:1 Compression (MJPEG algorithm) with a rate
of 5:1 is sufficiently moderate for not producing any visible artefact in the digitalized
image. The IEEE1394 interface (Apple’s Firewire) allows you to not only connect DV
devices among themselves, but also connect a camera directly to the computer.
Montage imports the following standards:
• DV PAL (Europe): 720 x 576 pixels, 25 frames/s.
• DV NTSC (USA): 720 x 480 pixels, 30 frames/s.

MJPEG Compression
Motion JPEG compression applies the JPEG compression algorithm to a video
sequence, a series of images. MJPEG allows a bit-rate of 8 to 10 Mbps(mega bytes per
second), and, as each image is encoded separately, you can have random access.

Audio
Basics
The characteristics of any wave, and therefore any sound, can be roughly described by
using two simple variables, frequency and amplitude.

Frequency
It is a measure of how frequently a wave cycle repeats, which is calculated in cyclesper-
second, or Hertz (Hz) for short. Frequency is related to pitch, our perception of
whether a sound is a low rumble or a high squeal. The frequency range of human
hearing is theoretically from 20 Hz at the lowest end to 20,000 Hz at the high end (your
actual mileage may vary). You'll often see high-frequency numbers written as kiloHertz
(kHz), which is metric-speak for ‘thousand Hertz’.

Amplitude
The second variable that describes a wave is amplitude, which is a measure of the
wave's energy level. Amplitude relates to our perception of volume or loudness. Big
waves with lots of energy are high-amplitude, and sound loud. Small waves with little
energy are low-amplitude, and sound soft or quiet. Amplitude is measured in decibels
(dB).

Decibels are talked about in a couple of different ways. When measuring sound
pressure levels, zero dB is defined as the effective bottom limit of human hearing, the
point of silence. 120 dB is the effective top limit or human hearing, the veritable
threshold of pain, exemplified by the sound of a jet aircraft (or a Who concert). Another
common use of decibels is the full scale measurement. For example, the dynamic range
of a compact disc is 96 dB, with 0 dB representing maximum loudness and -96 dB
representing silence. We'll dispense with the dBSPL measurement; assume that we're
talking about dBFS (Full Scale).

Pitch
As we have seen before, pitch is the term for how high or low sound is perceived by the
human ear. It is determined by a sound’s frequency. Middle C on the piano, for
example, vibrates at 261 cycles per second and its frequency is measured in Hertz (Hz).
The higher the frequency, the higher the pitch. But most sounds are a mixture of waves
at various frequencies, and musical tones always contain many pitches, known as
harmonics.
Here is a harmonics series (N, 2N, 3N, 4N according to Fourrier’s Theorem):
• 200 Hz: fundamental
• 400 Hz: second harmonics
• 600 Hz: third harmonics
• 800 Hz: fourth harmonics, etc.
The various frequencies that comprise a sound can be amplified or reduced with
equalization to change the sound’s overall tone and character.

Timbre
This notion is difficult to quantify. Timbre is defined as the tone, color, or texture of a
sound. It enables the brain to distinguish one type of instrument from another.

Effects
Sound waves reflect and disperse off various surfaces in our environment such as the
walls of a concert hall. We rarely ever hear the pure direct vibration of a sound wave
before it is masked or altered by the coloration of thousands of small reflections.
Sounds are coloured by the material and substances they travel through. Changing the
environment creates changes in tone quality, equalization and timbre of a sound. By
using audio effects, you will create these changes yourself.

Equalization, Gates & Dynamics Processing
Equalization
EQ for short is best known from the bass and treble knobs found on any home stereo. In
the most basic scenario, the range of frequencies across the audible spectrum is divided
into two bands by a filter; one band contains the low end (bass) and the other contains
the upper range (treble). You'd use the bass and treble controls to boost or cut the
volume energy of the signal within that band.

Graphic EQ
A more complicated type of equalizer you may have seen is the graphic EQ. The typical
graphic EQ filters the frequency spectrum into many bands, perhaps ten or twenty, so
you can make more precise adjustments to the sound by boosting or cutting the volume
level of narrow frequency ranges.

Gates & Dynamics Processing
A gate is a common audio circuit which lets you turn on, or off, the flow of a signal.
The gate continuously measures the signal which is being fed to it. If the input signal is
at a low amplitude (quiet) then the gate stays shut, allowing no signal to pass. If the
amplitude of the input signal rises above an arbitrary line (i.e., is ‘loud enough’) then
the gate opens and passes the signal to its output. This arbitrary ‘loud enough’ line,
which triggers the gate's opening and closing, is known as the threshold.

Downward Compression
A simple use for a gate is the process known as limiting. A limiter measures an input
signal; when the amplitude is below the threshold, the signal passes untouched. As the
input amplitude rises above the threshold, attenuation (cut) is applied to the signal, so as
to reduce unwanted peaks in the audio material. This is also commonly known as
downward compression.

Upward Compression
This is a process similar to limiting, except in this case gain (volume boost) is applied to
signals which fall below the threshold. This increases the volume level of soft passages;
signal that exceeds the threshold is passed unamplified.

The Compressor/Limiter
A common studio tool is the compressor/limiter, which is typically a hardware device
that combines the two functions described above. Imagine that you're watching a
volume meter and you have your hand on a volume knob; when the signal is low you
crank it up, when the signal is too hot, you turn it down. Thus, soft program material is
boosted in volume, loud program material is dropped in volume, and the dynamic range
(the difference between softest and loudest) of the signal is reduced.
Compressor/limiters are very useful for smoothing out uneven volume levels in
recordings.

The Expander
An expander is essentially the opposite of a compressor/limiter; it expands the dynamic
range by exaggerating the differences between soft and loud passages. Expanders
attenuate (cut) the volume of low-amplitude signals and/or add gain to (boost) the
volume of high-amplitude signals. The process of attenuating low-amplitude signals is
called downward expansion; the corresponding process of adding more gain to signal
peaks is called upward expansion. Downward expansion is helpful for noise reduction.

Digital Audio, Sample Rate and Resolution
From Analog to Digital
As you know, a sound wave is a series of periodical vibrations. A microphone, for
example, ‘translates’, or converts, these acoustic waves into electrical ones. At this
analog state, every new conversion will degrade the sound a little more. Even the
smallest amplitude variation provokes a distortion of the signal; every copy brings
along a flattening, a loss of dynamics, more background noise, etc.
With digital sound on the contrary, making a copy equals copying a list of numbers, a
trifle for the computer. The most current format for the digital representation of an
audio signal is PCM (Pulse Code Modulation); sound waves are translated into a series
of numbers.
When we use a mic to convert sound into electrical signals, the latter is then translated
into a numeric value by an ADC (A/D converter, Analog to Digital Converter). And, as
it is impossible in the digitizing process to record the infinite number of data that
characterize a sound wave, samples are selected at regular intervals, like ‘snapshots’ of
sound, with the sample rate corresponding to the number of samples per second. The
digital signal is therefore discontinuous. It is neither definable at every moment, nor for
every amplitude; the computer will have to reconstruct the wave form by stringing the
samples back together, more precisely, by calculating the most likely curve between
two samples.

Sample Rate
Sample rate has a direct bearing on two things, audio quality and file size. So, when
sound is being converted into digital information, the number of samples has to be
considered. And that’s where Nyquist & Shanon’s Theorem comes in: Sample rate must
be equal or superior to twice the maximum frequency of a given signal. Why? Sample
rate defines an audio file’s upper frequency limit. As we have seen, the human ear
perceives sounds up to about 20,000 Hz. This means that the sample rate should be at
least 40,000 Hz. Luckily, many applications can handle relatively low sample rates. The
human voice, for example, contains frequencies around 10 kHz; it theoretically needs a
sample rate of 20 kHz. Nevertheless, at 4 kHz, that means a sample rate of 8 kHz, the
human voice is still comprehensible and this is what the telephone uses, for long
distance transmissions. But sometimes, big surprise! So you better always make some
tests and systematically listen to the results. When using a low sample rate, too low with
regard to the frequencies of the original audio signal, you get aliasing, a special sort of
background noise / distortion.

Some standard sample rates:
• 32 kHz: digital FM radio (bandwidth limit 15 kHz)
• 44.1 kHz: professional audio and audio CD
• 48 kHz: recording standard for MiniDisc and some DATs, as well as some
professional digital multitrack recorders
• 96 kHz, and up to 192 kHz (2 x 96): DVD

Bit-rate Resolution
Bit-rate resolution is another key factor for defining digital audio quality. It’s the
number of values used to digitally represent data and determines how precisely a
sound’s dynamic range is represented. To understand this notion, a little detour on the
binary system’s wild side… The binary system is based on two values only, 0 and 1.
Binary coding produces a digital signal composed of a series of numbers called bits
(short for binary digits), organized in a very specific way. An 8-bit series is called a
byte; it has 28 (or 256) possible combinations between 0 and 255 (from ‘00000000’ to
‘11111111’). 16 bits have 216 (65,536) combinations, and 24 bits have 224 (16,777,216)
combinations, 256 times more than 16 bits! That’s why resolution is essential for sound
quality. Remember, an audio CD has 16 bits. Practically speaking, 16-bit-files have a
better signal-to-noise ratio than 8-bit-files, which means they have much less audible
noise. But, the lower the sample rate, the lower the resolution, the smaller the audio file
in terms of memory or binary volume. And this is where the dilemma begins… the
choice of sample rate and bit-rate resolution will drastically define sound quality.
Keep in mind:
• Sample rate, expressed in kHz, corresponds to the number of samples per
second. It has to be equal or superior to twice the signal’s maximum
frequency.
• Digital resolution, expressed in bits, is the number of values used to
represent digital data.
• The quality of digital audio is defined by both sample rate and bit-rate
resolution.

Digital Audio Compression
A codec, coding and decoding, corresponds to a whole set of compression and
decompression algorithms. There is a surprising number of codecs; it would be difficult
to make their list complete. The compression bitrate corresponds to the number of bits
that one second of data occupies in the compressed file. You can also talk about
compression ratio or compression rate, and express it like this: 10 : 1, 12:12, etc. Digital
compression is close to the methods used for A/D converting.
There is two types of digital compression:
• Destructive, or lossy compression, compression with loss of data, that eliminates
bits, sometimes without loosing quality
• Undestructive, or lossless compression, with no data loss, corresponding to a set of
algorithms which preserve the original data by way of a compression /
decompression process.
Destructive compression is based on the fact that humans almost never hear frequencies
above 20 kHz; that’s why it’s also called perceptual encoding. It also takes advantage of
the fact that certain frequencies are masked by others. For comparison, the JPEG format
used for images is based on destructive compression.
Compression technologies can be standard or proprietary. The Moving Picture Expert
Group works under the co-direction of both the ISO (International Standards
Organization) and the IEC (International Electro-Technical Commission), in order to
establish video and audio compression standards. The MPEG format is a type of audio
compression based on the perceptual encoding techniques mentioned above. In the
audio field, the most popular format (and one of the most powerful within the MPEG
family) is MPEG level 3, MP3 for short, developed in 1987 by the Fraunhofer Institut.
Level 3 allows a reduction of down to 1/12th the size of the original signal without much
sacrificing quality. But careful, once again everything depends on the original signal.
An example: MP3 is excellent for electronic music, but much less so for jazz, classical,
and other acoustic music, the latter generating many harmonics which, as you know,
determine timbre and tone color of an instrument. MP3 very badly digests accumulated
harmonics series and might transform them into some sort of ‘mash’ in songs with
many acoustic instruments.



All information supplied is correct to the best of our knowledge. 
Carolina Sound Services is not responsible for errors in the data supplied.