mirror of
https://github.com/DrBeef/QuestZDoom.git
synced 2025-03-06 17:32:00 +00:00
3258 lines
175 KiB
Text
3258 lines
175 KiB
Text
SoX(1) Sound eXchange SoX(1)
|
||
|
||
|
||
|
||
NAME
|
||
SoX - Sound eXchange, the Swiss Army knife of audio manipulation
|
||
|
||
SYNOPSIS
|
||
sox [global-options] [format-options] infile1
|
||
[[format-options] infile2] ... [format-options] outfile
|
||
[effect [effect-options]] ...
|
||
|
||
play [global-options] [format-options] infile1
|
||
[[format-options] infile2] ... [format-options]
|
||
[effect [effect-options]] ...
|
||
|
||
rec [global-options] [format-options] outfile
|
||
[effect [effect-options]] ...
|
||
|
||
DESCRIPTION
|
||
Introduction
|
||
SoX reads and writes audio files in most popular formats and can
|
||
optionally apply effects to them. It can combine multiple input
|
||
sources, synthesise audio, and, on many systems, act as a general pur‐
|
||
pose audio player or a multi-track audio recorder. It also has limited
|
||
ability to split the input into multiple output files.
|
||
|
||
All SoX functionality is available using just the sox command. To sim‐
|
||
plify playing and recording audio, if SoX is invoked as play, the out‐
|
||
put file is automatically set to be the default sound device, and if
|
||
invoked as rec, the default sound device is used as an input source.
|
||
Additionally, the soxi(1) command provides a convenient way to just
|
||
query audio file header information.
|
||
|
||
The heart of SoX is a library called libSoX. Those interested in
|
||
extending SoX or using it in other programs should refer to the libSoX
|
||
manual page: libsox(3).
|
||
|
||
SoX is a command-line audio processing tool, particularly suited to
|
||
making quick, simple edits and to batch processing. If you need an
|
||
interactive, graphical audio editor, use audacity(1).
|
||
|
||
* * *
|
||
|
||
The overall SoX processing chain can be summarised as follows:
|
||
|
||
Input(s) → Combiner → Effects → Output(s)
|
||
|
||
Note however, that on the SoX command line, the positions of the Out‐
|
||
put(s) and the Effects are swapped w.r.t. the logical flow just shown.
|
||
Note also that whilst options pertaining to files are placed before
|
||
their respective file name, the opposite is true for effects. To show
|
||
how this works in practice, here is a selection of examples of how SoX
|
||
might be used. The simple
|
||
sox recital.au recital.wav
|
||
translates an audio file in Sun AU format to a Microsoft WAV file,
|
||
whilst
|
||
sox recital.au -b 16 recital.wav channels 1 rate 16k fade 3 norm
|
||
performs the same format translation, but also applies four effects
|
||
(down-mix to one channel, sample rate change, fade-in, nomalize), and
|
||
stores the result at a bit-depth of 16.
|
||
sox -r 16k -e signed -b 8 -c 1 voice-memo.raw voice-memo.wav
|
||
converts `raw' (a.k.a. `headerless') audio to a self-describing file
|
||
format,
|
||
sox slow.aiff fixed.aiff speed 1.027
|
||
adjusts audio speed,
|
||
sox short.wav long.wav longer.wav
|
||
concatenates two audio files, and
|
||
sox -m music.mp3 voice.wav mixed.flac
|
||
mixes together two audio files.
|
||
play "The Moonbeams/Greatest/*.ogg" bass +3
|
||
plays a collection of audio files whilst applying a bass boosting
|
||
effect,
|
||
play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1
|
||
plays a synthesised `A minor seventh' chord with a pipe-organ sound,
|
||
rec -c 2 radio.aiff trim 0 30:00
|
||
records half an hour of stereo audio, and
|
||
play -q take1.aiff & rec -M take1.aiff take1-dub.aiff
|
||
(with POSIX shell and where supported by hardware) records a new track
|
||
in a multi-track recording. Finally,
|
||
rec -r 44100 -b 16 -e signed-integer -p \
|
||
silence 1 0.50 0.1% 1 10:00 0.1% | \
|
||
sox -p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \
|
||
newfile : restart
|
||
records a stream of audio such as LP/cassette and splits in to multiple
|
||
audio files at points with 2 seconds of silence. Also, it does not
|
||
start recording until it detects audio is playing and stops after it
|
||
sees 10 minutes of silence.
|
||
|
||
N.B. The above is just an overview of SoX's capabilities; detailed
|
||
explanations of how to use all SoX parameters, file formats, and
|
||
effects can be found below in this manual, in soxformat(7), and in
|
||
soxi(1).
|
||
|
||
File Format Types
|
||
SoX can work with `self-describing' and `raw' audio files. `self-
|
||
describing' formats (e.g. WAV, FLAC, MP3) have a header that completely
|
||
describes the signal and encoding attributes of the audio data that
|
||
follows. `raw' or `headerless' formats do not contain this information,
|
||
so the audio characteristics of these must be described on the SoX com‐
|
||
mand line or inferred from those of the input file.
|
||
|
||
The following four characteristics are used to describe the format of
|
||
audio data such that it can be processed with SoX:
|
||
|
||
sample rate
|
||
The sample rate in samples per second (`Hertz' or `Hz'). Digi‐
|
||
tal telephony traditionally uses a sample rate of 8000 Hz
|
||
(8 kHz), though these days, 16 and even 32 kHz are becoming more
|
||
common. Audio Compact Discs use 44100 Hz (44.1 kHz). Digital
|
||
Audio Tape and many computer systems use 48 kHz. Professional
|
||
audio systems often use 96 kHz.
|
||
|
||
sample size
|
||
The number of bits used to store each sample. Today, 16-bit is
|
||
commonly used. 8-bit was popular in the early days of computer
|
||
audio. 24-bit is used in the professional audio arena. Other
|
||
sizes are also used.
|
||
|
||
data encoding
|
||
The way in which each audio sample is represented (or
|
||
`encoded'). Some encodings have variants with different byte-
|
||
orderings or bit-orderings. Some compress the audio data so
|
||
that the stored audio data takes up less space (i.e. disk space
|
||
or transmission bandwidth) than the other format parameters and
|
||
the number of samples would imply. Commonly-used encoding types
|
||
include floating-point, μ-law, ADPCM, signed-integer PCM, MP3,
|
||
and FLAC.
|
||
|
||
channels
|
||
The number of audio channels contained in the file. One
|
||
(`mono') and two (`stereo') are widely used. `Surround sound'
|
||
audio typically contains six or more channels.
|
||
|
||
The term `bit-rate' is a measure of the amount of storage occupied by
|
||
an encoded audio signal over a unit of time. It can depend on all of
|
||
the above and is typically denoted as a number of kilo-bits per second
|
||
(kbps). An A-law telephony signal has a bit-rate of 64 kbps.
|
||
MP3-encoded stereo music typically has a bit-rate of 128-196 kbps.
|
||
FLAC-encoded stereo music typically has a bit-rate of 550-760 kbps.
|
||
|
||
Most self-describing formats also allow textual `comments' to be embed‐
|
||
ded in the file that can be used to describe the audio in some way,
|
||
e.g. for music, the title, the author, etc.
|
||
|
||
One important use of audio file comments is to convey `Replay Gain'
|
||
information. SoX supports applying Replay Gain information (for cer‐
|
||
tain input file formats only; currently, at least FLAC and Ogg Vorbis),
|
||
but not generating it. Note that by default, SoX copies input file
|
||
comments to output files that support comments, so output files may
|
||
contain Replay Gain information if some was present in the input file.
|
||
In this case, if anything other than a simple format conversion was
|
||
performed then the output file Replay Gain information is likely to be
|
||
incorrect and so should be recalculated using a tool that supports this
|
||
(not SoX).
|
||
|
||
The soxi(1) command can be used to display information from audio file
|
||
headers.
|
||
|
||
Determining & Setting The File Format
|
||
There are several mechanisms available for SoX to use to determine or
|
||
set the format characteristics of an audio file. Depending on the cir‐
|
||
cumstances, individual characteristics may be determined or set using
|
||
different mechanisms.
|
||
|
||
To determine the format of an input file, SoX will use, in order of
|
||
precedence and as given or available:
|
||
|
||
1. Command-line format options.
|
||
|
||
2. The contents of the file header.
|
||
|
||
3. The filename extension.
|
||
|
||
To set the output file format, SoX will use, in order of precedence and
|
||
as given or available:
|
||
|
||
1. Command-line format options.
|
||
|
||
2. The filename extension.
|
||
|
||
3. The input file format characteristics, or the closest that is sup‐
|
||
ported by the output file type.
|
||
|
||
For all files, SoX will exit with an error if the file type cannot be
|
||
determined. Command-line format options may need to be added or changed
|
||
to resolve the problem.
|
||
|
||
Playing & Recording Audio
|
||
The play and rec commands are provided so that basic playing and
|
||
recording is as simple as
|
||
play existing-file.wav
|
||
and
|
||
rec new-file.wav
|
||
These two commands are functionally equivalent to
|
||
sox existing-file.wav -d
|
||
and
|
||
sox -d new-file.wav
|
||
Of course, further options and effects (as described below) can be
|
||
added to the commands in either form.
|
||
|
||
* * *
|
||
|
||
Some systems provide more than one type of (SoX-compatible) audio
|
||
driver, e.g. ALSA & OSS, or SUNAU & AO. Systems can also have more
|
||
than one audio device (a.k.a. `sound card'). If more than one audio
|
||
driver has been built-in to SoX, and the default selected by SoX when
|
||
recording or playing is not the one that is wanted, then the AUDIO‐
|
||
DRIVER environment variable can be used to override the default. For
|
||
example (on many systems):
|
||
set AUDIODRIVER=oss
|
||
play ...
|
||
The AUDIODEV environment variable can be used to override the default
|
||
audio device, e.g.
|
||
set AUDIODEV=/dev/dsp2
|
||
play ...
|
||
sox ... -t oss
|
||
or
|
||
set AUDIODEV=hw:soundwave,1,2
|
||
play ...
|
||
sox ... -t alsa
|
||
Note that the way of setting environment variables varies from system
|
||
to system - for some specific examples, see `SOX_OPTS' below.
|
||
|
||
When playing a file with a sample rate that is not supported by the
|
||
audio output device, SoX will automatically invoke the rate effect to
|
||
perform the necessary sample rate conversion. For compatibility with
|
||
old hardware, the default rate quality level is set to `low'. This can
|
||
be changed by explicitly specifying the rate effect with a different
|
||
quality level, e.g.
|
||
play ... rate -m
|
||
or by using the --play-rate-arg option (see below).
|
||
|
||
* * *
|
||
|
||
On some systems, SoX allows audio playback volume to be adjusted whilst
|
||
using play. Where supported, this is achieved by tapping the `v' & `V'
|
||
keys during playback.
|
||
|
||
To help with setting a suitable recording level, SoX includes a peak-
|
||
level meter which can be invoked (before making the actual recording)
|
||
as follows:
|
||
rec -n
|
||
The recording level should be adjusted (using the system-provided mixer
|
||
program, not SoX) so that the meter is at most occasionally full scale,
|
||
and never `in the red' (an exclamation mark is shown). See also -S
|
||
below.
|
||
|
||
Accuracy
|
||
Many file formats that compress audio discard some of the audio signal
|
||
information whilst doing so. Converting to such a format and then con‐
|
||
verting back again will not produce an exact copy of the original
|
||
audio. This is the case for many formats used in telephony (e.g. A-
|
||
law, GSM) where low signal bandwidth is more important than high audio
|
||
fidelity, and for many formats used in portable music players (e.g.
|
||
MP3, Vorbis) where adequate fidelity can be retained even with the
|
||
large compression ratios that are needed to make portable players prac‐
|
||
tical.
|
||
|
||
Formats that discard audio signal information are called `lossy'. For‐
|
||
mats that do not are called `lossless'. The term `quality' is used as
|
||
a measure of how closely the original audio signal can be reproduced
|
||
when using a lossy format.
|
||
|
||
Audio file conversion with SoX is lossless when it can be, i.e. when
|
||
not using lossy compression, when not reducing the sampling rate or
|
||
number of channels, and when the number of bits used in the destination
|
||
format is not less than in the source format. E.g. converting from an
|
||
8-bit PCM format to a 16-bit PCM format is lossless but converting from
|
||
an 8-bit PCM format to (8-bit) A-law isn't.
|
||
|
||
N.B. SoX converts all audio files to an internal uncompressed format
|
||
before performing any audio processing. This means that manipulating a
|
||
file that is stored in a lossy format can cause further losses in audio
|
||
fidelity. E.g. with
|
||
sox long.mp3 short.mp3 trim 10
|
||
SoX first decompresses the input MP3 file, then applies the trim
|
||
effect, and finally creates the output MP3 file by re-compressing the
|
||
audio - with a possible reduction in fidelity above that which occurred
|
||
when the input file was created. Hence, if what is ultimately desired
|
||
is lossily compressed audio, it is highly recommended to perform all
|
||
audio processing using lossless file formats and then convert to the
|
||
lossy format only at the final stage.
|
||
|
||
N.B. Applying multiple effects with a single SoX invocation will, in
|
||
general, produce more accurate results than those produced using multi‐
|
||
ple SoX invocations.
|
||
|
||
Dithering
|
||
Dithering is a technique used to maximise the dynamic range of audio
|
||
stored at a particular bit-depth. Any distortion introduced by quanti‐
|
||
sation is decorrelated by adding a small amount of white noise to the
|
||
signal. In most cases, SoX can determine whether the selected process‐
|
||
ing requires dither and will add it during output formatting if appro‐
|
||
priate.
|
||
|
||
Specifically, by default, SoX automatically adds TPDF dither when the
|
||
output bit-depth is less than 24 and any of the following are true:
|
||
|
||
· bit-depth reduction has been specified explicitly using a command-
|
||
line option
|
||
|
||
· the output file format supports only bit-depths lower than that of
|
||
the input file format
|
||
|
||
· an effect has increased effective bit-depth within the internal
|
||
processing chain
|
||
|
||
For example, adjusting volume with vol 0.25 requires two additional
|
||
bits in which to losslessly store its results (since 0.25 decimal
|
||
equals 0.01 binary). So if the input file bit-depth is 16, then SoX's
|
||
internal representation will utilise 18 bits after processing this vol‐
|
||
ume change. In order to store the output at the same depth as the
|
||
input, dithering is used to remove the additional bits.
|
||
|
||
Use the -V option to see what processing SoX has automatically added.
|
||
The -D option may be given to override automatic dithering. To invoke
|
||
dithering manually (e.g. to select a noise-shaping curve), see the
|
||
dither effect.
|
||
|
||
Clipping
|
||
Clipping is distortion that occurs when an audio signal level (or `vol‐
|
||
ume') exceeds the range of the chosen representation. In most cases,
|
||
clipping is undesirable and so should be corrected by adjusting the
|
||
level prior to the point (in the processing chain) at which it occurs.
|
||
|
||
In SoX, clipping could occur, as you might expect, when using the vol
|
||
or gain effects to increase the audio volume. Clipping could also occur
|
||
with many other effects, when converting one format to another, and
|
||
even when simply playing the audio.
|
||
|
||
Playing an audio file often involves resampling, and processing by ana‐
|
||
logue components can introduce a small DC offset and/or amplification,
|
||
all of which can produce distortion if the audio signal level was ini‐
|
||
tially too close to the clipping point.
|
||
|
||
For these reasons, it is usual to make sure that an audio file's signal
|
||
level has some `headroom', i.e. it does not exceed a particular level
|
||
below the maximum possible level for the given representation. Some
|
||
standards bodies recommend as much as 9dB headroom, but in most cases,
|
||
3dB (≈ 70% linear) is enough. Note that this wisdom seems to have been
|
||
lost in modern music production; in fact, many CDs, MP3s, etc. are now
|
||
mastered at levels above 0dBFS i.e. the audio is clipped as delivered.
|
||
|
||
SoX's stat and stats effects can assist in determining the signal level
|
||
in an audio file. The gain or vol effect can be used to prevent clip‐
|
||
ping, e.g.
|
||
sox dull.wav bright.wav gain -6 treble +6
|
||
guarantees that the treble boost will not clip.
|
||
|
||
If clipping occurs at any point during processing, SoX will display a
|
||
warning message to that effect.
|
||
|
||
See also -G and the gain and norm effects.
|
||
|
||
Input File Combining
|
||
SoX's input combiner can be configured (see OPTIONS below) to combine
|
||
multiple files using any of the following methods: `concatenate',
|
||
`sequence', `mix', `mix-power', `merge', or `multiply'. The default
|
||
method is `sequence' for play, and `concatenate' for rec and sox.
|
||
|
||
For all methods other than `sequence', multiple input files must have
|
||
the same sampling rate. If necessary, separate SoX invocations can be
|
||
used to make sampling rate adjustments prior to combining.
|
||
|
||
If the `concatenate' combining method is selected (usually, this will
|
||
be by default) then the input files must also have the same number of
|
||
channels. The audio from each input will be concatenated in the order
|
||
given to form the output file.
|
||
|
||
The `sequence' combining method is selected automatically for play. It
|
||
is similar to `concatenate' in that the audio from each input file is
|
||
sent serially to the output file. However, here the output file may be
|
||
closed and reopened at the corresponding transition between input
|
||
files. This may be just what is needed when sending different types of
|
||
audio to an output device, but is not generally useful when the output
|
||
is a normal file.
|
||
|
||
If either the `mix' or `mix-power' combining method is selected then
|
||
two or more input files must be given and will be mixed together to
|
||
form the output file. The number of channels in each input file need
|
||
not be the same, but SoX will issue a warning if they are not and some
|
||
channels in the output file will not contain audio from every input
|
||
file. A mixed audio file cannot be un-mixed without reference to the
|
||
original input files.
|
||
|
||
If the `merge' combining method is selected then two or more input
|
||
files must be given and will be merged together to form the output
|
||
file. The number of channels in each input file need not be the same.
|
||
A merged audio file comprises all of the channels from all of the input
|
||
files. Un-merging is possible using multiple invocations of SoX with
|
||
the remix effect. For example, two mono files could be merged to form
|
||
one stereo file. The first and second mono files would become the left
|
||
and right channels of the stereo file.
|
||
|
||
The `multiply' combining method multiplies the sample values of corre‐
|
||
sponding channels (treated as numbers in the interval -1 to +1). If
|
||
the number of channels in the input files is not the same, the missing
|
||
channels are considered to contain all zero.
|
||
|
||
When combining input files, SoX applies any specified effects (includ‐
|
||
ing, for example, the vol volume adjustment effect) after the audio has
|
||
been combined. However, it is often useful to be able to set the volume
|
||
of (i.e. `balance') the inputs individually, before combining takes
|
||
place.
|
||
|
||
For all combining methods, input file volume adjustments can be made
|
||
manually using the -v option (below) which can be given for one or more
|
||
input files. If it is given for only some of the input files then the
|
||
others receive no volume adjustment. In some circumstances, automatic
|
||
volume adjustments may be applied (see below).
|
||
|
||
The -V option (below) can be used to show the input file volume adjust‐
|
||
ments that have been selected (either manually or automatically).
|
||
|
||
There are some special considerations that need to made when mixing
|
||
input files:
|
||
|
||
Unlike the other methods, `mix' combining has the potential to cause
|
||
clipping in the combiner if no balancing is performed. In this case,
|
||
if manual volume adjustments are not given, SoX will try to ensure that
|
||
clipping does not occur by automatically adjusting the volume (ampli‐
|
||
tude) of each input signal by a factor of ¹/n, where n is the number of
|
||
input files. If this results in audio that is too quiet or otherwise
|
||
unbalanced then the input file volumes can be set manually as described
|
||
above. Using the norm effect on the mix is another alternative.
|
||
|
||
If mixed audio seems loud enough at some points but too quiet in others
|
||
then dynamic range compression should be applied to correct this - see
|
||
the compand effect.
|
||
|
||
With the `mix-power' combine method, the mixed volume is approximately
|
||
equal to that of one of the input signals. This is achieved by balanc‐
|
||
ing using a factor of ¹/√n instead of ¹/n. Note that this balancing
|
||
factor does not guarantee that clipping will not occur, but the number
|
||
of clips will usually be low and the resultant distortion is generally
|
||
imperceptible.
|
||
|
||
Output Files
|
||
SoX's default behaviour is to take one or more input files and write
|
||
them to a single output file.
|
||
|
||
This behaviour can be changed by specifying the pseudo-effect `newfile'
|
||
within the effects list. SoX will then enter multiple output mode.
|
||
|
||
In multiple output mode, a new file is created when the effects prior
|
||
to the `newfile' indicate they are done. The effects chain listed
|
||
after `newfile' is then started up and its output is saved to the new
|
||
file.
|
||
|
||
In multiple output mode, a unique number will automatically be appended
|
||
to the end of all filenames. If the filename has an extension then the
|
||
number is inserted before the extension. This behaviour can be custom‐
|
||
ized by placing a %n anywhere in the filename where the number should
|
||
be substituted. An optional number can be placed after the % to indi‐
|
||
cate a minimum fixed width for the number.
|
||
|
||
Multiple output mode is not very useful unless an effect that will stop
|
||
the effects chain early is specified before the `newfile'. If end of
|
||
file is reached before the effects chain stops itself then no new file
|
||
will be created as it would be empty.
|
||
|
||
The following is an example of splitting the first 60 seconds of an
|
||
input file into two 30 second files and ignoring the rest.
|
||
sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
|
||
|
||
Stopping SoX
|
||
Usually SoX will complete its processing and exit automatically once it
|
||
has read all available audio data from the input files.
|
||
|
||
If desired, it can be terminated earlier by sending an interrupt signal
|
||
to the process (usually by pressing the keyboard interrupt key which is
|
||
normally Ctrl-C). This is a natural requirement in some circumstances,
|
||
e.g. when using SoX to make a recording. Note that when using SoX to
|
||
play multiple files, Ctrl-C behaves slightly differently: pressing it
|
||
once causes SoX to skip to the next file; pressing it twice in quick
|
||
succession causes SoX to exit.
|
||
|
||
Another option to stop processing early is to use an effect that has a
|
||
time period or sample count to determine the stopping point. The trim
|
||
effect is an example of this. Once all effects chains have stopped
|
||
then SoX will also stop.
|
||
|
||
FILENAMES
|
||
Filenames can be simple file names, absolute or relative path names, or
|
||
URLs (input files only). Note that URL support requires that wget(1)
|
||
is available.
|
||
|
||
Note: Giving SoX an input or output filename that is the same as a SoX
|
||
effect-name will not work since SoX will treat it as an effect
|
||
specification. The only work-around to this is to avoid such
|
||
filenames. This is generally not difficult since most audio filenames
|
||
have a filename `extension', whilst effect-names do not.
|
||
|
||
Special Filenames
|
||
The following special filenames may be used in certain circumstances in
|
||
place of a normal filename on the command line:
|
||
|
||
- SoX can be used in simple pipeline operations by using the
|
||
special filename `-' which, if used as an input filename, will
|
||
cause SoX will read audio data from `standard input' (stdin),
|
||
and which, if used as the output filename, will cause SoX will
|
||
send audio data to `standard output' (stdout). Note that when
|
||
using this option for the output file, and sometimes when using
|
||
it for an input file, the file-type (see -t below) must also be
|
||
given.
|
||
|
||
"|program [options] ..."
|
||
This can be used in place of an input filename to specify the
|
||
the given program's standard output (stdout) be used as an input
|
||
file. Unlike - (above), this can be used for several inputs to
|
||
one SoX command. For example, if `genw' generates mono WAV
|
||
formatted signals to its standard output, then the following
|
||
command makes a stereo file from two generated signals:
|
||
sox -M "|genw --imd -" "|genw --thd -" out.wav
|
||
For headerless (raw) audio, -t (and perhaps other format
|
||
options) will need to be given, preceding the input command.
|
||
|
||
"wildcard-filename"
|
||
Specifies that filename `globbing' (wild-card matching) should
|
||
be performed by SoX instead of by the shell. This allows a sin‐
|
||
gle set of file options to be applied to a group of files. For
|
||
example, if the current directory contains three `vox' files,
|
||
file1.vox, file2.vox, and file3.vox, then
|
||
play --rate 6k *.vox
|
||
will be expanded by the `shell' (in most environments) to
|
||
play --rate 6k file1.vox file2.vox file3.vox
|
||
which will treat only the first vox file as having a sample rate
|
||
of 6k. With
|
||
play --rate 6k "*.vox"
|
||
the given sample rate option will be applied to all three vox
|
||
files.
|
||
|
||
-p, --sox-pipe
|
||
This can be used in place of an output filename to specify that
|
||
the SoX command should be used as in input pipe to another SoX
|
||
command. For example, the command:
|
||
play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat
|
||
plays two `files' in succession, each with different effects.
|
||
|
||
-p is in fact an alias for `-t sox -'.
|
||
|
||
-d, --default-device
|
||
This can be used in place of an input or output filename to
|
||
specify that the default audio device (if one has been built
|
||
into SoX) is to be used. This is akin to invoking rec or play
|
||
(as described above).
|
||
|
||
-n, --null
|
||
This can be used in place of an input or output filename to
|
||
specify that a `null file' is to be used. Note that here, `null
|
||
file' refers to a SoX-specific mechanism and is not related to
|
||
any operating-system mechanism with a similar name.
|
||
|
||
Using a null file to input audio is equivalent to using a normal
|
||
audio file that contains an infinite amount of silence, and as
|
||
such is not generally useful unless used with an effect that
|
||
specifies a finite time length (such as trim or synth).
|
||
|
||
Using a null file to output audio amounts to discarding the
|
||
audio and is useful mainly with effects that produce information
|
||
about the audio instead of affecting it (such as noiseprof or
|
||
stat).
|
||
|
||
The sampling rate associated with a null file is by default
|
||
48 kHz, but, as with a normal file, this can be overridden if
|
||
desired using command-line format options (see below).
|
||
|
||
Supported File & Audio Device Types
|
||
See soxformat(7) for a list and description of the supported file for‐
|
||
mats and audio device drivers.
|
||
|
||
OPTIONS
|
||
Global Options
|
||
These options can be specified on the command line at any point before
|
||
the first effect name.
|
||
|
||
The SOX_OPTS environment variable can be used to provide alternative
|
||
default values for SoX's global options. For example:
|
||
SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"
|
||
Note that setting SOX_OPTS can potentially create unwanted changes in
|
||
the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
|
||
might best be used for things (such as in the given example) that
|
||
reflect the environment in which SoX is being run. Enabling options
|
||
such as --no-clobber as default might be handled better using a shell
|
||
alias since a shell alias will not affect operation in scripts etc.
|
||
|
||
One way to ensure that a script cannot be affected by SOX_OPTS is to
|
||
clear SOX_OPTS at the start of the script, but this of course loses the
|
||
benefit of SOX_OPTS carrying some system-wide default options. An
|
||
alternative approach is to explicitly invoke SoX with default option
|
||
values, e.g.
|
||
SOX_OPTS="-V --no-clobber"
|
||
...
|
||
sox -V2 --clobber $input $output ...
|
||
Note that the way to set environment variables varies from system to
|
||
system. Here are some examples:
|
||
|
||
Unix bash:
|
||
export SOX_OPTS="-V --no-clobber"
|
||
Unix csh:
|
||
setenv SOX_OPTS "-V --no-clobber"
|
||
MS-DOS/MS-Windows:
|
||
set SOX_OPTS=-V --no-clobber
|
||
MS-Windows GUI: via Control Panel : System : Advanced : Environment
|
||
Variables
|
||
|
||
Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
|
||
|
||
--buffer BYTES, --input-buffer BYTES
|
||
Set the size in bytes of the buffers used for processing audio
|
||
(default 8192). --buffer applies to input, effects, and output
|
||
processing; --input-buffer applies only to input processing (for
|
||
which it overrides --buffer if both are given).
|
||
|
||
Be aware that large values for --buffer will cause SoX to be
|
||
become slow to respond to requests to terminate or to skip the
|
||
current input file.
|
||
|
||
--clobber
|
||
Don't prompt before overwriting an existing file with the same
|
||
name as that given for the output file. This is the default be‐
|
||
haviour.
|
||
|
||
--combine concatenate|merge|mix|mix-power|multiply|sequence
|
||
Select the input file combining method; for some of these, short
|
||
options are available: -m selects `mix', -M selects `merge', and
|
||
-T selects `multiply'.
|
||
|
||
See Input File Combining above for a description of the differ‐
|
||
ent combining methods.
|
||
|
||
-D, --no-dither
|
||
Disable automatic dither - see `Dithering' above. An example of
|
||
why this might occasionally be useful is if a file has been con‐
|
||
verted from 16 to 24 bit with the intention of doing some pro‐
|
||
cessing on it, but in fact no processing is needed after all and
|
||
the original 16 bit file has been lost, then, strictly speaking,
|
||
no dither is needed if converting the file back to 16 bit. See
|
||
also the stats effect for how to determine the actual bit depth
|
||
of the audio within a file.
|
||
|
||
--effects-file FILENAME
|
||
Use FILENAME to obtain all effects and their arguments. The
|
||
file is parsed as if the values were specified on the command
|
||
line. A new line can be used in place of the special : marker
|
||
to separate effect chains. For convenience, such markers at the
|
||
end of the file are normally ignored; if you want to specify an
|
||
empty last effects chain, use an explicit : by itself on the
|
||
last line of the file. This option causes any effects specified
|
||
on the command line to be discarded.
|
||
|
||
-G, --guard
|
||
Automatically invoke the gain effect to guard against clipping.
|
||
E.g.
|
||
sox -G infile -b 16 outfile rate 44100 dither -s
|
||
is shorthand for
|
||
sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s
|
||
See also -V, --norm, and the gain effect.
|
||
|
||
-h, --help
|
||
Show version number and usage information.
|
||
|
||
--help-effect NAME
|
||
Show usage information on the specified effect. The name all
|
||
can be used to show usage on all effects.
|
||
|
||
--help-format NAME
|
||
Show information about the specified file format. The name all
|
||
can be used to show information on all formats.
|
||
|
||
--i, --info
|
||
Only if given as the first parameter to sox, behave as soxi(1).
|
||
|
||
-m|-M Equivalent to --combine mix and --combine merge, respectively.
|
||
|
||
--magic
|
||
If SoX has been built with the optional `libmagic' library then
|
||
this option can be given to enable its use in helping to detect
|
||
audio file types.
|
||
|
||
--multi-threaded | --single-threaded
|
||
By default, SoX is `single threaded'. If the --multi-threaded
|
||
option is given however then SoX will process audio channels for
|
||
most multi-channel effects in parallel on hyper-threading/multi-
|
||
core architectures. This may reduce processing time, though
|
||
sometimes it may be necessary to use this option in conjunction
|
||
with a larger buffer size than is the default to gain any bene‐
|
||
fit from multi-threaded processing (e.g. 131072; see --buffer
|
||
above).
|
||
|
||
--no-clobber
|
||
Prompt before overwriting an existing file with the same name as
|
||
that given for the output file.
|
||
|
||
N.B. Unintentionally overwriting a file is easier than you
|
||
might think, for example, if you accidentally enter
|
||
sox file1 file2 effect1 effect2 ...
|
||
when what you really meant was
|
||
play file1 file2 effect1 effect2 ...
|
||
then, without this option, file2 will be overwritten. Hence,
|
||
using this option is recommended. SOX_OPTS (above), a `shell'
|
||
alias, script, or batch file may be an appropriate way of perma‐
|
||
nently enabling it.
|
||
|
||
--norm[=dB-level]
|
||
Automatically invoke the gain effect to guard against clipping
|
||
and to normalise the audio. E.g.
|
||
sox --norm infile -b 16 outfile rate 44100 dither -s
|
||
is shorthand for
|
||
sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s
|
||
Optionally, the audio can be normalized to a given level (usu‐
|
||
ally) below 0 dBFS:
|
||
sox --norm=-3 infile outfile
|
||
|
||
See also -V, -G, and the gain effect.
|
||
|
||
--play-rate-arg ARG
|
||
Selects a quality option to be used when the `rate' effect is
|
||
automatically invoked whilst playing audio. This option is typ‐
|
||
ically set via the SOX_OPTS environment variable (see above).
|
||
|
||
--plot gnuplot|octave|off
|
||
If not set to off (the default if --plot is not given), run in a
|
||
mode that can be used, in conjunction with the gnuplot program
|
||
or the GNU Octave program, to assist with the selection and con‐
|
||
figuration of many of the transfer-function based effects. For
|
||
the first given effect that supports the selected plotting pro‐
|
||
gram, SoX will output commands to plot the effect's transfer
|
||
function, and then exit without actually processing any audio.
|
||
E.g.
|
||
sox --plot octave input-file -n highpass 1320 > highpass.plt
|
||
octave highpass.plt
|
||
|
||
-q, --no-show-progress
|
||
Run in quiet mode when SoX wouldn't otherwise do so. This is
|
||
the opposite of the -S option.
|
||
|
||
-R Run in `repeatable' mode. When this option is given, where
|
||
applicable, SoX will embed a fixed time-stamp in the output file
|
||
(e.g. AIFF) and will `seed' pseudo random number generators
|
||
(e.g. dither) with a fixed number, thus ensuring that succes‐
|
||
sive SoX invocations with the same inputs and the same parame‐
|
||
ters yield the same output.
|
||
|
||
--replay-gain track|album|off
|
||
Select whether or not to apply replay-gain adjustment to input
|
||
files. The default is off for sox and rec, album for play where
|
||
(at least) the first two input files are tagged with the same
|
||
Artist and Album names, and track for play otherwise.
|
||
|
||
-S, --show-progress
|
||
Display input file format/header information, and processing
|
||
progress as input file(s) percentage complete, elapsed time, and
|
||
remaining time (if known; shown in brackets), and the number of
|
||
samples written to the output file. Also shown is a peak-level
|
||
meter, and an indication if clipping has occurred. The peak-
|
||
level meter shows up to two channels and is calibrated for digi‐
|
||
tal audio as follows (right channel shown):
|
||
|
||
dB FSD Display dB FSD Display
|
||
-25 - -11 ====
|
||
-23 = -9 ====-
|
||
-21 =- -7 =====
|
||
-19 == -5 =====-
|
||
-17 ==- -3 ======
|
||
-15 === -1 =====!
|
||
-13 ===-
|
||
|
||
A three-second peak-held value of headroom in dBs will be shown
|
||
to the right of the meter if this is below 6dB.
|
||
|
||
This option is enabled by default when using SoX to play or
|
||
record audio.
|
||
|
||
-T Equivalent to --combine multiply.
|
||
|
||
--temp DIRECTORY
|
||
Specify that any temporary files should be created in the given
|
||
DIRECTORY. This can be useful if there are permission or free-
|
||
space problems with the default location. In this case, using
|
||
`--temp .' (to use the current directory) is often a good solu‐
|
||
tion.
|
||
|
||
--version
|
||
Show SoX's version number and exit.
|
||
|
||
-V[level]
|
||
Set verbosity. This is particularly useful for seeing how any
|
||
automatic effects have been invoked by SoX.
|
||
|
||
SoX displays messages on the console (stderr) according to the
|
||
following verbosity levels:
|
||
|
||
0 No messages are shown at all; use the exit status to
|
||
determine if an error has occurred.
|
||
|
||
1 Only error messages are shown. These are generated if
|
||
SoX cannot complete the requested commands.
|
||
|
||
2 Warning messages are also shown. These are generated if
|
||
SoX can complete the requested commands, but not exactly
|
||
according to the requested command parameters, or if
|
||
clipping occurs.
|
||
|
||
3 Descriptions of SoX's processing phases are also shown.
|
||
Useful for seeing exactly how SoX is processing your
|
||
audio.
|
||
|
||
4 and above
|
||
Messages to help with debugging SoX are also shown.
|
||
|
||
By default, the verbosity level is set to 2 (shows errors and
|
||
warnings). Each occurrence of the -V option increases the ver‐
|
||
bosity level by 1. Alternatively, the verbosity level can be
|
||
set to an absolute number by specifying it immediately after the
|
||
-V, e.g. -V0 sets it to 0.
|
||
|
||
Input File Options
|
||
These options apply only to input files and may precede only input
|
||
filenames on the command line.
|
||
|
||
--ignore-length
|
||
Override an (incorrect) audio length given in an audio file's
|
||
header. If this option is given then SoX will keep reading audio
|
||
until it reaches the end of the input file.
|
||
|
||
-v, --volume FACTOR
|
||
Intended for use when combining multiple input files, this
|
||
option adjusts the volume of the file that follows it on the
|
||
command line by a factor of FACTOR. This allows it to be `bal‐
|
||
anced' w.r.t. the other input files. This is a linear (ampli‐
|
||
tude) adjustment, so a number less than 1 decreases the volume
|
||
and a number greater than 1 increases it. If a negative number
|
||
is given then in addition to the volume adjustment, the audio
|
||
signal will be inverted.
|
||
|
||
See also the norm, vol, and gain effects, and see Input File
|
||
Balancing above.
|
||
|
||
Input & Output File Format Options
|
||
These options apply to the input or output file whose name they immedi‐
|
||
ately precede on the command line and are used mainly when working with
|
||
headerless file formats or when specifying a format for the output file
|
||
that is different to that of the input file.
|
||
|
||
-b BITS, --bits BITS
|
||
The number of bits (a.k.a. bit-depth or sometimes word-length)
|
||
in each encoded sample. Not applicable to complex encodings
|
||
such as MP3 or GSM. Not necessary with encodings that have a
|
||
fixed number of bits, e.g. A/μ-law, ADPCM.
|
||
|
||
For an input file, the most common use for this option is to
|
||
inform SoX of the number of bits per sample in a `raw' (`header‐
|
||
less') audio file. For example
|
||
sox -r 16k -e signed -b 8 input.raw output.wav
|
||
converts a particular `raw' file to a self-describing `WAV'
|
||
file.
|
||
|
||
For an output file, this option can be used (perhaps along with
|
||
-e) to set the output encoding size. By default (i.e. if this
|
||
option is not given), the output encoding size will (providing
|
||
it is supported by the output file type) be set to the input
|
||
encoding size. For example
|
||
sox input.cdda -b 24 output.wav
|
||
converts raw CD digital audio (16-bit, signed-integer) to a
|
||
24-bit (signed-integer) `WAV' file.
|
||
|
||
-c CHANNELS, --channels CHANNELS
|
||
The number of audio channels in the audio file. This can be any
|
||
number greater than zero.
|
||
|
||
For an input file, the most common use for this option is to
|
||
inform SoX of the number of channels in a `raw' (`headerless')
|
||
audio file. Occasionally, it may be useful to use this option
|
||
with a `headered' file, in order to override the (presumably
|
||
incorrect) value in the header - note that this is only sup‐
|
||
ported with certain file types. Examples:
|
||
sox -r 48k -e float -b 32 -c 2 input.raw output.wav
|
||
converts a particular `raw' file to a self-describing `WAV'
|
||
file.
|
||
play -c 1 music.wav
|
||
interprets the file data as belonging to a single channel
|
||
regardless of what is indicated in the file header. Note that
|
||
if the file does in fact have two channels, this will result in
|
||
the file playing at half speed.
|
||
|
||
For an output file, this option provides a shorthand for speci‐
|
||
fying that the channels effect should be invoked in order to
|
||
change (if necessary) the number of channels in the audio signal
|
||
to the number given. For example, the following two commands
|
||
are equivalent:
|
||
sox input.wav -c 1 output.wav bass -b 24
|
||
sox input.wav output.wav bass -b 24 channels 1
|
||
though the second form is more flexible as it allows the effects
|
||
to be ordered arbitrarily.
|
||
|
||
-e ENCODING, --encoding ENCODING
|
||
The audio encoding type. Sometimes needed with file-types that
|
||
support more than one encoding type. For example, with raw, WAV,
|
||
or AU (but not, for example, with MP3 or FLAC). The available
|
||
encoding types are as follows:
|
||
|
||
signed-integer
|
||
PCM data stored as signed (`two's complement') integers.
|
||
Commonly used with a 16 or 24 -bit encoding size. A
|
||
value of 0 represents minimum signal power.
|
||
|
||
unsigned-integer
|
||
PCM data stored as unsigned integers. Commonly used with
|
||
an 8-bit encoding size. A value of 0 represents maximum
|
||
signal power.
|
||
|
||
floating-point
|
||
PCM data stored as IEEE 753 single precision (32-bit) or
|
||
double precision (64-bit) floating-point (`real') num‐
|
||
bers. A value of 0 represents minimum signal power.
|
||
|
||
a-law International telephony standard for logarithmic encoding
|
||
to 8 bits per sample. It has a precision equivalent to
|
||
roughly 13-bit PCM and is sometimes encoded with reversed
|
||
bit-ordering (see the -X option).
|
||
|
||
u-law, mu-law
|
||
North American telephony standard for logarithmic encod‐
|
||
ing to 8 bits per sample. A.k.a. μ-law. It has a preci‐
|
||
sion equivalent to roughly 14-bit PCM and is sometimes
|
||
encoded with reversed bit-ordering (see the -X option).
|
||
|
||
oki-adpcm
|
||
OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM; it has
|
||
a precision equivalent to roughly 12-bit PCM. ADPCM is a
|
||
form of audio compression that has a good compromise
|
||
between audio quality and encoding/decoding speed.
|
||
|
||
ima-adpcm
|
||
IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision equiva‐
|
||
lent to roughly 13-bit PCM.
|
||
|
||
ms-adpcm
|
||
Microsoft 4-bit ADPCM; it has a precision equivalent to
|
||
roughly 14-bit PCM.
|
||
|
||
gsm-full-rate
|
||
GSM is currently used for the vast majority of the
|
||
world's digital wireless telephone calls. It utilises
|
||
several audio formats with different bit-rates and asso‐
|
||
ciated speech quality. SoX has support for GSM's origi‐
|
||
nal 13kbps `Full Rate' audio format. It is usually CPU-
|
||
intensive to work with GSM audio.
|
||
|
||
Encoding names can be abbreviated where this would not be
|
||
ambiguous; e.g. `unsigned-integer' can be given as `un', but not
|
||
`u' (ambiguous with `u-law').
|
||
|
||
For an input file, the most common use for this option is to
|
||
inform SoX of the encoding of a `raw' (`headerless') audio file
|
||
(see the examples in -b and -c above).
|
||
|
||
For an output file, this option can be used (perhaps along with
|
||
-b) to set the output encoding type For example
|
||
sox input.cdda -e float output1.wav
|
||
|
||
sox input.cdda -b 64 -e float output2.wav
|
||
convert raw CD digital audio (16-bit, signed-integer) to float‐
|
||
ing-point `WAV' files (single & double precision respectively).
|
||
|
||
By default (i.e. if this option is not given), the output encod‐
|
||
ing type will (providing it is supported by the output file
|
||
type) be set to the input encoding type.
|
||
|
||
--no-glob
|
||
Specifies that filename `globbing' (wild-card matching) should
|
||
not be performed by SoX on the following filename. For example,
|
||
if the current directory contains the two files `five-sec‐
|
||
onds.wav' and `five*.wav', then
|
||
play --no-glob "five*.wav"
|
||
can be used to play just the single file `five*.wav'.
|
||
|
||
-r, --rate RATE[k]
|
||
Gives the sample rate in Hz (or kHz if appended with `k') of the
|
||
file.
|
||
|
||
For an input file, the most common use for this option is to
|
||
inform SoX of the sample rate of a `raw' (`headerless') audio
|
||
file (see the examples in -b and -c above). Occasionally it may
|
||
be useful to use this option with a `headered' file, in order to
|
||
override the (presumably incorrect) value in the header - note
|
||
that this is only supported with certain file types. For exam‐
|
||
ple, if audio was recorded with a sample-rate of say 48k from a
|
||
source that played back a little, say 1.5%, too slowly, then
|
||
sox -r 48720 input.wav output.wav
|
||
effectively corrects the speed by changing only the file header
|
||
(but see also the speed effect for the more usual solution to
|
||
this problem).
|
||
|
||
For an output file, this option provides a shorthand for speci‐
|
||
fying that the rate effect should be invoked in order to change
|
||
(if necessary) the sample rate of the audio signal to the given
|
||
value. For example, the following two commands are equivalent:
|
||
sox input.wav -r 48k output.wav bass -b 24
|
||
sox input.wav output.wav bass -b 24 rate 48k
|
||
though the second form is more flexible as it allows rate
|
||
options to be given, and allows the effects to be ordered arbi‐
|
||
trarily.
|
||
|
||
-t, --type FILE-TYPE
|
||
Gives the type of the audio file. For both input and output
|
||
files, this option is commonly used to inform SoX of the type a
|
||
`headerless' audio file (e.g. raw, mp3) where the actual/desired
|
||
type cannot be determined from a given filename extension. For
|
||
example:
|
||
another-command | sox -t mp3 - output.wav
|
||
|
||
sox input.wav -t raw output.bin
|
||
It can also be used to override the type implied by an input
|
||
filename extension, but if overriding with a type that has a
|
||
header, SoX will exit with an appropriate error message if such
|
||
a header is not actually present.
|
||
|
||
See soxformat(7) for a list of supported file types.
|
||
|
||
-L, --endian little
|
||
-B, --endian big
|
||
-x, --endian swap
|
||
These options specify whether the byte-order of the audio data
|
||
is, respectively, `little endian', `big endian', or the opposite
|
||
to that of the system on which SoX is being used. Endianness
|
||
applies only to data encoded as floating-point, or as signed or
|
||
unsigned integers of 16 or more bits. It is often necessary to
|
||
specify one of these options for headerless files, and sometimes
|
||
necessary for (otherwise) self-describing files. A given
|
||
endian-setting option may be ignored for an input file whose
|
||
header contains a specific endianness identifier, or for an out‐
|
||
put file that is actually an audio device.
|
||
|
||
N.B. Unlike other format characteristics, the endianness (byte,
|
||
nibble, & bit ordering) of the input file is not automatically
|
||
used for the output file; so, for example, when the following is
|
||
run on a little-endian system:
|
||
sox -B audio.s16 trimmed.s16 trim 2
|
||
trimmed.s16 will be created as little-endian;
|
||
sox -B audio.s16 -B trimmed.s16 trim 2
|
||
must be used to preserve big-endianness in the output file.
|
||
|
||
The -V option can be used to check the selected orderings.
|
||
|
||
-N, --reverse-nibbles
|
||
Specifies that the nibble ordering (i.e. the 2 halves of a byte)
|
||
of the samples should be reversed; sometimes useful with ADPCM-
|
||
based formats.
|
||
|
||
N.B. See also N.B. in section on -x above.
|
||
|
||
-X, --reverse-bits
|
||
Specifies that the bit ordering of the samples should be
|
||
reversed; sometimes useful with a few (mostly headerless) for‐
|
||
mats.
|
||
|
||
N.B. See also N.B. in section on -x above.
|
||
|
||
Output File Format Options
|
||
These options apply only to the output file and may precede only the
|
||
output filename on the command line.
|
||
|
||
--add-comment TEXT
|
||
Append a comment in the output file header (where applicable).
|
||
|
||
--comment TEXT
|
||
Specify the comment text to store in the output file header
|
||
(where applicable).
|
||
|
||
SoX will provide a default comment if this option (or --com‐
|
||
ment-file) is not given. To specify that no comment should be
|
||
stored in the output file, use --comment "" .
|
||
|
||
--comment-file FILENAME
|
||
Specify a file containing the comment text to store in the out‐
|
||
put file header (where applicable).
|
||
|
||
-C, --compression FACTOR
|
||
The compression factor for variably compressing output file for‐
|
||
mats. If this option is not given then a default compression
|
||
factor will apply. The compression factor is interpreted dif‐
|
||
ferently for different compressing file formats. See the
|
||
description of the file formats that use this option in soxfor‐
|
||
mat(7) for more information.
|
||
|
||
EFFECTS
|
||
In addition to converting, playing and recording audio files, SoX can
|
||
be used to invoke a number of audio `effects'. Multiple effects may be
|
||
applied by specifying them one after another at the end of the SoX com‐
|
||
mand line, forming an `effects chain'. Note that applying multiple
|
||
effects in real-time (i.e. when playing audio) is likely to require a
|
||
high performance computer. Stopping other applications may alleviate
|
||
performance issues should they occur.
|
||
|
||
Some of the SoX effects are primarily intended to be applied to a sin‐
|
||
gle instrument or `voice'. To facilitate this, the remix effect and
|
||
the global SoX option -M can be used to isolate then recombine tracks
|
||
from a multi-track recording.
|
||
|
||
Multiple Effects Chains
|
||
A single effects chain is made up of one or more effects. Audio from
|
||
the input runs through the chain until either the end of the input file
|
||
is reached or an effect in the chain requests to terminate the chain.
|
||
|
||
SoX supports running multiple effects chains over the input audio. In
|
||
this case, when one chain indicates it is done processing audio, the
|
||
audio data is then sent through the next effects chain. This continues
|
||
until either no more effects chains exist or the input has reached the
|
||
end of the file.
|
||
|
||
An effects chain is terminated by placing a : (colon) after an effect.
|
||
Any following effects are a part of a new effects chain.
|
||
|
||
It is important to place the effect that will stop the chain as the
|
||
first effect in the chain. This is because any samples that are
|
||
buffered by effects to the left of the terminating effect will be dis‐
|
||
carded. The amount of samples discarded is related to the --buffer
|
||
option and it should be kept small, relative to the sample rate, if the
|
||
terminating effect cannot be first. Further information on stopping
|
||
effects can be found in the Stopping SoX section.
|
||
|
||
There are a few pseudo-effects that aid using multiple effects chains.
|
||
These include newfile which will start writing to a new output file
|
||
before moving to the next effects chain and restart which will move
|
||
back to the first effects chain. Pseudo-effects must be specified as
|
||
the first effect in a chain and as the only effect in a chain (they
|
||
must have a : before and after they are specified).
|
||
|
||
The following is an example of multiple effects chains. It will split
|
||
the input file into multiple files of 30 seconds in length. Each out‐
|
||
put filename will have unique number in its name as documented in the
|
||
Output Files section.
|
||
sox infile.wav output.wav trim 0 30 : newfile : restart
|
||
|
||
Common Notation And Parameters
|
||
In the descriptions that follow, brackets [ ] are used to denote param‐
|
||
eters that are optional, braces { } to denote those that are both
|
||
optional and repeatable, and angle brackets < > to denote those that
|
||
are repeatable but not optional. Where applicable, default values for
|
||
optional parameters are shown in parenthesis ( ).
|
||
|
||
The following parameters are used with, and have the same meaning for,
|
||
several effects:
|
||
|
||
center[k]
|
||
See frequency.
|
||
|
||
frequency[k]
|
||
A frequency in Hz, or, if appended with `k', kHz.
|
||
|
||
gain A power gain in dB. Zero gives no gain; less than zero gives an
|
||
attenuation.
|
||
|
||
position
|
||
A position within the audio stream; the syntax is [=|+|-]time‐
|
||
spec, where timespec is a time specification (see below). The
|
||
optional first character indicates whether the timespec is to be
|
||
interpreted relative to the start (=) or end (-) of audio, or to
|
||
the previous position if the effect accepts multiple position
|
||
arguments (+). The audio length must be known for end-relative
|
||
locations to work; some effects do accept -0 for end-of-audio,
|
||
though, even if the length is unknown. Which of =, +, - is the
|
||
default depends on the effect and is shown in its syntax as,
|
||
e.g., position(+).
|
||
|
||
Examples: =2:00 (two minutes into the audio stream), -100s (one
|
||
hundred samples before the end of audio), +0:12+10s (twelve sec‐
|
||
onds and ten samples after the previous position), -0.5+1s (one
|
||
sample less than half a second before the end of audio).
|
||
|
||
width[h|k|o|q]
|
||
Used to specify the band-width of a filter. A number of differ‐
|
||
ent methods to specify the width are available (though not all
|
||
for every effect). One of the characters shown may be appended
|
||
to select the desired method as follows:
|
||
|
||
Method Notes
|
||
h Hz
|
||
k kHz
|
||
o Octaves
|
||
q Q-factor See [2]
|
||
|
||
For each effect that uses this parameter, the default method
|
||
(i.e. if no character is appended) is the one that it listed
|
||
first in the first line of the effect's description.
|
||
|
||
Most effects that expect an audio position or duration in a parameter,
|
||
i.e. a time specification, accept either of the following two forms:
|
||
|
||
[[hours:]minutes:]seconds[.frac][t]
|
||
A specification of `1:30.5' corresponds to one minute, thirty
|
||
and ½ seconds. The t suffix is entirely optional (however, see
|
||
the silence effect for an exception). Note that the component
|
||
values do not have to be normalized; e.g., `1:23:45', `83:45',
|
||
`79:0285', `1:0:1425', `1::1425' and `5025' all are legal and
|
||
equivalent to each other.
|
||
|
||
sampless
|
||
Specifies the number of samples directly, as in `8000s'. For
|
||
large sample counts, e notation is supported: `1.7e6s' is the
|
||
same as `1700000s'.
|
||
|
||
Time specifications can also be chained with + or - into a new time
|
||
specification where the right part is added to or subtracted from the
|
||
left, respectively: `3:00-200s' means two hundred samples less than
|
||
three minutes.
|
||
|
||
To see if SoX has support for an optional effect, enter sox -h and look
|
||
for its name under the list: `EFFECTS'.
|
||
|
||
Supported Effects
|
||
Note: a categorised list of the effects can be found in the accompany‐
|
||
ing `README' file.
|
||
|
||
allpass frequency[k] width[h|k|o|q]
|
||
Apply a two-pole all-pass filter with central frequency (in Hz)
|
||
frequency, and filter-width width. An all-pass filter changes
|
||
the audio's frequency to phase relationship without changing its
|
||
frequency to amplitude relationship. The filter is described in
|
||
detail in [1].
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
band [-n] center[k] [width[h|k|o|q]]
|
||
Apply a band-pass filter. The frequency response drops loga‐
|
||
rithmically around the center frequency. The width parameter
|
||
gives the slope of the drop. The frequencies at center + width
|
||
and center - width will be half of their original amplitudes.
|
||
band defaults to a mode oriented to pitched audio, i.e. voice,
|
||
singing, or instrumental music. The -n (for noise) option uses
|
||
the alternate mode for un-pitched audio (e.g. percussion).
|
||
Warning: -n introduces a power-gain of about 11dB in the filter,
|
||
so beware of output clipping. band introduces noise in the
|
||
shape of the filter, i.e. peaking at the center frequency and
|
||
settling around it.
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
See also sinc for a bandpass filter with steeper shoulders.
|
||
|
||
bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
|
||
Apply a two-pole Butterworth band-pass or band-reject filter
|
||
with central frequency frequency, and (3dB-point) band-width
|
||
width. The -c option applies only to bandpass and selects a
|
||
constant skirt gain (peak gain = Q) instead of the default: con‐
|
||
stant 0dB peak gain. The filters roll off at 6dB per octave
|
||
(20dB per decade) and are described in detail in [1].
|
||
|
||
These effects support the --plot global option.
|
||
|
||
See also sinc for a bandpass filter with steeper shoulders.
|
||
|
||
bandreject frequency[k] width[h|k|o|q]
|
||
Apply a band-reject filter. See the description of the bandpass
|
||
effect for details.
|
||
|
||
bass|treble gain [frequency[k] [width[s|h|k|o|q]]]
|
||
Boost or cut the bass (lower) or treble (upper) frequencies of
|
||
the audio using a two-pole shelving filter with a response simi‐
|
||
lar to that of a standard hi-fi's tone-controls. This is also
|
||
known as shelving equalisation (EQ).
|
||
|
||
gain gives the gain at 0 Hz (for bass), or whichever is the
|
||
lower of ∼22 kHz and the Nyquist frequency (for treble). Its
|
||
useful range is about -20 (for a large cut) to +20 (for a large
|
||
boost). Beware of Clipping when using a positive gain.
|
||
|
||
If desired, the filter can be fine-tuned using the following
|
||
optional parameters:
|
||
|
||
frequency sets the filter's central frequency and so can be used
|
||
to extend or reduce the frequency range to be boosted or cut.
|
||
The default value is 100 Hz (for bass) or 3 kHz (for treble).
|
||
|
||
width determines how steep is the filter's shelf transition. In
|
||
addition to the common width specification methods described
|
||
above, `slope' (the default, or if appended with `s') may be
|
||
used. The useful range of `slope' is about 0.3, for a gentle
|
||
slope, to 1 (the maximum), for a steep slope; the default value
|
||
is 0.5.
|
||
|
||
The filters are described in detail in [1].
|
||
|
||
These effects support the --plot global option.
|
||
|
||
See also equalizer for a peaking equalisation effect.
|
||
|
||
bend [-f frame-rate(25)] [-o over-sample(16)] { start-posi‐
|
||
tion(+),cents,end-position(+) }
|
||
Changes pitch by specified amounts at specified times. Each
|
||
given triple: start-position,cents,end-position specifies one
|
||
bend. cents is the number of cents (100 cents = 1 semitone) by
|
||
which to bend the pitch. The other values specify the points in
|
||
time at which to start and end bending the pitch, respectively.
|
||
|
||
The pitch-bending algorithm utilises the Discrete Fourier Trans‐
|
||
form (DFT) at a particular frame rate and over-sampling rate.
|
||
The -f and -o parameters may be used to adjust these parameters
|
||
and thus control the smoothness of the changes in pitch.
|
||
|
||
For example, an initial tone is generated, then bent three
|
||
times, yielding four different notes in total:
|
||
play -n synth 2.5 sin 667 gain 1 \
|
||
bend .35,180,.25 .15,740,.53 0,-520,.3
|
||
Here, the first bend runs from 0.35 to 0.6, and the second one
|
||
from 0.75 to 1.28 seconds. Note that the clipping that is pro‐
|
||
duced in this example is deliberate; to remove it, use gain -5
|
||
in place of gain 1.
|
||
|
||
See also pitch.
|
||
|
||
biquad b0 b1 b2 a0 a1 a2
|
||
Apply a biquad IIR filter with the given coefficients. Where b*
|
||
and a* are the numerator and denominator coefficients respec‐
|
||
tively.
|
||
|
||
See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0
|
||
= 1).
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
channels CHANNELS
|
||
Invoke a simple algorithm to change the number of channels in
|
||
the audio signal to the given number CHANNELS: mixing if
|
||
decreasing the number of channels or duplicating if increasing
|
||
the number of channels.
|
||
|
||
The channels effect is invoked automatically if SoX's -c option
|
||
specifies a number of channels that is different to that of the
|
||
input file(s). Alternatively, if this effect is given explic‐
|
||
itly, then SoX's -c option need not be given. For example, the
|
||
following two commands are equivalent:
|
||
sox input.wav -c 1 output.wav bass -b 24
|
||
sox input.wav output.wav bass -b 24 channels 1
|
||
though the second form is more flexible as it allows the effects
|
||
to be ordered arbitrarily.
|
||
|
||
See also remix for an effect that allows channels to be
|
||
mixed/selected arbitrarily.
|
||
|
||
chorus gain-in gain-out <delay decay speed depth -s|-t>
|
||
Add a chorus effect to the audio. This can make a single vocal
|
||
sound like a chorus, but can also be applied to instrumentation.
|
||
|
||
Chorus resembles an echo effect with a short delay, but whereas
|
||
with echo the delay is constant, with chorus, it is varied using
|
||
sinusoidal or triangular modulation. The modulation depth
|
||
defines the range the modulated delay is played before or after
|
||
the delay. Hence the delayed sound will sound slower or faster,
|
||
that is the delayed sound tuned around the original one, like in
|
||
a chorus where some vocals are slightly off key. See [3] for
|
||
more discussion of the chorus effect.
|
||
|
||
Each four-tuple parameter delay/decay/speed/depth gives the
|
||
delay in milliseconds and the decay (relative to gain-in) with a
|
||
modulation speed in Hz using depth in milliseconds. The modula‐
|
||
tion is either sinusoidal (-s) or triangular (-t). Gain-out is
|
||
the volume of the output.
|
||
|
||
A typical delay is around 40ms to 60ms; the modulation speed is
|
||
best near 0.25Hz and the modulation depth around 2ms. For exam‐
|
||
ple, a single delay:
|
||
play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 -t
|
||
Two delays of the original samples:
|
||
play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 -t \
|
||
60 0.32 0.4 1.3 -s
|
||
A fuller sounding chorus (with three additional delays):
|
||
play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
|
||
60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s
|
||
|
||
compand attack1,decay1{,attack2,decay2}
|
||
[soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
|
||
[gain [initial-volume-dB [delay]]]
|
||
|
||
Compand (compress or expand) the dynamic range of the audio.
|
||
|
||
The attack and decay parameters (in seconds) determine the time
|
||
over which the instantaneous level of the input signal is aver‐
|
||
aged to determine its volume; attacks refer to increases in vol‐
|
||
ume and decays refer to decreases. For most situations, the
|
||
attack time (response to the music getting louder) should be
|
||
shorter than the decay time because the human ear is more sensi‐
|
||
tive to sudden loud music than sudden soft music. Where more
|
||
than one pair of attack/decay parameters are specified, each
|
||
input channel is companded separately and the number of pairs
|
||
must agree with the number of input channels. Typical values
|
||
are 0.3,0.8 seconds.
|
||
|
||
The second parameter is a list of points on the compander's
|
||
transfer function specified in dB relative to the maximum possi‐
|
||
ble signal amplitude. The input values must be in a strictly
|
||
increasing order but the transfer function does not have to be
|
||
monotonically rising. If omitted, the value of out-dB1 defaults
|
||
to the same value as in-dB1; levels below in-dB1 are not com‐
|
||
panded (but may have gain applied to them). The point 0,0 is
|
||
assumed but may be overridden (by 0,out-dBn). If the list is
|
||
preceded by a soft-knee-dB value, then the points at where adja‐
|
||
cent line segments on the transfer function meet will be rounded
|
||
by the amount given. Typical values for the transfer function
|
||
are 6:-70,-60,-20.
|
||
|
||
The third (optional) parameter is an additional gain in dB to be
|
||
applied at all points on the transfer function and allows easy
|
||
adjustment of the overall gain.
|
||
|
||
The fourth (optional) parameter is an initial level to be
|
||
assumed for each channel when companding starts. This permits
|
||
the user to supply a nominal level initially, so that, for exam‐
|
||
ple, a very large gain is not applied to initial signal levels
|
||
before the companding action has begun to operate: it is quite
|
||
probable that in such an event, the output would be severely
|
||
clipped while the compander gain properly adjusts itself. A
|
||
typical value (for audio which is initially quiet) is -90 dB.
|
||
|
||
The fifth (optional) parameter is a delay in seconds. The input
|
||
signal is analysed immediately to control the compander, but it
|
||
is delayed before being fed to the volume adjuster. Specifying
|
||
a delay approximately equal to the attack/decay times allows the
|
||
compander to effectively operate in a `predictive' rather than a
|
||
reactive mode. A typical value is 0.2 seconds.
|
||
|
||
* * *
|
||
|
||
The following example might be used to make a piece of music
|
||
with both quiet and loud passages suitable for listening to in a
|
||
noisy environment such as a moving vehicle:
|
||
sox asz.wav asz-car.wav compand 0.3,1 6:-70,-60,-20 -5 -90 0.2
|
||
The transfer function (`6:-70,...') says that very soft sounds
|
||
(below -70dB) will remain unchanged. This will stop the compan‐
|
||
der from boosting the volume on `silent' passages such as
|
||
between movements. However, sounds in the range -60dB to 0dB
|
||
(maximum volume) will be boosted so that the 60dB dynamic range
|
||
of the original music will be compressed 3-to-1 into a 20dB
|
||
range, which is wide enough to enjoy the music but narrow enough
|
||
to get around the road noise. The `6:' selects 6dB soft-knee
|
||
companding. The -5 (dB) output gain is needed to avoid clipping
|
||
(the number is inexact, and was derived by experimentation).
|
||
The -90 (dB) for the initial volume will work fine for a clip
|
||
that starts with near silence, and the delay of 0.2 (seconds)
|
||
has the effect of causing the compander to react a bit more
|
||
quickly to sudden volume changes.
|
||
|
||
In the next example, compand is being used as a noise-gate for
|
||
when the noise is at a lower level than the signal:
|
||
play infile compand .1,.2 -inf,-50.1,-inf,-50,-50 0 -90 .1
|
||
Here is another noise-gate, this time for when the noise is at a
|
||
higher level than the signal (making it, in some ways, similar
|
||
to squelch):
|
||
play infile compand .1,.1 -45.1,-45,-inf,0,-inf 45 -90 .1
|
||
This effect supports the --plot global option (for the transfer
|
||
function).
|
||
|
||
See also mcompand for a multiple-band companding effect.
|
||
|
||
contrast [enhancement-amount(75)]
|
||
Comparable with compression, this effect modifies an audio sig‐
|
||
nal to make it sound louder. enhancement-amount controls the
|
||
amount of the enhancement and is a number in the range 0-100.
|
||
Note that enhancement-amount = 0 still gives a significant con‐
|
||
trast enhancement.
|
||
|
||
See also the compand and mcompand effects.
|
||
|
||
dcshift shift [limitergain]
|
||
Apply a DC shift to the audio. This can be useful to remove a
|
||
DC offset (caused perhaps by a hardware problem in the recording
|
||
chain) from the audio. The effect of a DC offset is reduced
|
||
headroom and hence volume. The stat or stats effect can be used
|
||
to determine if a signal has a DC offset.
|
||
|
||
The given dcshift value is a floating point number in the range
|
||
of ±2 that indicates the amount to shift the audio (which is in
|
||
the range of ±1).
|
||
|
||
An optional limitergain can be specified as well. It should
|
||
have a value much less than 1 (e.g. 0.05 or 0.02) and is used
|
||
only on peaks to prevent clipping.
|
||
|
||
* * *
|
||
|
||
An alternative approach to removing a DC offset (albeit with a
|
||
short delay) is to use the highpass filter effect at a frequency
|
||
of say 10Hz, as illustrated in the following example:
|
||
sox -n dc.wav synth 5 sin %0 50
|
||
sox dc.wav fixed.wav highpass 10
|
||
|
||
deemph Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation
|
||
shelving filter).
|
||
|
||
Pre-emphasis was applied in the mastering of some CDs issued in
|
||
the early 1980s. These included many classical music albums, as
|
||
well as now sought-after issues of albums by The Beatles, Pink
|
||
Floyd and others. Pre-emphasis should be removed at playback
|
||
time by a de-emphasis filter in the playback device. However,
|
||
not all modern CD players have this filter, and very few PC CD
|
||
drives have it; playing pre-emphasised audio without the correct
|
||
de-emphasis filter results in audio that sounds harsh and is far
|
||
from what its creators intended.
|
||
|
||
With the deemph effect, it is possible to apply the necessary
|
||
de-emphasis to audio that has been extracted from a pre-empha‐
|
||
sised CD, and then either burn the de-emphasised audio to a new
|
||
CD (which will then play correctly on any CD player), or simply
|
||
play the correctly de-emphasised audio files on the PC. For
|
||
example:
|
||
sox track1.wav track1-deemph.wav deemph
|
||
and then burn track1-deemph.wav to CD, or
|
||
play track1-deemph.wav
|
||
or simply
|
||
play track1.wav deemph
|
||
The de-emphasis filter is implemented as a biquad and requires
|
||
the input audio sample rate to be either 44.1kHz or 48kHz. Max‐
|
||
imum deviation from the ideal response is only 0.06dB (up to
|
||
20kHz).
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
See also the bass and treble shelving equalisation effects.
|
||
|
||
delay {position(=)}
|
||
Delay one or more audio channels such that they start at the
|
||
given position. For example, delay 1.5 +1 3000s delays the
|
||
first channel by 1.5 seconds, the second channel by 2.5 seconds
|
||
(one second more than the previous channel), the third channel
|
||
by 3000 samples, and leaves any other channels that may be
|
||
present un-delayed. The following (one long) command plays a
|
||
chime sound:
|
||
play -n synth -j 3 sin %3 sin %-2 sin %-5 sin %-9 \
|
||
sin %-14 sin %-21 fade h .01 2 1.5 delay \
|
||
1.3 1 .76 .54 .27 remix - fade h 0 2.7 2.5 norm -1
|
||
and this plays a guitar chord:
|
||
play -n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \
|
||
delay 0 .05 .1 .15 .2 .25 remix - fade 0 4 .1 norm -1
|
||
|
||
dither [-S|-s|-f filter] [-a] [-p precision]
|
||
Apply dithering to the audio. Dithering deliberately adds a
|
||
small amount of noise to the signal in order to mask audible
|
||
quantization effects that can occur if the output sample size is
|
||
less than 24 bits. With no options, this effect will add trian‐
|
||
gular (TPDF) white noise. Noise-shaping (only for certain sam‐
|
||
ple rates) can be selected with -s. With the -f option, it is
|
||
possible to select a particular noise-shaping filter from the
|
||
following list: lipshitz, f-weighted, modified-e-weighted,
|
||
improved-e-weighted, gesemann, shibata, low-shibata, high-shi‐
|
||
bata. Note that most filter types are available only with
|
||
44100Hz sample rate. The filter types are distinguished by the
|
||
following properties: audibility of noise, level of (inaudible,
|
||
but in some circumstances, otherwise problematic) shaped high
|
||
frequency noise, and processing speed.
|
||
See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of
|
||
the different noise-shaping curves.
|
||
|
||
The -S option selects a slightly `sloped' TPDF, biased towards
|
||
higher frequencies. It can be used at any sampling rate but
|
||
below ≈22k, plain TPDF is probably better, and above ≈ 37k,
|
||
noise-shaping (if available) is probably better.
|
||
|
||
The -a option enables a mode where dithering (and noise-shaping
|
||
if applicable) are automatically enabled only when needed. The
|
||
most likely use for this is when applying fade in or out to an
|
||
already dithered file, so that the redithering applies only to
|
||
the faded portions. However, auto dithering is not fool-proof,
|
||
so the fades should be carefully checked for any noise modula‐
|
||
tion; if this occurs, then either re-dither the whole file, or
|
||
use trim, fade, and concatencate.
|
||
|
||
The -p option allows overriding the target precision.
|
||
|
||
If the SoX global option -R option is not given, then the
|
||
pseudo-random number generator used to generate the white noise
|
||
will be `reseeded', i.e. the generated noise will be different
|
||
between invocations.
|
||
|
||
This effect should not be followed by any other effect that
|
||
affects the audio.
|
||
|
||
See also the `Dithering' section above.
|
||
|
||
downsample [factor(2)]
|
||
Downsample the signal by an integer factor: Only the first out
|
||
of each factor samples is retained, the others are discarded.
|
||
|
||
No decimation filter is applied. If the input is not a properly
|
||
bandlimited baseband signal, aliasing will occur. This may be
|
||
desirable, e.g., for frequency translation.
|
||
|
||
For a general resampling effect with anti-aliasing, see rate.
|
||
See also upsample.
|
||
|
||
earwax Makes audio easier to listen to on headphones. Adds `cues' to
|
||
44.1kHz stereo (i.e. audio CD format) audio so that when lis‐
|
||
tened to on headphones the stereo image is moved from inside
|
||
your head (standard for headphones) to outside and in front of
|
||
the listener (standard for speakers).
|
||
|
||
echo gain-in gain-out <delay decay>
|
||
Add echoing to the audio. Echoes are reflected sound and can
|
||
occur naturally amongst mountains (and sometimes large build‐
|
||
ings) when talking or shouting; digital echo effects emulate
|
||
this behaviour and are often used to help fill out the sound of
|
||
a single instrument or vocal. The time difference between the
|
||
original signal and the reflection is the `delay' (time), and
|
||
the loudness of the reflected signal is the `decay'. Multiple
|
||
echoes can have different delays and decays.
|
||
|
||
Each given delay decay pair gives the delay in milliseconds and
|
||
the decay (relative to gain-in) of that echo. Gain-out is the
|
||
volume of the output. For example: This will make it sound as
|
||
if there are twice as many instruments as are actually playing:
|
||
play lead.aiff echo 0.8 0.88 60 0.4
|
||
If the delay is very short, then it sound like a (metallic) ro‐
|
||
bot playing music:
|
||
play lead.aiff echo 0.8 0.88 6 0.4
|
||
A longer delay will sound like an open air concert in the moun‐
|
||
tains:
|
||
play lead.aiff echo 0.8 0.9 1000 0.3
|
||
One mountain more, and:
|
||
play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
|
||
|
||
echos gain-in gain-out <delay decay>
|
||
Add a sequence of echoes to the audio. Each delay decay pair
|
||
gives the delay in milliseconds and the decay (relative to gain-
|
||
in) of that echo. Gain-out is the volume of the output.
|
||
|
||
Like the echo effect, echos stand for `ECHO in Sequel', that is
|
||
the first echos takes the input, the second the input and the
|
||
first echos, the third the input and the first and the second
|
||
echos, ... and so on. Care should be taken using many echos; a
|
||
single echos has the same effect as a single echo.
|
||
|
||
The sample will be bounced twice in symmetric echos:
|
||
play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
|
||
The sample will be bounced twice in asymmetric echos:
|
||
play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
|
||
The sample will sound as if played in a garage:
|
||
play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
|
||
|
||
equalizer frequency[k] width[q|o|h|k] gain
|
||
Apply a two-pole peaking equalisation (EQ) filter. With this
|
||
filter, the signal-level at and around a selected frequency can
|
||
be increased or decreased, whilst (unlike band-pass and band-
|
||
reject filters) that at all other frequencies is unchanged.
|
||
|
||
frequency gives the filter's central frequency in Hz, width, the
|
||
band-width, and gain the required gain or attenuation in dB.
|
||
Beware of Clipping when using a positive gain.
|
||
|
||
In order to produce complex equalisation curves, this effect can
|
||
be given several times, each with a different central frequency.
|
||
|
||
The filter is described in detail in [1].
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
See also bass and treble for shelving equalisation effects.
|
||
|
||
fade [type] fade-in-length [stop-position(=) [fade-out-length]]
|
||
Apply a fade effect to the beginning, end, or both of the audio.
|
||
|
||
An optional type can be specified to select the shape of the
|
||
fade curve: q for quarter of a sine wave, h for half a sine
|
||
wave, t for linear (`triangular') slope, l for logarithmic, and
|
||
p for inverted parabola. The default is logarithmic.
|
||
|
||
A fade-in starts from the first sample and ramps the signal
|
||
level from 0 to full volume over the time given as fade-in-
|
||
length. Specify 0 if no fade-in is wanted.
|
||
|
||
For fade-outs, the audio will be truncated at stop-position and
|
||
the signal level will be ramped from full volume down to 0 over
|
||
an interval of fade-out-length before the stop-position. If
|
||
fade-out-length is not specified, it defaults to the same value
|
||
as fade-in-length. No fade-out is performed if stop-position is
|
||
not specified. If the audio length can be determined from the
|
||
input file header and any previous effects, then -0 (or, for
|
||
historical reasons, 0) may be specified for stop-position to
|
||
indicate the usual case of a fade-out that ends at the end of
|
||
the input audio stream.
|
||
|
||
Any time specification may be used for fade-in-length and fade-
|
||
out-length.
|
||
|
||
See also the splice effect.
|
||
|
||
fir [coefs-file|coefs]
|
||
Use SoX's FFT convolution engine with given FIR filter coeffi‐
|
||
cients. If a single argument is given then this is treated as
|
||
the name of a file containing the filter coefficients (white-
|
||
space separated; may contain `#' comments). If the given file‐
|
||
name is `-', or if no argument is given, then the coefficients
|
||
are read from the `standard input' (stdin); otherwise, coeffi‐
|
||
cients may be given on the command line. Examples:
|
||
sox infile outfile fir 0.0195 -0.082 0.234 0.891 -0.145 0.043
|
||
sox infile outfile fir coefs.txt
|
||
with coefs.txt containing
|
||
# HP filter
|
||
# freq=10000
|
||
1.2311233052619888e-01
|
||
-4.4777096106211783e-01
|
||
5.1031563346705155e-01
|
||
-6.6502926320995331e-02
|
||
...
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
flanger [delay depth regen width speed shape phase interp]
|
||
Apply a flanging effect to the audio. See [3] for a detailed
|
||
description of flanging.
|
||
|
||
All parameters are optional (right to left).
|
||
|
||
Range Default Description
|
||
delay 0 - 30 0 Base delay in milliseconds.
|
||
depth 0 - 10 2 Added swept delay in milliseconds.
|
||
regen -95 - 95 0 Percentage regeneration (delayed
|
||
signal feedback).
|
||
width 0 - 100 71 Percentage of delayed signal mixed
|
||
with original.
|
||
speed 0.1 - 10 0.5 Sweeps per second (Hz).
|
||
shape sin Swept wave shape: sine|triangle.
|
||
phase 0 - 100 25 Swept wave percentage phase-shift
|
||
for multi-channel (e.g. stereo)
|
||
flange; 0 = 100 = same phase on
|
||
each channel.
|
||
interp lin Digital delay-line interpolation:
|
||
linear|quadratic.
|
||
|
||
gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
|
||
Apply amplification or attenuation to the audio signal, or, in
|
||
some cases, to some of its channels. Note that use of any of
|
||
-e, -B, -b, -r, or -n requires temporary file space to store the
|
||
audio to be processed, so may be unsuitable for use with
|
||
`streamed' audio.
|
||
|
||
Without other options, gain-dB is used to adjust the signal
|
||
power level by the given number of dB: positive amplifies
|
||
(beware of Clipping), negative attenuates. With other options,
|
||
the gain-dB amplification or attenuation is (logically) applied
|
||
after the processing due to those options.
|
||
|
||
Given the -e option, the levels of the audio channels of a
|
||
multi-channel file are `equalised', i.e. gain is applied to all
|
||
channels other than that with the highest peak level, such that
|
||
all channels attain the same peak level (but, without also giv‐
|
||
ing -n, the audio is not `normalised').
|
||
|
||
The -B (balance) option is similar to -e, but with -B, the RMS
|
||
level is used instead of the peak level. -B might be used to
|
||
correct stereo imbalance caused by an imperfect record turntable
|
||
cartridge. Note that unlike -e, -B might cause some clipping.
|
||
|
||
-b is similar to -B but has clipping protection, i.e. if neces‐
|
||
sary to prevent clipping whilst balancing, attenuation is
|
||
applied to all channels. Note, however, that in conjunction
|
||
with -n, -B and -b are synonymous.
|
||
|
||
The -r option is used in conjunction with a prior invocation of
|
||
gain with the -h option - see below for details.
|
||
|
||
The -n option normalises the audio to 0dB FSD; it is often used
|
||
in conjunction with a negative gain-dB to the effect that the
|
||
audio is normalised to a given level below 0dB. For example,
|
||
sox infile outfile gain -n
|
||
normalises to 0dB, and
|
||
sox infile outfile gain -n -3
|
||
normalises to -3dB.
|
||
|
||
The -l option invokes a simple limiter, e.g.
|
||
sox infile outfile gain -l 6
|
||
will apply 6dB of gain but never clip. Note that limiting more
|
||
than a few dBs more than occasionally (in a piece of audio) is
|
||
not recommended as it can cause audible distortion. See the
|
||
compand effect for a more capable limiter.
|
||
|
||
The -h option is used to apply gain to provide head-room for
|
||
subsequent processing. For example, with
|
||
sox infile outfile gain -h bass +6
|
||
6dB of attenuation will be applied prior to the bass boosting
|
||
effect thus ensuring that it will not clip. Of course, with
|
||
bass, it is obvious how much headroom will be needed, but with
|
||
other effects (e.g. rate, dither) it is not always as clear.
|
||
Another advantage of using gain -h rather than an explicit
|
||
attenuation, is that if the headroom is not used by subsequent
|
||
effects, it can be reclaimed with gain -r, for example:
|
||
sox infile outfile gain -h bass +6 rate 44100 gain -r
|
||
The above effects chain guarantees never to clip nor amplify; it
|
||
attenuates if necessary to prevent clipping, but by only as much
|
||
as is needed to do so.
|
||
|
||
Output formatting (dithering and bit-depth reduction) also
|
||
requires headroom (which cannot be `reclaimed'), e.g.
|
||
sox infile outfile gain -h bass +6 rate 44100 gain -rh dither
|
||
Here, the second gain invocation, reclaims as much of the head‐
|
||
room as it can from the preceding effects, but retains as much
|
||
headroom as is needed for subsequent processing. The SoX global
|
||
option -G can be given to automatically invoke gain -h and gain
|
||
-r.
|
||
|
||
See also the norm and vol effects.
|
||
|
||
highpass|lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
|
||
Apply a high-pass or low-pass filter with 3dB point frequency.
|
||
The filter can be either single-pole (with -1), or double-pole
|
||
(the default, or with -2). width applies only to double-pole
|
||
filters; the default is Q = 0.707 and gives a Butterworth
|
||
response. The filters roll off at 6dB per pole per octave (20dB
|
||
per pole per decade). The double-pole filters are described in
|
||
detail in [1].
|
||
|
||
These effects support the --plot global option.
|
||
|
||
See also sinc for filters with a steeper roll-off.
|
||
|
||
hilbert [-n taps]
|
||
Apply an odd-tap Hilbert transform filter, phase-shifting the
|
||
signal by 90 degrees.
|
||
|
||
This is used in many matrix coding schemes and for analytic sig‐
|
||
nal generation. The process is often written as a multiplica‐
|
||
tion by i (or j), the imaginary unit.
|
||
|
||
An odd-tap Hilbert transform filter has a bandpass characteris‐
|
||
tic, attenuating the lowest and highest frequencies. Its band‐
|
||
width can be controlled by the number of filter taps, which can
|
||
be specified with -n. By default, the number of taps is chosen
|
||
for a cutoff frequency of about 75 Hz.
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
ladspa [-l|-r] module [plugin] [argument ...]
|
||
Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API)
|
||
plugin. Despite the name, LADSPA is not Linux-specific, and a
|
||
wide range of effects is available as LADSPA plugins, such as
|
||
cmt [6] (the Computer Music Toolkit) and Steve Harris's plugin
|
||
collection [7]. The first argument is the plugin module, the
|
||
second the name of the plugin (a module can contain more than
|
||
one plugin), and any other arguments are for the control ports
|
||
of the plugin. Missing arguments are supplied by default values
|
||
if possible.
|
||
|
||
Normally, the number of input ports of the plugin must match the
|
||
number of input channels, and the number of output ports deter‐
|
||
mines the output channel count. However, the -r (replicate)
|
||
option allows cloning a mono plugin to handle multi-channel
|
||
input.
|
||
|
||
Some plugins introduce latency which SoX may optionally compen‐
|
||
sate for. The -l (latency compensation) option automatically
|
||
compensates for latency as reported by the plugin via an output
|
||
control port named "latency".
|
||
|
||
If found, the environment variable LADSPA_PATH will be used as
|
||
search path for plugins.
|
||
|
||
loudness [gain [reference]]
|
||
Loudness control - similar to the gain effect, but provides
|
||
equalisation for the human auditory system. See
|
||
http://en.wikipedia.org/wiki/Loudness for a detailed description
|
||
of loudness. The gain is adjusted by the given gain parameter
|
||
(usually negative) and the signal equalised according to ISO 226
|
||
w.r.t. a reference level of 65dB, though an alternative refer‐
|
||
ence level may be given if the original audio has been equalised
|
||
for some other optimal level. A default gain of -10dB is used
|
||
if a gain value is not given.
|
||
|
||
See also the gain effect.
|
||
|
||
lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
|
||
Apply a low-pass filter. See the description of the highpass
|
||
effect for details.
|
||
|
||
mcompand "attack1,decay1{,attack2,decay2}
|
||
[soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
|
||
[gain [initial-volume-dB [delay]]]" {crossover-freq[k]
|
||
"attack1,..."}
|
||
|
||
The multi-band compander is similar to the single-band compander
|
||
but the audio is first divided into bands using Linkwitz-Riley
|
||
cross-over filters and a separately specifiable compander run on
|
||
each band. See the compand effect for the definition of its
|
||
parameters. Compand parameters are specified between double
|
||
quotes and the crossover frequency for that band is given by
|
||
crossover-freq; these can be repeated to create multiple bands.
|
||
|
||
For example, the following (one long) command shows how multi-
|
||
band companding is typically used in FM radio:
|
||
play track1.wav gain -3 sinc 8000- 29 100 mcompand \
|
||
"0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
|
||
"0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
|
||
"0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
|
||
"0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
|
||
"0,0.025 -38,-31,-28,-28,-0,-25" \
|
||
gain 15 highpass 22 highpass 22 sinc -n 255 -b 16 -17500 \
|
||
gain 9 lowpass -1 17801
|
||
The audio file is played with a simulated FM radio sound (or
|
||
broadcast signal condition if the lowpass filter at the end is
|
||
skipped). Note that the pipeline is set up with US-style 75us
|
||
pre-emphasis.
|
||
|
||
See also compand for a single-band companding effect.
|
||
|
||
noiseprof [profile-file]
|
||
Calculate a profile of the audio for use in noise reduction.
|
||
See the description of the noisered effect for details.
|
||
|
||
noisered [profile-file [amount]]
|
||
Reduce noise in the audio signal by profiling and filtering.
|
||
This effect is moderately effective at removing consistent back‐
|
||
ground noise such as hiss or hum. To use it, first run SoX with
|
||
the noiseprof effect on a section of audio that ideally would
|
||
contain silence but in fact contains noise - such sections are
|
||
typically found at the beginning or the end of a recording.
|
||
noiseprof will write out a noise profile to profile-file, or to
|
||
stdout if no profile-file or if `-' is given. E.g.
|
||
sox speech.wav -n trim 0 1.5 noiseprof speech.noise-profile
|
||
To actually remove the noise, run SoX again, this time with the
|
||
noisered effect; noisered will reduce noise according to a noise
|
||
profile (which was generated by noiseprof), from profile-file,
|
||
or from stdin if no profile-file or if `-' is given. E.g.
|
||
sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
|
||
How much noise should be removed is specified by amount-a number
|
||
between 0 and 1 with a default of 0.5. Higher numbers will
|
||
remove more noise but present a greater likelihood of removing
|
||
wanted components of the audio signal. Before replacing an
|
||
original recording with a noise-reduced version, experiment with
|
||
different amount values to find the optimal one for your audio;
|
||
use headphones to check that you are happy with the results,
|
||
paying particular attention to quieter sections of the audio.
|
||
|
||
On most systems, the two stages - profiling and reduction - can
|
||
be combined using a pipe, e.g.
|
||
sox noisy.wav -n trim 0 1 noiseprof | play noisy.wav noisered
|
||
|
||
norm [dB-level]
|
||
Normalise the audio. norm is just an alias for gain -n; see the
|
||
gain effect for details.
|
||
|
||
oops Out Of Phase Stereo effect. Mixes stereo to twin-mono where
|
||
each mono channel contains the difference between the left and
|
||
right stereo channels. This is sometimes known as the `karaoke'
|
||
effect as it often has the effect of removing most or all of the
|
||
vocals from a recording. It is equivalent to remix 1,2i 1,2i.
|
||
|
||
overdrive [gain(20) [colour(20)]]
|
||
Non linear distortion. The colour parameter controls the amount
|
||
of even harmonic content in the over-driven output.
|
||
|
||
pad { length[@position(=)] }
|
||
Pad the audio with silence, at the beginning, the end, or any
|
||
specified points through the audio. length is the amount of
|
||
silence to insert and position the position in the input audio
|
||
stream at which to insert it. Any number of lengths and posi‐
|
||
tions may be specified, provided that a specified position is
|
||
not less that the previous one, and any time specification may
|
||
be used for them. position is optional for the first and last
|
||
lengths specified and if omitted correspond to the beginning and
|
||
the end of the audio respectively. For example, pad 1.5 1.5
|
||
adds 1.5 seconds of silence padding at each end of the audio,
|
||
whilst pad 4000s@3:00 inserts 4000 samples of silence 3 minutes
|
||
into the audio. If silence is wanted only at the end of the
|
||
audio, specify either the end position or specify a zero-length
|
||
pad at the start.
|
||
|
||
See also delay for an effect that can add silence at the begin‐
|
||
ning of the audio on a channel-by-channel basis.
|
||
|
||
phaser gain-in gain-out delay decay speed [-s|-t]
|
||
Add a phasing effect to the audio. See [3] for a detailed
|
||
description of phasing.
|
||
|
||
delay/decay/speed gives the delay in milliseconds and the decay
|
||
(relative to gain-in) with a modulation speed in Hz. The modu‐
|
||
lation is either sinusoidal (-s) - preferable for multiple
|
||
instruments, or triangular (-t) - gives single instruments a
|
||
sharper phasing effect. The decay should be less than 0.5 to
|
||
avoid feedback, and usually no less than 0.1. Gain-out is the
|
||
volume of the output.
|
||
|
||
For example:
|
||
play snare.flac phaser 0.8 0.74 3 0.4 0.5 -t
|
||
Gentler:
|
||
play snare.flac phaser 0.9 0.85 4 0.23 1.3 -s
|
||
A popular sound:
|
||
play snare.flac phaser 0.89 0.85 1 0.24 2 -t
|
||
More severe:
|
||
play snare.flac phaser 0.6 0.66 3 0.6 2 -t
|
||
|
||
pitch [-q] shift [segment [search [overlap]]]
|
||
Change the audio pitch (but not tempo).
|
||
|
||
shift gives the pitch shift as positive or negative `cents'
|
||
(i.e. 100ths of a semitone). See the tempo effect for a
|
||
description of the other parameters.
|
||
|
||
See also the bend, speed, and tempo effects.
|
||
|
||
rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
|
||
Change the audio sampling rate (i.e. resample the audio) to any
|
||
given RATE (even non-integer if this is supported by the output
|
||
file format) using a quality level defined as follows:
|
||
|
||
Quality Band- Rej dB Typical Use
|
||
width
|
||
-q quick n/a ≈30 @ playback on
|
||
Fs/4 ancient hardware
|
||
-l low 80% 100 playback on old
|
||
hardware
|
||
-m medium 95% 100 audio playback
|
||
|
||
|
||
-h high 95% 125 16-bit mastering
|
||
(use with dither)
|
||
-v very high 95% 175 24-bit mastering
|
||
|
||
where Band-width is the percentage of the audio frequency band
|
||
that is preserved and Rej dB is the level of noise rejection.
|
||
Increasing levels of resampling quality come at the expense of
|
||
increasing amounts of time to process the audio. If no quality
|
||
option is given, the quality level used is `high' (but see
|
||
`Playing & Recording Audio' above regarding playback).
|
||
|
||
The `quick' algorithm uses cubic interpolation; all others use
|
||
band-limited interpolation. By default, all algorithms have a
|
||
`linear' phase response; for `medium', `high' and `very high',
|
||
the phase response is configurable (see below).
|
||
|
||
The rate effect is invoked automatically if SoX's -r option
|
||
specifies a rate that is different to that of the input file(s).
|
||
Alternatively, if this effect is given explicitly, then SoX's -r
|
||
option need not be given. For example, the following two com‐
|
||
mands are equivalent:
|
||
sox input.wav -r 48k output.wav bass -b 24
|
||
sox input.wav output.wav bass -b 24 rate 48k
|
||
though the second command is more flexible as it allows rate
|
||
options to be given, and allows the effects to be ordered arbi‐
|
||
trarily.
|
||
|
||
* * *
|
||
|
||
Warning: technically detailed discussion follows.
|
||
|
||
The simple quality selection described above provides settings
|
||
that satisfy the needs of the vast majority of resampling tasks.
|
||
Occasionally, however, it may be desirable to fine-tune the
|
||
resampler's filter response; this can be achieved using over‐
|
||
ride options, as detailed in the following table:
|
||
|
||
-M/-I/-L Phase response = minimum/intermediate/linear
|
||
-s Steep filter (band-width = 99%)
|
||
-a Allow aliasing/imaging above the pass-band
|
||
-b 74-99.7 Any band-width %
|
||
-p 0-100 Any phase response (0 = minimum, 25 = intermediate,
|
||
50 = linear, 100 = maximum)
|
||
|
||
N.B. Override options cannot be used with the `quick' or `low'
|
||
quality algorithms.
|
||
|
||
All resamplers use filters that can sometimes create `echo'
|
||
(a.k.a. `ringing') artefacts with transient signals such as
|
||
those that occur with `finger snaps' or other highly percussive
|
||
sounds. Such artefacts are much more noticeable to the human
|
||
ear if they occur before the transient (`pre-echo') than if they
|
||
occur after it (`post-echo'). Note that frequency of any such
|
||
artefacts is related to the smaller of the original and new sam‐
|
||
pling rates but that if this is at least 44.1kHz, then the arte‐
|
||
facts will lie outside the range of human hearing.
|
||
|
||
A phase response setting may be used to control the distribution
|
||
of any transient echo between `pre' and `post': with minimum
|
||
phase, there is no pre-echo but the longest post-echo; with lin‐
|
||
ear phase, pre and post echo are in equal amounts (in signal
|
||
terms, but not audibility terms); the intermediate phase setting
|
||
attempts to find the best compromise by selecting a small length
|
||
(and level) of pre-echo and a medium lengthed post-echo.
|
||
|
||
Minimum, intermediate, or linear phase response is selected
|
||
using the -M, -I, or -L option; a custom phase response can be
|
||
created with the -p option. Note that phase responses between
|
||
`linear' and `maximum' (greater than 50) are rarely useful.
|
||
|
||
A resampler's band-width setting determines how much of the fre‐
|
||
quency content of the original signal (w.r.t. the original sam‐
|
||
ple rate when up-sampling, or the new sample rate when down-sam‐
|
||
pling) is preserved during conversion. The term `pass-band' is
|
||
used to refer to all frequencies up to the band-width point
|
||
(e.g. for 44.1kHz sampling rate, and a resampling band-width of
|
||
95%, the pass-band represents frequencies from 0Hz (D.C.) to
|
||
circa 21kHz). Increasing the resampler's band-width results in
|
||
a slower conversion and can increase transient echo artefacts
|
||
(and vice versa).
|
||
|
||
The -s `steep filter' option changes resampling band-width from
|
||
the default 95% (based on the 3dB point), to 99%. The -b option
|
||
allows the band-width to be set to any value in the range
|
||
74-99.7 %, but note that band-width values greater than 99% are
|
||
not recommended for normal use as they can cause excessive tran‐
|
||
sient echo.
|
||
|
||
If the -a option is given, then aliasing/imaging above the pass-
|
||
band is allowed. For example, with 44.1kHz sampling rate, and a
|
||
resampling band-width of 95%, this means that frequency content
|
||
above 21kHz can be distorted; however, since this is above the
|
||
pass-band (i.e. above the highest frequency of interest/audi‐
|
||
bility), this may not be a problem. The benefits of allowing
|
||
aliasing/imaging are reduced processing time, and reduced (by
|
||
almost half) transient echo artefacts. Note that if this option
|
||
is given, then the minimum band-width allowable with -b
|
||
increases to 85%.
|
||
|
||
Examples:
|
||
sox input.wav -b 16 output.wav rate -s -a 44100 dither -s
|
||
default (high) quality resampling; overrides: steep filter,
|
||
allow aliasing; to 44.1kHz sample rate; noise-shaped dither to
|
||
16-bit WAV file.
|
||
sox input.wav -b 24 output.aiff rate -v -I -b 90 48k
|
||
very high quality resampling; overrides: intermediate phase,
|
||
band-width 90%; to 48k sample rate; store output to 24-bit AIFF
|
||
file.
|
||
|
||
* * *
|
||
|
||
The pitch and speed effects use the rate effect at their core.
|
||
|
||
remix [-a|-m|-p] <out-spec>
|
||
out-spec = in-spec{,in-spec} | 0
|
||
in-spec = [in-chan][-[in-chan2]][vol-spec]
|
||
vol-spec = p|i|v[volume]
|
||
|
||
Select and mix input audio channels into output audio channels.
|
||
Each output channel is specified, in turn, by a given out-spec:
|
||
a list of contributing input channels and volume specifications.
|
||
|
||
Note that this effect operates on the audio channels within the
|
||
SoX effects processing chain; it should not be confused with the
|
||
-m global option (where multiple files are mix-combined before
|
||
entering the effects chain).
|
||
|
||
An out-spec contains comma-separated input channel-numbers and
|
||
hyphen-delimited channel-number ranges; alternatively, 0 may be
|
||
given to create a silent output channel. For example,
|
||
sox input.wav output.wav remix 6 7 8 0
|
||
creates an output file with four channels, where channels 1, 2,
|
||
and 3 are copies of channels 6, 7, and 8 in the input file, and
|
||
channel 4 is silent. Whereas
|
||
sox input.wav output.wav remix 1-3,7 3
|
||
creates a (somewhat bizarre) stereo output file where the left
|
||
channel is a mix-down of input channels 1, 2, 3, and 7, and the
|
||
right channel is a copy of input channel 3.
|
||
|
||
Where a range of channels is specified, the channel numbers to
|
||
the left and right of the hyphen are optional and default to 1
|
||
and to the number of input channels respectively. Thus
|
||
sox input.wav output.wav remix -
|
||
performs a mix-down of all input channels to mono.
|
||
|
||
By default, where an output channel is mixed from multiple (n)
|
||
input channels, each input channel will be scaled by a factor of
|
||
¹/n. Custom mixing volumes can be set by following a given
|
||
input channel or range of input channels with a vol-spec (volume
|
||
specification). This is one of the letters p, i, or v, followed
|
||
by a volume number, the meaning of which depends on the given
|
||
letter and is defined as follows:
|
||
|
||
Letter Volume number Notes
|
||
p power adjust in dB 0 = no change
|
||
|
||
i power adjust in dB As `p', but invert
|
||
the audio
|
||
v voltage multiplier 1 = no change, 0.5
|
||
≈ 6dB attenuation,
|
||
2 ≈ 6dB gain, -1 =
|
||
invert
|
||
|
||
If an out-spec includes at least one vol-spec then, by default,
|
||
¹/n scaling is not applied to any other channels in the same
|
||
out-spec (though may be in other out-specs). The -a (automatic)
|
||
option however, can be given to retain the automatic scaling in
|
||
this case. For example,
|
||
sox input.wav output.wav remix 1,2 3,4v0.8
|
||
results in channel level multipliers of 0.5,0.5 1,0.8, whereas
|
||
sox input.wav output.wav remix -a 1,2 3,4v0.8
|
||
results in channel level multipliers of 0.5,0.5 0.5,0.8.
|
||
|
||
The -m (manual) option disables all automatic volume adjust‐
|
||
ments, so
|
||
sox input.wav output.wav remix -m 1,2 3,4v0.8
|
||
results in channel level multipliers of 1,1 1,0.8.
|
||
|
||
The volume number is optional and omitting it corresponds to no
|
||
volume change; however, the only case in which this is useful is
|
||
in conjunction with i. For example, if input.wav is stereo,
|
||
then
|
||
sox input.wav output.wav remix 1,2i
|
||
is a mono equivalent of the oops effect.
|
||
|
||
If the -p option is given, then any automatic ¹/n scaling is
|
||
replaced by ¹/√n (`power') scaling; this gives a louder mix but
|
||
one that might occasionally clip.
|
||
|
||
* * *
|
||
|
||
One use of the remix effect is to split an audio file into a set
|
||
of files, each containing one of the constituent channels (in
|
||
order to perform subsequent processing on individual audio chan‐
|
||
nels). Where more than a few channels are involved, a script
|
||
such as the following (Bourne shell script) is useful:
|
||
#!/bin/sh
|
||
chans=`soxi -c "$1"`
|
||
while [ $chans -ge 1 ]; do
|
||
chans0=`printf %02i $chans` # 2 digits hence up to 99 chans
|
||
out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
|
||
sox "$1" "$out" remix $chans
|
||
chans=`expr $chans - 1`
|
||
done
|
||
If a file input.wav containing six audio channels were given,
|
||
the script would produce six output files: input-01.wav,
|
||
input-02.wav, ..., input-06.wav.
|
||
|
||
See also the swap effect.
|
||
|
||
repeat [count(1)|-]
|
||
Repeat the entire audio count times, or once if count is not
|
||
given. The special value - requests infinite repetition.
|
||
Requires temporary file space to store the audio to be repeated.
|
||
Note that repeating once yields two copies: the original audio
|
||
and the repeated audio.
|
||
|
||
reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
|
||
[room-scale (100%) [stereo-depth (100%)
|
||
[pre-delay (0ms) [wet-gain (0dB)]]]]]]
|
||
|
||
Add reverberation to the audio using the `freeverb' algorithm.
|
||
A reverberation effect is sometimes desirable for concert halls
|
||
that are too small or contain so many people that the hall's
|
||
natural reverberance is diminished. Applying a small amount of
|
||
stereo reverb to a (dry) mono signal will usually make it sound
|
||
more natural. See [3] for a detailed description of reverbera‐
|
||
tion.
|
||
|
||
Note that this effect increases both the volume and the length
|
||
of the audio, so to prevent clipping in these domains, a typical
|
||
invocation might be:
|
||
play dry.wav gain -3 pad 0 3 reverb
|
||
The -w option can be given to select only the `wet' signal, thus
|
||
allowing it to be processed further, independently of the `dry'
|
||
signal. E.g.
|
||
play -m voice.wav "|sox voice.wav -p reverse reverb -w reverse"
|
||
for a reverse reverb effect.
|
||
|
||
reverse
|
||
Reverse the audio completely. Requires temporary file space to
|
||
store the audio to be reversed.
|
||
|
||
riaa Apply RIAA vinyl playback equalisation. The sampling rate must
|
||
be one of: 44.1, 48, 88.2, 96 kHz.
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
silence [-l] above-periods [duration threshold[d|%]
|
||
[below-periods duration threshold[d|%]]
|
||
|
||
Removes silence from the beginning, middle, or end of the audio.
|
||
`Silence' is determined by a specified threshold.
|
||
|
||
The above-periods value is used to indicate if audio should be
|
||
trimmed at the beginning of the audio. A value of zero indicates
|
||
no silence should be trimmed from the beginning. When specifying
|
||
a non-zero above-periods, it trims audio up until it finds non-
|
||
silence. Normally, when trimming silence from beginning of audio
|
||
the above-periods will be 1 but it can be increased to higher
|
||
values to trim all audio up to a specific count of non-silence
|
||
periods. For example, if you had an audio file with two songs
|
||
that each contained 2 seconds of silence before the song, you
|
||
could specify an above-period of 2 to strip out both silence
|
||
periods and the first song.
|
||
|
||
When above-periods is non-zero, you must also specify a duration
|
||
and threshold. duration indicates the amount of time that non-
|
||
silence must be detected before it stops trimming audio. By
|
||
increasing the duration, burst of noise can be treated as
|
||
silence and trimmed off.
|
||
|
||
threshold is used to indicate what sample value you should treat
|
||
as silence. For digital audio, a value of 0 may be fine but for
|
||
audio recorded from analog, you may wish to increase the value
|
||
to account for background noise.
|
||
|
||
When optionally trimming silence from the end of the audio, you
|
||
specify a below-periods count. In this case, below-period means
|
||
to remove all audio after silence is detected. Normally, this
|
||
will be a value 1 of but it can be increased to skip over peri‐
|
||
ods of silence that are wanted. For example, if you have a song
|
||
with 2 seconds of silence in the middle and 2 second at the end,
|
||
you could set below-period to a value of 2 to skip over the
|
||
silence in the middle of the audio.
|
||
|
||
For below-periods, duration specifies a period of silence that
|
||
must exist before audio is not copied any more. By specifying a
|
||
higher duration, silence that is wanted can be left in the
|
||
audio. For example, if you have a song with an expected 1 sec‐
|
||
ond of silence in the middle and 2 seconds of silence at the
|
||
end, a duration of 2 seconds could be used to skip over the mid‐
|
||
dle silence.
|
||
|
||
Unfortunately, you must know the length of the silence at the
|
||
end of your audio file to trim off silence reliably. A work‐
|
||
around is to use the silence effect in combination with the
|
||
reverse effect. By first reversing the audio, you can use the
|
||
above-periods to reliably trim all audio from what looks like
|
||
the front of the file. Then reverse the file again to get back
|
||
to normal.
|
||
|
||
To remove silence from the middle of a file, specify a below-
|
||
periods that is negative. This value is then treated as a posi‐
|
||
tive value and is also used to indicate that the effect should
|
||
restart processing as specified by the above-periods, making it
|
||
suitable for removing periods of silence in the middle of the
|
||
audio.
|
||
|
||
The option -l indicates that below-periods duration length of
|
||
audio should be left intact at the beginning of each period of
|
||
silence. For example, if you want to remove long pauses between
|
||
words but do not want to remove the pauses completely.
|
||
|
||
duration is a time specification with the peculiarity that a
|
||
bare number is interpreted as a sample count, not as a number of
|
||
seconds. For specifying seconds, either use the t suffix (as in
|
||
`2t') or specify minutes, too (as in `0:02').
|
||
|
||
threshold numbers may be suffixed with d to indicate the value
|
||
is in decibels, or % to indicate a percentage of maximum value
|
||
of the sample value (0% specifies pure digital silence).
|
||
|
||
The following example shows how this effect can be used to start
|
||
a recording that does not contain the delay at the start which
|
||
usually occurs between `pressing the record button' and the
|
||
start of the performance:
|
||
rec parameters filename other-effects silence 1 5 2%
|
||
|
||
sinc [-a att|-b beta] [-p phase|-M|-I|-L] [-t tbw|-n taps] [freqHP]
|
||
[-freqLP [-t tbw|-n taps]]
|
||
Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or
|
||
band-reject filter to the signal. The freqHP and freqLP parame‐
|
||
ters give the frequencies of the 6dB points of a high-pass and
|
||
low-pass filter that may be invoked individually, or together.
|
||
If both are given, then freqHP less than freqLP creates a band-
|
||
pass filter, freqHP greater than freqLP creates a band-reject
|
||
filter. For example, the invocations
|
||
sinc 3k
|
||
sinc -4k
|
||
sinc 3k-4k
|
||
sinc 4k-3k
|
||
create a high-pass, low-pass, band-pass, and band-reject filter
|
||
respectively.
|
||
|
||
The default stop-band attenuation of 120dB can be overridden
|
||
with -a; alternatively, the kaiser-window `beta' parameter can
|
||
be given directly with -b.
|
||
|
||
The default transition band-width of 5% of the total band can be
|
||
overridden with -t (and tbw in Hertz); alternatively, the number
|
||
of filter taps can be given directly with -n.
|
||
|
||
If both freqHP and freqLP are given, then a -t or -n option
|
||
given to the left of the frequencies applies to both frequen‐
|
||
cies; one of these options given to the right of the frequencies
|
||
applies only to freqLP.
|
||
|
||
The -p, -M, -I, and -L options control the filter's phase
|
||
response; see the rate effect for details.
|
||
|
||
This effect supports the --plot global option.
|
||
|
||
spectrogram [options]
|
||
Create a spectrogram of the audio; the audio is passed unmodi‐
|
||
fied through the SoX processing chain. This effect is optional
|
||
- type sox --help and check the list of supported effects to see
|
||
if it has been included.
|
||
|
||
The spectrogram is rendered in a Portable Network Graphic (PNG)
|
||
file, and shows time in the X-axis, frequency in the Y-axis, and
|
||
audio signal magnitude in the Z-axis. Z-axis values are repre‐
|
||
sented by the colour (or optionally the intensity) of the pixels
|
||
in the X-Y plane. If the audio signal contains multiple chan‐
|
||
nels then these are shown from top to bottom starting from chan‐
|
||
nel 1 (which is the left channel for stereo audio).
|
||
|
||
For example, if `my.wav' is a stereo file, then with
|
||
sox my.wav -n spectrogram
|
||
a spectrogram of the entire file will be created in the file
|
||
`spectrogram.png'. More often though, analysis of a smaller
|
||
portion of the audio is required; e.g. with
|
||
sox my.wav -n remix 2 trim 20 30 spectrogram
|
||
the spectrogram shows information only from the second (right)
|
||
channel, and of thirty seconds of audio starting from twenty
|
||
seconds in. To analyse a small portion of the frequency domain,
|
||
the rate effect may be used, e.g.
|
||
sox my.wav -n rate 6k spectrogram
|
||
allows detailed analysis of frequencies up to 3kHz (half the
|
||
sampling rate) i.e. where the human auditory system is most sen‐
|
||
sitive. With
|
||
sox my.wav -n trim 0 10 spectrogram -x 600 -y 200 -z 100
|
||
the given options control the size of the spectrogram's X, Y & Z
|
||
axes (in this case, the spectrogram area of the produced image
|
||
will be 600 by 200 pixels in size and the Z-axis range will be
|
||
100 dB). Note that the produced image includes axes legends
|
||
etc. and so will be a little larger than the specified spectro‐
|
||
gram size. In this example:
|
||
sox -n -n synth 6 tri 10k:14k spectrogram -z 100 -w kaiser
|
||
an analysis `window' with high dynamic range is selected to best
|
||
display the spectrogram of a swept triangular wave. For a smi‐
|
||
lar example, append the following to the `chime' command in the
|
||
description of the delay effect (above):
|
||
rate 2k spectrogram -X 200 -Z -10 -w kaiser
|
||
Options are also available to control the appearance (colour-
|
||
set, brightness, contrast, etc.) and filename of the spectro‐
|
||
gram; e.g. with
|
||
sox my.wav -n spectrogram -m -l -o print.png
|
||
a spectrogram is created suitable for printing on a `black and
|
||
white' printer.
|
||
|
||
Options:
|
||
|
||
-x num Change the (maximum) width (X-axis) of the spectrogram
|
||
from its default value of 800 pixels to a given number
|
||
between 100 and 200000. See also -X and -d.
|
||
|
||
-X num X-axis pixels/second; the default is auto-calculated to
|
||
fit the given or known audio duration to the X-axis size,
|
||
or 100 otherwise. If given in conjunction with -d, this
|
||
option affects the width of the spectrogram; otherwise,
|
||
it affects the duration of the spectrogram. num can be
|
||
from 1 (low time resolution) to 5000 (high time resolu‐
|
||
tion) and need not be an integer. SoX may make a slight
|
||
adjustment to the given number for processing quantisa‐
|
||
tion reasons; if so, SoX will report the actual number
|
||
used (viewable when the SoX global option -V is in
|
||
effect). See also -x and -d.
|
||
|
||
-y num Sets the Y-axis size in pixels (per channel); this is the
|
||
number of frequency `bins' used in the Fourier analysis
|
||
that produces the spectrogram. N.B. it can be slow to
|
||
produce the spectrogram if this number is not one more
|
||
than a power of two (e.g. 129). By default the Y-axis
|
||
size is chosen automatically (depending on the number of
|
||
channels). See -Y for alternative way of setting spec‐
|
||
trogram height.
|
||
|
||
-Y num Sets the target total height of the spectrogram(s). The
|
||
default value is 550 pixels. Using this option (and by
|
||
default), SoX will choose a height for individual spec‐
|
||
trogram channels that is one more than a power of two, so
|
||
the actual total height may fall short of the given num‐
|
||
ber. However, there is also a minimum height per channel
|
||
so if there are many channels, the number may be
|
||
exceeded. See -y for alternative way of setting spectro‐
|
||
gram height.
|
||
|
||
-z num Z-axis (colour) range in dB, default 120. This sets the
|
||
dynamic-range of the spectrogram to be -num dBFS to
|
||
0 dBFS. Num may range from 20 to 180. Decreasing
|
||
dynamic-range effectively increases the `contrast' of the
|
||
spectrogram display, and vice versa.
|
||
|
||
-Z num Sets the upper limit of the Z-axis in dBFS. A negative
|
||
num effectively increases the `brightness' of the spec‐
|
||
trogram display, and vice versa.
|
||
|
||
-q num Sets the Z-axis quantisation, i.e. the number of differ‐
|
||
ent colours (or intensities) in which to render Z-axis
|
||
values. A small number (e.g. 4) will give a
|
||
`poster'-like effect making it easier to discern magni‐
|
||
tude bands of similar level. Small numbers also usually
|
||
result in small PNG files. The number given specifies
|
||
the number of colours to use inside the Z-axis range; two
|
||
colours are reserved to represent out-of-range values.
|
||
|
||
-w name
|
||
Window: Hann (default), Hamming, Bartlett, Rectangular,
|
||
Kaiser or Dolph. The spectrogram is produced using the
|
||
Discrete Fourier Transform (DFT) algorithm. A signifi‐
|
||
cant parameter to this algorithm is the choice of `window
|
||
function'. By default, SoX uses the Hann window which
|
||
has good all-round frequency-resolution and dynamic-range
|
||
properties. For better frequency resolution (but lower
|
||
dynamic-range), select a Hamming window; for higher
|
||
dynamic-range (but poorer frequency-resolution), select a
|
||
Dolph window. Kaiser, Bartlett and Rectangular windows
|
||
are also available.
|
||
|
||
-W num Window adjustment parameter. This can be used to make
|
||
small adjustments to the Kaiser or Dolph window shape. A
|
||
positive number (up to ten) increases its dynamic range,
|
||
a negative number decreases it.
|
||
|
||
-s Allow slack overlapping of DFT windows. This can, in
|
||
some cases, increase image sharpness and give greater
|
||
adherence to the -x value, but at the expense of a little
|
||
spectral loss.
|
||
|
||
-m Creates a monochrome spectrogram (the default is colour).
|
||
|
||
-h Selects a high-colour palette - less visually pleasing
|
||
than the default colour palette, but it may make it eas‐
|
||
ier to differentiate different levels. If this option is
|
||
used in conjunction with -m, the result will be a hybrid
|
||
monochrome/colour palette.
|
||
|
||
-p num Permute the colours in a colour or hybrid palette. The
|
||
num parameter, from 1 (the default) to 6, selects the
|
||
permutation.
|
||
|
||
-l Creates a `printer friendly' spectrogram with a light
|
||
background (the default has a dark background).
|
||
|
||
-a Suppress the display of the axis lines. This is some‐
|
||
times useful in helping to discern artefacts at the spec‐
|
||
trogram edges.
|
||
|
||
-r Raw spectrogram: suppress the display of axes and leg‐
|
||
ends.
|
||
|
||
-A Selects an alternative, fixed colour-set. This is pro‐
|
||
vided only for compatibility with spectrograms produced
|
||
by another package. It should not normally be used as it
|
||
has some problems, not least, a lack of differentiation
|
||
at the bottom end which results in masking of low-level
|
||
artefacts.
|
||
|
||
-t text
|
||
Set the image title - text to display above the spectro‐
|
||
gram.
|
||
|
||
-c text
|
||
Set (or clear) the image comment - text to display below
|
||
and to the left of the spectrogram.
|
||
|
||
-o file
|
||
Name of the spectrogram output PNG file, default `spec‐
|
||
trogram.png'. If `-' is given, the spectrogram will be
|
||
sent to standard output (stdout).
|
||
|
||
Advanced Options:
|
||
In order to process a smaller section of audio without affecting
|
||
other effects or the output signal (unlike when the trim effect
|
||
is used), the following options may be used.
|
||
|
||
-d duration
|
||
This option sets the X-axis resolution such that audio
|
||
with the given duration (a time specification) fits the
|
||
selected (or default) X-axis width. For example,
|
||
sox input.mp3 output.wav -n spectrogram -d 1:00 stats
|
||
creates a spectrogram showing the first minute of the
|
||
audio, whilst
|
||
the stats effect is applied to the entire audio signal.
|
||
|
||
See also -X for an alternative way of setting the X-axis
|
||
resolution.
|
||
|
||
-S position(=)
|
||
Start the spectrogram at the given point in the audio
|
||
stream. For example
|
||
sox input.aiff output.wav spectrogram -S 1:00
|
||
creates a spectrogram showing all but the first minute of
|
||
the audio (the output file, however, receives the entire
|
||
audio stream).
|
||
|
||
For the ability to perform off-line processing of spectral data,
|
||
see the stat effect.
|
||
|
||
speed factor[c]
|
||
Adjust the audio speed (pitch and tempo together). factor is
|
||
either the ratio of the new speed to the old speed: greater than
|
||
1 speeds up, less than 1 slows down, or, if appended with the
|
||
letter `c', the number of cents (i.e. 100ths of a semitone) by
|
||
which the pitch (and tempo) should be adjusted: greater than 0
|
||
increases, less than 0 decreases.
|
||
|
||
Technically, the speed effect only changes the sample rate
|
||
information, leaving the samples themselves untouched. The rate
|
||
effect is invoked automatically to resample to the output sample
|
||
rate, using its default quality/speed. For higher quality or
|
||
higher speed resampling, in addition to the speed effect, spec‐
|
||
ify the rate effect with the desired quality option.
|
||
|
||
See also the bend, pitch, and tempo effects.
|
||
|
||
splice [-h|-t|-q] { position(=)[,excess[,leeway]] }
|
||
Splice together audio sections. This effect provides two things
|
||
over simple audio concatenation: a (usually short) cross-fade is
|
||
applied at the join, and a wave similarity comparison is made to
|
||
help determine the best place at which to make the join.
|
||
|
||
One of the options -h, -t, or -q may be given to select the fade
|
||
envelope as half-cosine wave (the default), triangular (a.k.a.
|
||
linear), or quarter-cosine wave respectively.
|
||
|
||
Type Audio Fade level Transitions
|
||
t correlated constant gain abrupt
|
||
h correlated constant gain smooth
|
||
q uncorrelated constant power smooth
|
||
|
||
To perform a splice, first use the trim effect to select the
|
||
audio sections to be joined together. As when performing a tape
|
||
splice, the end of the section to be spliced onto should be
|
||
trimmed with a small excess (default 0.005 seconds) of audio
|
||
after the ideal joining point. The beginning of the audio sec‐
|
||
tion to splice on should be trimmed with the same excess (before
|
||
the ideal joining point), plus an additional leeway (default
|
||
0.005 seconds). Any time specification may be used for these
|
||
parameters. SoX should then be invoked with the two audio sec‐
|
||
tions as input files and the splice effect given with the posi‐
|
||
tion at which to perform the splice - this is length of the
|
||
first audio section (including the excess).
|
||
|
||
The following diagram uses the tape analogy to illustrate the
|
||
splice operation. The effect simulates the diagonal cuts and
|
||
joins the two pieces:
|
||
|
||
length1 excess
|
||
-----------><--->
|
||
_________ : : _________________
|
||
\ : : :\ `
|
||
\ : : : \ `
|
||
\: : : \ `
|
||
* : : * - - *
|
||
\ : : :\ `
|
||
\ : : : \ `
|
||
_______________\: : : \_____`____
|
||
: : : :
|
||
<---> <----->
|
||
excess leeway
|
||
|
||
where * indicates the joining points.
|
||
|
||
For example, a long song begins with two verses which start (as
|
||
determined e.g. by using the play command with the trim (start)
|
||
effect) at times 0:30.125 and 1:03.432. The following commands
|
||
cut out the first verse:
|
||
sox too-long.wav part1.wav trim 0 30.130
|
||
(5 ms excess, after the first verse starts)
|
||
sox too-long.wav part2.wav trim 1:03.422
|
||
(5 ms excess plus 5 ms leeway, before the second verse starts)
|
||
sox part1.wav part2.wav just-right.wav splice 30.130
|
||
For another example, the SoX command
|
||
play "|sox -n -p synth 1 sin %1" "|sox -n -p synth 1 sin %3"
|
||
generates and plays two notes, but there is a nasty click at the
|
||
transition; the click can be removed by splicing instead of con‐
|
||
catenating the audio, i.e. by appending splice 1 to the command.
|
||
(Clicks at the beginning and end of the audio can be removed by
|
||
preceding the splice effect with fade q .01 2 .01).
|
||
|
||
Provided your arithmetic is good enough, multiple splices can be
|
||
performed with a single splice invocation. For example:
|
||
#!/bin/sh
|
||
# Audio Copy and Paste Over
|
||
# acpo infile copy-start copy-stop paste-over-start outfile
|
||
# No chained time specifications allowed for the parameters
|
||
# (i.e. such that contain +/-).
|
||
e=0.005 # Using default excess
|
||
l=$e # and leeway.
|
||
sox "$1" piece.wav trim $2-$e-$l =$3+$e
|
||
sox "$1" part1.wav trim 0 $4+$e
|
||
sox "$1" part2.wav trim $4+$3-$2-$e-$l
|
||
sox part1.wav piece.wav part2.wav "$5" \
|
||
splice $4+$e +$3-$2+$e+$l+$e
|
||
In the above Bourne shell script, two splices are used to `copy
|
||
and paste' audio.
|
||
|
||
* * *
|
||
|
||
It is also possible to use this effect to perform general cross-
|
||
fades, e.g. to join two songs. In this case, excess would typi‐
|
||
cally be an number of seconds, the -q option would typically be
|
||
given (to select an `equal power' cross-fade), and leeway should
|
||
be zero (which is the default if -q is given). For example, if
|
||
f1.wav and f2.wav are audio files to be cross-faded, then
|
||
sox f1.wav f2.wav out.wav splice -q $(soxi -D f1.wav),3
|
||
cross-fades the files where the point of equal loudness is 3
|
||
seconds before the end of f1.wav, i.e. the total length of the
|
||
cross-fade is 2 × 3 = 6 seconds (Note: the $(...) notation is
|
||
POSIX shell).
|
||
|
||
stat [-s scale] [-rms] [-freq] [-v] [-d]
|
||
Display time and frequency domain statistical information about
|
||
the audio. Audio is passed unmodified through the SoX process‐
|
||
ing chain.
|
||
|
||
The information is output to the `standard error' (stderr)
|
||
stream and is calculated, where n is the duration of the audio
|
||
in samples, c is the number of audio channels, r is the audio
|
||
sample rate, and xk represents the PCM value (in the range -1 to
|
||
+1 by default) of each successive sample in the audio, as fol‐
|
||
lows:
|
||
|
||
Samples read n×c
|
||
Length (seconds) n÷r
|
||
Scaled by See -s below.
|
||
Maximum amplitude max(xk) The maximum sample
|
||
value in the audio;
|
||
usually this will
|
||
be a positive num‐
|
||
ber.
|
||
Minimum amplitude min(xk) The minimum sample
|
||
value in the audio;
|
||
usually this will
|
||
be a negative num‐
|
||
ber.
|
||
Midline amplitude ½min(xk)+½max(xk)
|
||
Mean norm ¹/nΣ│xk│ The average of the
|
||
absolute value of
|
||
each sample in the
|
||
audio.
|
||
Mean amplitude ¹/nΣxk The average of each
|
||
sample in the
|
||
audio. If this
|
||
figure is non-zero,
|
||
then it indicates
|
||
the presence of a
|
||
D.C. offset (which
|
||
could be removed
|
||
using the dcshift
|
||
effect).
|
||
|
||
|
||
|
||
RMS amplitude √(¹/nΣxk²) The level of a D.C.
|
||
signal that would
|
||
have the same power
|
||
as the audio's
|
||
average power.
|
||
Maximum delta max(│xk-xk-1│)
|
||
Minimum delta min(│xk-xk-1│)
|
||
Mean delta ¹/n-1Σ│xk-xk-1│
|
||
RMS delta √(¹/n-1Σ(xk-xk-1)²)
|
||
Rough frequency In Hz.
|
||
Volume Adjustment The parameter to
|
||
the vol effect
|
||
which would make
|
||
the audio as loud
|
||
as possible without
|
||
clipping. Note:
|
||
See the discussion
|
||
on Clipping above
|
||
for reasons why it
|
||
is rarely a good
|
||
idea actually to do
|
||
this.
|
||
|
||
Note that the delta measurements are not applicable for multi-
|
||
channel audio.
|
||
|
||
The -s option can be used to scale the input data by a given
|
||
factor. The default value of scale is 2147483647 (i.e. the max‐
|
||
imum value of a 32-bit signed integer). Internal effects always
|
||
work with signed long PCM data and so the value should relate to
|
||
this fact.
|
||
|
||
The -rms option will convert all output average values to `root
|
||
mean square' format.
|
||
|
||
The -v option displays only the `Volume Adjustment' value.
|
||
|
||
The -freq option calculates the input's power spectrum (4096
|
||
point DFT) instead of the statistics listed above. This should
|
||
only be used with a single channel audio file.
|
||
|
||
The -d option displays a hex dump of the 32-bit signed PCM data
|
||
audio in SoX's internal buffer. This is mainly used to help
|
||
track down endian problems that sometimes occur in cross-plat‐
|
||
form versions of SoX.
|
||
|
||
See also the stats effect.
|
||
|
||
stats [-b bits|-x bits|-s scale] [-w window-time]
|
||
Display time domain statistical information about the audio
|
||
channels; audio is passed unmodified through the SoX processing
|
||
chain. Statistics are calculated and displayed for each audio
|
||
channel and, where applicable, an overall figure is also given.
|
||
|
||
For example, for a typical well-mastered stereo music file:
|
||
|
||
Overall Left Right
|
||
DC offset 0.000803 -0.000391 0.000803
|
||
Min level -0.750977 -0.750977 -0.653412
|
||
Max level 0.708801 0.708801 0.653534
|
||
Pk lev dB -2.49 -2.49 -3.69
|
||
RMS lev dB -19.41 -19.13 -19.71
|
||
RMS Pk dB -13.82 -13.82 -14.38
|
||
RMS Tr dB -85.25 -85.25 -82.66
|
||
Crest factor - 6.79 6.32
|
||
Flat factor 0.00 0.00 0.00
|
||
Pk count 2 2 2
|
||
Bit-depth 16/16 16/16 16/16
|
||
Num samples 7.72M
|
||
Length s 174.973
|
||
Scale max 1.000000
|
||
Window s 0.050
|
||
|
||
DC offset, Min level, and Max level are shown, by default, in
|
||
the range ±1. If the -b (bits) options is given, then these
|
||
three measurements will be scaled to a signed integer with the
|
||
given number of bits; for example, for 16 bits, the scale would
|
||
be -32768 to +32767. The -x option behaves the same way as -b
|
||
except that the signed integer values are displayed in hexadeci‐
|
||
mal. The -s option scales the three measurements by a given
|
||
floating-point number.
|
||
|
||
Pk lev dB and RMS lev dB are standard peak and RMS level mea‐
|
||
sured in dBFS. RMS Pk dB and RMS Tr dB are peak and trough val‐
|
||
ues for RMS level measured over a short window (default 50ms).
|
||
|
||
Crest factor is the standard ratio of peak to RMS level (note:
|
||
not in dB).
|
||
|
||
Flat factor is a measure of the flatness (i.e. consecutive sam‐
|
||
ples with the same value) of the signal at its peak levels (i.e.
|
||
either Min level, or Max level). Pk count is the number of
|
||
occasions (not the number of samples) that the signal attained
|
||
either Min level, or Max level.
|
||
|
||
The right-hand Bit-depth figure is the standard definition of
|
||
bit-depth i.e. bits less significant than the given number are
|
||
fixed at zero. The left-hand figure is the number of most sig‐
|
||
nificant bits that are fixed at zero (or one for negative num‐
|
||
bers) subtracted from the right-hand figure (the number sub‐
|
||
tracted is directly related to Pk lev dB).
|
||
|
||
For multi-channel audio, an overall figure for each of the above
|
||
measurements is given and derived from the channel figures as
|
||
follows: DC offset: maximum magnitude; Max level, Pk lev dB,
|
||
RMS Pk dB, Bit-depth: maximum; Min level, RMS Tr dB: minimum;
|
||
RMS lev dB, Flat factor, Pk count: average; Crest factor: not
|
||
applicable.
|
||
|
||
Length s is the duration in seconds of the audio, and Num sam‐
|
||
ples is equal to the sample-rate multiplied by Length.
|
||
Scale Max is the scaling applied to the first three measure‐
|
||
ments; specifically, it is the maximum value that could apply to
|
||
Max level. Window s is the length of the window used for the
|
||
peak and trough RMS measurements.
|
||
|
||
See also the stat effect.
|
||
|
||
swap Swap stereo channels. If the input is not stereo, pairs of
|
||
channels are swapped, and a possible odd last channel passed
|
||
through. E.g., for seven channels, the output order will be 2,
|
||
1, 4, 3, 6, 5, 7.
|
||
|
||
See also remix for an effect that allows arbitrary channel
|
||
selection and ordering (and mixing).
|
||
|
||
stretch factor [window fade shift fading]
|
||
Change the audio duration (but not its pitch). This effect is
|
||
broadly equivalent to the tempo effect with (factor inverted
|
||
and) search set to zero, so in general, its results are compara‐
|
||
tively poor; it is retained as it can sometimes out-perform
|
||
tempo for small factors.
|
||
|
||
factor of stretching: >1 lengthen, <1 shorten duration. window
|
||
size is in ms. Default is 20ms. The fade option, can be `lin'.
|
||
shift ratio, in [0 1]. Default depends on stretch factor. 1 to
|
||
shorten, 0.8 to lengthen. The fading ratio, in [0 0.5]. The
|
||
amount of a fade's default depends on factor and shift.
|
||
|
||
See also the tempo effect.
|
||
|
||
synth [-j KEY] [-n] [len [off [ph [p1 [p2 [p3]]]]]] {[type] [combine]
|
||
[[%]freq[k][:|+|/|-[%]freq2[k]]] [off [ph [p1 [p2 [p3]]]]]}
|
||
This effect can be used to generate fixed or swept frequency
|
||
audio tones with various wave shapes, or to generate wide-band
|
||
noise of various `colours'. Multiple synth effects can be cas‐
|
||
caded to produce more complex waveforms; at each stage it is
|
||
possible to choose whether the generated waveform will be mixed
|
||
with, or modulated onto the output from the previous stage.
|
||
Audio for each channel in a multi-channel audio file can be syn‐
|
||
thesised independently.
|
||
|
||
Though this effect is used to generate audio, an input file must
|
||
still be given, the characteristics of which will be used to set
|
||
the synthesised audio length, the number of channels, and the
|
||
sampling rate; however, since the input file's audio is not nor‐
|
||
mally needed, a `null file' (with the special name -n) is often
|
||
given instead (and the length specified as a parameter to synth
|
||
or by another given effect that has an associated length).
|
||
|
||
For example, the following produces a 3 second, 48kHz, audio
|
||
file containing a sine-wave swept from 300 to 3300 Hz:
|
||
sox -n output.wav synth 3 sine 300-3300
|
||
and this produces an 8 kHz version:
|
||
sox -r 8000 -n output.wav synth 3 sine 300-3300
|
||
Multiple channels can be synthesised by specifying the set of
|
||
parameters shown between braces multiple times; the following
|
||
puts the swept tone in the left channel and adds `brown' noise
|
||
in the right:
|
||
sox -n output.wav synth 3 sine 300-3300 brownnoise
|
||
The following example shows how two synth effects can be cas‐
|
||
caded to create a more complex waveform:
|
||
play -n synth 0.5 sine 200-500 synth 0.5 sine fmod 700-100
|
||
Frequencies can also be given in `scientific' note notation, or,
|
||
by prefixing a `%' character, as a number of semitones relative
|
||
to `middle A' (440 Hz). For example, the following could be
|
||
used to help tune a guitar's low `E' string:
|
||
play -n synth 4 pluck %-29
|
||
or with a (Bourne shell) loop, the whole guitar:
|
||
for n in E2 A2 D3 G3 B3 E4; do
|
||
play -n synth 4 pluck $n repeat 2; done
|
||
See the delay effect (above) and the reference to `SoX scripting
|
||
examples' (below) for more synth examples.
|
||
|
||
N.B. This effect generates audio at maximum volume (0dBFS),
|
||
which means that there is a high chance of clipping when using
|
||
the audio subsequently, so in many cases, you will want to fol‐
|
||
low this effect with the gain effect to prevent this from hap‐
|
||
pening. (See also Clipping above.) Note that, by default, the
|
||
synth effect incorporates the functionality of gain -h (see the
|
||
gain effect for details); synth's -n option may be given to dis‐
|
||
able this behaviour.
|
||
|
||
A detailed description of each synth parameter follows:
|
||
|
||
len is the length of audio to synthesise (any time specifica‐
|
||
tion); a value of 0 indicated to use the input length, which is
|
||
also the default.
|
||
|
||
type is one of sine, square, triangle, sawtooth, trapezium, exp,
|
||
[white]noise, tpdfnoise, pinknoise, brownnoise, pluck;
|
||
default=sine.
|
||
|
||
combine is one of create, mix, amod (amplitude modulation), fmod
|
||
(frequency modulation); default=create.
|
||
|
||
freq/freq2 are the frequencies at the beginning/end of synthesis
|
||
in Hz or, if preceded with `%', semitones relative to A
|
||
(440 Hz); alternatively, `scientific' note notation (e.g. E2)
|
||
may be used. The default frequency is 440Hz. By default, the
|
||
tuning used with the note notations is `equal temperament'; the
|
||
-j KEY option selects `just intonation', where KEY is an integer
|
||
number of semitones relative to A (so for example, -9 or 3
|
||
selects the key of C), or a note in scientific notation.
|
||
|
||
If freq2 is given, then len must also have been given and the
|
||
generated tone will be swept between the given frequencies. The
|
||
two given frequencies must be separated by one of the characters
|
||
`:', `+', `/', or `-'. This character is used to specify the
|
||
sweep function as follows:
|
||
|
||
: Linear: the tone will change by a fixed number of hertz
|
||
per second.
|
||
|
||
+ Square: a second-order function is used to change the
|
||
tone.
|
||
|
||
/ Exponential: the tone will change by a fixed number of
|
||
semitones per second.
|
||
|
||
- Exponential: as `/', but initial phase always zero, and
|
||
stepped (less smooth) frequency changes.
|
||
|
||
Not used for noise.
|
||
|
||
off is the bias (DC-offset) of the signal in percent; default=0.
|
||
|
||
ph is the phase shift in percentage of 1 cycle; default=0. Not
|
||
used for noise.
|
||
|
||
p1 is the percentage of each cycle that is `on' (square), or
|
||
`rising' (triangle, exp, trapezium); default=50 (square, trian‐
|
||
gle, exp), default=10 (trapezium), or sustain (pluck);
|
||
default=40.
|
||
|
||
p2 (trapezium): the percentage through each cycle at which
|
||
`falling' begins; default=50. exp: the amplitude in multiples of
|
||
2dB; default=50, or tone-1 (pluck); default=20.
|
||
|
||
p3 (trapezium): the percentage through each cycle at which
|
||
`falling' ends; default=60, or tone-2 (pluck); default=90.
|
||
|
||
tempo [-q] [-m|-s|-l] factor [segment [search [overlap]]]
|
||
Change the audio playback speed but not its pitch. This effect
|
||
uses the WSOLA algorithm. The audio is chopped up into segments
|
||
which are then shifted in the time domain and overlapped (cross-
|
||
faded) at points where their waveforms are most similar as
|
||
determined by measurement of `least squares'.
|
||
|
||
By default, linear searches are used to find the best overlap‐
|
||
ping points. If the optional -q parameter is given, tree
|
||
searches are used instead. This makes the effect work more
|
||
quickly, but the result may not sound as good. However, if you
|
||
must improve the processing speed, this generally reduces the
|
||
sound quality less than reducing the search or overlap values.
|
||
|
||
The -m option is used to optimize default values of segment,
|
||
search and overlap for music processing.
|
||
|
||
The -s option is used to optimize default values of segment,
|
||
search and overlap for speech processing.
|
||
|
||
The -l option is used to optimize default values of segment,
|
||
search and overlap for `linear' processing that tends to cause
|
||
more noticeable distortion but may be useful when factor is
|
||
close to 1.
|
||
|
||
If -m, -s, or -l is specified, the default value of segment will
|
||
be calculated based on factor, while default search and overlap
|
||
values are based on segment. Any values you provide still over‐
|
||
ride these default values.
|
||
|
||
factor gives the ratio of new tempo to the old tempo, so e.g.
|
||
1.1 speeds up the tempo by 10%, and 0.9 slows it down by 10%.
|
||
|
||
The optional segment parameter selects the algorithm's segment
|
||
size in milliseconds. If no other flags are specified, the
|
||
default value is 82 and is typically suited to making small
|
||
changes to the tempo of music. For larger changes (e.g. a factor
|
||
of 2), 41 ms may give a better result. The -m, -s, and -l flags
|
||
will cause the segment default to be automatically adjusted
|
||
based on factor. For example using -s (for speech) with a tempo
|
||
of 1.25 will calculate a default segment value of 32.
|
||
|
||
The optional search parameter gives the audio length in mil‐
|
||
liseconds over which the algorithm will search for overlapping
|
||
points. If no other flags are specified, the default value is
|
||
14.68. Larger values use more processing time and may or may
|
||
not produce better results. A practical maximum is half the
|
||
value of segment. Search can be reduced to cut processing time
|
||
at the risk of degrading output quality. The -m, -s, and -l
|
||
flags will cause the search default to be automatically adjusted
|
||
based on segment.
|
||
|
||
The optional overlap parameter gives the segment overlap length
|
||
in milliseconds. Default value is 12, but -m, -s, or -l flags
|
||
automatically adjust overlap based on segment size. Increasing
|
||
overlap increases processing time and may increase quality. A
|
||
practical maximum for overlap is the value of search, with over‐
|
||
lap typically being (at least) a little smaller then search.
|
||
|
||
See also speed for an effect that changes tempo and pitch
|
||
together, pitch and bend for effects that change pitch only, and
|
||
stretch for an effect that changes tempo using a different algo‐
|
||
rithm.
|
||
|
||
treble gain [frequency[k] [width[s|h|k|o|q]]]
|
||
Apply a treble tone-control effect. See the description of the
|
||
bass effect for details.
|
||
|
||
tremolo speed [depth]
|
||
Apply a tremolo (low frequency amplitude modulation) effect to
|
||
the audio. The tremolo frequency in Hz is given by speed, and
|
||
the depth as a percentage by depth (default 40).
|
||
|
||
trim {position(+)}
|
||
Cuts portions out of the audio. Any number of positions may be
|
||
given; audio is not sent to the output until the first position
|
||
is reached. The effect then alternates between copying and dis‐
|
||
carding audio at each position. Using a value of 0 for the
|
||
first position parameter allows copying from the beginning of
|
||
the audio.
|
||
|
||
For example,
|
||
sox infile outfile trim 0 10
|
||
will copy the first ten seconds, while
|
||
play infile trim 12:34 =15:00 -2:00
|
||
and
|
||
play infile trim 12:34 2:26 -2:00
|
||
will both play from 12 minutes 34 seconds into the audio up to
|
||
15 minutes into the audio (i.e. 2 minutes and 26 seconds long),
|
||
then resume playing two minutes before the end of audio.
|
||
|
||
upsample [factor]
|
||
Upsample the signal by an integer factor: factor-1 zero-value
|
||
samples are inserted between each pair of input samples. As a
|
||
result, the original spectrum is replicated into the new fre‐
|
||
quency space (imaging) and attenuated. This attenuation can be
|
||
compensated for by adding vol factor after any further process‐
|
||
ing. The upsample effect is typically used in combination with
|
||
filtering effects.
|
||
|
||
For a general resampling effect with anti-imaging, see rate.
|
||
See also downsample.
|
||
|
||
vad [options]
|
||
Voice Activity Detector. Attempts to trim silence and quiet
|
||
background sounds from the ends of (fairly high resolution i.e.
|
||
16-bit, 44-48kHz) recordings of speech. The algorithm currently
|
||
uses a simple cepstral power measurement to detect voice, so may
|
||
be fooled by other things, especially music. The effect can
|
||
trim only from the front of the audio, so in order to trim from
|
||
the back, the reverse effect must also be used. E.g.
|
||
play speech.wav norm vad
|
||
to trim from the front,
|
||
play speech.wav norm reverse vad reverse
|
||
to trim from the back, and
|
||
play speech.wav norm vad reverse vad reverse
|
||
to trim from both ends. The use of the norm effect is recom‐
|
||
mended, but remember that neither reverse nor norm is suitable
|
||
for use with streamed audio.
|
||
|
||
Options:
|
||
Default values are shown in parenthesis.
|
||
|
||
-t num (7)
|
||
The measurement level used to trigger activity detection.
|
||
This might need to be changed depending on the noise
|
||
level, signal level and other charactistics of the input
|
||
audio.
|
||
|
||
-T num (0.25)
|
||
The time constant (in seconds) used to help ignore short
|
||
bursts of sound.
|
||
|
||
-s num (1)
|
||
The amount of audio (in seconds) to search for qui‐
|
||
eter/shorter bursts of audio to include prior to the
|
||
detected trigger point.
|
||
|
||
-g num (0.25)
|
||
Allowed gap (in seconds) between quieter/shorter bursts
|
||
of audio to include prior to the detected trigger point.
|
||
|
||
-p num (0)
|
||
The amount of audio (in seconds) to preserve before the
|
||
trigger point and any found quieter/shorter bursts.
|
||
|
||
Advanced Options:
|
||
These allow fine tuning of the algorithm's internal parameters.
|
||
|
||
-b num The algorithm (internally) uses adaptive noise estima‐
|
||
tion/reduction in order to detect the start of the wanted
|
||
audio. This option sets the time for the initial noise
|
||
estimate.
|
||
|
||
-N num Time constant used by the adaptive noise estimator for
|
||
when the noise level is increasing.
|
||
|
||
-n num Time constant used by the adaptive noise estimator for
|
||
when the noise level is decreasing.
|
||
|
||
-r num Amount of noise reduction to use in the detection algo‐
|
||
rithm (e.g. 0, 0.5, ...).
|
||
|
||
-f num Frequency of the algorithm's processing/measurements.
|
||
|
||
-m num Measurement duration; by default, twice the measurement
|
||
period; i.e. with overlap.
|
||
|
||
-M num Time constant used to smooth spectral measurements.
|
||
|
||
-h num `Brick-wall' frequency of high-pass filter applied at the
|
||
input to the detector algorithm.
|
||
|
||
-l num `Brick-wall' frequency of low-pass filter applied at the
|
||
input to the detector algorithm.
|
||
|
||
-H num `Brick-wall' frequency of high-pass lifter used in the
|
||
detector algorithm.
|
||
|
||
-L num `Brick-wall' frequency of low-pass lifter used in the
|
||
detector algorithm.
|
||
|
||
See also the silence effect.
|
||
|
||
vol gain [type [limitergain]]
|
||
Apply an amplification or an attenuation to the audio signal.
|
||
Unlike the -v option (which is used for balancing multiple input
|
||
files as they enter the SoX effects processing chain), vol is an
|
||
effect like any other so can be applied anywhere, and several
|
||
times if necessary, during the processing chain.
|
||
|
||
The amount to change the volume is given by gain which is inter‐
|
||
preted, according to the given type, as follows: if type is
|
||
amplitude (or is omitted), then gain is an amplitude (i.e. volt‐
|
||
age or linear) ratio, if power, then a power (i.e. wattage or
|
||
voltage-squared) ratio, and if dB, then a power change in dB.
|
||
|
||
When type is amplitude or power, a gain of 1 leaves the volume
|
||
unchanged, less than 1 decreases it, and greater than 1
|
||
increases it; a negative gain inverts the audio signal in addi‐
|
||
tion to adjusting its volume.
|
||
|
||
When type is dB, a gain of 0 leaves the volume unchanged, less
|
||
than 0 decreases it, and greater than 0 increases it.
|
||
|
||
See [4] for a detailed discussion on electrical (and hence audio
|
||
signal) voltage and power ratios.
|
||
|
||
Beware of Clipping when the increasing the volume.
|
||
|
||
The gain and the type parameters can be concatenated if desired,
|
||
e.g. vol 10dB.
|
||
|
||
An optional limitergain value can be specified and should be a
|
||
value much less than 1 (e.g. 0.05 or 0.02) and is used only on
|
||
peaks to prevent clipping. Not specifying this parameter will
|
||
cause no limiter to be used. In verbose mode, this effect will
|
||
display the percentage of the audio that needed to be limited.
|
||
|
||
See also gain for a volume-changing effect with different capa‐
|
||
bilities, and compand for a dynamic-range compression/expan‐
|
||
sion/limiting effect.
|
||
|
||
DIAGNOSTICS
|
||
Exit status is 0 for no error, 1 if there is a problem with the com‐
|
||
mand-line parameters, or 2 if an error occurs during file processing.
|
||
|
||
BUGS
|
||
Please report any bugs found in this version of SoX to the mailing list
|
||
(sox-users@lists.sourceforge.net).
|
||
|
||
SEE ALSO
|
||
soxi(1), soxformat(7), libsox(3)
|
||
audacity(1), gnuplot(1), octave(1), wget(1)
|
||
The SoX web site at http://sox.sourceforge.net
|
||
SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
|
||
|
||
References
|
||
[1] R. Bristow-Johnson, Cookbook formulae for audio EQ biquad filter
|
||
coefficients, http://musicdsp.org/files/Audio-EQ-Cookbook.txt
|
||
|
||
[2] Wikipedia, Q-factor, http://en.wikipedia.org/wiki/Q_factor
|
||
|
||
[3] Scott Lehman, Effects Explained, http://harmony-cen‐
|
||
tral.com/Effects/effects-explained.html
|
||
|
||
[4] Wikipedia, Decibel, http://en.wikipedia.org/wiki/Decibel
|
||
|
||
[5] Richard Furse, Linux Audio Developer's Simple Plugin API,
|
||
http://www.ladspa.org
|
||
|
||
[6] Richard Furse, Computer Music Toolkit, http://www.ladspa.org/cmt
|
||
|
||
[7] Steve Harris, LADSPA plugins, http://plugin.org.uk
|
||
|
||
LICENSE
|
||
Copyright 1998-2013 Chris Bagwell and SoX Contributors.
|
||
Copyright 1991 Lance Norskog and Sundry Contributors.
|
||
|
||
This program is free software; you can redistribute it and/or modify it
|
||
under the terms of the GNU General Public License as published by the
|
||
Free Software Foundation; either version 2, or (at your option) any
|
||
later version.
|
||
|
||
This program is distributed in the hope that it will be useful, but
|
||
WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
|
||
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
|
||
Public License for more details.
|
||
|
||
AUTHORS
|
||
Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
|
||
tributors are listed in the ChangeLog file that is distributed with the
|
||
source code.
|
||
|
||
|
||
|
||
sox December 31, 2014 SoX(1)
|