mirror of
https://github.com/DrBeef/QuestZDoom.git
synced 2025-03-06 17:32:00 +00:00
4255 lines
146 KiB
Groff
4255 lines
146 KiB
Groff
'\" t
|
|
'\" The line above instructs most `man' programs to invoke tbl
|
|
'\"
|
|
'\" Separate paragraphs; not the same as PP which resets indent level.
|
|
.de SP
|
|
.if t .sp .5
|
|
.if n .sp
|
|
..
|
|
'\"
|
|
'\" Replacement em-dash for nroff (default is too short).
|
|
.ie n .ds m " -
|
|
.el .ds m \(em
|
|
'\"
|
|
'\" Placeholder macro for if longer nroff arrow is needed.
|
|
.ds RA \(->
|
|
'\"
|
|
'\" Decimal point set slightly raised
|
|
.if t .ds d \v'-.15m'.\v'+.15m'
|
|
.if n .ds d .
|
|
'\"
|
|
'\" Enclosure macro for examples
|
|
.de EX
|
|
.SP
|
|
.nf
|
|
.ft CW
|
|
..
|
|
.de EE
|
|
.ft R
|
|
.SP
|
|
.fi
|
|
..
|
|
.TH SoX 1 "December 31, 2014" "sox" "Sound eXchange"
|
|
.SH NAME
|
|
SoX \- Sound eXchange, the Swiss Army knife of audio manipulation
|
|
.SH SYNOPSIS
|
|
.nf
|
|
\fBsox\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
|
|
[[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR] \fIoutfile\fR
|
|
[\fIeffect\fR [\fIeffect-options\fR]] ...
|
|
.SP
|
|
\fBplay\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
|
|
[[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR]
|
|
[\fIeffect\fR [\fIeffect-options\fR]] ...
|
|
.SP
|
|
\fBrec\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIoutfile\fR
|
|
[\fIeffect\fR [\fIeffect-options\fR]] ...
|
|
.fi
|
|
.SH DESCRIPTION
|
|
.SS Introduction
|
|
SoX reads and writes audio files in most popular formats and can
|
|
optionally apply effects to them. It can combine multiple input
|
|
sources, synthesise audio, and, on many systems, act as a general
|
|
purpose audio player or a multi-track audio recorder. It also has
|
|
limited ability to split the input into multiple output files.
|
|
.SP
|
|
All SoX functionality is available using just the \fBsox\fR command.
|
|
To simplify playing and recording audio, if SoX is invoked as
|
|
\fBplay\fR, the output file is automatically set to be the default sound
|
|
device, and if invoked as \fBrec\fR, the default sound device is used as an
|
|
input source.
|
|
Additionally, the
|
|
.BR soxi (1)
|
|
command provides a convenient way to just query audio file header information.
|
|
.SP
|
|
The heart of SoX is a library called libSoX. Those interested in
|
|
extending SoX or using it in other programs should refer to the libSoX
|
|
manual page:
|
|
.BR libsox (3).
|
|
.SP
|
|
SoX is a command-line audio processing tool, particularly suited to making
|
|
quick, simple edits and to batch processing.
|
|
If you need an interactive, graphical audio editor, use
|
|
.BR audacity (1).
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
The overall SoX processing chain can be summarised as follows:
|
|
.TS
|
|
center;
|
|
l.
|
|
Input(s) \*(RA Combiner \*(RA Effects \*(RA Output(s)
|
|
.TE
|
|
.DT
|
|
.SP
|
|
Note however, that on the SoX command line, the positions of the
|
|
Output(s) and the Effects are swapped w.r.t. the logical flow just
|
|
shown. Note also that whilst options pertaining to files are placed
|
|
before their respective file name, the opposite is true for effects.
|
|
To show how this works in practice, here is a selection of examples of
|
|
how SoX might be used. The simple
|
|
.EX
|
|
sox recital.au recital.wav
|
|
.EE
|
|
translates an audio file in Sun AU format to a Microsoft WAV file, whilst
|
|
.EX
|
|
sox recital.au \-b 16 recital.wav channels 1 rate 16k fade 3 norm
|
|
.EE
|
|
performs the same format translation, but also applies four effects
|
|
(down-mix to one channel, sample rate change, fade-in, nomalize),
|
|
and stores the result at a bit-depth of 16.
|
|
.EX
|
|
sox \-r 16k \-e signed \-b 8 \-c 1 voice-memo.raw voice-memo.wav
|
|
.EE
|
|
converts `raw' (a.k.a. `headerless') audio to a self-describing file format,
|
|
.EX
|
|
sox slow.aiff fixed.aiff speed 1.027
|
|
.EE
|
|
adjusts audio speed,
|
|
.EX
|
|
sox short.wav long.wav longer.wav
|
|
.EE
|
|
concatenates two audio files, and
|
|
.EX
|
|
sox \-m music.mp3 voice.wav mixed.flac
|
|
.EE
|
|
mixes together two audio files.
|
|
.EX
|
|
play \(dqThe Moonbeams/Greatest/*.ogg\(dq bass +3
|
|
.EE
|
|
plays a collection of audio files whilst applying a bass boosting effect,
|
|
.EX
|
|
play \-n \-c1 synth sin %\-12 sin %\-9 sin %\-5 sin %\-2 fade h 0.1 1 0.1
|
|
.EE
|
|
plays a synthesised `A minor seventh' chord with a pipe-organ sound,
|
|
.EX
|
|
rec \-c 2 radio.aiff trim 0 30:00
|
|
.EE
|
|
records half an hour of stereo audio, and
|
|
.EX
|
|
play \-q take1.aiff & rec \-M take1.aiff take1\-dub.aiff
|
|
.EE
|
|
(with POSIX shell and where supported by hardware)
|
|
records a new track in a multi-track recording. Finally,
|
|
.EX
|
|
.ne 3
|
|
rec \-r 44100 \-b 16 \-e signed-integer \-p \\
|
|
silence 1 0.50 0.1% 1 10:00 0.1% | \\
|
|
sox \-p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \\
|
|
newfile : restart
|
|
.EE
|
|
records a stream of audio such as LP/cassette and splits in to multiple
|
|
audio files at points with 2 seconds of silence. Also, it does not start
|
|
recording until it detects audio is playing and stops after it sees
|
|
10 minutes of silence.
|
|
.SP
|
|
N.B. The above is just an overview of SoX's capabilities; detailed
|
|
explanations of how to use \fIall\fR SoX parameters, file formats, and
|
|
effects can be found below in this manual, in
|
|
.BR soxformat (7),
|
|
and in
|
|
.BR soxi (1).
|
|
.SS File Format Types
|
|
SoX can work with `self-describing' and `raw' audio files.
|
|
`self-describing' formats (e.g. WAV, FLAC, MP3) have a header that
|
|
completely describes the signal and encoding attributes of the audio
|
|
data that follows. `raw' or `headerless' formats do not contain this
|
|
information, so the audio characteristics of these must be described
|
|
on the SoX command line or inferred from those of the input file.
|
|
.SP
|
|
The following four characteristics are used to describe the format of
|
|
audio data such that it can be processed with SoX:
|
|
.TP
|
|
sample rate
|
|
The sample rate in samples per second (`Hertz' or `Hz').
|
|
Digital telephony traditionally uses a sample rate of 8000\ Hz (8\ kHz),
|
|
though these days, 16 and even 32\ kHz are becoming more common. Audio
|
|
Compact Discs use 44100\ Hz (44\*d1\ kHz). Digital Audio Tape and many
|
|
computer systems use 48\ kHz. Professional audio systems often use 96
|
|
kHz.
|
|
.TP
|
|
sample size
|
|
The number of bits used to store each sample. Today, 16-bit is
|
|
commonly used. 8-bit was popular in the early days of computer
|
|
audio. 24-bit is used in the professional audio arena. Other sizes are
|
|
also used.
|
|
.TP
|
|
data encoding
|
|
The way in which each audio sample is represented (or `encoded'). Some
|
|
encodings have variants with different byte-orderings or bit-orderings.
|
|
Some compress the audio data so that the stored audio data takes up less
|
|
space (i.e. disk space or transmission bandwidth) than the other format
|
|
parameters and the number of samples would imply. Commonly-used
|
|
encoding types include floating-point, \(*m-law, ADPCM, signed-integer
|
|
PCM, MP3, and FLAC.
|
|
.TP
|
|
channels
|
|
The number of audio channels contained in the file. One (`mono') and
|
|
two (`stereo') are widely used. `Surround sound' audio typically
|
|
contains six or more channels.
|
|
.PP
|
|
The term `bit-rate' is a measure of the amount of storage occupied by an
|
|
encoded audio signal over a unit of time. It can depend on all of the
|
|
above and is typically denoted as a number of kilo-bits per second
|
|
(kbps). An A-law telephony signal has a bit-rate of 64 kbps. MP3-encoded
|
|
stereo music typically has a bit-rate of 128\-196 kbps. FLAC-encoded
|
|
stereo music typically has a bit-rate of 550\-760 kbps.
|
|
.SP
|
|
Most self-describing formats also allow textual `comments' to be
|
|
embedded in the file that can be used to describe the audio in some way,
|
|
e.g. for music, the title, the author, etc.
|
|
.SP
|
|
One important use of audio file comments is to convey `Replay Gain'
|
|
information. SoX supports applying Replay Gain information (for certain
|
|
input file formats only; currently, at least FLAC and Ogg Vorbis), but not
|
|
generating it. Note that by default, SoX copies input file comments
|
|
to output files that support comments, so output files may contain
|
|
Replay Gain information if some was present in the input file. In this
|
|
case, if anything other than a simple format conversion was performed
|
|
then the output file Replay Gain information is likely to be incorrect
|
|
and so should be recalculated using a tool that supports this (not SoX).
|
|
.SP
|
|
The
|
|
.BR soxi (1)
|
|
command can be used to display information from audio file headers.
|
|
.SS Determining & Setting The File Format
|
|
There are several mechanisms available for SoX to use to determine or set the
|
|
format characteristics of an audio file. Depending on the circumstances,
|
|
individual characteristics may be determined or set using different mechanisms.
|
|
.SP
|
|
To determine the format of an input file, SoX will use, in order of
|
|
precedence and as given or available:
|
|
.IP 1. 4
|
|
Command-line format options.
|
|
.IP 2. 4
|
|
The contents of the file header.
|
|
.IP 3. 4
|
|
The filename extension.
|
|
.PP
|
|
To set the output file format, SoX will use, in order of
|
|
precedence and as given or available:
|
|
.IP 1. 4
|
|
Command-line format options.
|
|
.IP 2. 4
|
|
The filename extension.
|
|
.IP 3. 4
|
|
The input file format characteristics, or the closest
|
|
that is supported by the output file type.
|
|
.PP
|
|
For all files, SoX will exit with an error
|
|
if the file type cannot be determined. Command-line format options may
|
|
need to be added or changed to resolve the problem.
|
|
.SS Playing & Recording Audio
|
|
The
|
|
.B play
|
|
and
|
|
.B rec
|
|
commands are provided so that basic playing and
|
|
recording is as simple as
|
|
.EX
|
|
play existing-file.wav
|
|
.EE
|
|
and
|
|
.EX
|
|
rec new-file.wav
|
|
.EE
|
|
These two commands are functionally equivalent to
|
|
.EX
|
|
sox existing-file.wav \-d
|
|
.EE
|
|
and
|
|
.EX
|
|
sox \-d new-file.wav
|
|
.EE
|
|
Of course, further options and effects (as described below) can be
|
|
added to the commands in either form.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
Some systems provide more than one type of (SoX-compatible) audio
|
|
driver, e.g. ALSA & OSS, or SUNAU & AO.
|
|
Systems can also have more than one audio device (a.k.a. `sound card').
|
|
If more than one audio driver has been
|
|
built-in to SoX, and the default selected by SoX when recording or playing
|
|
is not the one that is wanted, then the
|
|
.B AUDIODRIVER
|
|
environment variable can be used to override the default. For example
|
|
(on many systems):
|
|
.EX
|
|
set AUDIODRIVER=oss
|
|
play ...
|
|
.EE
|
|
The
|
|
.B AUDIODEV
|
|
environment variable can be used to override the default audio device,
|
|
e.g.
|
|
.EX
|
|
set AUDIODEV=/dev/dsp2
|
|
play ...
|
|
sox ... \-t oss
|
|
.EE
|
|
or
|
|
.EX
|
|
set AUDIODEV=hw:soundwave,1,2
|
|
play ...
|
|
sox ... \-t alsa
|
|
.EE
|
|
Note that the way of setting environment variables varies from system
|
|
to system\*mfor some specific examples, see `SOX_OPTS' below.
|
|
.SP
|
|
When playing a file with a sample rate that is not supported by the
|
|
audio output device, SoX will automatically invoke the \fBrate\fR effect
|
|
to perform the necessary sample rate conversion. For
|
|
compatibility with old hardware, the
|
|
default \fBrate\fR quality level is set to `low'. This
|
|
can be changed by explicitly specifying the \fBrate\fR
|
|
effect with a different quality level, e.g.
|
|
.EX
|
|
play ... rate \-m
|
|
.EE
|
|
or by using the
|
|
.B \-\-play\-rate\-arg
|
|
option (see below).
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
On some systems, SoX allows audio playback volume to be adjusted whilst
|
|
using
|
|
.BR play .
|
|
Where supported, this is achieved by tapping the `v' & `V' keys during
|
|
playback.
|
|
.SP
|
|
To help with setting a suitable recording level, SoX includes a peak-level
|
|
meter which can be invoked (before making the actual recording) as follows:
|
|
.EX
|
|
rec \-n
|
|
.EE
|
|
The recording level should be adjusted (using the system-provided mixer
|
|
program, not SoX) so that the meter is \fIat most occasionally\fR full
|
|
scale, and never `in the red' (an exclamation mark is shown).
|
|
See also \fB\-S\fR below.
|
|
.SS Accuracy
|
|
Many file formats that compress audio discard some of the audio signal
|
|
information whilst doing so. Converting to such a format and then converting
|
|
back again will not produce an exact copy of the original audio. This
|
|
is the case for many formats used in telephony (e.g. A-law, GSM) where
|
|
low signal bandwidth is more important than high audio fidelity, and for
|
|
many formats used in portable music players (e.g. MP3, Vorbis) where
|
|
adequate fidelity can be retained even with the large compression ratios
|
|
that are needed to make portable players practical.
|
|
.SP
|
|
Formats that discard audio signal information are called `lossy'.
|
|
Formats that do not are called `lossless'. The term `quality' is used as a
|
|
measure of how closely the original audio signal can be reproduced when
|
|
using a lossy format.
|
|
.SP
|
|
Audio file conversion with SoX is lossless when it can be, i.e. when not
|
|
using lossy compression, when not reducing the sampling rate or number
|
|
of channels, and when the number of bits used in the destination format
|
|
is not less than in the source format. E.g. converting from an 8-bit
|
|
PCM format to a 16-bit PCM format is lossless but converting from an
|
|
8-bit PCM format to (8-bit) A-law isn't.
|
|
.SP
|
|
.B N.B.
|
|
SoX converts all audio files to an internal uncompressed
|
|
format before performing any audio processing. This means that
|
|
manipulating a file that is stored in a lossy format can cause further
|
|
losses in audio fidelity. E.g. with
|
|
.EX
|
|
sox long.mp3 short.mp3 trim 10
|
|
.EE
|
|
SoX first decompresses the input MP3 file, then applies the
|
|
.B trim
|
|
effect, and finally creates the output MP3 file by re-compressing the
|
|
audio\*mwith a possible reduction in fidelity above that which
|
|
occurred when the input file was created.
|
|
Hence, if what is ultimately desired is lossily compressed audio, it is
|
|
highly recommended to perform all audio processing using lossless file
|
|
formats and then convert to the lossy format only at the final stage.
|
|
.SP
|
|
.B N.B.
|
|
Applying multiple effects with a single SoX invocation will,
|
|
in general, produce more accurate results than those produced using
|
|
multiple SoX invocations.
|
|
.SS Dithering
|
|
Dithering is a technique used to maximise the dynamic range of audio
|
|
stored at a particular bit-depth. Any distortion introduced by
|
|
quantisation is decorrelated by adding a small amount of white noise
|
|
to the signal. In most cases, SoX can determine whether the selected
|
|
processing requires dither and will add it during output formatting if
|
|
appropriate.
|
|
.SP
|
|
Specifically, by default, SoX automatically adds TPDF dither
|
|
when the output bit-depth is less than 24 and any
|
|
of the following are true:
|
|
.IP \(bu 4
|
|
bit-depth reduction has been specified explicitly using a command-line
|
|
option
|
|
.IP \(bu 4
|
|
the output file format supports only bit-depths lower than that of the
|
|
input file format
|
|
.IP \(bu 4
|
|
an effect has increased effective bit-depth within the internal
|
|
processing chain
|
|
.PP
|
|
For example, adjusting volume with
|
|
.B vol 0.25
|
|
requires two additional bits in which to losslessly store its results
|
|
(since 0\*d25 decimal equals 0\*d01 binary). So if the input file
|
|
bit-depth is 16, then SoX's internal representation will utilise 18
|
|
bits after processing this volume change. In order to store the
|
|
output at the same depth as the input, dithering is used to remove the
|
|
additional bits.
|
|
.SP
|
|
Use the
|
|
.B \-V
|
|
option to see what processing SoX has automatically added. The
|
|
.B \-D
|
|
option may be given to override automatic dithering. To invoke
|
|
dithering manually (e.g. to select a noise-shaping curve), see the
|
|
.B dither
|
|
effect.
|
|
.SS Clipping
|
|
Clipping is distortion that occurs when an audio signal level (or
|
|
`volume') exceeds the range of the chosen representation. In most
|
|
cases, clipping is undesirable and so should be corrected by adjusting
|
|
the level prior to the point (in the processing chain) at which it
|
|
occurs.
|
|
.SP
|
|
In SoX, clipping could occur, as you might expect, when using the
|
|
.B vol
|
|
or
|
|
.B gain
|
|
effects to increase the audio volume. Clipping could also occur with many
|
|
other effects, when converting one format to another, and even when
|
|
simply playing the audio.
|
|
.SP
|
|
Playing an audio file often involves resampling, and processing by
|
|
analogue components can introduce a small DC offset and/or
|
|
amplification, all of which can produce distortion if the audio signal
|
|
level was initially too close to the clipping point.
|
|
.SP
|
|
For these reasons, it is usual to make sure that an audio
|
|
file's signal level has some `headroom', i.e. it does not exceed a particular
|
|
level below the maximum possible level for the given representation.
|
|
Some standards bodies recommend as much as 9dB headroom, but in most cases,
|
|
3dB (\(~~ 70% linear) is enough. Note that this wisdom
|
|
seems to have been lost in modern music production; in fact, many CDs,
|
|
MP3s, etc. are now mastered at levels \fIabove\fR 0dBFS i.e. the
|
|
audio is clipped as delivered.
|
|
.SP
|
|
SoX's
|
|
.B stat
|
|
and
|
|
.B stats
|
|
effects can assist in determining the signal level in an audio file. The
|
|
.B gain
|
|
or
|
|
.B vol
|
|
effect can be used to prevent clipping, e.g.
|
|
.EX
|
|
sox dull.wav bright.wav gain \-6 treble +6
|
|
.EE
|
|
guarantees that the treble boost will not clip.
|
|
.SP
|
|
If clipping occurs at any point during processing,
|
|
SoX will display a warning message to that effect.
|
|
.SP
|
|
See also
|
|
.B \-G
|
|
and the
|
|
.B gain
|
|
and
|
|
.B norm
|
|
effects.
|
|
.SS Input File Combining
|
|
SoX's input combiner can be configured (see OPTIONS below) to
|
|
combine multiple files using any of the
|
|
following methods: `concatenate', `sequence', `mix', `mix-power',
|
|
`merge', or `multiply'.
|
|
The default method is `sequence' for
|
|
.BR play ,
|
|
and `concatenate' for
|
|
.B rec
|
|
and
|
|
.BR sox .
|
|
.SP
|
|
For all methods other than `sequence', multiple input files must have
|
|
the same sampling rate. If necessary, separate SoX invocations can be
|
|
used to make sampling rate adjustments prior to combining.
|
|
.SP
|
|
If the `concatenate' combining method is selected (usually, this will be
|
|
by default) then the input files must also have the same number of
|
|
channels. The audio from each input will be concatenated in the order
|
|
given to form the output file.
|
|
.SP
|
|
The `sequence' combining method is selected automatically for
|
|
.BR play .
|
|
It is similar to `concatenate' in that the audio from each input file is
|
|
sent serially to the output file. However, here the output file may be
|
|
closed and reopened at the corresponding transition between input
|
|
files. This may be just what is needed when sending different types of
|
|
audio to an output device, but is not generally useful when the output is a
|
|
normal file.
|
|
.SP
|
|
If either the `mix' or `mix-power' combining method is selected then two or
|
|
more input files must be given and will be mixed together to form the
|
|
output file. The number of channels in each input file need not be the
|
|
same, but SoX will issue a warning if they are not and some
|
|
channels in the output file will not contain audio from every input
|
|
file. A mixed audio file cannot be un-mixed without reference to the
|
|
original input files.
|
|
.SP
|
|
If the `merge' combining method is selected then two or
|
|
more input files must be given and will be merged together to form the
|
|
output file. The number of channels in each input file need not be the
|
|
same. A merged audio file comprises all of the channels from all of the
|
|
input files. Un-merging is possible using multiple
|
|
invocations of SoX with the
|
|
.B remix
|
|
effect.
|
|
For example, two mono files could be merged to form one stereo file. The
|
|
first and second mono files would become the left and right channels of
|
|
the stereo file.
|
|
.SP
|
|
The `multiply' combining method multiplies the sample values of
|
|
corresponding channels (treated as numbers in the interval \-1 to +1).
|
|
If the number of channels in the input files is not the same, the
|
|
missing channels are considered to contain all zero.
|
|
.SP
|
|
When combining input files, SoX applies any specified effects
|
|
(including, for example, the
|
|
.B vol
|
|
volume adjustment effect) after the audio has been combined. However, it
|
|
is often useful to be able to set the volume of (i.e. `balance') the
|
|
inputs individually, before combining takes place.
|
|
.SP
|
|
For all combining methods, input
|
|
file volume adjustments can be made manually using the
|
|
.B \-v
|
|
option (below) which can be given for one or more input files. If it is
|
|
given for only some of the input files then the others receive no volume
|
|
adjustment. In some circumstances, automatic volume
|
|
adjustments may be applied (see below).
|
|
.SP
|
|
The \fB\-V\fR option (below) can be used to show the input file volume
|
|
adjustments that have been selected (either manually or automatically).
|
|
.SP
|
|
There are some special considerations that need to made when mixing
|
|
input files:
|
|
.SP
|
|
Unlike the other methods, `mix' combining has the
|
|
potential to cause clipping in the combiner if no balancing is
|
|
performed. In this case, if manual volume adjustments are not given,
|
|
SoX will try to ensure that clipping does not occur by automatically
|
|
adjusting the
|
|
volume (amplitude) of each input signal by a factor of \(S1/\s-2n\s+2,
|
|
where n is the number of input files. If this results in audio that is
|
|
too quiet or otherwise unbalanced then the input file volumes can be
|
|
set manually as described above. Using the
|
|
.B norm
|
|
effect on the mix is another alternative.
|
|
.SP
|
|
If mixed audio seems loud enough at some points but
|
|
too quiet in others then dynamic range compression should be applied to
|
|
correct this\*msee the
|
|
.B compand
|
|
effect.
|
|
.SP
|
|
With the `mix-power' combine method, the
|
|
mixed volume is approximately equal to that of one of the input signals.
|
|
This is achieved by balancing using a factor of
|
|
\(S1/\s-2\(srn\s+2 instead of \(S1/\s-2n\s+2.
|
|
Note that this balancing factor does not guarantee that clipping will not occur,
|
|
but the number of clips will usually be low and the resultant
|
|
distortion is generally imperceptible.
|
|
.SS Output Files
|
|
SoX's default behaviour is to take one or more input files and
|
|
write them to a single output file.
|
|
|
|
This behaviour can be changed by specifying the pseudo-effect `newfile'
|
|
within the effects list. SoX will then enter multiple output mode.
|
|
|
|
In multiple output mode, a new file is created when the effects
|
|
prior to the `newfile' indicate they are done.
|
|
The effects chain listed after `newfile'
|
|
is then started up and its output is saved to the new file.
|
|
|
|
In multiple output mode, a unique number will automatically be appended
|
|
to the end of all filenames. If the filename has an extension
|
|
then the number is inserted before the extension. This behaviour can
|
|
be customized by placing a %n anywhere in the filename where the
|
|
number should be substituted. An optional number can be placed after
|
|
the % to indicate a minimum fixed width for the number.
|
|
|
|
Multiple output mode is not very useful unless an effect that will
|
|
stop the effects chain early is
|
|
specified before the `newfile'. If end of file is
|
|
reached before the effects chain stops itself then no new file
|
|
will be created as it would be empty.
|
|
|
|
The following is an example of splitting the first 60 seconds of an input
|
|
file into two 30 second files and ignoring the rest.
|
|
.EX
|
|
sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
|
|
.SS Stopping SoX
|
|
Usually SoX will complete its processing and exit automatically once
|
|
it has read all available audio data from the input files.
|
|
.SP
|
|
If desired, it can be terminated earlier by sending an
|
|
interrupt signal to the process (usually by pressing the
|
|
keyboard interrupt key which is normally Ctrl-C). This is a natural requirement
|
|
in some circumstances, e.g. when using SoX to make a recording. Note
|
|
that when using SoX to play multiple files, Ctrl-C behaves slightly
|
|
differently: pressing it once causes SoX to skip to the next file;
|
|
pressing it twice in quick succession causes SoX to exit.
|
|
.SP
|
|
Another option to stop processing early is to use an effect that
|
|
has a time period or sample count to determine the stopping
|
|
point. The trim effect is an example of this. Once all
|
|
effects chains have stopped then SoX will also stop.
|
|
.SH FILENAMES
|
|
Filenames can be simple file names, absolute or relative path names,
|
|
or URLs (input files only). Note that URL support requires that
|
|
.BR wget (1)
|
|
is available.
|
|
.SP
|
|
Note:
|
|
Giving SoX an input or output filename that is the same as a SoX
|
|
effect-name will not work since SoX will treat it as an effect
|
|
specification. The only work-around to this is to avoid such
|
|
filenames. This is generally not difficult since most audio
|
|
filenames have a filename `extension', whilst effect-names do not.
|
|
.SS Special Filenames
|
|
The following special filenames may be used in certain circumstances
|
|
in place of a normal filename on the command line:
|
|
.TP
|
|
\fB\-\fR
|
|
SoX can be used in simple pipeline operations by using the special
|
|
filename `\-' which,
|
|
if used as an input filename, will cause
|
|
SoX will read audio data from `standard input' (stdin),
|
|
and which,
|
|
if used as the output filename, will cause
|
|
SoX will send audio data to `standard output' (stdout).
|
|
Note that when using this option for the output file, and sometimes
|
|
when using it for an input file, the file-type (see
|
|
.B \-t
|
|
below) must also be given.
|
|
.TP
|
|
\fB\(dq\^|\^\fIprogram \fR[\fIoptions\fR] ...\fB\(dq\fR
|
|
This can be used in place of an input filename to specify the
|
|
the given program's standard output (stdout) be used as an input file.
|
|
Unlike
|
|
.B \-
|
|
(above), this can be used for several inputs to one SoX command. For
|
|
example, if `genw' generates mono WAV formatted signals to its
|
|
standard output, then the following command makes a stereo file
|
|
from two generated signals:
|
|
.EX
|
|
sox \-M "|genw \-\-imd \-" "|genw \-\-thd \-" out.wav
|
|
.EE
|
|
For headerless (raw) audio,
|
|
.B \-t
|
|
(and perhaps other format options) will need to be given, preceding the input
|
|
command.
|
|
.TP
|
|
\fB\(dq\fIwildcard-filename\fB\(dq\fR
|
|
Specifies that filename `globbing' (wild-card matching) should be performed
|
|
by SoX instead of by the shell. This allows a single set of file options to be
|
|
applied to a group of files. For example, if the current directory contains
|
|
three `vox' files, file1.vox, file2.vox, and file3.vox, then
|
|
.EX
|
|
play \-\-rate 6k *.vox
|
|
.EE
|
|
will be expanded by the `shell' (in most environments) to
|
|
.EX
|
|
play \-\-rate 6k file1.vox file2.vox file3.vox
|
|
.EE
|
|
which will treat only the first vox file as having a sample rate of 6k.
|
|
With
|
|
.EX
|
|
play \-\-rate 6k "*.vox"
|
|
.EE
|
|
the given sample rate option will be applied to all three vox files.
|
|
.TP
|
|
\fB\-p\fR, \fB\-\-sox\-pipe\fR
|
|
This can be used in place of an output filename to specify that
|
|
the SoX command should be used as in input pipe to another SoX command.
|
|
For example, the command:
|
|
.EX
|
|
play "|sox \-n \-p synth 2" "|sox \-n \-p synth 2 tremolo 10" stat
|
|
.EE
|
|
plays two `files' in succession, each with different effects.
|
|
.SP
|
|
.B \-p
|
|
is in fact an alias for `\fB\-t sox \-\fR'.
|
|
.TP
|
|
\fB\-d\fR, \fB\-\-default\-device\fR
|
|
This can be used in place of an input or output filename to specify that
|
|
the default audio device (if one has been built into SoX) is to be used.
|
|
This is akin to invoking
|
|
.B rec
|
|
or
|
|
.B play
|
|
(as described above).
|
|
.TP
|
|
\fB\-n\fR, \fB\-\-null\fR
|
|
This can be used in place of an input or output filename to specify that
|
|
a `null file' is to be used. Note that here, `null file' refers to a
|
|
SoX-specific mechanism and is not related to any operating-system
|
|
mechanism with a similar name.
|
|
.SP
|
|
Using a null file to input audio is equivalent to
|
|
using a normal audio file that contains an infinite amount
|
|
of silence, and as such is not generally useful unless used
|
|
with an effect that specifies a finite time length
|
|
(such as \fBtrim\fR or \fBsynth\fR).
|
|
.SP
|
|
Using a null file to output audio amounts to discarding the audio
|
|
and is useful mainly with effects that produce information about the
|
|
audio instead of affecting it (such as \fBnoiseprof\fR or \fBstat\fR).
|
|
.SP
|
|
The sampling rate associated with a null file
|
|
is by default 48\ kHz, but, as with a normal
|
|
file, this can be overridden if desired using command-line format
|
|
options (see below).
|
|
.SS Supported File & Audio Device Types
|
|
See
|
|
.BR soxformat (7)
|
|
for a list and description of the supported file formats and audio device
|
|
drivers.
|
|
.SH OPTIONS
|
|
.SS Global Options
|
|
These options can be specified on the command line at any point
|
|
before the first effect name.
|
|
.SP
|
|
The
|
|
.B SOX_OPTS
|
|
environment variable can be used to provide alternative default values for
|
|
SoX's global options.
|
|
For example:
|
|
.EX
|
|
SOX_OPTS="\-\-buffer 20000 \-\-play\-rate\-arg \-hs \-\-temp /mnt/temp"
|
|
.EE
|
|
Note that setting SOX_OPTS can potentially create unwanted changes in
|
|
the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
|
|
might best be used for things (such as in the given example) that reflect the
|
|
environment in which SoX is being run. Enabling options such as
|
|
.B \-\-no\-clobber
|
|
as default might be handled better using a shell alias
|
|
since a shell alias will not affect operation in scripts etc.
|
|
.SP
|
|
One way to ensure that a script cannot be affected by SOX_OPTS is to
|
|
clear SOX_OPTS at the start of the script, but this of course loses
|
|
the benefit of SOX_OPTS carrying some system-wide default options. An
|
|
alternative approach is to explicitly invoke SoX with default
|
|
option values, e.g.
|
|
.EX
|
|
SOX_OPTS="\-V \-\-no-clobber"
|
|
...
|
|
sox \-V2 \-\-clobber $input $output ...
|
|
.EE
|
|
Note that the way to set environment variables varies from system
|
|
to system. Here are some examples:
|
|
.SP
|
|
Unix bash:
|
|
.EX
|
|
export SOX_OPTS="\-V \-\-no-clobber"
|
|
.EE
|
|
Unix csh:
|
|
.EX
|
|
setenv SOX_OPTS "\-V \-\-no-clobber"
|
|
.EE
|
|
MS-DOS/MS-Windows:
|
|
.EX
|
|
set SOX_OPTS=\-V \-\-no-clobber
|
|
.EE
|
|
MS-Windows GUI: via Control Panel : System : Advanced : Environment
|
|
Variables
|
|
.SP
|
|
Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
|
|
.TP
|
|
\fB\-\-buffer\fR \fBBYTES\fR, \fB\-\-input\-buffer\fR \fBBYTES\fR
|
|
Set the size in bytes of the buffers used for processing audio (default 8192).
|
|
.B \-\-buffer
|
|
applies to input, effects, and output processing;
|
|
.B \-\-input\-buffer
|
|
applies only to input processing (for which it overrides
|
|
.B \-\-buffer
|
|
if both are given).
|
|
.SP
|
|
Be aware that large values for
|
|
.B \-\-buffer
|
|
will cause SoX to be become slow to respond to requests to terminate or to skip
|
|
the current input file.
|
|
.TP
|
|
\fB\-\-clobber\fR
|
|
Don't prompt before overwriting an existing file with the same name as that
|
|
given for the output file. This is the default behaviour.
|
|
.TP
|
|
\fB\-\-combine concatenate\fR\^|\^\fBmerge\fR\^|\^\fBmix\fR\^|\^\fBmix\-power\fR\^|\^\fBmultiply\fR\^|\^\fBsequence\fR
|
|
Select the input file combining method;
|
|
for some of these, short options are available:
|
|
.B \-m
|
|
selects `mix',
|
|
.B \-M
|
|
selects `merge', and
|
|
.B \-T
|
|
selects `multiply'.
|
|
.SP
|
|
See \fBInput File Combining\fR above for a description of the different
|
|
combining methods.
|
|
.TP
|
|
\fB\-D\fR, \fB\-\-no\-dither\fR
|
|
Disable automatic dither\*msee `Dithering' above. An example of why this
|
|
might occasionally be useful is if a file has been converted from 16 to
|
|
24 bit with the intention of doing some processing on it, but in fact
|
|
no processing is needed after all and the original 16 bit file has
|
|
been lost, then, strictly speaking, no dither is needed if converting the
|
|
file back to 16 bit. See also the
|
|
.B stats
|
|
effect for how to determine the actual bit depth of the audio within a
|
|
file.
|
|
.TP
|
|
\fB\-\-effects\-file \fIFILENAME\fR
|
|
Use FILENAME to obtain all effects and their arguments.
|
|
The file is parsed as if the values were specified on the
|
|
command line. A new line can be used in place of the special \fB:\fR
|
|
marker to separate effect chains. For convenience, such markers at the
|
|
end of the file are normally ignored; if you want to specify an empty
|
|
last effects chain, use an explicit \fB:\fR by itself on the last line
|
|
of the file. This option causes any effects specified on the command
|
|
line to be discarded.
|
|
.TP
|
|
\fB\-G\fR, \fB\-\-guard\fR
|
|
Automatically invoke the
|
|
.B gain
|
|
effect to guard against clipping. E.g.
|
|
.EX
|
|
sox \-G infile \-b 16 outfile rate 44100 dither \-s
|
|
.EE
|
|
is shorthand for
|
|
.EX
|
|
sox infile \-b 16 outfile gain \-h rate 44100 gain \-rh dither \-s
|
|
.EE
|
|
See also
|
|
.BR \-V,
|
|
.BR \-\-norm,
|
|
and the
|
|
.B gain
|
|
effect.
|
|
.TP
|
|
\fB\-h\fR, \fB\-\-help\fR
|
|
Show version number and usage information.
|
|
.TP
|
|
\fB\-\-help\-effect \fINAME\fR
|
|
Show usage information on the specified effect. The name
|
|
\fBall\fR can be used to show usage on all effects.
|
|
.TP
|
|
\fB\-\-help\-format \fINAME\fR
|
|
Show information about the specified file format. The name
|
|
\fBall\fR can be used to show information on all formats.
|
|
.TP
|
|
\fB\-\-i\fR, \fB\-\-info\fR
|
|
Only if given as the first parameter to
|
|
.BR sox ,
|
|
behave as
|
|
.BR soxi (1).
|
|
.TP
|
|
\fB\-m\fR\^|\^\fB\-M\fR
|
|
Equivalent to \fB\-\-combine mix\fR and \fB\-\-combine merge\fR, respectively.
|
|
.TP
|
|
.B \-\-magic
|
|
If SoX has been built with the optional `libmagic' library then this
|
|
option can be given to enable its use in helping to detect audio file types.
|
|
.TP
|
|
\fB\-\-multi\-threaded\fR | \fB\-\-single\-threaded\fR
|
|
By default, SoX is `single threaded'.
|
|
If the \fB\-\-multi\-threaded\fR option is given however then SoX
|
|
will process audio channels for most multi-channel
|
|
effects in parallel on hyper-threading/multi-core architectures. This
|
|
may reduce processing time, though sometimes it may be necessary to use
|
|
this option in conjunction with a larger buffer size than is the default
|
|
to gain any benefit from multi-threaded processing
|
|
(e.g. 131072; see \fB\-\-buffer\fR above).
|
|
.TP
|
|
\fB\-\-no\-clobber\fR
|
|
Prompt before overwriting an existing file with the same name as that
|
|
given for the output file.
|
|
.SP
|
|
.B N.B.
|
|
Unintentionally overwriting a file is easier than you might think, for
|
|
example, if you accidentally enter
|
|
.EX
|
|
sox file1 file2 effect1 effect2 ...
|
|
.EE
|
|
when what you really meant was
|
|
.EX
|
|
play file1 file2 effect1 effect2 ...
|
|
.EE
|
|
then, without this option, file2 will be overwritten. Hence, using
|
|
this option is recommended. SOX_OPTS (above), a `shell'
|
|
alias, script, or batch file may be an appropriate way of permanently
|
|
enabling it.
|
|
.TP
|
|
\fB\-\-norm\fR[\fB=\fIdB-level\fR]
|
|
Automatically invoke the
|
|
.B gain
|
|
effect to guard against clipping and to normalise the audio. E.g.
|
|
.EX
|
|
sox \-\-norm infile \-b 16 outfile rate 44100 dither \-s
|
|
.EE
|
|
is shorthand for
|
|
.EX
|
|
sox infile \-b 16 outfile gain \-h rate 44100 gain \-nh dither \-s
|
|
.EE
|
|
Optionally, the audio can be normalized to a given level (usually)
|
|
below 0 dBFS:
|
|
.EX
|
|
sox \-\-norm=\-3 infile outfile
|
|
.EE
|
|
.SP
|
|
See also
|
|
.BR \-V,
|
|
.BR \-G,
|
|
and the
|
|
.B gain
|
|
effect.
|
|
.TP
|
|
\fB\-\-play\-rate\-arg ARG\fR
|
|
Selects a quality option to be used when the `rate' effect is automatically
|
|
invoked whilst playing audio. This option is typically set via the
|
|
.B SOX_OPTS
|
|
environment variable (see above).
|
|
.TP
|
|
\fB\-\-plot gnuplot\fR\^|\^\fBoctave\fR\^|\^\fBoff\fR
|
|
If not set to
|
|
.B off
|
|
(the default if
|
|
.B \-\-plot
|
|
is not given), run in a mode that can be used, in conjunction with the
|
|
gnuplot program or the GNU Octave program, to assist with the selection
|
|
and configuration of many of the transfer-function based effects.
|
|
For the first given effect that supports the selected plotting program,
|
|
SoX will output commands to plot the effect's transfer function, and
|
|
then exit without actually processing any audio. E.g.
|
|
.EX
|
|
sox \-\-plot octave input-file \-n highpass 1320 > highpass.plt
|
|
octave highpass.plt
|
|
.EE
|
|
.TP
|
|
\fB\-q\fR, \fB\-\-no\-show\-progress\fR
|
|
Run in quiet mode when SoX wouldn't otherwise do so.
|
|
This is the opposite of the \fB\-S\fR option.
|
|
.TP
|
|
\fB\-R\fR
|
|
Run in `repeatable' mode. When this option is given, where
|
|
applicable, SoX will embed a fixed time-stamp in the output file (e.g.
|
|
\fBAIFF\fR) and will `seed' pseudo random number generators (e.g.
|
|
\fBdither\fR) with a fixed number, thus ensuring that successive SoX
|
|
invocations with the same inputs and the same parameters yield the
|
|
same output.
|
|
.TP
|
|
\fB\-\-replay\-gain track\fR\^|\^\fBalbum\fR\^|\^\fBoff\fR
|
|
Select whether or not to apply replay-gain adjustment to input files.
|
|
The default is
|
|
.B off
|
|
for
|
|
.B sox
|
|
and
|
|
.BR rec ,
|
|
.B album
|
|
for
|
|
.B play
|
|
where (at least) the first two input files are tagged with the same Artist and
|
|
Album names, and
|
|
.B track
|
|
for
|
|
.B play
|
|
otherwise.
|
|
.TP
|
|
\fB\-S\fR, \fB\-\-show\-progress\fR
|
|
Display input file format/header information, and processing progress as
|
|
input file(s) percentage complete, elapsed time, and remaining time (if
|
|
known; shown in brackets), and the number of samples written to the
|
|
output file. Also shown is a peak-level meter, and an indication if
|
|
clipping has occurred. The peak-level meter shows up to two channels
|
|
and is calibrated for digital audio as follows (right channel shown):
|
|
.ne 8
|
|
.TS
|
|
center;
|
|
cI lI cI lI
|
|
c l c l.
|
|
dB FSD Display dB FSD Display
|
|
\-25 \- \-11 ====
|
|
\-23 T{
|
|
=
|
|
T} \-9 ====\-
|
|
\-21 =\- \-7 =====
|
|
\-19 == \-5 =====\-
|
|
\-17 ==\- \-3 ======
|
|
\-15 === \-1 =====!
|
|
\-13 ===\-
|
|
.TE
|
|
.DT
|
|
.SP
|
|
A three-second peak-held value of headroom in dBs will be shown to the right
|
|
of the meter if this is below 6dB.
|
|
.SP
|
|
This option is enabled by default when using
|
|
SoX to play or record audio.
|
|
.TP
|
|
\fB\-T\fR\fR
|
|
Equivalent to \fB\-\-combine multiply\fR.
|
|
.TP
|
|
\fB\-\-temp\fI DIRECTORY\fR
|
|
Specify that any temporary files should be created in the given
|
|
.IR DIRECTORY .
|
|
This can be useful if there are permission or free-space problems with the
|
|
default location. In this case, using `\fB\-\-temp .\fR' (to use the
|
|
current directory) is often a good solution.
|
|
.TP
|
|
\fB\-\-version\fR
|
|
Show SoX's version number and exit.
|
|
.IP \fB\-V\fR[\fIlevel\fR]
|
|
Set verbosity. This is particularly useful for seeing how any automatic
|
|
effects have been invoked by SoX.
|
|
.SP
|
|
SoX displays messages on the console (stderr) according to the following
|
|
verbosity levels:
|
|
.IP
|
|
.RS
|
|
.IP 0
|
|
No messages are shown at all; use the exit status to determine
|
|
if an error has occurred.
|
|
.IP 1
|
|
Only error messages are shown. These are generated if
|
|
SoX cannot complete the requested commands.
|
|
.IP 2
|
|
Warning messages are also shown. These are generated if
|
|
SoX can complete the requested commands,
|
|
but not exactly according to the requested command parameters,
|
|
or if clipping occurs.
|
|
.IP 3
|
|
Descriptions of
|
|
SoX's processing phases are also shown.
|
|
Useful for seeing exactly how
|
|
SoX is processing your audio.
|
|
.IP "4 and above"
|
|
Messages to help with debugging
|
|
SoX are also shown.
|
|
.RE
|
|
.IP
|
|
By default, the verbosity level is set to 2 (shows errors and
|
|
warnings). Each occurrence of the \fB\-V\fR option increases the
|
|
verbosity level by 1. Alternatively, the verbosity level can be set
|
|
to an absolute number by specifying it immediately after the
|
|
.BR \-V ,
|
|
e.g.
|
|
.B \-V0
|
|
sets it to 0.
|
|
.IP
|
|
.SS Input File Options
|
|
These options apply only to input files and may precede only input
|
|
filenames on the command line.
|
|
.TP
|
|
\fB\-\-ignore\-length\fR
|
|
Override an (incorrect) audio length given in an audio file's header. If
|
|
this option is given then SoX will keep reading audio until it reaches
|
|
the end of the input file.
|
|
.TP
|
|
\fB\-v\fR, \fB\-\-volume\fR \fIFACTOR\fR
|
|
Intended for use when combining multiple input files, this option
|
|
adjusts the volume of the file that follows it on the command line by a
|
|
factor of \fIFACTOR\fR. This allows it to be `balanced' w.r.t. the other
|
|
input files. This is a linear (amplitude) adjustment, so a number less
|
|
than 1 decreases the volume and a number greater than 1 increases it. If a
|
|
negative number is given then in addition to the volume adjustment,
|
|
the audio signal will be inverted.
|
|
.SP
|
|
See also the
|
|
.BR norm ,
|
|
.BR vol ,
|
|
and
|
|
.B gain
|
|
effects, and see \fBInput File Balancing\fR above.
|
|
.SS Input & Output File Format Options
|
|
These options apply to the input or output file whose name they
|
|
immediately precede on the command line and are used mainly when
|
|
working with headerless file formats or when specifying a format
|
|
for the output file that is different to that of the input file.
|
|
.TP
|
|
\fB\-b\fR \fIBITS\fR, \fB\-\-bits\fR \fIBITS\fR
|
|
The number of bits (a.k.a. bit-depth or sometimes word-length) in each
|
|
encoded sample. Not applicable to complex encodings such as MP3 or GSM.
|
|
Not necessary with encodings that have a fixed number of bits, e.g.
|
|
A/\(*m-law, ADPCM.
|
|
.SP
|
|
For an input file, the most common use for this option is to inform
|
|
SoX of the number of bits per sample in a `raw' (`headerless') audio
|
|
file. For example
|
|
.EX
|
|
sox \-r 16k \-e signed \-b 8 input.raw output.wav
|
|
.EE
|
|
converts a particular `raw' file to a self-describing `WAV' file.
|
|
.SP
|
|
For an output file, this option can be used (perhaps along with
|
|
.BR \-e )
|
|
to set the output encoding size. By default (i.e. if this option is
|
|
not given), the output encoding size will (providing it is supported
|
|
by the output file type) be set to the input encoding size. For
|
|
example
|
|
.EX
|
|
sox input.cdda \-b 24 output.wav
|
|
.EE
|
|
converts raw CD digital audio (16-bit, signed-integer) to a
|
|
24-bit (signed-integer) `WAV' file.
|
|
.TP
|
|
\fB\-c\fR \fICHANNELS\fR, \fB\-\-channels\fR \fICHANNELS\fR
|
|
The number of audio channels in the audio file. This can be any number
|
|
greater than zero.
|
|
.SP
|
|
For an input file, the most common use for this option is to inform
|
|
SoX of the number of channels in a `raw' (`headerless') audio file.
|
|
Occasionally, it may be useful to use this option with a `headered'
|
|
file, in order to override the (presumably incorrect) value in the
|
|
header\*mnote that this is only supported with certain file types.
|
|
Examples:
|
|
.EX
|
|
sox \-r 48k \-e float \-b 32 \-c 2 input.raw output.wav
|
|
.EE
|
|
converts a particular `raw' file to a self-describing `WAV' file.
|
|
.EX
|
|
play \-c 1 music.wav
|
|
.EE
|
|
interprets the file data as belonging to a single channel regardless
|
|
of what is indicated in the file header. Note that if the file does
|
|
in fact have two channels, this will result in the file playing at
|
|
half speed.
|
|
.SP
|
|
For an output file, this option provides a shorthand for specifying
|
|
that the
|
|
.B channels
|
|
effect should be invoked in order to change (if necessary) the number
|
|
of channels in the audio signal to the number given. For
|
|
example, the following two commands are equivalent:
|
|
.EX
|
|
.ne 2
|
|
sox input.wav \-c 1 output.wav bass \-b 24
|
|
sox input.wav output.wav bass \-b 24 channels 1
|
|
.EE
|
|
though the second form is more flexible as it allows the effects to
|
|
be ordered arbitrarily.
|
|
.TP
|
|
\fB\-e \fIENCODING\fR, \fB\-\-encoding\fR \fIENCODING\fR
|
|
The audio encoding type. Sometimes needed with file-types that
|
|
support more than one encoding type. For example, with raw, WAV, or
|
|
AU (but not, for example, with MP3 or FLAC).
|
|
The available encoding types are as follows:
|
|
.RS
|
|
.IP \fBsigned-integer\fR
|
|
PCM data stored as signed (`two's complement') integers. Commonly used
|
|
with a 16 or 24 \-bit encoding size.
|
|
A value of 0 represents minimum signal power.
|
|
.IP \fBunsigned-integer\fR
|
|
PCM data stored as unsigned integers. Commonly used
|
|
with an 8-bit encoding size. A value of 0 represents maximum signal
|
|
power.
|
|
.IP \fBfloating-point\fR
|
|
PCM data stored as IEEE 753 single precision (32-bit) or double
|
|
precision (64-bit) floating-point (`real') numbers.
|
|
A value of 0 represents minimum signal power.
|
|
.IP \fBa-law\fR
|
|
International telephony standard for logarithmic encoding to 8 bits per
|
|
sample. It has a precision equivalent to roughly 13-bit PCM and is
|
|
sometimes encoded with reversed bit-ordering (see the
|
|
.B \-X
|
|
option).
|
|
.IP \fBu-law,\ mu-law\fR
|
|
North American telephony standard for logarithmic encoding to 8 bits per
|
|
sample. A.k.a. \(*m-law. It has a precision equivalent to roughly
|
|
14-bit PCM and is
|
|
sometimes encoded with reversed bit-ordering (see the
|
|
.B \-X
|
|
option).
|
|
.IP \fBoki-adpcm\fR
|
|
OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
|
|
it has a precision equivalent to roughly 12-bit PCM.
|
|
ADPCM is a form of audio compression that has a good
|
|
compromise between audio quality and encoding/decoding speed.
|
|
.IP \fBima-adpcm\fR
|
|
IMA (a.k.a. DVI) 4-bit ADPCM;
|
|
it has a precision equivalent to roughly 13-bit PCM.
|
|
.IP \fBms-adpcm\fR
|
|
Microsoft 4-bit ADPCM; it has a precision equivalent to roughly 14-bit
|
|
PCM.
|
|
.IP \fBgsm-full-rate\fR
|
|
GSM is currently used for the vast majority of the world's digital
|
|
wireless telephone calls. It utilises several audio
|
|
formats with different bit-rates and associated speech quality.
|
|
SoX has support for GSM's original 13kbps `Full Rate' audio format.
|
|
It is usually CPU-intensive to work with GSM audio.
|
|
.RE
|
|
.TP
|
|
\
|
|
Encoding names can be abbreviated where this would not be ambiguous;
|
|
e.g. `unsigned-integer' can be given as `un', but not `u' (ambiguous
|
|
with `u-law').
|
|
.SP
|
|
For an input file, the most common use for this option is to inform
|
|
SoX of the encoding of a `raw' (`headerless') audio
|
|
file (see the examples in
|
|
.B \-b
|
|
and
|
|
.B \-c
|
|
above).
|
|
.SP
|
|
For an output file, this option can be used (perhaps along with
|
|
.BR \-b )
|
|
to set the output encoding type For example
|
|
.EX
|
|
sox input.cdda \-e float output1.wav
|
|
|
|
sox input.cdda \-b 64 \-e float output2.wav
|
|
.EE
|
|
convert raw CD digital audio (16-bit, signed-integer) to
|
|
floating-point `WAV' files (single & double precision respectively).
|
|
.SP
|
|
By default (i.e. if this option is not given), the output encoding
|
|
type will (providing it is supported by the output file type) be set
|
|
to the input encoding type.
|
|
.TP
|
|
\fB\-\-no\-glob\fR
|
|
Specifies that filename `globbing' (wild-card matching) should not be
|
|
performed by SoX on the following filename. For example, if the current
|
|
directory contains the two files `five-seconds.wav' and `five*.wav', then
|
|
.EX
|
|
play \-\-no\-glob "five*.wav"
|
|
.EE
|
|
can be used to play just the single file `five*.wav'.
|
|
.TP
|
|
\fB\-r, \fB\-\-rate\fR \fIRATE\fR[\fBk\fR]
|
|
Gives the sample rate in Hz (or kHz if appended with `k') of the file.
|
|
.SP
|
|
For an input file, the most common use for this option is to inform
|
|
SoX of the sample rate of a `raw' (`headerless') audio file (see the
|
|
examples in
|
|
.B \-b
|
|
and
|
|
.B \-c
|
|
above).
|
|
Occasionally it may be useful to use this option with a `headered'
|
|
file, in order to override the (presumably incorrect) value in the
|
|
header\*mnote that this is only supported with certain file types.
|
|
For example, if audio was recorded with a sample-rate of say 48k from
|
|
a source that played back a little, say 1\*d5%, too slowly, then
|
|
.EX
|
|
sox \-r 48720 input.wav output.wav
|
|
.EE
|
|
effectively corrects the speed by changing only the file header (but see
|
|
also the
|
|
.B speed
|
|
effect for the more usual solution to this problem).
|
|
.SP
|
|
For an output file, this option provides a shorthand for specifying
|
|
that the
|
|
.B rate
|
|
effect should be invoked in order to change (if necessary) the sample
|
|
rate of the audio signal to the given value. For example, the
|
|
following two commands are equivalent:
|
|
.EX
|
|
.ne 2
|
|
sox input.wav \-r 48k output.wav bass \-b 24
|
|
sox input.wav output.wav bass \-b 24 rate 48k
|
|
.EE
|
|
though the second form is more flexible as it allows
|
|
.B rate
|
|
options to be given, and allows the effects to be ordered arbitrarily.
|
|
.TP
|
|
\fB\-t\fR, \fB\-\-type\fR \fIFILE-TYPE\fR
|
|
Gives the type of the audio file. For both input and output files,
|
|
this option is commonly used to inform SoX of the type a `headerless'
|
|
audio file (e.g. raw, mp3) where the actual/desired type cannot be
|
|
determined from a given filename extension. For example:
|
|
.EX
|
|
another-command | sox \-t mp3 \- output.wav
|
|
|
|
sox input.wav \-t raw output.bin
|
|
.EE
|
|
It can also be used to override the type implied by an input filename
|
|
extension, but if overriding with a type that has a header, SoX will
|
|
exit with an appropriate error message if such a header is not
|
|
actually present.
|
|
.SP
|
|
See
|
|
.BR soxformat (7)
|
|
for a list of supported file types.
|
|
.PP
|
|
\fB\-L\fR, \fB\-\-endian little\fR
|
|
.br
|
|
\fB\-B\fR, \fB\-\-endian big\fR
|
|
.br
|
|
\fB\-x\fR, \fB\-\-endian swap\fR
|
|
.if t .sp -.5
|
|
.if n .sp -1
|
|
.TP
|
|
\
|
|
These options specify whether the byte-order of the audio data is,
|
|
respectively, `little endian', `big endian', or the opposite to that of
|
|
the system on which SoX is being used. Endianness applies only to data
|
|
encoded as floating-point, or as signed or unsigned integers of 16 or
|
|
more bits. It is often necessary to specify one of these options for
|
|
headerless files, and sometimes necessary for (otherwise)
|
|
self-describing files. A given endian-setting option may be ignored
|
|
for an input file whose header contains a specific endianness
|
|
identifier, or for an output file that is actually an audio device.
|
|
.SP
|
|
.B N.B.
|
|
Unlike other format characteristics, the endianness (byte, nibble, &
|
|
bit ordering) of the input file is not automatically used for the output
|
|
file; so, for example, when the following is run on a little-endian system:
|
|
.EX
|
|
sox \-B audio.s16 trimmed.s16 trim 2
|
|
.EE
|
|
trimmed.s16 will be created as little-endian;
|
|
.EX
|
|
sox \-B audio.s16 \-B trimmed.s16 trim 2
|
|
.EE
|
|
must be used to preserve big-endianness in the output file.
|
|
.SP
|
|
The
|
|
.B \-V
|
|
option can be used to check the selected orderings.
|
|
.TP
|
|
\fB\-N\fR, \fB\-\-reverse\-nibbles\fR
|
|
Specifies that the nibble ordering (i.e. the 2 halves of a byte) of the samples should be reversed;
|
|
sometimes useful with ADPCM-based formats.
|
|
.SP
|
|
.B N.B.
|
|
See also N.B. in section on
|
|
.B \-x
|
|
above.
|
|
.TP
|
|
\fB\-X\fR, \fB\-\-reverse\-bits\fR
|
|
Specifies that the bit ordering of the samples should be reversed;
|
|
sometimes useful with a few (mostly headerless) formats.
|
|
.SP
|
|
.B N.B.
|
|
See also N.B. in section on
|
|
.B \-x
|
|
above.
|
|
.SS Output File Format Options
|
|
These options apply only to the output file and may precede only the output
|
|
filename on the command line.
|
|
.TP
|
|
\fB\-\-add\-comment \fITEXT\fR
|
|
Append a comment in the output file header (where applicable).
|
|
.TP
|
|
\fB\-\-comment \fITEXT\fR
|
|
Specify the comment text to store in the output file header (where
|
|
applicable).
|
|
.SP
|
|
SoX will provide a default comment if this option (or
|
|
.BR \-\-comment\-file )
|
|
is not given. To specify that no comment should be stored in the output file,
|
|
use
|
|
.B "\-\-comment \(dq\(dq" .
|
|
.TP
|
|
\fB\-\-comment\-file \fIFILENAME\fR
|
|
Specify a file containing the comment text to store in the output
|
|
file header (where applicable).
|
|
.TP
|
|
\fB\-C\fR, \fB\-\-compression\fR \fIFACTOR\fR
|
|
The compression factor for variably compressing output file formats. If
|
|
this option is not given then a default compression factor will apply.
|
|
The compression factor is interpreted differently for different
|
|
compressing file formats. See the description of the file formats that
|
|
use this option in
|
|
.BR soxformat (7)
|
|
for more information.
|
|
.SH EFFECTS
|
|
In addition to converting, playing and recording audio files, SoX can
|
|
be used to invoke a number of audio `effects'. Multiple effects may
|
|
be applied by specifying them one after another at the end of the SoX
|
|
command line, forming an `effects chain'.
|
|
Note that applying multiple effects in real-time (i.e. when playing audio)
|
|
is likely to require a high performance computer. Stopping other applications
|
|
may alleviate performance issues should they occur.
|
|
.SP
|
|
Some of the SoX effects are primarily intended to be applied to a single
|
|
instrument or `voice'. To facilitate this, the \fBremix\fR effect and
|
|
the global SoX option \fB\-M\fR can be used to isolate then recombine
|
|
tracks from a multi-track recording.
|
|
.SS Multiple Effects Chains
|
|
A single effects chain is made up of one or more effects. Audio from
|
|
the input runs through the chain until either the end of the input file
|
|
is reached or an effect in the chain requests to terminate the chain.
|
|
.SP
|
|
SoX supports running multiple effects chains over the input audio.
|
|
In this case, when one chain indicates it is done processing audio,
|
|
the audio data is then sent through the next effects chain. This
|
|
continues until either no more effects chains exist or the input has
|
|
reached the end of the file.
|
|
.SP
|
|
An effects chain is terminated by placing a
|
|
.B :
|
|
(colon) after an effect. Any following effects are a part of a new effects chain.
|
|
.SP
|
|
It is important to place the effect that will stop the chain
|
|
as the first effect in the chain. This is because any samples
|
|
that are buffered by effects to the left of the terminating effect
|
|
will be discarded. The amount of samples discarded is related to the
|
|
.B \-\-buffer
|
|
option and it should be kept small, relative to the sample rate, if
|
|
the terminating effect cannot be first. Further information on
|
|
stopping effects can be found in the
|
|
.B Stopping SoX
|
|
section.
|
|
.SP
|
|
There are a few pseudo-effects that aid using multiple effects chains.
|
|
These include
|
|
.B newfile
|
|
which will start writing to a new output file before moving to the
|
|
next effects chain and
|
|
.B restart
|
|
which will move back to the first effects chain. Pseudo-effects
|
|
must be specified as the first effect in a chain and as the only
|
|
effect in a chain (they must have a
|
|
.B :
|
|
before and after they are specified).
|
|
.SP
|
|
The following is an example of multiple effects chains. It will split the
|
|
input file into multiple files of 30 seconds in length. Each output filename
|
|
will have unique number in its name as documented in the
|
|
.B Output Files
|
|
section.
|
|
.EX
|
|
sox infile.wav output.wav trim 0 30 : newfile : restart
|
|
.EE
|
|
.SS Common Notation And Parameters
|
|
In the descriptions that follow,
|
|
brackets [ ] are used to denote parameters that are optional, braces
|
|
{ } to denote those that are both optional and repeatable,
|
|
and angle brackets < > to denote those that are repeatable but not
|
|
optional.
|
|
Where applicable, default values for optional parameters are shown in parenthesis ( ).
|
|
.SP
|
|
The following parameters are used with, and have the same meaning for,
|
|
several effects:
|
|
.TP
|
|
\fIcenter\fR[\fBk\fR]
|
|
See
|
|
.IR frequency .
|
|
.TP
|
|
\fIfrequency\fR[\fBk\fR]
|
|
A frequency in Hz, or, if appended with `k', kHz.
|
|
.TP
|
|
\fIgain\fR
|
|
A power gain in dB.
|
|
Zero gives no gain; less than zero gives an attenuation.
|
|
.TP
|
|
\fIposition\fR
|
|
A position within the audio stream; the syntax is
|
|
[\fB=\fR\^|\^\fB+\fR\^|\^\fB\-\fR]\fItimespec\fR, where \fItimespec\fR is a
|
|
time specification (see below). The optional first character indicates
|
|
whether the \fItimespec\fR is to be interpreted relative to the start
|
|
(\fB=\fR) or end (\fB\-\fR) of audio, or to the previous \fIposition\fR if
|
|
the effect accepts multiple position arguments (\fB+\fR). The audio length
|
|
must be known for end-relative locations to work; some effects do accept
|
|
\fB\-0\fR for end-of-audio, though, even if the length is unknown. Which of
|
|
\fB=\fR, \fB+\fR, \fB\-\fR is the default depends on the effect and is shown
|
|
in its syntax as, e.g., \fIposition(+)\fR.
|
|
.SP
|
|
Examples: \fB=2:00\fR (two minutes into the audio stream), \fB\-100s\fR (one
|
|
hundred samples before the end of audio), \fB+0:12+10s\fR (twelve seconds
|
|
and ten samples after the previous position), \fB\-0.5+1s\fR (one sample less
|
|
than half a second before the end of audio).
|
|
.TP
|
|
\fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
|
|
Used to specify the band-width of a filter. A number of different
|
|
methods to specify the width are available (though not all for every effect).
|
|
One of the characters shown may be appended to select the desired method
|
|
as follows:
|
|
.ne 5
|
|
.TS
|
|
center;
|
|
cI cI lI
|
|
cB c l.
|
|
\ Method Notes
|
|
h Hz \
|
|
k kHz \
|
|
o Octaves \
|
|
q Q-factor See [2]
|
|
.TE
|
|
.DT
|
|
.SP
|
|
For each effect that uses this parameter, the default method (i.e. if no
|
|
character is appended) is the one that it listed first in the first line of
|
|
the effect's description.
|
|
.PP
|
|
Most effects that expect an audio position or duration in a parameter,
|
|
i.e. a \fBtime specification\fR, accept either of the following two forms:
|
|
.TP
|
|
[[\fIhours\fB:\fR]\fIminutes\fB:\fR]\fIseconds\fR[\fB.\fIfrac\fR][\fBt\fR]
|
|
A specification of `1:30\*d5' corresponds to one minute, thirty and
|
|
\(12 seconds. The \fBt\fR suffix is entirely optional (however, see the
|
|
\fBsilence\fR effect for an exception).
|
|
Note that the component values do not have to be normalized; e.g.,
|
|
`1:23:45', `83:45', `79:0285', `1:0:1425', `1::1425' and `5025' all are
|
|
legal and equivalent to each other.
|
|
.TP
|
|
\fIsamples\fBs\fR
|
|
Specifies the number of samples directly, as in `8000s'. For large sample
|
|
counts, \fIe notation\fR is supported: `1.7e6s' is the same as `1700000s'.
|
|
.PP
|
|
Time specifications can also be chained with \fB+\fR or \fB\-\fR into a new
|
|
time specification where the right part is added to or subtracted from the
|
|
left, respectively: `3:00\-200s' means two hundred samples less than three
|
|
minutes.
|
|
.SP
|
|
To see if SoX has support for an optional effect, enter
|
|
.B sox \-h
|
|
and look for its name under the list: `EFFECTS'.
|
|
.SS Supported Effects
|
|
Note: a categorised list of the effects can be found in the
|
|
accompanying `README' file.
|
|
.TP
|
|
\fBallpass\fR \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
|
|
Apply a two-pole all-pass filter with central frequency (in Hz)
|
|
\fIfrequency\fR, and filter-width \fIwidth\fR.
|
|
An all-pass filter changes the
|
|
audio's frequency to phase relationship without changing its frequency
|
|
to amplitude relationship. The filter is described in detail in [1].
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBband\fR [\fB\-n\fR] \fIcenter\fR[\fBk\fR]\fR [\fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]
|
|
Apply a band-pass filter.
|
|
The frequency response drops logarithmically
|
|
around the
|
|
.I center
|
|
frequency.
|
|
The
|
|
.I width
|
|
parameter gives the slope of the drop.
|
|
The frequencies at
|
|
.I center
|
|
+
|
|
.I width
|
|
and
|
|
.I center
|
|
\-
|
|
.I width
|
|
will be half of their original amplitudes.
|
|
.B band
|
|
defaults to a mode oriented to pitched audio,
|
|
i.e. voice, singing, or instrumental music.
|
|
The \fB\-n\fR (for noise) option uses the alternate mode
|
|
for un-pitched audio (e.g. percussion).
|
|
.B Warning:
|
|
\fB\-n\fR introduces a power-gain of about 11dB in the filter, so beware
|
|
of output clipping.
|
|
.B band
|
|
introduces noise in the shape of the filter,
|
|
i.e. peaking at the
|
|
.I center
|
|
frequency and settling around it.
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also \fBsinc\fR for a bandpass filter with steeper shoulders.
|
|
.TP
|
|
\fBbandpass\fR\^|\^\fBbandreject\fR [\fB\-c\fR] \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
|
|
Apply a two-pole Butterworth band-pass or band-reject filter with
|
|
central frequency \fIfrequency\fR, and (3dB-point) band-width
|
|
\fIwidth\fR. The
|
|
.B \-c
|
|
option applies only to
|
|
.B bandpass
|
|
and selects a constant skirt gain (peak gain = Q) instead of the
|
|
default: constant 0dB peak gain.
|
|
The filters roll off at 6dB per octave (20dB per decade)
|
|
and are described in detail in [1].
|
|
.SP
|
|
These effects support the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also \fBsinc\fR for a bandpass filter with steeper shoulders.
|
|
.TP
|
|
\fBbandreject \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
|
|
Apply a band-reject filter.
|
|
See the description of the \fBbandpass\fR effect for details.
|
|
.TP
|
|
\fBbass\fR\^|\^\fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
|
|
Boost or cut the bass (lower) or treble (upper) frequencies of the audio
|
|
using a two-pole shelving filter with a response similar to that
|
|
of a standard hi-fi's tone-controls. This is also
|
|
known as shelving equalisation (EQ).
|
|
.SP
|
|
\fIgain\fR gives the gain at 0\ Hz (for \fBbass\fR), or whichever is
|
|
the lower of \(ap22\ kHz and the Nyquist frequency (for \fBtreble\fR). Its
|
|
useful range is about \-20 (for a large cut) to +20 (for a large
|
|
boost).
|
|
Beware of
|
|
.B Clipping
|
|
when using a positive \fIgain\fR.
|
|
.SP
|
|
If desired, the filter can be fine-tuned using the following
|
|
optional parameters:
|
|
.SP
|
|
\fIfrequency\fR sets the filter's central frequency and so can be
|
|
used to extend or reduce the frequency range to be boosted or
|
|
cut. The default value is 100\ Hz (for \fBbass\fR) or 3\ kHz (for
|
|
\fBtreble\fR).
|
|
.SP
|
|
\fIwidth\fR
|
|
determines how
|
|
steep is the filter's shelf transition. In addition to the common
|
|
width specification methods described above,
|
|
`slope' (the default, or if appended with `\fBs\fR') may be used.
|
|
The useful range of `slope' is
|
|
about 0\*d3, for a gentle slope, to 1 (the maximum), for a steep slope; the
|
|
default value is 0\*d5.
|
|
.SP
|
|
The filters are described in detail in [1].
|
|
.SP
|
|
These effects support the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also \fBequalizer\fR for a peaking equalisation effect.
|
|
.TP
|
|
\fBbend\fR [\fB\-f \fIframe-rate\fR(25)] [\fB\-o \fIover-sample\fR(16)] { \fIstart-position(+)\fB,\fIcents\fB,\fIend-position(+)\fR }
|
|
Changes pitch by specified amounts at specified times.
|
|
Each given triple: \fIstart-position\fB,\fIcents\fB,\fIend-position\fR
|
|
specifies one bend.
|
|
\fIcents\fR is the number of cents (100 cents = 1 semitone) by which to
|
|
bend the pitch. The other values specify the points in time at which to start
|
|
and end bending the pitch, respectively.
|
|
.SP
|
|
The pitch-bending algorithm utilises the Discrete Fourier Transform (DFT)
|
|
at a particular frame rate and over-sampling rate.
|
|
The
|
|
.B \-f
|
|
and
|
|
.B \-o
|
|
parameters may be used to adjust these parameters and thus control the
|
|
smoothness of the changes in pitch.
|
|
.SP
|
|
For example, an initial tone is generated, then bent three times, yielding
|
|
four different notes in total:
|
|
.EX
|
|
.ne 2
|
|
play \-n synth 2.5 sin 667 gain 1 \\
|
|
bend .35,180,.25 .15,740,.53 0,\-520,.3
|
|
.EE
|
|
Here, the first bend runs from 0.35 to 0.6, and the second one from 0.75
|
|
to 1.28 seconds.
|
|
Note that the clipping that is produced in this example is deliberate;
|
|
to remove it, use
|
|
.B gain\ \-5
|
|
in place of
|
|
.BR gain\ 1 .
|
|
.SP
|
|
See also \fBpitch\fR.
|
|
.TP
|
|
\fBbiquad \fIb0 b1 b2 a0 a1 a2\fR
|
|
Apply a biquad IIR filter with the given coefficients. Where b* and a* are
|
|
the numerator and denominator coefficients respectively.
|
|
.SP
|
|
See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0 = 1).
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBchannels \fICHANNELS\fR
|
|
Invoke a simple algorithm to change the number of channels in
|
|
the audio signal to the given number
|
|
.IR CHANNELS :
|
|
mixing if decreasing the number of channels or duplicating if
|
|
increasing the number of channels.
|
|
.SP
|
|
The
|
|
.B channels
|
|
effect is invoked automatically if SoX's \fB\-c\fR option specifies a
|
|
number of channels that is different to that of the input file(s).
|
|
Alternatively, if this effect is given explicitly, then SoX's
|
|
.B \-c
|
|
option need not be given. For example, the following two commands are
|
|
equivalent:
|
|
.EX
|
|
.ne 2
|
|
sox input.wav \-c 1 output.wav bass \-b 24
|
|
sox input.wav output.wav bass \-b 24 channels 1
|
|
.EE
|
|
though the second form is more flexible as it allows the effects to
|
|
be ordered arbitrarily.
|
|
.SP
|
|
See also
|
|
.B remix
|
|
for an effect that allows channels to be mixed/selected arbitrarily.
|
|
.TP
|
|
\fBchorus \fIgain-in gain-out\fR <\fIdelay decay speed depth \fB\-s\fR\^|\^\fB\-t\fR>
|
|
Add a chorus effect to the audio. This can make a single vocal sound
|
|
like a chorus, but can also be applied to instrumentation.
|
|
.SP
|
|
Chorus resembles an echo effect with a short delay, but
|
|
whereas with echo the delay is constant, with chorus, it
|
|
is varied using sinusoidal or triangular modulation. The modulation
|
|
depth defines the range the modulated delay is played before or after the
|
|
delay. Hence the delayed sound will sound slower or faster, that is the delayed
|
|
sound tuned around the original one, like in a chorus where some vocals are
|
|
slightly off key.
|
|
See [3] for more discussion of the chorus effect.
|
|
.SP
|
|
Each four-tuple parameter
|
|
delay/decay/speed/depth gives the delay in milliseconds
|
|
and the decay (relative to gain-in) with a modulation
|
|
speed in Hz using depth in milliseconds.
|
|
The modulation is either sinusoidal (\fB\-s\fR) or triangular
|
|
(\fB\-t\fR). Gain-out is the volume of the output.
|
|
.SP
|
|
A typical delay is around 40ms to 60ms; the modulation speed is best
|
|
near 0\*d25Hz and the modulation depth around 2ms.
|
|
For example, a single delay:
|
|
.EX
|
|
play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 \-t
|
|
.EE
|
|
Two delays of the original samples:
|
|
.EX
|
|
.ne 2
|
|
play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 \-t \\
|
|
60 0.32 0.4 1.3 \-s
|
|
.EE
|
|
A fuller sounding chorus (with three additional delays):
|
|
.EX
|
|
.ne 2
|
|
play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 \-t \\
|
|
60 0.32 0.4 2.3 \-t 40 0.3 0.3 1.3 \-s
|
|
.EE
|
|
.TP
|
|
\fBcompand \fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
|
|
[\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
|
|
.br
|
|
[\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]
|
|
.SP
|
|
Compand (compress or expand) the dynamic range of the audio.
|
|
.SP
|
|
The
|
|
.I attack
|
|
and
|
|
.I decay
|
|
parameters (in seconds) determine the time over which the
|
|
instantaneous level of the input signal is averaged to determine its
|
|
volume; attacks refer to increases in volume and decays refer to
|
|
decreases.
|
|
For most situations, the attack time (response to the music getting
|
|
louder) should be shorter than the decay time because the human ear is more
|
|
sensitive to sudden loud music than sudden soft music.
|
|
Where more than one pair of attack/decay parameters are
|
|
specified, each input channel is companded separately and the number of
|
|
pairs must agree with the number of input channels.
|
|
Typical values are
|
|
.B 0\*d3,0\*d8
|
|
seconds.
|
|
.SP
|
|
The second parameter is a list of points on the compander's transfer
|
|
function specified in dB relative to the maximum possible signal
|
|
amplitude. The input values must be in a strictly increasing order but
|
|
the transfer function does not have to be monotonically rising. If
|
|
omitted, the value of
|
|
.I out-dB1
|
|
defaults to the same value as
|
|
.IR in-dB1 ;
|
|
levels below
|
|
.I in-dB1
|
|
are not companded (but may have gain applied to them).
|
|
The point \fB0,0\fR is assumed but may be overridden (by
|
|
\fB0,\fIout-dBn\fR).
|
|
If the list is preceded by a
|
|
.I soft-knee-dB
|
|
value, then the points at where adjacent line segments on the
|
|
transfer function meet will be rounded by the amount given.
|
|
Typical values for the transfer function are
|
|
.BR 6:\-70,\-60,\-20 .
|
|
.SP
|
|
The third (optional) parameter is an additional gain in dB to be applied
|
|
at all points on the transfer function and allows easy adjustment
|
|
of the overall gain.
|
|
.SP
|
|
The fourth (optional) parameter is an initial level to be assumed for
|
|
each channel when companding starts. This permits the user to supply a
|
|
nominal level initially, so that, for example, a very large gain is not
|
|
applied to initial signal levels before the companding action has begun
|
|
to operate: it is quite probable that in such an event, the output would
|
|
be severely clipped while the compander gain properly adjusts itself.
|
|
A typical value (for audio which is initially quiet) is
|
|
.B \-90
|
|
dB.
|
|
.SP
|
|
The fifth (optional) parameter is a delay in seconds. The input signal
|
|
is analysed immediately to control the compander, but it is delayed
|
|
before being fed to the volume adjuster. Specifying a delay
|
|
approximately equal to the attack/decay times allows the compander to
|
|
effectively operate in a `predictive' rather than a reactive mode.
|
|
A typical value is
|
|
.B 0\*d2
|
|
seconds.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
The following example might be used to make a piece of music with both
|
|
quiet and loud passages suitable for listening to in a noisy environment
|
|
such as a moving vehicle:
|
|
.EX
|
|
sox asz.wav asz-car.wav compand 0.3,1 6:\-70,\-60,\-20 \-5 \-90 0.2
|
|
.EE
|
|
The transfer function (`6:\-70,...') says that very soft sounds (below
|
|
\-70dB) will remain unchanged. This will stop the compander from
|
|
boosting the volume on `silent' passages such as between movements.
|
|
However, sounds in the range \-60dB to 0dB (maximum
|
|
volume) will be boosted so that the 60dB dynamic range of the
|
|
original music will be compressed 3-to-1 into a 20dB range, which is
|
|
wide enough to enjoy the music but narrow enough to get around the
|
|
road noise. The `6:' selects 6dB soft-knee companding.
|
|
The \-5 (dB) output gain is needed to avoid clipping (the number is
|
|
inexact, and was derived by experimentation).
|
|
The \-90 (dB) for the initial volume will work fine for a clip that starts
|
|
with near silence, and the delay of 0\*d2 (seconds) has the effect of causing
|
|
the compander to react a bit more quickly to sudden volume changes.
|
|
.SP
|
|
In the next example, compand is being used as a noise-gate for when the
|
|
noise is at a lower level than the signal:
|
|
.EX
|
|
play infile compand .1,.2 \-inf,\-50.1,\-inf,\-50,\-50 0 \-90 .1
|
|
.EE
|
|
Here is another noise-gate, this time for when the
|
|
noise is at a higher level than the signal (making it, in some ways,
|
|
similar to squelch):
|
|
.EX
|
|
play infile compand .1,.1 \-45.1,\-45,\-inf,0,\-inf 45 \-90 .1
|
|
.EE
|
|
This effect supports the \fB\-\-plot\fR global option (for the transfer function).
|
|
.SP
|
|
See also
|
|
.B mcompand
|
|
for a multiple-band companding effect.
|
|
.TP
|
|
\fBcontrast \fR[\fIenhancement-amount\fR(75)]
|
|
Comparable with compression, this effect modifies an audio signal to
|
|
make it sound louder.
|
|
.I enhancement-amount
|
|
controls the amount of the enhancement and is a number in the range 0\-100.
|
|
Note that
|
|
.I enhancement-amount
|
|
= 0 still gives a significant contrast enhancement.
|
|
.SP
|
|
See also the
|
|
.B compand
|
|
and
|
|
.B mcompand
|
|
effects.
|
|
.TP
|
|
\fBdcshift \fIshift\fR [\fIlimitergain\fR]
|
|
Apply a DC shift to the audio. This can be useful to remove a DC
|
|
offset (caused perhaps by a hardware problem in the recording chain)
|
|
from the audio. The effect of a DC offset is reduced headroom and
|
|
hence volume.
|
|
The
|
|
.B stat
|
|
or
|
|
.B stats
|
|
effect can be used to determine if a signal has a DC offset.
|
|
.SP
|
|
The given \fIdcshift\fR value is a floating point number in the range
|
|
of \(+-2 that indicates the amount to shift the audio (which is in the
|
|
range of \(+-1).
|
|
.SP
|
|
An optional
|
|
.I limitergain
|
|
can be specified as well. It should have a value much less than 1
|
|
(e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
An alternative approach to removing a DC offset (albeit with a short delay)
|
|
is to use the
|
|
.B highpass
|
|
filter effect at a frequency of say 10Hz, as illustrated in the following
|
|
example:
|
|
.EX
|
|
sox \-n dc.wav synth 5 sin %0 50
|
|
sox dc.wav fixed.wav highpass 10
|
|
.EE
|
|
.TP
|
|
\fBdeemph\fR
|
|
Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving
|
|
filter).
|
|
.SP
|
|
Pre-emphasis was applied in the mastering of some CDs issued in the early
|
|
1980s. These included many classical music albums, as well as now
|
|
sought-after issues of albums by The Beatles, Pink Floyd and others.
|
|
Pre-emphasis should be removed at playback time by a de-emphasis
|
|
filter in the playback device. However, not all modern CD players have
|
|
this filter, and very few PC CD drives have it; playing pre-emphasised
|
|
audio without the correct de-emphasis filter results in audio that sounds harsh
|
|
and is far from what its creators intended.
|
|
.SP
|
|
With the
|
|
.B deemph
|
|
effect, it is possible to apply the necessary de-emphasis to audio that
|
|
has been extracted from a pre-emphasised CD, and then either burn the
|
|
de-emphasised audio to a new CD (which will then play correctly on any
|
|
CD player), or simply play the correctly de-emphasised audio files on the
|
|
PC. For example:
|
|
.EX
|
|
sox track1.wav track1\-deemph.wav deemph
|
|
.EE
|
|
and then burn track1-deemph.wav to CD, or
|
|
.EX
|
|
play track1\-deemph.wav
|
|
.EE
|
|
or simply
|
|
.EX
|
|
play track1.wav deemph
|
|
.EE
|
|
The de-emphasis filter is implemented as a biquad and requires the input
|
|
audio sample rate to be either 44.1kHz or 48kHz. Maximum deviation
|
|
from the ideal response is only 0\*d06dB (up to 20kHz).
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also the \fBbass\fR and \fBtreble\fR shelving equalisation effects.
|
|
.TP
|
|
\fBdelay\fR {\fIposition(=)\fR}
|
|
Delay one or more audio channels such that they start at the given
|
|
\fIposition\fR.
|
|
For example,
|
|
.B delay 1\*d5 +1 3000s
|
|
delays the first channel by 1\*d5 seconds, the second channel by 2\*d5
|
|
seconds (one second more than the previous channel), the third channel
|
|
by 3000 samples, and leaves any other channels that may be
|
|
present un-delayed.
|
|
The following (one long) command plays a chime sound:
|
|
.EX
|
|
.ne 3
|
|
play \-n synth \-j 3 sin %3 sin %\-2 sin %\-5 sin %\-9 \\
|
|
sin %\-14 sin %\-21 fade h .01 2 1.5 delay \\
|
|
1.3 1 .76 .54 .27 remix \- fade h 0 2.7 2.5 norm \-1
|
|
.EE
|
|
and this plays a guitar chord:
|
|
.EX
|
|
.ne 2
|
|
play \-n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \\
|
|
delay 0 .05 .1 .15 .2 .25 remix \- fade 0 4 .1 norm \-1
|
|
.EE
|
|
.TP
|
|
\fBdither\fR [\fB\-S\fR\^|\^\fB\-s\fR\^|\^\fB\-f \fIfilter\fR] [\fB\-a\fR] [\fB\-p \fIprecision\fR]
|
|
Apply dithering to the audio.
|
|
Dithering deliberately adds a small amount of noise to the signal in
|
|
order to mask audible quantization effects that can occur if the output
|
|
sample size is less than 24 bits. With no options, this effect will
|
|
add triangular (TPDF) white noise. Noise-shaping (only for certain
|
|
sample rates) can be selected with
|
|
.BR \-s .
|
|
With the
|
|
.B \-f
|
|
option, it is possible to select a particular noise-shaping filter from
|
|
the following list: lipshitz, f-weighted, modified-e-weighted,
|
|
improved-e-weighted, gesemann, shibata, low-shibata, high-shibata. Note
|
|
that most filter types are available only with 44100Hz sample rate. The
|
|
filter types are distinguished by the following properties: audibility
|
|
of noise, level of (inaudible, but in some circumstances, otherwise
|
|
problematic) shaped high frequency noise, and processing speed.
|
|
.br
|
|
See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of the different
|
|
noise-shaping curves.
|
|
.SP
|
|
The
|
|
.B \-S
|
|
option selects a slightly `sloped' TPDF, biased towards higher
|
|
frequencies. It can be used at any sampling rate but below \(~~22k,
|
|
plain TPDF is probably better, and above \(~~ 37k, noise-shaping
|
|
(if available) is probably better.
|
|
.SP
|
|
The
|
|
.B \-a
|
|
option enables a mode where dithering (and noise-shaping if applicable)
|
|
are automatically enabled only when needed. The most likely use for
|
|
this is when applying fade in or out to an already dithered file, so
|
|
that the redithering applies only to the faded portions. However, auto
|
|
dithering is not fool-proof, so the fades should be carefully checked
|
|
for any noise modulation; if this occurs, then either re-dither the whole
|
|
file, or use
|
|
.BR trim ,
|
|
.BR fade ,
|
|
and concatencate.
|
|
.SP
|
|
The
|
|
.B \-p
|
|
option allows overriding the target precision.
|
|
.SP
|
|
If the SoX global option
|
|
.B \-R
|
|
option is not given, then the pseudo-random number generator used to
|
|
generate the white noise will be `reseeded', i.e. the generated noise
|
|
will be different between invocations.
|
|
.SP
|
|
This effect should not be followed by any other effect that
|
|
affects the audio.
|
|
.SP
|
|
See also the `Dithering' section above.
|
|
.TP
|
|
\fBdownsample\fR [\fIfactor\fR(2)]
|
|
Downsample the signal by an integer factor: Only the first out of
|
|
each \fIfactor\fR samples is retained, the others are discarded.
|
|
.SP
|
|
No decimation filter is applied. If the input is not a properly
|
|
bandlimited baseband signal, aliasing will occur. This may be
|
|
desirable, e.g., for frequency translation.
|
|
.SP
|
|
For a general resampling effect with anti-aliasing, see \fBrate\fR. See
|
|
also \fBupsample\fR.
|
|
.TP
|
|
\fBearwax\fR
|
|
Makes audio easier to listen to on headphones.
|
|
Adds `cues' to 44\*d1kHz stereo (i.e. audio CD format) audio so that
|
|
when listened to on headphones the stereo image is
|
|
moved from inside
|
|
your head (standard for headphones) to outside and in front of the
|
|
listener (standard for speakers).
|
|
.TP
|
|
\fBecho \fIgain-in gain-out\fR <\fIdelay decay\fR>
|
|
Add echoing to the audio.
|
|
Echoes are reflected sound and can occur naturally amongst mountains
|
|
(and sometimes large buildings) when talking or shouting; digital echo
|
|
effects emulate this behaviour and are often used to help fill
|
|
out the sound of a single instrument or vocal. The time difference
|
|
between the original signal and the reflection is the `delay' (time),
|
|
and the loudness of the reflected signal is the `decay'. Multiple echoes
|
|
can have different delays and decays.
|
|
.SP
|
|
Each given
|
|
.I "delay decay"
|
|
pair gives the delay in milliseconds
|
|
and the decay (relative to gain-in) of that echo.
|
|
Gain-out is the volume of the output.
|
|
For example:
|
|
This will make it sound as if there are twice as many instruments as are
|
|
actually playing:
|
|
.EX
|
|
play lead.aiff echo 0.8 0.88 60 0.4
|
|
.EE
|
|
If the delay is very short, then it sound like a (metallic) robot playing
|
|
music:
|
|
.EX
|
|
play lead.aiff echo 0.8 0.88 6 0.4
|
|
.EE
|
|
A longer delay will sound like an open air concert in the mountains:
|
|
.EX
|
|
play lead.aiff echo 0.8 0.9 1000 0.3
|
|
.EE
|
|
One mountain more, and:
|
|
.EX
|
|
play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
|
|
.EE
|
|
.TP
|
|
\fBechos \fIgain-in gain-out\fR <\fIdelay decay\fR>
|
|
Add a sequence of echoes to the audio.
|
|
Each
|
|
.I "delay decay"
|
|
pair gives the delay in milliseconds
|
|
and the decay (relative to gain-in) of that echo.
|
|
Gain-out is the volume of the output.
|
|
.SP
|
|
Like the echo effect, echos stand for `ECHO in Sequel', that is the first echos
|
|
takes the input, the second the input and the first echos, the third the input
|
|
and the first and the second echos, ... and so on.
|
|
Care should be taken using many echos; a single echos
|
|
has the same effect as a single echo.
|
|
.SP
|
|
The sample will be bounced twice in symmetric echos:
|
|
.EX
|
|
play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
|
|
.EE
|
|
The sample will be bounced twice in asymmetric echos:
|
|
.EX
|
|
play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
|
|
.EE
|
|
The sample will sound as if played in a garage:
|
|
.EX
|
|
play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
|
|
.EE
|
|
.TP
|
|
\fBequalizer \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR] \fIgain\fR
|
|
Apply a two-pole peaking equalisation (EQ) filter.
|
|
With this filter, the signal-level at and around a selected frequency
|
|
can be increased or decreased, whilst (unlike band-pass and band-reject
|
|
filters) that at all other frequencies is unchanged.
|
|
.SP
|
|
\fIfrequency\fR gives the filter's central frequency in Hz,
|
|
\fIwidth\fR, the band-width,
|
|
and \fIgain\fR the required gain
|
|
or attenuation in dB.
|
|
Beware of
|
|
.B Clipping
|
|
when using a positive \fIgain\fR.
|
|
.SP
|
|
In order to produce complex equalisation curves, this effect
|
|
can be given several times, each with a different central frequency.
|
|
.SP
|
|
The filter is described in detail in [1].
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also \fBbass\fR and \fBtreble\fR for shelving equalisation effects.
|
|
.TP
|
|
\fBfade\fR [\fItype\fR] \fIfade-in-length\fR [\fIstop-position(=)\fR [\fIfade-out-length\fR]]
|
|
Apply a fade effect to the beginning, end, or both of the audio.
|
|
.SP
|
|
An optional \fItype\fR can be specified to select the shape of the fade
|
|
curve:
|
|
\fBq\fR for quarter of a sine wave, \fBh\fR for half a sine
|
|
wave, \fBt\fR for linear (`triangular') slope, \fBl\fR for logarithmic,
|
|
and \fBp\fR for inverted parabola. The default is logarithmic.
|
|
.SP
|
|
A fade-in starts from the first sample and ramps the signal level from 0
|
|
to full volume over the time given as \fIfade-in-length\fR. Specify 0 if
|
|
no fade-in is wanted.
|
|
.SP
|
|
For fade-outs, the audio will be truncated at
|
|
.I stop-position
|
|
and the signal level will be ramped from full volume down to 0 over an
|
|
interval of \fIfade-out-length\fR before the \fIstop-position\fR. If
|
|
.I fade-out-length
|
|
is not specified, it defaults to the same value as
|
|
\fIfade-in-length\fR.
|
|
No fade-out is performed if
|
|
.I stop-position
|
|
is not specified.
|
|
If the audio length can be determined from the input file header and any
|
|
previous effects, then \fB\-0\fR (or, for historical reasons, \fB0\fR) may
|
|
be specified for
|
|
.I stop-position
|
|
to indicate the usual case of a fade-out that ends at the end of the input
|
|
audio stream.
|
|
.SP
|
|
Any time specification may be used for \fIfade-in-length\fR and
|
|
\fIfade-out-length\fR.
|
|
.SP
|
|
See also the
|
|
.B splice
|
|
effect.
|
|
.TP
|
|
\fBfir\fR [\fIcoefs-file\fR\^|\^\fIcoefs\fR]
|
|
Use SoX's FFT convolution engine with given FIR filter
|
|
coefficients.
|
|
If a single argument is given then this is treated as the name of a file
|
|
containing the filter coefficients (white-space separated; may contain
|
|
`#' comments). If the given filename is `\-', or if no argument is
|
|
given, then the coefficients are read from the `standard input' (stdin);
|
|
otherwise, coefficients may be given on the command line.
|
|
Examples:
|
|
.EX
|
|
sox infile outfile fir 0.0195 \-0.082 0.234 0.891 \-0.145 0.043
|
|
.EE
|
|
.EX
|
|
sox infile outfile fir coefs.txt
|
|
.EE
|
|
with coefs.txt containing
|
|
.EX
|
|
# HP filter
|
|
# freq=10000
|
|
1.2311233052619888e\-01
|
|
\-4.4777096106211783e\-01
|
|
5.1031563346705155e\-01
|
|
\-6.6502926320995331e\-02
|
|
...
|
|
.EE
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBflanger\fR [\fIdelay depth regen width speed shape phase interp\fR]
|
|
Apply a flanging effect to the audio.
|
|
See [3] for a detailed description of flanging.
|
|
.SP
|
|
All parameters are optional (right to left).
|
|
.ne 15
|
|
.TS
|
|
center;
|
|
cI cI cI lI
|
|
cI c c l.
|
|
\ Range Default Description
|
|
delay 0 \- 30 0 Base delay in milliseconds.
|
|
depth 0 \- 10 2 Added swept delay in milliseconds.
|
|
regen \-95 \- 95 0 T{
|
|
.na
|
|
Percentage regeneration (delayed signal feedback).
|
|
T}
|
|
width 0 \- 100 71 T{
|
|
.na
|
|
Percentage of delayed signal mixed with original.
|
|
T}
|
|
speed 0\*d1 \- 10 0\*d5 Sweeps per second (Hz).
|
|
shape \ sin Swept wave shape: \fBsine\fR\^|\^\fBtriangle\fR.
|
|
phase 0 \- 100 25 T{
|
|
.na
|
|
Swept wave percentage phase-shift for multi-channel (e.g. stereo) flange;
|
|
0 = 100 = same phase on each channel.
|
|
T}
|
|
interp \ lin T{
|
|
.na
|
|
Digital delay-line interpolation: \fBlinear\fR\^|\^\fBquadratic\fR.
|
|
T}
|
|
.TE
|
|
.DT
|
|
.TP
|
|
\fBgain \fR[\fB\-e\fR\^|\^\fB\-B\fR\^|\^\fB\-b\fR\^|\^\fB\-r\fR] [\fB\-n\fR] [\fB\-l\fR\^|\^\fB\-h\fR] [\fIgain-dB\fR]
|
|
Apply amplification or attenuation to the audio signal, or, in some
|
|
cases, to some of its channels.
|
|
Note that use of any of
|
|
.BR \-e ,
|
|
.BR \-B ,
|
|
.BR \-b ,
|
|
.BR \-r ,
|
|
or
|
|
.B \-n
|
|
requires temporary file space to store the audio to be processed, so may
|
|
be unsuitable for use with `streamed' audio.
|
|
.SP
|
|
Without other options,
|
|
.I gain-dB
|
|
is used to adjust the signal power level by the given number of dB:
|
|
positive amplifies (beware of Clipping), negative attenuates. With
|
|
other options, the
|
|
.I gain-dB
|
|
amplification or attenuation is (logically) applied after the processing due to those options.
|
|
.SP
|
|
Given the
|
|
.B \-e
|
|
option, the levels of the audio channels of a multi-channel file are `equalised', i.e.
|
|
gain is applied to all channels other than that with the highest peak
|
|
level, such that all channels attain the same peak level
|
|
(but, without also giving
|
|
.BR \-n ,
|
|
the audio is not `normalised').
|
|
.SP
|
|
The
|
|
.B \-B
|
|
(balance) option is similar to
|
|
.BR \-e ,
|
|
but with
|
|
.BR \-B,
|
|
the RMS level is used instead of the peak level.
|
|
.B \-B
|
|
might be used to correct stereo imbalance caused by an imperfect record
|
|
turntable cartridge. Note
|
|
that unlike
|
|
.BR \-e ,
|
|
.B \-B
|
|
might cause some clipping.
|
|
.SP
|
|
.B \-b
|
|
is similar to
|
|
.B \-B
|
|
but has clipping protection, i.e. if necessary to prevent clipping
|
|
whilst balancing, attenuation is applied to all channels.
|
|
Note, however, that in conjunction with
|
|
.BR \-n ,
|
|
.B \-B
|
|
and
|
|
.B \-b
|
|
are synonymous.
|
|
.SP
|
|
The
|
|
.B \-r
|
|
option is used in conjunction with a prior invocation of
|
|
.B gain
|
|
with the
|
|
.B \-h
|
|
option\*msee below for details.
|
|
.SP
|
|
The
|
|
.B \-n
|
|
option normalises the audio to 0dB FSD; it is often used in conjunction with a negative
|
|
.I gain-dB
|
|
to the effect that the audio is normalised to a given level below 0dB.
|
|
For example,
|
|
.EX
|
|
sox infile outfile gain \-n
|
|
.EE
|
|
normalises to 0dB, and
|
|
.EX
|
|
sox infile outfile gain \-n \-3
|
|
.EE
|
|
normalises to \-3dB.
|
|
.SP
|
|
The
|
|
.B \-l
|
|
option invokes a simple limiter, e.g.
|
|
.EX
|
|
sox infile outfile gain \-l 6
|
|
.EE
|
|
will apply 6dB of gain but never clip. Note that limiting more than a
|
|
few dBs more than occasionally (in a piece of audio) is not recommended
|
|
as it can cause audible distortion.
|
|
See the
|
|
.B compand
|
|
effect for a more capable limiter.
|
|
.SP
|
|
The
|
|
.B \-h
|
|
option is used to apply gain to provide head-room for subsequent
|
|
processing. For example, with
|
|
.EX
|
|
sox infile outfile gain \-h bass +6
|
|
.EE
|
|
6dB of attenuation will be applied prior to the bass boosting effect
|
|
thus ensuring that it will not clip. Of course, with bass, it is
|
|
obvious how much headroom will be needed, but with other effects (e.g.
|
|
rate, dither) it is not always as clear. Another advantage of using
|
|
\fBgain \-h\fR rather than an explicit attenuation, is that if the
|
|
headroom is not used by subsequent effects, it can be reclaimed with
|
|
\fBgain \-r\fR, for example:
|
|
.EX
|
|
sox infile outfile gain \-h bass +6 rate 44100 gain \-r
|
|
.EE
|
|
The above effects chain guarantees never to clip nor amplify;
|
|
it attenuates if necessary to prevent clipping, but by only as
|
|
much as is needed to do so.
|
|
.SP
|
|
Output formatting (dithering and bit-depth reduction) also requires
|
|
headroom (which cannot be `reclaimed'), e.g.
|
|
.EX
|
|
sox infile outfile gain \-h bass +6 rate 44100 gain \-rh dither
|
|
.EE
|
|
Here, the second
|
|
.B gain
|
|
invocation, reclaims as much of the headroom as it can from the
|
|
preceding effects, but retains as much headroom as is needed for
|
|
subsequent processing.
|
|
The SoX global option
|
|
.B \-G
|
|
can be given to automatically invoke \fBgain \-h\fR and \fBgain \-r\fR.
|
|
.SP
|
|
See also the
|
|
.B norm
|
|
and
|
|
.B vol
|
|
effects.
|
|
.TP
|
|
\fBhighpass\fR\^|\^\fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
|
|
Apply a high-pass or low-pass filter with 3dB point \fIfrequency\fR.
|
|
The filter can be either single-pole (with
|
|
.BR \-1 ),
|
|
or double-pole (the default, or with
|
|
.BR \-2 ).
|
|
.I width
|
|
applies only to double-pole filters;
|
|
the default is Q = 0\*d707 and gives a Butterworth response. The filters
|
|
roll off at 6dB per pole per octave (20dB per pole per decade). The
|
|
double-pole filters are described in detail in [1].
|
|
.SP
|
|
These effects support the \fB\-\-plot\fR global option.
|
|
.SP
|
|
See also \fBsinc\fR for filters with a steeper roll-off.
|
|
.TP
|
|
\fBhilbert\fR [\fB\-n \fItaps\fR]
|
|
Apply an odd-tap Hilbert transform filter, phase-shifting the signal
|
|
by 90 degrees.
|
|
.SP
|
|
This is used in many matrix coding schemes and for analytic signal
|
|
generation. The process is often written as a multiplication by \fIi\fR
|
|
(or \fIj\fR), the imaginary unit.
|
|
.SP
|
|
An odd-tap Hilbert transform filter has a bandpass characteristic,
|
|
attenuating the lowest and highest frequencies. Its bandwidth can be
|
|
controlled by the number of filter taps, which can be specified with
|
|
\fB\-n\fR. By default, the number of taps is chosen for a cutoff
|
|
frequency of about 75 Hz.
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBladspa\fR [\fB-l\fR\^|\^\fB-r\fR] \fImodule\fR [\fIplugin\fR] [\fIargument\fR ...]
|
|
Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API) plugin.
|
|
Despite the name, LADSPA is not Linux-specific, and a wide range of
|
|
effects is available as LADSPA plugins, such as cmt [6] (the Computer
|
|
Music Toolkit) and Steve Harris's plugin collection [7]. The first
|
|
argument is the plugin module, the second the name of the plugin (a
|
|
module can contain more than one plugin), and any other arguments are
|
|
for the control ports of the plugin. Missing arguments are supplied by
|
|
default values if possible.
|
|
.SP
|
|
Normally, the number of input ports of the plugin must match the number
|
|
of input channels, and the number of output ports determines the output
|
|
channel count. However, the
|
|
.B \-r
|
|
(replicate) option allows cloning a mono plugin to handle multi-channel
|
|
input.
|
|
.SP
|
|
Some plugins introduce latency which SoX may optionally compensate for.
|
|
The
|
|
.B \-l
|
|
(latency compensation) option automatically compensates for latency
|
|
as reported by the plugin via an output control port named "latency".
|
|
.SP
|
|
If found, the environment variable LADSPA_PATH will be used as search
|
|
path for plugins.
|
|
.TP
|
|
\fBloudness\fR [\fIgain\fR [\fIreference\fR]]
|
|
Loudness control\*msimilar to the
|
|
.B gain
|
|
effect, but provides equalisation for the human auditory system. See
|
|
http://en.wikipedia.org/wiki/Loudness for a detailed description of
|
|
loudness. The gain is adjusted by the given
|
|
.I gain
|
|
parameter (usually negative) and the signal equalised according to ISO
|
|
226 w.r.t. a reference level of 65dB, though an alternative
|
|
.I reference
|
|
level may be given if the original audio has been equalised for some
|
|
other optimal level.
|
|
A default gain of \-10dB is used if a
|
|
.I gain
|
|
value is not given.
|
|
.SP
|
|
See also the
|
|
.B gain
|
|
effect.
|
|
.TP
|
|
\fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
|
|
Apply a low-pass filter.
|
|
See the description of the \fBhighpass\fR effect for details.
|
|
.TP
|
|
\fBmcompand\fR \(dq\fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
|
|
[\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
|
|
.br
|
|
[\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]\(dq {\fIcrossover-freq\fR[\fBk\fR] \(dqattack1,...\(dq}
|
|
.SP
|
|
The multi-band compander is similar to the single-band compander but the
|
|
audio is first divided into bands using Linkwitz-Riley cross-over filters
|
|
and a separately specifiable compander run on each band. See the
|
|
\fBcompand\fR effect for the definition of its parameters. Compand
|
|
parameters are specified between double quotes and the crossover
|
|
frequency for that band is given by \fIcrossover-freq\fR; these can be
|
|
repeated to create multiple bands.
|
|
.SP
|
|
For example, the following (one long) command shows how multi-band
|
|
companding is typically used in FM radio:
|
|
.EX
|
|
.ne 8
|
|
play track1.wav gain \-3 sinc 8000\- 29 100 mcompand \\
|
|
\(dq0.005,0.1 \-47,\-40,\-34,\-34,\-17,\-33\(dq 100 \\
|
|
\(dq0.003,0.05 \-47,\-40,\-34,\-34,\-17,\-33\(dq 400 \\
|
|
\(dq0.000625,0.0125 \-47,\-40,\-34,\-34,\-15,\-33\(dq 1600 \\
|
|
\(dq0.0001,0.025 \-47,\-40,\-34,\-34,\-31,\-31,\-0,\-30\(dq 6400 \\
|
|
\(dq0,0.025 \-38,\-31,\-28,\-28,\-0,\-25\(dq \\
|
|
gain 15 highpass 22 highpass 22 sinc \-n 255 \-b 16 \-17500 \\
|
|
gain 9 lowpass \-1 17801
|
|
.EE
|
|
The audio file is played with a simulated FM radio sound (or broadcast
|
|
signal condition if the lowpass filter at the end is skipped).
|
|
Note that the pipeline is set up with US-style 75us pre-emphasis.
|
|
.SP
|
|
See also
|
|
.B compand
|
|
for a single-band companding effect.
|
|
.TP
|
|
\fBnoiseprof\fR [\fIprofile-file\fR]
|
|
Calculate a profile of the audio for use in noise reduction. See the
|
|
description of the \fBnoisered\fR effect for details.
|
|
.TP
|
|
\fBnoisered\fR [\fIprofile-file\fR [\fIamount\fR]]
|
|
Reduce noise in the audio signal by profiling and filtering. This
|
|
effect is moderately effective at removing consistent background noise
|
|
such as hiss or hum. To use it, first run SoX with the \fBnoiseprof\fR
|
|
effect on a section of audio that ideally would contain silence but in
|
|
fact contains noise\*msuch sections are typically found at the beginning
|
|
or the end of a recording. \fBnoiseprof\fR will write out a noise
|
|
profile to \fIprofile-file\fR, or to stdout if no \fIprofile-file\fR or
|
|
if `\-' is given. E.g.
|
|
.EX
|
|
sox speech.wav \-n trim 0 1.5 noiseprof speech.noise-profile
|
|
.EE
|
|
To actually remove the noise, run SoX again, this time with the \fBnoisered\fR
|
|
effect;
|
|
.B noisered
|
|
will reduce noise according to a noise profile (which was generated by
|
|
.BR noiseprof ),
|
|
from
|
|
.IR profile-file ,
|
|
or from stdin if no \fIprofile-file\fR or if `\-' is given. E.g.
|
|
.EX
|
|
sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
|
|
.EE
|
|
How much noise should be removed is specified by
|
|
.IR amount \*ma
|
|
number between 0 and 1 with a default of 0\*d5. Higher numbers will
|
|
remove more noise but present a greater likelihood of removing wanted
|
|
components of the audio signal. Before replacing an original recording
|
|
with a noise-reduced version, experiment with different
|
|
.I amount
|
|
values to find the optimal one for your audio; use headphones to check
|
|
that you are happy with the results, paying particular attention to quieter
|
|
sections of the audio.
|
|
.SP
|
|
On most systems, the two stages\*mprofiling and reduction\*mcan be combined
|
|
using a pipe, e.g.
|
|
.EX
|
|
sox noisy.wav \-n trim 0 1 noiseprof | play noisy.wav noisered
|
|
.EE
|
|
.TP
|
|
\fBnorm\fR [\fIdB-level\fR]
|
|
Normalise the audio.
|
|
.B norm
|
|
is just an alias for \fBgain \-n\fR; see the
|
|
.B gain
|
|
effect for details.
|
|
.TP
|
|
\fBoops\fR
|
|
Out Of Phase Stereo effect.
|
|
Mixes stereo to twin-mono where each mono channel contains the
|
|
difference between the left and right stereo channels.
|
|
This is sometimes known as the `karaoke' effect as it often has the effect
|
|
of removing most or all of the vocals from a recording.
|
|
It is equivalent to \fBremix 1,2i 1,2i\fR.
|
|
.TP
|
|
\fBoverdrive\fR [\fIgain\fR(20) [\fIcolour\fR(20)]]
|
|
Non linear distortion.
|
|
The \fIcolour\fR parameter controls the amount of even harmonic content
|
|
in the over-driven output.
|
|
.TP
|
|
\fBpad\fR { \fIlength\fR[\fB@\fIposition(=)\fR] }
|
|
Pad the audio with silence, at the beginning, the end, or any
|
|
specified points through the audio.
|
|
.I length
|
|
is the amount of silence to insert and
|
|
.I position
|
|
the position in the input audio stream at which to insert it.
|
|
Any number of lengths and positions may be specified, provided that
|
|
a specified position is not less that the previous one, and any time
|
|
specification may be used for them.
|
|
.I position
|
|
is optional for the first and last lengths specified and
|
|
if omitted correspond to the beginning and the end of the audio respectively.
|
|
For example,
|
|
.B pad 1\*d5 1\*d5
|
|
adds 1\*d5 seconds of silence padding at each end of the audio, whilst
|
|
.B pad 4000s@3:00
|
|
inserts 4000 samples of silence 3 minutes into the audio.
|
|
If silence is wanted only at the end of the audio, specify either the end
|
|
position or specify a zero-length pad at the start.
|
|
.SP
|
|
See also
|
|
.B delay
|
|
for an effect that can add silence at the beginning of
|
|
the audio on a channel-by-channel basis.
|
|
.TP
|
|
\fBphaser \fIgain-in gain-out delay decay speed\fR [\fB\-s\fR\^|\^\fB\-t\fR]
|
|
Add a phasing effect to the audio.
|
|
See [3] for a detailed description of phasing.
|
|
.SP
|
|
delay/decay/speed gives the delay in milliseconds
|
|
and the decay (relative to gain-in) with a modulation
|
|
speed in Hz.
|
|
The modulation is either sinusoidal (\fB\-s\fR) \*mpreferable for multiple
|
|
instruments, or triangular
|
|
(\fB\-t\fR) \*mgives single instruments a sharper phasing effect.
|
|
The decay should be less than 0\*d5 to avoid
|
|
feedback, and usually no less than 0\*d1. Gain-out is the volume of the output.
|
|
.SP
|
|
For example:
|
|
.EX
|
|
play snare.flac phaser 0.8 0.74 3 0.4 0.5 \-t
|
|
.EE
|
|
Gentler:
|
|
.EX
|
|
play snare.flac phaser 0.9 0.85 4 0.23 1.3 \-s
|
|
.EE
|
|
A popular sound:
|
|
.EX
|
|
play snare.flac phaser 0.89 0.85 1 0.24 2 \-t
|
|
.EE
|
|
More severe:
|
|
.EX
|
|
play snare.flac phaser 0.6 0.66 3 0.6 2 \-t
|
|
.EE
|
|
.TP
|
|
\fBpitch \fR[\fB\-q\fR] \fIshift\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
|
|
Change the audio pitch (but not tempo).
|
|
.SP
|
|
.I shift
|
|
gives the pitch shift as positive or negative `cents' (i.e. 100ths of a
|
|
semitone). See the
|
|
.B tempo
|
|
effect for a description of the other parameters.
|
|
.SP
|
|
See also the \fBbend\fR, \fBspeed\fR,
|
|
and
|
|
.B tempo
|
|
effects.
|
|
.TP
|
|
\fBrate\fR [\fB\-q\fR\^|\^\fB\-l\fR\^|\^\fB\-m\fR\^|\^\fB\-h\fR\^|\^\fB\-v\fR] [override-options] \fIRATE\fR[\fBk\fR]
|
|
Change the audio sampling rate (i.e. resample the audio) to any given
|
|
.I RATE
|
|
(even non-integer if this is supported by the output file format)
|
|
using a quality level defined as follows:
|
|
.ne 10
|
|
.TS
|
|
center;
|
|
cI cI2w9 cI2w6 cIw6 lIw17
|
|
cB c c c l.
|
|
\ Quality T{
|
|
.na
|
|
Band-width
|
|
T} Rej dB T{
|
|
.na
|
|
Typical Use
|
|
T}
|
|
\-q T{
|
|
.na
|
|
quick
|
|
T} n/a T{
|
|
.na
|
|
\(~=30 @ \ Fs/4
|
|
T} T{
|
|
.na
|
|
playback on ancient hardware
|
|
T}
|
|
\-l low 80% 100 T{
|
|
.na
|
|
playback on old hardware
|
|
T}
|
|
\-m medium 95% 100 T{
|
|
.na
|
|
audio playback
|
|
T}
|
|
\-h high 95% 125 T{
|
|
.na
|
|
16-bit mastering (use with dither)
|
|
T}
|
|
\-v T{
|
|
.na
|
|
very high
|
|
T} 95% 175 24-bit mastering
|
|
.TE
|
|
.DT
|
|
.SP
|
|
where
|
|
.I Band-width
|
|
is the percentage of the audio frequency band that is preserved and
|
|
.I Rej dB
|
|
is the level of noise rejection. Increasing levels of resampling
|
|
quality come at the expense of increasing amounts of time to process the
|
|
audio. If no quality option is given, the quality level used is `high'
|
|
(but see `Playing & Recording Audio' above regarding playback).
|
|
.SP
|
|
The `quick' algorithm uses cubic interpolation; all others use
|
|
band-limited interpolation. By default, all algorithms have
|
|
a `linear' phase response; for `medium', `high' and
|
|
`very high', the phase response is configurable (see below).
|
|
.SP
|
|
The
|
|
.B rate
|
|
effect is invoked automatically if SoX's \fB\-r\fR option specifies a
|
|
rate that is different to that of the input file(s). Alternatively, if
|
|
this effect is given explicitly, then SoX's
|
|
.B \-r
|
|
option need not be given. For example, the following two commands are
|
|
equivalent:
|
|
.EX
|
|
.ne 2
|
|
sox input.wav \-r 48k output.wav bass \-b 24
|
|
sox input.wav output.wav bass \-b 24 rate 48k
|
|
.EE
|
|
though the second command is more flexible as it allows
|
|
.B rate
|
|
options to be given, and allows the effects to be ordered arbitrarily.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
Warning: technically detailed discussion follows.
|
|
.SP
|
|
The simple quality selection described above provides settings that
|
|
satisfy the needs of the vast majority of resampling tasks.
|
|
Occasionally, however, it may be desirable to fine-tune the resampler's
|
|
filter response; this can be achieved using
|
|
.IR override\ options ,
|
|
as detailed in the following table:
|
|
.ne 6
|
|
.TS
|
|
center;
|
|
lB lw52.
|
|
\-M/\-I/\-L Phase response = minimum/intermediate/linear
|
|
\-s Steep filter (band-width = 99%)
|
|
\-a Allow aliasing/imaging above the pass-band
|
|
\-b\ 74\-99\*d7 Any band-width %
|
|
\-p\ 0\-100 T{
|
|
.na
|
|
Any phase response (0 = minimum, 25 = intermediate, 50 = linear, 100 = maximum)
|
|
T}
|
|
.TE
|
|
.DT
|
|
.SP
|
|
N.B. Override options cannot be used with the `quick' or `low'
|
|
quality algorithms.
|
|
.SP
|
|
All resamplers use filters that can sometimes create `echo' (a.k.a.
|
|
`ringing') artefacts with transient signals such as those that occur
|
|
with `finger snaps' or other highly percussive sounds. Such artefacts are
|
|
much more noticeable to the human ear if they occur before the transient
|
|
(`pre-echo') than if they occur after it (`post-echo'). Note that
|
|
frequency of any such artefacts is related to the smaller of the
|
|
original and new sampling rates but that if this is at least 44\*d1kHz,
|
|
then the artefacts will lie outside the range of human hearing.
|
|
.SP
|
|
A phase response setting may be used to control the distribution of any
|
|
transient echo between
|
|
`pre' and `post': with minimum phase, there is no pre-echo but the
|
|
longest post-echo; with linear phase, pre and post echo are in equal
|
|
amounts (in signal terms, but not audibility terms); the intermediate
|
|
phase setting attempts to find the best compromise by selecting a small
|
|
length (and level) of pre-echo and a medium lengthed post-echo.
|
|
.SP
|
|
Minimum, intermediate, or linear phase response is selected using the
|
|
.BR \-M ,
|
|
.BR \-I ,
|
|
or
|
|
.B \-L
|
|
option; a custom phase response can be created with the
|
|
.B \-p
|
|
option. Note that phase responses between `linear' and `maximum'
|
|
(greater than 50) are rarely useful.
|
|
.SP
|
|
A resampler's band-width setting determines how much of the frequency
|
|
content of the original signal (w.r.t. the original sample rate when
|
|
up-sampling, or the new sample rate when down-sampling) is preserved
|
|
during conversion. The term `pass-band' is used to refer to all frequencies
|
|
up to the band-width point (e.g. for 44\*d1kHz sampling rate, and a
|
|
resampling band-width of 95%, the pass-band represents frequencies from
|
|
0Hz (D.C.) to circa 21kHz). Increasing the resampler's band-width
|
|
results in a slower conversion and can increase transient echo
|
|
artefacts (and vice versa).
|
|
.SP
|
|
The
|
|
.B \-s
|
|
`steep filter' option changes resampling band-width from the default 95%
|
|
(based on the 3dB point), to 99%. The
|
|
.B \-b
|
|
option allows the band-width to be set to any value in the range
|
|
74\-99\*d7 %, but note that band-width values greater than 99% are not
|
|
recommended for normal use as they can cause excessive transient echo.
|
|
.SP
|
|
If the
|
|
.B \-a
|
|
option is given, then aliasing/imaging above the pass-band is allowed. For
|
|
example, with 44\*d1kHz sampling rate, and a
|
|
resampling band-width of 95%, this means that frequency content above
|
|
21kHz can be distorted; however, since this is above the pass-band (i.e.
|
|
above the highest frequency of interest/audibility), this may not be a
|
|
problem. The benefits of allowing aliasing/imaging are reduced processing time,
|
|
and reduced (by almost half) transient echo artefacts.
|
|
Note that if this option is given, then
|
|
the minimum band-width allowable with
|
|
.B \-b
|
|
increases to 85%.
|
|
.SP
|
|
Examples:
|
|
.EX
|
|
sox input.wav \-b 16 output.wav rate \-s \-a 44100 dither \-s
|
|
.EE
|
|
default (high) quality resampling; overrides: steep filter, allow
|
|
aliasing; to 44\*d1kHz sample rate; noise-shaped dither to 16-bit WAV
|
|
file.
|
|
.EX
|
|
sox input.wav \-b 24 output.aiff rate \-v \-I \-b 90 48k
|
|
.EE
|
|
very high quality resampling; overrides: intermediate phase, band-width 90%;
|
|
to 48k sample rate; store output to 24-bit AIFF file.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
The
|
|
.B pitch
|
|
and
|
|
.B speed
|
|
effects use the
|
|
.B rate
|
|
effect at their core.
|
|
.TP
|
|
\fBremix\fR [\fB\-a\fR\^|\^\fB\-m\fR\^|\^\fB\-p\fR] <\fIout-spec\fR>
|
|
\fIout-spec\fR = \fIin-spec\fR{\fB,\fIin-spec\fR} | \fB0\fR
|
|
.br
|
|
\fIin-spec\fR = [\fIin-chan\fR]\^[\fB\-\fR[\fIin-chan2\fR]]\^[\fIvol-spec\fR]
|
|
.br
|
|
\fIvol-spec\fR = \fBp\fR\^|\^\fBi\fR\^|\^\fBv\^\fR[\fIvolume\fR]
|
|
.br
|
|
.SP
|
|
Select and mix input audio channels into output audio channels. Each output
|
|
channel is specified, in turn, by a given \fIout-spec\fR: a list of
|
|
contributing input channels and volume specifications.
|
|
.SP
|
|
Note that this effect operates on the audio
|
|
.I channels
|
|
within the SoX effects processing chain; it should not be confused with the
|
|
.B \-m
|
|
global option (where multiple
|
|
.I files
|
|
are mix-combined before entering the effects chain).
|
|
.SP
|
|
An
|
|
.I out-spec
|
|
contains comma-separated input channel-numbers and hyphen-delimited
|
|
channel-number ranges; alternatively,
|
|
.B 0
|
|
may be given to create a silent output channel. For example,
|
|
.EX
|
|
sox input.wav output.wav remix 6 7 8 0
|
|
.EE
|
|
creates an output file with four channels, where channels 1, 2, and 3 are
|
|
copies of channels 6, 7, and 8 in the input file, and channel 4 is silent.
|
|
Whereas
|
|
.EX
|
|
sox input.wav output.wav remix 1\-3,7 3
|
|
.EE
|
|
creates a (somewhat bizarre) stereo output file where the left channel
|
|
is a mix-down of input channels 1, 2, 3, and 7, and the right channel is
|
|
a copy of input channel 3.
|
|
.SP
|
|
Where a range of channels is specified, the channel numbers to the left and
|
|
right of the hyphen are optional and default to 1 and to the number of input
|
|
channels respectively. Thus
|
|
.EX
|
|
sox input.wav output.wav remix \-
|
|
.EE
|
|
performs a mix-down of all input channels to mono.
|
|
.SP
|
|
By default, where an output channel is mixed from multiple (n) input
|
|
channels, each input channel will be scaled by a factor of \(S1/\s-2n\s+2.
|
|
Custom mixing volumes can be set by following a given input channel or range
|
|
of input channels with a \fIvol-spec\fR (volume specification).
|
|
This is one of the letters \fBp\fR, \fBi\fR, or \fBv\fR,
|
|
followed by a volume number, the meaning of which depends on the given
|
|
letter and is defined as follows:
|
|
.TS
|
|
center;
|
|
lI lI lI
|
|
c l l.
|
|
Letter Volume number Notes
|
|
p power adjust in dB 0 = no change
|
|
i power adjust in dB T{
|
|
.na
|
|
As `p', but invert the audio
|
|
T}
|
|
v voltage multiplier T{
|
|
.na
|
|
1 = no change, 0\*d5 \(~= 6dB attenuation, 2 \(~= 6dB gain, \-1 = invert
|
|
T}
|
|
.TE
|
|
.DT
|
|
.SP
|
|
If an
|
|
.I out-spec
|
|
includes at least one
|
|
.I vol-spec
|
|
then, by default, \(S1/\s-2n\s+2 scaling is not applied to any other channels in the
|
|
same out-spec (though may be in other out-specs).
|
|
The \-a (automatic)
|
|
option however, can be given to retain the automatic scaling in this
|
|
case. For example,
|
|
.EX
|
|
sox input.wav output.wav remix 1,2 3,4v0.8
|
|
.EE
|
|
results in channel level multipliers of 0\*d5,0\*d5 1,0\*d8, whereas
|
|
.EX
|
|
sox input.wav output.wav remix \-a 1,2 3,4v0.8
|
|
.EE
|
|
results in channel level multipliers of 0\*d5,0\*d5 0\*d5,0\*d8.
|
|
.SP
|
|
The \-m (manual) option disables all automatic volume adjustments, so
|
|
.EX
|
|
sox input.wav output.wav remix \-m 1,2 3,4v0.8
|
|
.EE
|
|
results in channel level multipliers of 1,1 1,0\*d8.
|
|
.SP
|
|
The volume number is optional and omitting it corresponds to no volume
|
|
change; however, the only case in which this is useful is in conjunction
|
|
with
|
|
.BR i .
|
|
For example, if
|
|
.I input.wav
|
|
is stereo, then
|
|
.EX
|
|
sox input.wav output.wav remix 1,2i
|
|
.EE
|
|
is a mono equivalent of the
|
|
.B oops
|
|
effect.
|
|
.SP
|
|
If the \fB\-p\fR option is given, then any automatic \(S1/\s-2n\s+2 scaling
|
|
is replaced by \(S1/\s-2\(srn\s+2 (`power') scaling; this gives a louder mix
|
|
but one that might occasionally clip.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
One use of the
|
|
.B remix
|
|
effect is to split an audio file into a set of files, each containing
|
|
one of the constituent channels (in order to perform subsequent
|
|
processing on individual audio channels). Where more than a few
|
|
channels are involved, a script such as the following (Bourne shell
|
|
script) is useful:
|
|
.EX
|
|
#!/bin/sh
|
|
chans=\`soxi \-c "$1"\`
|
|
while [ $chans \-ge 1 ]; do
|
|
chans0=\`printf %02i $chans\` # 2 digits hence up to 99 chans
|
|
out=\`echo "$1"|sed "s/\\(.*\\)\\.\\(.*\\)/\\1\-$chans0.\\2/"\`
|
|
sox "$1" "$out" remix $chans
|
|
chans=\`expr $chans \- 1\`
|
|
done
|
|
.EE
|
|
If a file
|
|
.I input.wav
|
|
containing six audio channels were given, the script would produce six
|
|
output files:
|
|
.IR input-01.wav ,
|
|
\fIinput-02.wav\fR, ...,
|
|
.IR input-06.wav .
|
|
.SP
|
|
See also the \fBswap\fR effect.
|
|
.TP
|
|
\fBrepeat\fR [\fIcount\fR(1)|\fB\-\fR]
|
|
Repeat the entire audio \fIcount\fR times, or once if \fIcount\fR is not given.
|
|
The special value \fB\-\fR requests infinite repetition.
|
|
Requires temporary file space to store the audio to be repeated.
|
|
Note that repeating once yields two copies: the original audio and the
|
|
repeated audio.
|
|
.TP
|
|
\fBreverb\fR [\fB\-w\fR|\fB\-\-wet-only\fR] [\fIreverberance\fR (50%) [\fIHF-damping\fR (50%)
|
|
[\fIroom-scale\fR (100%) [\fIstereo-depth\fR (100%)
|
|
.br
|
|
[\fIpre-delay\fR (0ms) [\fIwet-gain\fR (0dB)]]]]]]
|
|
.SP
|
|
Add reverberation to the audio using the `freeverb' algorithm. A
|
|
reverberation effect is sometimes desirable for concert halls that are too
|
|
small or contain so many people that the hall's natural reverberance is
|
|
diminished. Applying a small amount of stereo reverb to a (dry) mono signal
|
|
will usually make it sound more natural. See [3] for a detailed description
|
|
of reverberation.
|
|
.SP
|
|
Note that this effect
|
|
increases both the volume and the length of the audio, so to prevent clipping
|
|
in these domains, a typical invocation might be:
|
|
.EX
|
|
play dry.wav gain \-3 pad 0 3 reverb
|
|
.EE
|
|
The
|
|
.B \-w
|
|
option can be given to select only the `wet' signal, thus allowing it to be
|
|
processed further, independently of the `dry' signal. E.g.
|
|
.EX
|
|
play \-m voice.wav "|sox voice.wav \-p reverse reverb \-w reverse"
|
|
.EE
|
|
for a reverse reverb effect.
|
|
.TP
|
|
\fBreverse\fR
|
|
Reverse the audio completely.
|
|
Requires temporary file space to store the audio to be reversed.
|
|
.TP
|
|
\fBriaa\fR
|
|
Apply RIAA vinyl playback equalisation.
|
|
The sampling rate must be one of: 44\*d1, 48, 88\*d2, 96 kHz.
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBsilence \fR[\fB\-l\fR] \fIabove-periods\fR [\fIduration threshold\fR[\fBd\fR\^|\^\fB%\fR]
|
|
[\fIbelow-periods duration threshold\fR[\fBd\fR\^|\^\fB%\fR]]
|
|
.SP
|
|
Removes silence from the beginning, middle, or end of the audio.
|
|
`Silence' is determined by a specified threshold.
|
|
.SP
|
|
The \fIabove-periods\fR value is used to indicate if audio should be
|
|
trimmed at the beginning of the audio. A value of zero indicates no
|
|
silence should be trimmed from the beginning. When specifying a
|
|
non-zero \fIabove-periods\fR, it trims audio up until it finds
|
|
non-silence. Normally, when trimming silence from beginning of audio
|
|
the \fIabove-periods\fR will be 1 but it can be increased to higher
|
|
values to trim all audio up to a specific count of non-silence
|
|
periods. For example, if you had an audio file with two songs that
|
|
each contained 2 seconds of silence before the song, you could specify
|
|
an \fIabove-period\fR of 2 to strip out both silence periods and the
|
|
first song.
|
|
.SP
|
|
When \fIabove-periods\fR is non-zero, you must also specify a
|
|
\fIduration\fR and \fIthreshold\fR. \fIduration\fR indicates the
|
|
amount of time that non-silence must be detected before it stops
|
|
trimming audio. By increasing the duration, burst of noise can be
|
|
treated as silence and trimmed off.
|
|
.SP
|
|
\fIthreshold\fR is used to indicate what sample value you should treat as
|
|
silence. For digital audio, a value of 0 may be fine but for audio
|
|
recorded from analog, you may wish to increase the value to account
|
|
for background noise.
|
|
.SP
|
|
When optionally trimming silence from the end of the audio, you specify
|
|
a \fIbelow-periods\fR count. In this case, \fIbelow-period\fR means
|
|
to remove all audio after silence is detected.
|
|
Normally, this will be a value 1 of but it can
|
|
be increased to skip over periods of silence that are wanted. For example,
|
|
if you have a song with 2 seconds of silence in the middle and 2 second
|
|
at the end, you could set below-period to a value of 2 to skip over the
|
|
silence in the middle of the audio.
|
|
.SP
|
|
For \fIbelow-periods\fR, \fIduration\fR specifies a period of silence
|
|
that must exist before audio is not copied any more. By specifying
|
|
a higher duration, silence that is wanted can be left in the audio.
|
|
For example, if you have a song with an expected 1 second of silence
|
|
in the middle and 2 seconds of silence at the end, a duration of 2
|
|
seconds could be used to skip over the middle silence.
|
|
.SP
|
|
Unfortunately, you must know the length of the silence at the
|
|
end of your audio file to trim off silence reliably. A workaround is
|
|
to use the \fBsilence\fR effect in combination with the \fBreverse\fR effect.
|
|
By first reversing the audio, you can use the \fIabove-periods\fR
|
|
to reliably trim all audio from what looks like the front of the file.
|
|
Then reverse the file again to get back to normal.
|
|
.SP
|
|
To remove silence from the middle of a file, specify a
|
|
\fIbelow-periods\fR that is negative. This value is then
|
|
treated as a positive value and is also used to indicate that the
|
|
effect should restart processing as specified by the
|
|
\fIabove-periods\fR, making it suitable for removing periods of
|
|
silence in the middle of the audio.
|
|
.SP
|
|
The option
|
|
.B \-l
|
|
indicates that \fIbelow-periods\fR \fIduration\fR length of audio
|
|
should be left intact at the beginning of each period of silence.
|
|
For example, if you want to remove long pauses between words
|
|
but do not want to remove the pauses completely.
|
|
.SP
|
|
\fIduration\fR is a time specification with the peculiarity that a bare
|
|
number is interpreted as a sample count, not as a number of seconds.
|
|
For specifying seconds, either use the \fBt\fR suffix (as in `2t') or
|
|
specify minutes, too (as in `0:02').
|
|
.SP
|
|
\fIthreshold\fR numbers may be suffixed with
|
|
.B d
|
|
to indicate the value is in decibels, or
|
|
.B %
|
|
to indicate a percentage of maximum value of the sample value
|
|
(\fB0%\fR specifies pure digital silence).
|
|
.SP
|
|
The following example shows how this effect can be used to start a recording
|
|
that does not contain the delay at the start which usually occurs between
|
|
`pressing the record button' and the start of the performance:
|
|
.EX
|
|
rec \fIparameters filename other-effects\fR silence 1 5 2%
|
|
.EE
|
|
.na
|
|
.TP
|
|
\fBsinc\fR [\fB\-a\fI att\fR\^|\^\fB\-b\fI beta\fR] [\fB\-p\fI phase\fR\^|\^\fB\-M\fR\^|\^\fB\-I\fR\^|\^\fB\-L\fR] \:[\fB\-t\fI tbw\fR\^|\^\fB\-n\fI taps\fR] [\fIfreqHP\fR]\:[\fB\-\fIfreqLP\fR [\fB\-t\fR tbw\^|\^\fB\-n\fR taps]]
|
|
.ad
|
|
Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter
|
|
to the signal.
|
|
The \fIfreqHP\fR and \fIfreqLP\fR parameters give the frequencies of the
|
|
6dB points of a high-pass and low-pass filter that may be invoked
|
|
individually, or together. If both are
|
|
given, then \fIfreqHP\fR less than \fIfreqLP\fR creates a band-pass filter,
|
|
\fIfreqHP\fR greater than \fIfreqLP\fR creates a band-reject filter.
|
|
For example, the invocations
|
|
.EX
|
|
sinc 3k
|
|
sinc -4k
|
|
sinc 3k-4k
|
|
sinc 4k-3k
|
|
.EE
|
|
create a high-pass, low-pass, band-pass, and band-reject filter
|
|
respectively.
|
|
.SP
|
|
The default stop-band attenuation of 120dB can be overridden with
|
|
\fB\-a\fR; alternatively, the kaiser-window `beta' parameter can be
|
|
given directly with \fB\-b\fR.
|
|
.SP
|
|
The default transition band-width of 5% of the total band can be
|
|
overridden with \fB\-t\fR (and \fItbw\fR in Hertz); alternatively, the
|
|
number of filter taps can be given directly with \fB\-n\fR.
|
|
.SP
|
|
If both \fIfreqHP\fR and \fIfreqLP\fR are given, then a \fB\-t\fR or
|
|
\fB\-n\fR option given to the left of the frequencies applies to both
|
|
frequencies; one of these options given to the right of the frequencies
|
|
applies only to \fIfreqLP\fR.
|
|
.SP
|
|
The
|
|
.BR \-p ,
|
|
.BR \-M ,
|
|
.BR \-I ,
|
|
and
|
|
.B \-L
|
|
options control the filter's phase response; see the \fBrate\fR effect
|
|
for details.
|
|
.SP
|
|
This effect supports the \fB\-\-plot\fR global option.
|
|
.TP
|
|
\fBspectrogram \fR[\fIoptions\fR]
|
|
Create a spectrogram of the audio; the audio is passed unmodified
|
|
through the SoX processing chain. This effect is optional\*mtype
|
|
\fBsox \-\-help\fR and check the list of supported effects to see if
|
|
it has been included.
|
|
.SP
|
|
The spectrogram is rendered in a Portable Network Graphic (PNG) file,
|
|
and shows time in the X-axis, frequency in the Y-axis, and audio
|
|
signal magnitude in the Z-axis. Z-axis values are represented by the
|
|
colour (or optionally the intensity) of the pixels in the X-Y plane.
|
|
If the audio signal contains multiple channels then these are shown
|
|
from top to bottom starting from channel 1 (which is the left channel
|
|
for stereo audio).
|
|
.SP
|
|
For example, if `my.wav' is a stereo file, then with
|
|
.EX
|
|
sox my.wav \-n spectrogram
|
|
.EE
|
|
a spectrogram of the entire file will be created in the file
|
|
`spectrogram.png'. More often though, analysis of a smaller portion
|
|
of the audio is required; e.g. with
|
|
.EX
|
|
sox my.wav \-n remix 2 trim 20 30 spectrogram
|
|
.EE
|
|
the spectrogram shows information only from the second (right)
|
|
channel, and of thirty seconds of audio starting from twenty seconds
|
|
in. To analyse a small portion of the frequency domain, the
|
|
.B rate
|
|
effect may be used, e.g.
|
|
.EX
|
|
sox my.wav \-n rate 6k spectrogram
|
|
.EE
|
|
allows detailed analysis of frequencies up to 3kHz (half the sampling
|
|
rate) i.e. where the human auditory system is most sensitive.
|
|
With
|
|
.EX
|
|
sox my.wav \-n trim 0 10 spectrogram \-x 600 \-y 200 \-z 100
|
|
.EE
|
|
the given options control the size of the spectrogram's X, Y & Z axes
|
|
(in this case, the spectrogram area of the produced image will be 600
|
|
by 200 pixels in size and the Z-axis range will be 100 dB). Note that
|
|
the produced image includes axes legends etc. and so will be a little
|
|
larger than the specified spectrogram size. In this example:
|
|
.EX
|
|
sox \-n \-n synth 6 tri 10k:14k spectrogram \-z 100 \-w kaiser
|
|
.EE
|
|
an analysis `window' with high dynamic range is selected to best
|
|
display the spectrogram of a swept triangular wave. For a smilar
|
|
example, append the following to the `chime' command in the
|
|
description of the
|
|
.B delay
|
|
effect (above):
|
|
.EX
|
|
rate 2k spectrogram \-X 200 \-Z \-10 \-w kaiser
|
|
.EE
|
|
Options are also available to control the appearance (colour-set,
|
|
brightness, contrast, etc.) and filename of the spectrogram; e.g. with
|
|
.EX
|
|
sox my.wav \-n spectrogram \-m \-l \-o print.png
|
|
.EE
|
|
a spectrogram is created suitable for printing on a `black and white'
|
|
printer.
|
|
.SP
|
|
.I Options:
|
|
.RS
|
|
.IP \fB\-x\ \fInum\fR
|
|
Change the (maximum) width (X-axis) of the spectrogram from its default
|
|
value of 800 pixels to a given number between 100 and 200000.
|
|
See also \fB\-X\fR and \fB\-d\fR.
|
|
.IP \fB\-X\ \fInum\fR
|
|
X-axis pixels/second; the default is auto-calculated to fit the given
|
|
or known audio duration to the X-axis size, or 100 otherwise. If
|
|
given in conjunction with \fB\-d\fR, this option affects the width of
|
|
the spectrogram; otherwise, it affects the duration of the
|
|
spectrogram.
|
|
.I num
|
|
can be from 1 (low time resolution) to 5000 (high time resolution)
|
|
and need not be an integer. SoX
|
|
may make a slight adjustment to the given number for processing
|
|
quantisation reasons; if so, SoX will report the actual number used
|
|
(viewable when the SoX global option
|
|
.B \-V
|
|
is in effect).
|
|
See also \fB\-x\fR and \fB\-d\fR.
|
|
.IP \fB\-y\ \fInum\fR
|
|
Sets the Y-axis size in pixels (per channel); this is the number of
|
|
frequency `bins' used in the Fourier analysis that produces the
|
|
spectrogram. N.B. it can be slow to produce the spectrogram if this
|
|
number is not one more than a power of two (e.g. 129). By default the
|
|
Y-axis size is chosen automatically (depending on the number of
|
|
channels). See
|
|
.B \-Y
|
|
for alternative way of setting spectrogram height.
|
|
.IP \fB\-Y\ \fInum\fR
|
|
Sets the target total height of the spectrogram(s). The default value
|
|
is 550 pixels. Using this option (and by default), SoX will choose a
|
|
height for individual spectrogram channels that is one more than a
|
|
power of two, so the actual total height may fall short of the given
|
|
number. However, there is also a minimum height per channel so if
|
|
there are many channels, the number may be exceeded.
|
|
See
|
|
.B \-y
|
|
for alternative way of setting spectrogram height.
|
|
.IP \fB\-z\ \fInum\fR
|
|
Z-axis (colour) range in dB, default 120. This sets the dynamic-range
|
|
of the spectrogram to be \-\fInum\fR\ dBFS to 0\ dBFS.
|
|
.I Num
|
|
may range from 20 to 180. Decreasing dynamic-range effectively
|
|
increases the `contrast' of the spectrogram display, and vice versa.
|
|
.IP \fB\-Z\ \fInum\fR
|
|
Sets the upper limit of the Z-axis in dBFS.
|
|
A negative
|
|
.I num
|
|
effectively increases the `brightness' of the spectrogram display,
|
|
and vice versa.
|
|
.IP \fB\-q\ \fInum\fR
|
|
Sets the Z-axis quantisation, i.e. the number of different colours (or
|
|
intensities) in which to render Z-axis
|
|
values. A small number (e.g. 4) will give a `poster'-like effect making
|
|
it easier to discern magnitude bands of similar level. Small numbers
|
|
also usually
|
|
result in small PNG files. The number given specifies the number of
|
|
colours to use inside the Z-axis range; two colours are reserved to
|
|
represent out-of-range values.
|
|
.IP \fB\-w\ \fIname\fR
|
|
Window: Hann (default), Hamming, Bartlett, Rectangular, Kaiser or Dolph. The
|
|
spectrogram is produced using the Discrete Fourier Transform (DFT)
|
|
algorithm. A significant parameter to this algorithm is the choice of
|
|
`window function'. By default, SoX uses the Hann window which has good
|
|
all-round frequency-resolution and dynamic-range properties. For better
|
|
frequency resolution (but lower dynamic-range), select a Hamming window;
|
|
for higher dynamic-range (but poorer frequency-resolution), select a
|
|
Dolph window. Kaiser, Bartlett and Rectangular windows are also available.
|
|
.IP \fB\-W\ \fInum\fR
|
|
Window adjustment parameter. This can be used to make small
|
|
adjustments to the Kaiser or Dolph window shape. A positive number (up to
|
|
ten) increases its dynamic range, a negative number decreases it.
|
|
.IP \fB\-s\fR
|
|
Allow slack overlapping of DFT windows.
|
|
This can, in some cases, increase image sharpness and give greater adherence
|
|
to the
|
|
.B \-x
|
|
value, but at the expense of a little spectral loss.
|
|
.IP \fB\-m\fR
|
|
Creates a monochrome spectrogram (the default is colour).
|
|
.IP \fB\-h\fR
|
|
Selects a high-colour palette\*mless visually pleasing than the default
|
|
colour palette, but it may make it easier to differentiate different levels.
|
|
If this option is used in conjunction with
|
|
.BR \-m ,
|
|
the result will be a hybrid monochrome/colour palette.
|
|
.IP \fB\-p\ \fInum\fR
|
|
Permute the colours in a colour or hybrid palette.
|
|
The
|
|
.I num
|
|
parameter, from 1 (the default) to 6, selects the permutation.
|
|
.IP \fB\-l\fR
|
|
Creates a `printer friendly' spectrogram with a light background (the
|
|
default has a dark background).
|
|
.IP \fB\-a\fR
|
|
Suppress the display of the axis lines. This is sometimes useful in
|
|
helping to discern artefacts at the spectrogram edges.
|
|
.IP \fB\-r\fR
|
|
Raw spectrogram: suppress the display of axes and legends.
|
|
.IP \fB\-A\fR
|
|
Selects an alternative, fixed colour-set. This is provided only for
|
|
compatibility with spectrograms produced by another package. It should
|
|
not normally be used as it has some problems, not least, a lack of
|
|
differentiation at the bottom end which results in masking of low-level
|
|
artefacts.
|
|
.IP \fB\-t\ \fItext\fR
|
|
Set the image title\*mtext to display above the spectrogram.
|
|
.IP \fB\-c\ \fItext\fR
|
|
Set (or clear) the image comment\*mtext to display below and to the
|
|
left of the spectrogram.
|
|
.IP \fB\-o\ \fIfile\fR
|
|
Name of the spectrogram output PNG file, default `spectrogram.png'.
|
|
If `-' is given, the spectrogram will be sent to standard output
|
|
(stdout).
|
|
.RE
|
|
.TP
|
|
\
|
|
.I Advanced Options:
|
|
.br
|
|
In order to process a smaller section of audio without affecting other
|
|
effects or the output signal (unlike when the
|
|
.B trim
|
|
effect is used), the following options may be used.
|
|
.RS
|
|
.IP \fB\-d\ \fIduration\fR
|
|
This option sets the X-axis resolution such that audio with the given
|
|
.I duration
|
|
(a time specification) fits the selected (or default) X-axis width. For
|
|
example,
|
|
.EX
|
|
sox input.mp3 output.wav \-n spectrogram \-d 1:00 stats
|
|
.EE
|
|
creates a spectrogram showing the first minute of the audio, whilst
|
|
.EE
|
|
the
|
|
.B stats
|
|
effect is applied to the entire audio signal.
|
|
.SP
|
|
See also
|
|
.B \-X
|
|
for an alternative way of setting the X-axis resolution.
|
|
.IP \fB\-S\ \fIposition(=)\fR
|
|
Start the spectrogram at the given point in the audio stream. For
|
|
example
|
|
.EX
|
|
sox input.aiff output.wav spectrogram \-S 1:00
|
|
.EE
|
|
creates a spectrogram showing all but the first minute of the audio
|
|
(the output file, however, receives the entire audio stream).
|
|
.RE
|
|
.TP
|
|
\
|
|
For the ability to perform off-line processing of spectral data, see the
|
|
.B stat
|
|
effect.
|
|
.TP
|
|
\fBspeed \fIfactor\fR[\fBc\fR]
|
|
Adjust the audio speed (pitch and tempo together). \fIfactor\fR
|
|
is either the ratio of the new speed to the old speed: greater
|
|
than 1 speeds up, less than 1 slows down, or, if appended with the
|
|
letter
|
|
`c', the number of cents (i.e. 100ths of a semitone) by
|
|
which the pitch (and tempo) should be adjusted: greater than 0
|
|
increases, less than 0 decreases.
|
|
.SP
|
|
Technically, the speed effect only changes the sample rate information,
|
|
leaving the samples themselves untouched. The \fBrate\fR effect is invoked
|
|
automatically to resample to the output sample rate, using its default
|
|
quality/speed. For higher quality or higher speed
|
|
resampling, in addition to the \fBspeed\fR effect, specify
|
|
the \fBrate\fR effect with the desired quality option.
|
|
.SP
|
|
See also the \fBbend\fR, \fBpitch\fR,
|
|
and
|
|
.B tempo
|
|
effects.
|
|
.TP
|
|
\fBsplice \fR [\fB\-h\fR\^|\^\fB\-t\fR\^|\^\fB\-q\fR] { \fIposition(=)\fR[\fB,\fIexcess\fR[\fB,\fIleeway\fR]] }
|
|
Splice together audio sections. This effect provides two things over
|
|
simple audio concatenation: a (usually short) cross-fade is applied at
|
|
the join, and a wave similarity comparison is made to help determine the
|
|
best place at which to make the join.
|
|
.SP
|
|
One of the options
|
|
.BR \-h ,
|
|
.BR \-t ,
|
|
or
|
|
.B \-q
|
|
may be given to select the fade envelope as half-cosine wave (the default),
|
|
triangular (a.k.a. linear), or quarter-cosine wave respectively.
|
|
.TS
|
|
center;
|
|
cI lI lI lI
|
|
cB l l l.
|
|
Type Audio Fade level Transitions
|
|
t correlated constant gain abrupt
|
|
h correlated constant gain smooth
|
|
q uncorrelated constant power smooth
|
|
.TE
|
|
.DT
|
|
.SP
|
|
To perform a splice, first use the
|
|
.B trim
|
|
effect to select the audio sections to be joined together. As when
|
|
performing a tape splice, the end of the section to be spliced onto
|
|
should be trimmed with a small
|
|
.I excess
|
|
(default 0\*d005 seconds) of audio after the ideal joining point. The
|
|
beginning of the audio section to splice on should be trimmed with the
|
|
same
|
|
.IR excess
|
|
(before the ideal joining point), plus an additional
|
|
.I leeway
|
|
(default 0\*d005 seconds). Any time specification may be used for these
|
|
parameters. SoX should then be invoked with the two
|
|
audio sections as input files and the
|
|
.B splice
|
|
effect given with the position at which to perform the splice\*mthis is
|
|
length of the first audio section (including the excess).
|
|
.SP
|
|
The following diagram uses the tape analogy to illustrate the splice
|
|
operation. The effect simulates the diagonal cuts and joins the two pieces:
|
|
.EX
|
|
|
|
length1 excess
|
|
-----------><--->
|
|
_________ : : _________________
|
|
\\ : : :\\ `
|
|
\\ : : : \\ `
|
|
\\: : : \\ `
|
|
* : : * - - *
|
|
\\ : : :\\ `
|
|
\\ : : : \\ `
|
|
_______________\\: : : \\_____`____
|
|
: : : :
|
|
<---> <----->
|
|
excess leeway
|
|
|
|
.EE
|
|
where * indicates the joining points.
|
|
.SP
|
|
For example, a long song begins with two verses which start (as
|
|
determined e.g. by using the
|
|
.B play
|
|
command with the
|
|
.B trim
|
|
(\fIstart\fR) effect) at times 0:30\*d125 and 1:03\*d432.
|
|
The following commands cut out the first verse:
|
|
.EX
|
|
sox too-long.wav part1.wav trim 0 30.130
|
|
.EE
|
|
(5 ms excess, after the first verse starts)
|
|
.EX
|
|
sox too-long.wav part2.wav trim 1:03.422
|
|
.EE
|
|
(5 ms excess plus 5 ms leeway, before the second verse starts)
|
|
.EX
|
|
sox part1.wav part2.wav just-right.wav splice 30.130
|
|
.EE
|
|
For another example, the SoX command
|
|
.EX
|
|
play "|sox \-n \-p synth 1 sin %1" "|sox \-n \-p synth 1 sin %3"
|
|
.EE
|
|
generates and plays two notes, but there is a nasty click at the
|
|
transition; the click can be removed by splicing instead of
|
|
concatenating the audio, i.e. by appending \fBsplice 1\fR to the
|
|
command. (Clicks at the beginning and end of the audio can be removed by
|
|
\fIpreceding\fR the splice effect with \fBfade q .01 2 .01\fR).
|
|
.SP
|
|
Provided your arithmetic is good enough, multiple splices can be
|
|
performed with a single
|
|
.B splice
|
|
invocation. For example:
|
|
.EX
|
|
#!/bin/sh
|
|
# Audio Copy and Paste Over
|
|
# acpo infile copy-start copy-stop paste-over-start outfile
|
|
# No chained time specifications allowed for the parameters
|
|
# (i.e. such that contain +/\-).
|
|
e=0.005 # Using default excess
|
|
l=$e # and leeway.
|
|
sox "$1" piece.wav trim $2\-$e\-$l =$3+$e
|
|
sox "$1" part1.wav trim 0 $4+$e
|
|
sox "$1" part2.wav trim $4+$3\-$2\-$e\-$l
|
|
sox part1.wav piece.wav part2.wav "$5" \\
|
|
splice $4+$e +$3\-$2+$e+$l+$e
|
|
.EE
|
|
In the above Bourne shell script,
|
|
two splices are used to `copy and paste' audio.
|
|
.TS
|
|
center;
|
|
c8 c8 c.
|
|
* * *
|
|
.TE
|
|
.DT
|
|
.SP
|
|
It is also possible to use this effect to perform general cross-fades,
|
|
e.g. to join two songs. In this case,
|
|
.I excess
|
|
would typically be an number of seconds, the
|
|
.B \-q
|
|
option would typically be given (to select an `equal power' cross-fade), and
|
|
.I leeway
|
|
should be zero (which is the default if
|
|
.B \-q
|
|
is given). For example, if f1.wav and f2.wav are audio files
|
|
to be cross-faded, then
|
|
.EX
|
|
sox f1.wav f2.wav out.wav splice \-q $(soxi \-D f1.wav),3
|
|
.EE
|
|
cross-fades the files where the point of equal loudness is 3 seconds
|
|
before the end of f1.wav, i.e. the total length of the cross-fade is
|
|
2 \(mu 3 = 6 seconds (Note: the $(...) notation is POSIX shell).
|
|
.TP
|
|
\fBstat\fR [\fB\-s \fIscale\fR] [\fB\-rms\fR] [\fB\-freq\fR] [\fB\-v\fR] [\fB\-d\fR]
|
|
Display time and frequency domain statistical information about the audio.
|
|
Audio is passed unmodified through the SoX processing chain.
|
|
.SP
|
|
The information is output to the `standard error' (stderr) stream and is
|
|
calculated, where
|
|
.I n
|
|
is the duration of the audio in samples,
|
|
.I c
|
|
is the number of audio channels,
|
|
.I r
|
|
is the audio sample rate, and
|
|
.I x\s-2\dk\u\s0
|
|
represents the PCM value (in the range \-1 to +1 by default) of each successive
|
|
sample in the audio,
|
|
as follows:
|
|
.TS
|
|
center;
|
|
lI l l.
|
|
Samples read \fIn\fR\^\(mu\^\fIc\fR \
|
|
Length (seconds) \fIn\fR\^\(di\^\fIr\fR
|
|
Scaled by \ See \-s below.
|
|
Maximum amplitude max(\fIx\s-2\dk\u\s0\fR) T{
|
|
The maximum sample value in the audio; usually this will be a positive number.
|
|
T}
|
|
Minimum amplitude min(\fIx\s-2\dk\u\s0\fR) T{
|
|
The minimum sample value in the audio; usually this will be a negative number.
|
|
T}
|
|
Midline amplitude \(12\^min(\fIx\s-2\dk\u\s0\fR)\^+\^\(12\^max(\fIx\s-2\dk\u\s0\fR)
|
|
Mean norm \(S1/\s-2n\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\(br\^ T{
|
|
The average of the absolute value of each sample in the audio.
|
|
T}
|
|
Mean amplitude \(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR T{
|
|
The average of each sample in the audio. If this figure is non-zero, then it indicates the
|
|
presence of a D.C. offset (which could be removed using the
|
|
.B dcshift
|
|
effect).
|
|
T}
|
|
RMS amplitude \(sr(\(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR\(S2) T{
|
|
The level of a D.C. signal that would have the same power
|
|
as the audio's average power.
|
|
T}
|
|
Maximum delta max(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
|
|
Minimum delta min(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
|
|
Mean delta \(S1/\s-2n\-1\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^
|
|
RMS delta \(sr(\(S1/\s-2n\-1\s+2\^\(*S\^(\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR)\(S2)
|
|
Rough frequency \ In Hz.
|
|
Volume Adjustment \ T{
|
|
The parameter to the
|
|
.B vol
|
|
effect which would make the audio as loud as possible without clipping.
|
|
Note: See the discussion on
|
|
.B Clipping
|
|
above for reasons why it is rarely a good idea actually to do this.
|
|
T}
|
|
.TE
|
|
.DT
|
|
.SP
|
|
Note that the delta measurements are not applicable for multi-channel audio.
|
|
.SP
|
|
The
|
|
.B \-s
|
|
option can be used to scale the input data by a given factor.
|
|
The default value of
|
|
.I scale
|
|
is 2147483647 (i.e. the maximum value of a 32-bit signed integer).
|
|
Internal effects
|
|
always work with signed long PCM data and so the value should relate to this
|
|
fact.
|
|
.SP
|
|
The
|
|
.B \-rms
|
|
option will convert all output average values to `root mean square'
|
|
format.
|
|
.SP
|
|
The
|
|
.B \-v
|
|
option displays only the `Volume Adjustment' value.
|
|
.SP
|
|
The
|
|
.B \-freq
|
|
option calculates the input's power spectrum (4096 point DFT) instead of the
|
|
statistics listed above. This should only be used with a single channel
|
|
audio file.
|
|
.SP
|
|
The
|
|
.B \-d
|
|
option
|
|
displays a hex dump of the 32-bit signed PCM data
|
|
audio in SoX's internal buffer.
|
|
This is mainly used to help track down endian problems that
|
|
sometimes occur in cross-platform versions of SoX.
|
|
.SP
|
|
See also the
|
|
.B stats
|
|
effect.
|
|
.TP
|
|
\fBstats\fR [\fB\-b \fIbits\fR\^|\^\fB\-x \fIbits\fR\^|\^\fB\-s \fIscale\fR] [\fB\-w \fIwindow-time\fR]
|
|
Display time domain statistical information about the audio channels;
|
|
audio is passed unmodified through the SoX processing chain.
|
|
Statistics are calculated and displayed for each audio channel and,
|
|
where applicable, an overall figure is also given.
|
|
.SP
|
|
For example, for a typical well-mastered stereo music file:
|
|
.TS
|
|
center;
|
|
l.
|
|
.ft CW
|
|
Overall Left Right
|
|
DC offset 0.000803 \-0.000391 0.000803
|
|
Min level \-0.750977 \-0.750977 \-0.653412
|
|
Max level 0.708801 0.708801 0.653534
|
|
Pk lev dB \-2.49 \-2.49 \-3.69
|
|
RMS lev dB \-19.41 \-19.13 \-19.71
|
|
RMS Pk dB \-13.82 \-13.82 \-14.38
|
|
RMS Tr dB \-85.25 \-85.25 \-82.66
|
|
Crest factor \- 6.79 6.32
|
|
Flat factor 0.00 0.00 0.00
|
|
Pk count 2 2 2
|
|
Bit-depth 16/16 16/16 16/16
|
|
Num samples 7.72M
|
|
Length s 174.973
|
|
Scale max 1.000000
|
|
Window s 0.050
|
|
.ft R
|
|
.TE
|
|
.DT
|
|
.SP
|
|
.IR DC\ offset ,
|
|
.IR Min\ level ,
|
|
and
|
|
.I Max\ level
|
|
are shown, by default, in the range \(+-1.
|
|
If the
|
|
.B \-b
|
|
(bits) options is given, then these three measurements will be scaled to a signed integer
|
|
with the given number of bits; for example, for 16 bits, the scale would be \-32768 to +32767.
|
|
The
|
|
.B \-x
|
|
option behaves the same way as
|
|
.B \-b
|
|
except that the signed integer values are displayed in hexadecimal.
|
|
The
|
|
.B \-s
|
|
option scales the three measurements by a given floating-point number.
|
|
.SP
|
|
.I Pk\ lev\ dB
|
|
and
|
|
.I RMS\ lev\ dB
|
|
are standard peak and RMS level measured in dBFS.
|
|
.I RMS\ Pk\ dB
|
|
and
|
|
.I RMS\ Tr\ dB
|
|
are peak and trough values for RMS level measured over a short window (default 50ms).
|
|
.SP
|
|
.I Crest\ factor
|
|
is the standard ratio of peak to RMS level (note: not in dB).
|
|
.SP
|
|
.I Flat\ factor
|
|
is a measure of the flatness (i.e. consecutive samples with the same value) of the signal at
|
|
its peak levels (i.e. either
|
|
.IR Min\ level ,
|
|
or
|
|
.IR Max\ level ).
|
|
.I Pk\ count
|
|
is the number of occasions (not the number of samples) that the signal attained either
|
|
.IR Min\ level ,
|
|
or
|
|
.IR Max\ level .
|
|
.SP
|
|
The right-hand
|
|
.I Bit-depth
|
|
figure is the standard definition of bit-depth i.e. bits less
|
|
significant than the given number are fixed at zero. The left-hand
|
|
figure is the number of most significant bits that are fixed at zero (or
|
|
one for negative numbers) subtracted from the right-hand figure (the
|
|
number subtracted is directly related to
|
|
.IR Pk\ lev\ dB ).
|
|
.SP
|
|
For multi-channel audio, an overall figure for each of the above
|
|
measurements is given and derived from the channel figures as follows:
|
|
.IR DC\ offset :
|
|
maximum magnitude;
|
|
.IR Max\ level ,
|
|
.IR Pk\ lev\ dB ,
|
|
.IR RMS\ Pk\ dB ,
|
|
.IR Bit-depth :
|
|
maximum;
|
|
.IR Min\ level ,
|
|
.IR RMS\ Tr\ dB :
|
|
minimum;
|
|
.IR RMS\ lev\ dB ,
|
|
.IR Flat\ factor ,
|
|
.IR Pk\ count :
|
|
average;
|
|
.IR Crest\ factor :
|
|
not applicable.
|
|
.SP
|
|
.I Length\ s
|
|
is the duration in seconds of the audio, and
|
|
.I Num\ samples
|
|
is equal to the sample-rate multiplied by
|
|
.IR Length .
|
|
.I Scale\ Max
|
|
is the scaling applied to the first three measurements;
|
|
specifically, it is the maximum value that could apply to
|
|
.IR Max\ level .
|
|
.I Window\ s
|
|
is the length of the window used for the peak and trough RMS measurements.
|
|
.SP
|
|
See also the
|
|
.B stat
|
|
effect.
|
|
.TP
|
|
\fBswap\fR
|
|
Swap stereo channels. If the input is not stereo, pairs of channels are
|
|
swapped, and a possible odd last channel passed through. E.g., for seven
|
|
channels, the output order will be 2, 1, 4, 3, 6, 5, 7.
|
|
.SP
|
|
See also
|
|
.B remix
|
|
for an effect that allows arbitrary channel selection and ordering
|
|
(and mixing).
|
|
.TP
|
|
\fBstretch \fIfactor\fR [\fIwindow fade shift fading\fR]
|
|
Change the audio duration (but not its pitch).
|
|
This effect is broadly equivalent to the
|
|
.B tempo
|
|
effect with (\fIfactor\fR inverted and)
|
|
.I search
|
|
set to zero, so in general, its results are comparatively poor;
|
|
it is retained as it can sometimes out-perform
|
|
.B tempo
|
|
for small
|
|
.IR factor s.
|
|
.SP
|
|
.I factor
|
|
of stretching: >1 lengthen, <1 shorten duration.
|
|
.I window
|
|
size is in ms. Default is 20ms. The
|
|
.I fade
|
|
option, can be `lin'.
|
|
.I shift
|
|
ratio, in [0 1]. Default depends on stretch factor. 1
|
|
to shorten, 0\*d8 to lengthen. The
|
|
.I fading
|
|
ratio, in [0 0\*d5]. The amount of a fade's default depends on
|
|
.I factor
|
|
and \fIshift\fR.
|
|
.SP
|
|
See also the
|
|
.B tempo
|
|
effect.
|
|
.na
|
|
.TP
|
|
\fBsynth\fR [\fB\-j \fIKEY\fR] [\fB\-n\fR] [\fIlen\fR [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]] {[\fItype\fR] [\fIcombine\fR] \:[[\fB%\fR]\fIfreq\fR[\fBk\fR][\fB:\fR\^|\^\fB+\fR\^|\^\fB/\fR\^|\^\fB\-\fR[\fB%\fR]\fIfreq2\fR[\fBk\fR]]] [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]}
|
|
.ad
|
|
This effect can be used to generate fixed or swept frequency audio tones
|
|
with various wave shapes, or to generate wide-band noise of various
|
|
`colours'.
|
|
Multiple synth effects can be cascaded to produce more complex
|
|
waveforms; at each stage it is possible to choose whether the generated
|
|
waveform will be mixed with, or modulated onto
|
|
the output from the previous stage.
|
|
Audio for each channel in a multi-channel audio file can be synthesised
|
|
independently.
|
|
.SP
|
|
Though this effect is used to generate audio, an input file must still
|
|
be given, the characteristics of which will be used to set the
|
|
synthesised audio length, the number of channels, and the sampling rate;
|
|
however, since the input file's audio is not normally needed, a `null
|
|
file' (with the special name \fB\-n\fR) is often given instead (and the
|
|
length specified as a parameter to \fBsynth\fR or by another given
|
|
effect that has an associated length).
|
|
.SP
|
|
For example, the following produces a 3 second, 48kHz,
|
|
audio file containing a sine-wave swept from 300 to 3300\ Hz:
|
|
.EX
|
|
sox \-n output.wav synth 3 sine 300\-3300
|
|
.EE
|
|
and this produces an 8\ kHz version:
|
|
.EX
|
|
sox \-r 8000 \-n output.wav synth 3 sine 300\-3300
|
|
.EE
|
|
Multiple channels can be synthesised by specifying the set of
|
|
parameters shown between braces multiple times;
|
|
the following puts the swept tone in the left channel and adds `brown'
|
|
noise in the right:
|
|
.EX
|
|
sox \-n output.wav synth 3 sine 300\-3300 brownnoise
|
|
.EE
|
|
The following example shows how two synth effects can be cascaded
|
|
to create a more complex waveform:
|
|
.EX
|
|
.ne 2
|
|
play \-n synth 0.5 sine 200\-500 synth 0.5 sine fmod 700\-100
|
|
.EE
|
|
Frequencies can also be given in `scientific' note notation, or, by
|
|
prefixing a `%' character, as a number of semitones relative to
|
|
`middle A' (440\ Hz). For example, the following could be used to
|
|
help tune a guitar's low `E' string:
|
|
.EX
|
|
play \-n synth 4 pluck %\-29
|
|
.EE
|
|
or with a (Bourne shell) loop, the whole guitar:
|
|
.EX
|
|
.ne 2
|
|
for n in E2 A2 D3 G3 B3 E4; do
|
|
play \-n synth 4 pluck $n repeat 2; done
|
|
.EE
|
|
See the
|
|
.B delay
|
|
effect (above) and the reference to `SoX scripting examples' (below)
|
|
for more
|
|
.B synth
|
|
examples.
|
|
.SP
|
|
.B N.B.
|
|
This effect generates audio at maximum volume (0dBFS), which means that there
|
|
is a high chance of clipping when using the audio subsequently, so
|
|
in many cases, you will want to follow this effect with the \fBgain\fR
|
|
effect to prevent this from happening. (See also
|
|
.B Clipping
|
|
above.)
|
|
Note that, by default, the
|
|
.B synth
|
|
effect incorporates the functionality of \fBgain \-h\fR (see the
|
|
.B gain
|
|
effect for details);
|
|
.BR synth 's
|
|
.B \-n
|
|
option may be given to disable this behaviour.
|
|
.SP
|
|
A detailed description of each
|
|
.B synth
|
|
parameter follows:
|
|
.SP
|
|
\fIlen\fR is the length of audio to synthesise (any time specification);
|
|
a value of 0 indicated to use the input length, which is also the default.
|
|
.SP
|
|
\fItype\fR is one of sine, square, triangle, sawtooth, trapezium, exp,
|
|
[white]noise, tpdfnoise, pinknoise, brownnoise, pluck; default=sine.
|
|
.SP
|
|
\fIcombine\fR is one of create, mix, amod (amplitude modulation), fmod
|
|
(frequency modulation); default=create.
|
|
.SP
|
|
\fIfreq\fR/\fIfreq2\fR are the frequencies at the beginning/end of
|
|
synthesis in Hz or, if preceded with `%', semitones relative to A
|
|
(440\ Hz); alternatively, `scientific' note notation (e.g. E2) may
|
|
be used. The default frequency is 440Hz. By default, the tuning used
|
|
with the note notations is `equal temperament'; the
|
|
.B \-j
|
|
.I KEY
|
|
option selects `just intonation', where
|
|
.I KEY
|
|
is an integer number of semitones relative to A (so for example, \-9
|
|
or 3 selects the key of C), or a note in scientific notation.
|
|
.SP
|
|
If
|
|
.I freq2
|
|
is given, then
|
|
.I len
|
|
must also have been given and the generated tone will be swept between
|
|
the given frequencies. The two given frequencies must be separated by
|
|
one of the characters `:', `+', `/', or `\-'. This character is used to
|
|
specify the sweep function as follows:
|
|
.RS
|
|
.IP \fB:\fR
|
|
Linear: the tone will change by a fixed number of hertz per second.
|
|
.IP \fB+\fR
|
|
Square: a second-order function is used to change the tone.
|
|
.IP \fB/\fR
|
|
Exponential: the tone will change by a fixed number of semitones per second.
|
|
.IP \fB\-\fR
|
|
Exponential: as `/', but initial phase always zero, and stepped (less
|
|
smooth) frequency changes.
|
|
.RE
|
|
.TP
|
|
\
|
|
Not used for noise.
|
|
.SP
|
|
\fIoff\fR is the bias (DC-offset) of the signal in percent; default=0.
|
|
.SP
|
|
\fIph\fR is the phase shift in percentage of 1 cycle; default=0. Not
|
|
used for noise.
|
|
.SP
|
|
\fIp1\fR is the percentage of each cycle that is `on' (square), or
|
|
`rising' (triangle, exp, trapezium); default=50 (square, triangle, exp),
|
|
default=10 (trapezium), or sustain (pluck); default=40.
|
|
.SP
|
|
\fIp2\fR (trapezium): the percentage through each cycle at which `falling'
|
|
begins; default=50. exp: the amplitude in multiples of 2dB; default=50,
|
|
or tone-1 (pluck); default=20.
|
|
.SP
|
|
\fIp3\fR (trapezium): the percentage through each cycle at which `falling'
|
|
ends; default=60, or tone-2 (pluck); default=90.
|
|
.TP
|
|
\fBtempo \fR[\fB\-q\fR] [\fB\-m\fR\^|\^\fB\-s\fR\^|\^\fB\-l\fR] \fIfactor\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
|
|
Change the audio playback speed but not its pitch. This effect uses the
|
|
WSOLA algorithm. The audio is chopped up into segments which are then
|
|
shifted in the time domain and overlapped (cross-faded) at points where
|
|
their waveforms are most similar as determined by measurement of `least
|
|
squares'.
|
|
.SP
|
|
By default, linear searches are used to find the best overlapping
|
|
points. If the optional
|
|
.B \-q
|
|
parameter is given, tree searches are used instead. This makes the effect
|
|
work more quickly, but the result may not sound as good. However, if you
|
|
must improve the processing speed, this generally reduces the sound quality
|
|
less than reducing the search or overlap values.
|
|
.SP
|
|
The
|
|
.B \-m
|
|
option is used to optimize default values of segment, search and
|
|
overlap for music processing.
|
|
.SP
|
|
The
|
|
.B \-s
|
|
option is used to optimize default values of segment, search and
|
|
overlap for speech processing.
|
|
.SP
|
|
The
|
|
.B \-l
|
|
option is used to optimize default values of segment, search and
|
|
overlap for `linear' processing that tends to cause more
|
|
noticeable distortion but may be useful when factor is close to 1.
|
|
.SP
|
|
If \-m, \-s, or \-l is specified, the default value of segment will be
|
|
calculated based on factor, while default search and overlap values are
|
|
based on segment. Any values you provide still override these default
|
|
values.
|
|
.SP
|
|
.I factor
|
|
gives the ratio of new tempo to the old tempo, so e.g. 1.1 speeds up the
|
|
tempo by 10%, and 0.9 slows it down by 10%.
|
|
.SP
|
|
The optional
|
|
.I segment
|
|
parameter selects the algorithm's segment size in milliseconds. If no other
|
|
flags are specified, the default value is 82 and is typically suited to
|
|
making small changes to the tempo of music. For larger changes (e.g. a factor
|
|
of 2), 41\ ms may give a better result. The \-m, \-s, and \-l flags will cause
|
|
the segment default to be automatically adjusted based on factor.
|
|
For example using \-s (for speech) with a tempo of 1.25 will calculate a
|
|
default segment value of 32.
|
|
.SP
|
|
The optional
|
|
.I search
|
|
parameter gives the audio length in milliseconds over which
|
|
the algorithm will search for overlapping points. If no other
|
|
flags are specified, the default value is 14.68. Larger values use
|
|
more processing time and may or may not produce better results.
|
|
A practical maximum is half the value of segment. Search
|
|
can be reduced to cut processing time at the risk of degrading output
|
|
quality. The \-m, \-s, and \-l flags will cause
|
|
the search default to be automatically adjusted based on segment.
|
|
.SP
|
|
The optional
|
|
.I overlap
|
|
parameter gives the segment overlap length in milliseconds.
|
|
Default value is 12, but \-m, \-s, or \-l flags automatically
|
|
adjust overlap based on segment size. Increasing overlap increases
|
|
processing time and may increase quality. A practical maximum for overlap
|
|
is the value of search, with overlap typically being (at least) a little
|
|
smaller then search.
|
|
.SP
|
|
See also
|
|
.B speed
|
|
for an effect that changes tempo and pitch together,
|
|
.B pitch
|
|
and \fBbend\fR for effects that change pitch only, and
|
|
.B stretch
|
|
for an effect that changes tempo using a different algorithm.
|
|
.TP
|
|
\fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
|
|
Apply a treble tone-control effect.
|
|
See the description of the \fBbass\fR effect for details.
|
|
.TP
|
|
\fBtremolo \fIspeed\fR [\fIdepth\fR]
|
|
Apply a tremolo (low frequency amplitude modulation) effect to the audio.
|
|
The tremolo frequency in Hz is given by
|
|
.IR speed ,
|
|
and the depth as a percentage by
|
|
.I depth
|
|
(default 40).
|
|
.TP
|
|
\fBtrim\fR {\fIposition(+)\fR}
|
|
Cuts portions out of the audio. Any number of \fIposition\fRs may be
|
|
given; audio is not sent to the output until the first \fIposition\fR
|
|
is reached. The effect then alternates between copying and discarding
|
|
audio at each \fIposition\fR. Using a value of 0 for the first \fIposition\fR
|
|
parameter allows copying from the beginning of the audio.
|
|
.SP
|
|
For example,
|
|
.EX
|
|
sox infile outfile trim 0 10
|
|
.EE
|
|
will copy the first ten seconds, while
|
|
.EX
|
|
play infile trim 12:34 =15:00 -2:00
|
|
.EE
|
|
and
|
|
.EX
|
|
play infile trim 12:34 2:26 -2:00
|
|
.EE
|
|
will both play from 12 minutes 34 seconds into the audio up to 15 minutes into
|
|
the audio (i.e. 2 minutes and 26 seconds long), then resume playing two
|
|
minutes before the end of audio.
|
|
.TP
|
|
\fBupsample\fR [\fIfactor\fR]
|
|
Upsample the signal by an integer factor: \fIfactor\fR\-1 zero-value
|
|
samples are inserted between each pair of input samples. As a result, the
|
|
original spectrum is replicated into the new frequency space (imaging) and
|
|
attenuated. This attenuation can be compensated for by adding
|
|
\fBvol \fIfactor\fR after any further processing. The upsample effect is
|
|
typically used in combination with filtering effects.
|
|
.SP
|
|
For a general resampling effect with anti-imaging, see \fBrate\fR. See
|
|
also \fBdownsample\fR.
|
|
.TP
|
|
\fBvad \fR[\fIoptions\fR]
|
|
Voice Activity Detector. Attempts to trim silence and quiet
|
|
background sounds from the ends of (fairly high resolution
|
|
i.e. 16-bit, 44\-48kHz) recordings of speech. The algorithm currently
|
|
uses a simple cepstral power measurement to detect voice, so may be
|
|
fooled by other things, especially music. The effect can trim only
|
|
from the front of the audio, so in order to trim from the back, the
|
|
.B reverse
|
|
effect must also be used. E.g.
|
|
.EX
|
|
play speech.wav norm vad
|
|
.EE
|
|
to trim from the front,
|
|
.EX
|
|
play speech.wav norm reverse vad reverse
|
|
.EE
|
|
to trim from the back, and
|
|
.EX
|
|
play speech.wav norm vad reverse vad reverse
|
|
.EE
|
|
to trim from both ends. The use of the
|
|
.B norm
|
|
effect is recommended, but remember that neither
|
|
.B reverse
|
|
nor
|
|
.B norm
|
|
is suitable for use with streamed audio.
|
|
.SP
|
|
.I Options:
|
|
.br
|
|
Default values are shown in parenthesis.
|
|
.RS
|
|
.IP \fB\-t\ \fInum\fR\ (7)
|
|
The measurement level used to trigger activity detection. This might
|
|
need to be changed depending on the noise level, signal level and
|
|
other charactistics of the input audio.
|
|
.IP \fB\-T\ \fInum\fR\ (0.25)
|
|
The time constant (in seconds) used to help ignore short bursts of
|
|
sound.
|
|
.IP \fB\-s\ \fInum\fR\ (1)
|
|
The amount of audio (in seconds) to search for quieter/shorter bursts
|
|
of audio to include prior to the detected trigger point.
|
|
.IP \fB\-g\ \fInum\fR\ (0.25)
|
|
Allowed gap (in seconds) between quieter/shorter bursts of audio to
|
|
include prior to the detected trigger point.
|
|
.IP \fB\-p\ \fInum\fR\ (0)
|
|
The amount of audio (in seconds) to preserve before the trigger point
|
|
and any found quieter/shorter bursts.
|
|
.RE
|
|
.TP
|
|
\
|
|
.I Advanced Options:
|
|
.br
|
|
These allow fine tuning of the algorithm's internal parameters.
|
|
.RS
|
|
.IP \fB\-b\ \fInum\fR
|
|
The algorithm (internally) uses adaptive noise estimation/reduction in
|
|
order to detect the start of the wanted audio. This option sets the
|
|
time for the initial noise estimate.
|
|
.IP \fB\-N\ \fInum\fR
|
|
Time constant used by the adaptive noise estimator for when the noise
|
|
level is increasing.
|
|
.IP \fB\-n\ \fInum\fR
|
|
Time constant used by the adaptive noise estimator for when the noise
|
|
level is decreasing.
|
|
.IP \fB\-r\ \fInum\fR
|
|
Amount of noise reduction to use in the detection algorithm (e.g. 0,
|
|
0.5, ...).
|
|
.IP \fB\-f\ \fInum\fR
|
|
Frequency of the algorithm's processing/measurements.
|
|
.IP \fB\-m\ \fInum\fR
|
|
Measurement duration; by default, twice the measurement period; i.e.
|
|
with overlap.
|
|
.IP \fB\-M\ \fInum\fR
|
|
Time constant used to smooth spectral measurements.
|
|
.IP \fB\-h\ \fInum\fR
|
|
`Brick-wall' frequency of high-pass filter applied at the input to the
|
|
detector algorithm.
|
|
.IP \fB\-l\ \fInum\fR
|
|
`Brick-wall' frequency of low-pass filter applied at the input to the
|
|
detector algorithm.
|
|
.IP \fB\-H\ \fInum\fR
|
|
`Brick-wall' frequency of high-pass lifter used in the detector
|
|
algorithm.
|
|
.IP \fB\-L\ \fInum\fR
|
|
`Brick-wall' frequency of low-pass lifter used in the detector
|
|
algorithm.
|
|
.RE
|
|
.TP
|
|
\
|
|
See also the
|
|
.B silence
|
|
effect.
|
|
.TP
|
|
\fBvol \fIgain\fR [\fItype\fR [\fIlimitergain\fR]]
|
|
Apply an amplification or an attenuation to the audio signal.
|
|
Unlike the
|
|
.B \-v
|
|
option (which is used for balancing multiple input files as they enter the
|
|
SoX effects processing chain),
|
|
.B vol
|
|
is an effect like any other so can be applied anywhere, and several times
|
|
if necessary, during the processing chain.
|
|
.SP
|
|
The amount to change the volume is given by
|
|
.I gain
|
|
which is interpreted, according to the given \fItype\fR, as follows: if
|
|
.I type
|
|
is \fBamplitude\fR (or is omitted), then
|
|
.I gain
|
|
is an amplitude (i.e. voltage or linear) ratio,
|
|
if \fBpower\fR, then a power (i.e. wattage or voltage-squared) ratio,
|
|
and if \fBdB\fR, then a power change in dB.
|
|
.SP
|
|
When
|
|
.I type
|
|
is \fBamplitude\fR or \fBpower\fR, a
|
|
.I gain
|
|
of 1 leaves the volume unchanged,
|
|
less than 1 decreases it,
|
|
and greater than 1 increases it;
|
|
a negative
|
|
.I gain
|
|
inverts the audio signal in addition to adjusting its volume.
|
|
.SP
|
|
When
|
|
.I type
|
|
is \fBdB\fR, a
|
|
.I gain
|
|
of 0 leaves the volume unchanged,
|
|
less than 0 decreases it,
|
|
and greater than 0 increases it.
|
|
.SP
|
|
See [4]
|
|
for a detailed discussion on electrical (and hence audio signal)
|
|
voltage and power ratios.
|
|
.SP
|
|
Beware of
|
|
.B Clipping
|
|
when the increasing the volume.
|
|
.SP
|
|
The
|
|
.I gain
|
|
and the
|
|
.I type
|
|
parameters can be concatenated if desired, e.g.
|
|
.BR "vol 10dB" .
|
|
.SP
|
|
An optional \fIlimitergain\fR value can be specified and should be a
|
|
value much less
|
|
than 1 (e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
|
|
Not specifying this parameter will cause no limiter to be used. In verbose
|
|
mode, this effect will display the percentage of the audio that needed to be
|
|
limited.
|
|
.SP
|
|
See also
|
|
.B gain
|
|
for a volume-changing effect with different capabilities, and
|
|
.B compand
|
|
for a dynamic-range compression/expansion/limiting effect.
|
|
.SH DIAGNOSTICS
|
|
Exit status is 0 for no error, 1 if there is a problem with the
|
|
command-line parameters, or 2 if an error occurs during file processing.
|
|
.SH BUGS
|
|
Please report any bugs found in this version of SoX to the mailing list
|
|
(sox-users@lists.sourceforge.net).
|
|
.SH SEE ALSO
|
|
.BR soxi (1),
|
|
.BR soxformat (7),
|
|
.BR libsox (3)
|
|
.br
|
|
.BR audacity (1),
|
|
.BR gnuplot (1),
|
|
.BR octave (1),
|
|
.BR wget (1)
|
|
.br
|
|
The SoX web site at http://sox.sourceforge.net
|
|
.br
|
|
SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
|
|
.SS References
|
|
.TP
|
|
[1]
|
|
R. Bristow-Johnson,
|
|
.IR "Cookbook formulae for audio EQ biquad filter coefficients" ,
|
|
http://musicdsp.org/files/Audio-EQ-Cookbook.txt
|
|
.TP
|
|
[2]
|
|
Wikipedia,
|
|
.IR "Q-factor" ,
|
|
http://en.wikipedia.org/wiki/Q_factor
|
|
.TP
|
|
[3]
|
|
Scott Lehman,
|
|
.IR "Effects Explained" ,
|
|
http://harmony-central.com/Effects/effects-explained.html
|
|
.TP
|
|
[4]
|
|
Wikipedia,
|
|
.IR "Decibel" ,
|
|
http://en.wikipedia.org/wiki/Decibel
|
|
.TP
|
|
[5]
|
|
Richard Furse,
|
|
.IR "Linux Audio Developer's Simple Plugin API" ,
|
|
http://www.ladspa.org
|
|
.TP
|
|
[6]
|
|
Richard Furse,
|
|
.IR "Computer Music Toolkit" ,
|
|
http://www.ladspa.org/cmt
|
|
.TP
|
|
[7]
|
|
Steve Harris,
|
|
.IR "LADSPA plugins" ,
|
|
http://plugin.org.uk
|
|
.SH LICENSE
|
|
Copyright 1998\-2013 Chris Bagwell and SoX Contributors.
|
|
.br
|
|
Copyright 1991 Lance Norskog and Sundry Contributors.
|
|
.SP
|
|
This program is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 2, or (at your option)
|
|
any later version.
|
|
.SP
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
.SH AUTHORS
|
|
Chris Bagwell (cbagwell@users.sourceforge.net).
|
|
Other authors and contributors are listed in the ChangeLog file that
|
|
is distributed with the source code.
|