The GNUstep HTML Linker
Introduction
What the HTML linker does
The GNUstep HTML linker is able to fixup links from one HTML
document to other HTML ones. By link we mean the standard
<a href="NSString.html">
tag. By fixing up a
link we mean to modify the path in the href
so that
it points to the actual file on disk. For example, if you have
the file NSString.html
in the directory
/home/nicola/Doc
, when the linker fixes up the
<a href="NSString.html">
link, it will replace
it with <a
href="/home/nicola/Doc/NSString.html">
.
Example
Say that you have a collection of HTML files, all in the same
directory, with working links from one file to the other one. Now
you move the files around, scattering them in many directories -
of course the links no longer work! By running the HTML linker on
the files, the HTML linker will modify all links inside the files,
resolving each of them to point to the actual full path of the
required file. The links will work again.
Real world usage of the linker
In the real world, it's more complicated than this because you
normally put the HTML files across different directories from the
very beginning. The HTML linker becomes helpful because you can
create links between these files as if they were in the same
directory ... and then - only at the end - run the HTML linker to
actually fixup the links and make them work. If you move around
the files or mess in any way with their paths, you can always
fixup the links afterwards by rerunning the linker - you don't
need to regenerate the HTML files.
Specification
Input and destination files
The HTML linker uses input files and destination
files. Both type of files are always supposed to be HTML
files. Input files are modified by the linker during the run
(they have their links fixed up), while destination files are only
read (and sometime not even read). When the HTML linker is run,
it first prepares the list of input files and destination files
from the arguments on the command line. Then, it reads each input
file from disk in turn. It examines all links in the file, and
replaces the file with a new version of the file where all links
have been fixed up.
Specifying input and destination files
All command line arguments which do not begin with a hypen
(-
), and which are not the values of defaults (for
example, not the YES
in -CheckLinks YES
,
because that is the value of the default
-CheckLinks
), are interpreted as input files if they
come before the --Destinations
argument, and as
destination files if they come after it. If a directory is
specified as an input (or destination) file, the linker will
recurse into the directory and add to the list of input (or
destination) files all files in the directory (and in the
directory's subdirectories, no matter how deeply nested) which
have one of the following extensions: .html
,
.HTML
, .htm
or .HTM
. The
--Destinations
argument ends the list of input files,
and is followed by the list of destination files. A typical
invocation of the linker might be as follows:
HTMLLinker myfile.html --Destinations /usr/GNUstep/System/Documentation
this invokes the linker with a single input file,
myfile.html
, and all the HTML files inside the
/usr/GNUstep/System/Documentation
directory as
destination files.
What is a link
A link is an anchor tag with and href
, such as
<a href="dest.html#location">
. The destination
file of the link is the file specified in the href
;
dest.html
in the example. If the link contains a
path to the file, for example <a
href="/nicola/Documentation/dest.html#location">
, the
path is ignored, so this link is considered by the linker to be
exactly the same as <a
href="dest.html#location>
.
Which links are fixed up
Normally, the linker will only fixup links which have the
rel
attribute set to dynamical
, as in
the following example: <a href="nicola.html"
rel="dynamical">
. In this way, you can specify in your
HTML document which links you want to be fixed up, and which you
don't want to be. But in certain situations you might want to
force the linker to attempt to fixup all links; you can run the
linker with the -CheckAllLinks YES
option to cause
this behaviour. As a special exception, links which obviously are
not to be fixed up, such as links beginning with mailto: or
news:, and links without a filename (which means they refer
to the same file they are in) are never fixed up.
How links are fixed up
When the HTML linker encounters a link which needs to be fixed up
(say <a href="dest.html#location">
), it
searches the list of destination files for a file with the same
name as the destination file of the link. If no such file is
found, the HTML linker emits a warning, and replaces the link in
the file with a link to the pathless filename. In the example, it
would simply emit <a
href="dest.html#location">
. If the destination file is
found in the list, instead, the HTML linker replaces the link with
the full path to the destination file on disk. For example, if
there is a /home/nicola/Doc/dest.html
destination
file, the HTML linker will fixup the link to be <a
href="/home/nicola/Doc/dest.html#location">
(as a
special exception, if there is a path mapping which matches the
path to the destination file, it's applied to the path in the
link. See below for a detailed explanation of path mappings).
It's important to notice that you must have unique filenames for
the linker to work properly. For example, if you have two
different destination files with the same name, say
NSObject.html
, then a link to
NSObject.html
is likely not to be properly resolved
because the linker has no way to know if you meant the link to
point to the first or the second destination file! You should
choose filenames better to more uniquely specify their contents,
for example NSObject_class.html
and
NSObject_protocol.html
if the first file documents
the NSObject class and the second one the NSObject protocol. Then
all links will clearly refer to one file or the other one, and no
confusion will arise. If there are multiple destination files for
a link, the linker will guess which one is the right one, and that
might not give the desired result.
How links are checked
When a link is fixed up, the linker also performs some checking
that the link is correct. The first basic check, which is always
performed, is that the destination file can be resolved. A
warning is emitted if the destination file can't be resolved - in
this case (see above) the links can not even be fixed up, and its
path is left unspecified. Then, if the destination file was found
and if the linker was run with the option -CheckLinks
YES
(note that this option is the default, so the second
check is normally performed unless you pass -CheckLinks
NO
to the linker), the linker will actually check that the
link can be resolved. In practice, the linker checks that the
destination file actually contains the specified named section.
For example, consider the link <a
href="dest.html#location">
. This link points to the
name location
inside the dest.html
file.
When checking the link, the linker will read the destination file
dest.html
, and parse it looking for an anchor tag
with a name attribute set to location
- in practice
something like <a name="location">
. If it does
not find it, it emits a warning. If you already know that all
links can be resolved, you can run the linker with
-CheckLinks NO
skipping the parsing of destination
files in order to get peak performance.
Path mappings
Path mappings are an additional feature of the HTML linker which
can be used when exporting documentation to be served by a web
server. If you are not putting your documentation on a web server
but simply reading it from the filesystem, then you don't need the
path mappings. The issue with exporting documentation to a web
server is that you refer to files using paths which are not
necessarily the same paths where the files are on disk. For
example, suppose that you have some HTML documentation in
/opt/doc/base
and some other HTML documentation in
/opt/doc/gui
. The HTML files in the two
documentation directories refer to each other. You can run the
HTML linker and fixup all links, and we are happy. But now
suppose that you set up a web server; the web server, for example,
will serve URLs beginning with /Base
(meaning as in
requests from a browser of the form
http://www.server.org/Base
) by taking files from
/opt/doc/base
, and URLs beginning with
Gui
by taking files from /opt/doc/gui
.
To fixup the links in this case, you need path mappings. A path
mapping specifies that a certain directory on disk is to be
referred in some different way in links. In the example, you
would pass
-PathMapping '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
to the linker.
Each path mapping maps a path on disk to a virtual
path. For example, it maps the path on disk
/opt/doc/base
to the virtual path /Base
.
Each time the linker fixes up a link, after finding the
destination file, it checks the list of path mappings. If the
path to the destination file begins with the path on disk
of one of the path mappings, then that path on disk is
replaced with the corresponding virtual path in the path
to the destination file before the path to the destination file is
written out in the link.
For example, if you have the path mapping explained above, and if
the linker is fixing up the link <a
href="hi.html">
, where the destination file is
/opt/doc/base/nicola/hi.html
, then the destination
path matches the path mapping for /opt/doc/base
, so
the path mapping is applied and the link is fixed up to be
<a href="/Base/nicola/hi.html">
rather than
<a href="/opt/doc/base/nicola/hi.html">
as it
would normally have been without the path mapping.
Specifying path mappings
On the command line
Each path mapping specifies a mapping of a path on disk to a web
server alias. The first way to specify the mappings is on the
command line, in the form of a dictionary argument to the
-PathMappings
, as in
-PathMappings '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
where /opt/doc/base
and /opt/doc/gui
are
the paths on disk and /Base
and /Gui
are
the corresponding web server URL paths.
In a path mappings file
The other way to specify mappings is to write them into a file,
in the format of a dictionary, as, for example, in a file containing
the following lines
{
"/opt/doc/base"="/Base";
"/opt/doc/gui"="/Gui";
}
and then tell the linker to read the path mappings from that file,
by giving the filename as option to the
-PathMappingsFile
. For example, if the file
containing the mappings is called mappings
, then you need
to pass
-PathMappingsFile mappings
to the linker to have it read mappings from the file.
Command line path mappings override file path mappings
Both command line path mappings and path mappings from a file can
be used at the same time; in case of conflict, command line path
mappings override path mappings from the file.
Summary of all the options
Each of the options beginning with a single hypen (-
)
require an argument, as in
HTMLLinker Documentation -CheckLinks YES --Destinations Documentation
which sets CheckLinks
to YES
. The options
might be anywhere on the command line. Options which do not begin
with a single hypen (such as --help
) do not require
an argument, as in
HTMLLinker --help
-CheckLinks
If set to YES
(the default) causes the linker,
whenever it fixes up a link, to check that the destination file
actually contains the target <a name="">
tag.
(bug - does not manage yet the id
attribute as per
newer HTML specifications). You might set it to NO
if you already know all links are correct, this will give you a
performance improvement because the linker does not need to parse
destination files to check links then.
-FixupAllLinks
If set to NO
(the default) only links containing the
rel
attribute set to dynamical
are fixed
up in the input files. If set to YES
, all links are
fixed up.
-PathMappings
If set to a dictionary, read the dictionary as path mappings. See
above for more details of path mappings.
-PathMappingsFile
If set to a string, consider it to be the name of a file; read
path mappings from that file. The file must contain the path
mappings in the form of a dictionary. See above for more details
on path mappings.
-Verbose
If set to YES
prints some more messages than if set
to NO
(the default).
--help
Prints a quick explanation of the command line syntax and exits.
--version
Prints the version and exits.
--Destinations
Ends the list of input files and starts the list of destination
files. All files (or directories) on the command line appearing
before --Destinations
are considered input files,
while all files (or directories) after --Destinations
are considered destination files.
Nicola Pero
Last modified: Fri Jan 4 11:44:15 GMT 2002