libs-base/Tools/HTMLLinker.html

338 lines
16 KiB
HTML
Raw Normal View History

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>The GNUstep HTML Linker</title>
</head>
<body bgcolor="white" text="black" alink="red" vlink="purple" link="blue">
<h1>The GNUstep HTML Linker</h1>
<h3>Introduction</h3>
<h4>What the HTML linker does</h4>
The GNUstep HTML linker is able to fixup links from one HTML
document to other HTML ones. By link we mean the standard
<code>&lt;a href="NSString.html"&gt;</code> tag. By fixing up a
link we mean to modify the path in the <code>href</code> so that
it points to the actual file on disk. For example, if you have
the file <code>NSString.html</code> in the directory
<code>/home/nicola/Doc</code>, when the linker fixes up the
<code>&lt;a href="NSString.html"&gt;</code> link, it will replace
it with <code>&lt;a
href="/home/nicola/Doc/NSString.html"&gt;</code>.
<h4>Example</h4>
Say that you have a collection of HTML files, all in the same
directory, with working links from one file to the other one. Now
you move the files around, scattering them in many directories -
of course the links no longer work! By running the HTML linker on
the files, the HTML linker will modify all links inside the files,
resolving each of them to point to the actual full path of the
required file. The links will work again.
<h4>Real world usage of the linker</h4>
In the real world, it's more complicated than this because you
normally put the HTML files across different directories from the
very beginning. The HTML linker becomes helpful because you can
create links between these files as if they were in the same
directory ... and then - only at the end - run the HTML linker to
actually fixup the links and make them work. If you move around
the files or mess in any way with their paths, you can always
fixup the links afterwards by rerunning the linker - you don't
need to regenerate the HTML files.
<h3>Specification</h3>
<h4>Input and destination files</h4>
The HTML linker uses <em>input files</em> and <em>destination
files</em>. Both type of files are always supposed to be HTML
files. Input files are modified by the linker during the run
(they have their links fixed up), while destination files are only
read (and sometime not even read). When the HTML linker is run,
it first prepares the list of input files and destination files
from the arguments on the command line. Then, it reads each input
file from disk in turn. It examines all links in the file, and
replaces the file with a new version of the file where all links
have been fixed up.
<h4>Specifying input and destination files</h4>
All command line arguments which do not begin with a hypen
(<code>-</code>), and which are not the values of defaults (for
example, not the <code>YES</code> in <code>-CheckLinks YES</code>,
because that is the value of the default
<code>-CheckLinks</code>), are interpreted as input files if they
come before the <code>--Destinations</code> argument, and as
destination files if they come after it. If a directory is
specified as an input (or destination) file, the linker will
recurse into the directory and add to the list of input (or
destination) files all files in the directory (and in the
directory's subdirectories, no matter how deeply nested) which
have one of the following extensions: <code>.html</code>,
<code>.HTML</code>, <code>.htm</code> or <code>.HTM</code>. The
<code>--Destinations</code> argument ends the list of input files,
and is followed by the list of destination files. A typical
invocation of the linker might be as follows:
<pre>
HTMLLinker myfile.html --Destinations /usr/GNUstep/System/Documentation
</pre>
this invokes the linker with a single input file,
<code>myfile.html</code>, and all the HTML files inside the
<code>/usr/GNUstep/System/Documentation</code> directory as
destination files.
<h4>What is a link</h4>
A link is an anchor tag with and <code>href</code>, such as
<code>&lt;a href="dest.html#location"&gt;</code>. The destination
file of the link is the file specified in the <code>href</code>;
<code>dest.html</code> in the example. If the link contains a
path to the file, for example <code>&lt;a
href="/nicola/Documentation/dest.html#location"&gt;</code>, the
path is ignored, so this link is considered by the linker to be
exactly the same as <code>&lt;a
href="dest.html#location&gt;</code>.
<h4>Which links are fixed up</h4>
Normally, the linker will only fixup links which have the
<code>rel</code> attribute set to <code>dynamical</code>, as in
the following example: <code>&lt;a href="nicola.html"
rel="dynamical"&gt;</code>. In this way, you can specify in your
HTML document which links you want to be fixed up, and which you
don't want to be. But in certain situations you might want to
force the linker to attempt to fixup all links; you can run the
linker with the <code>-CheckAllLinks YES</code> option to cause
this behaviour. As a special exception, links which obviously are
not to be fixed up, such as links beginning with <i>mailto:</i> or
<i>news:</i>, and links without a filename (which means they refer
to the same file they are in) are never fixed up.
<h4>How links are fixed up </h4>
When the HTML linker encounters a link which needs to be fixed up
(say <code>&lt;a href="dest.html#location"&gt;</code>), it
searches the list of destination files for a file with the same
name as the destination file of the link. If no such file is
found, the HTML linker emits a warning, and replaces the link in
the file with a link to the pathless filename. In the example, it
would simply emit <code>&lt;a
href="dest.html#location"&gt;</code>. If the destination file is
found in the list, instead, the HTML linker replaces the link with
the full path to the destination file on disk. For example, if
there is a <code>/home/nicola/Doc/dest.html</code> destination
file, the HTML linker will fixup the link to be <code>&lt;a
href="/home/nicola/Doc/dest.html#location"&gt;</code> (as a
special exception, if there is a path mapping which matches the
path to the destination file, it's applied to the path in the
link. See below for a detailed explanation of path mappings).
It's important to notice that you must have unique filenames for
the linker to work properly. For example, if you have two
different destination files with the same name, say
<code>NSObject.html</code>, then a link to
<code>NSObject.html</code> is likely not to be properly resolved
because the linker has no way to know if you meant the link to
point to the first or the second destination file! You should
choose filenames better to more uniquely specify their contents,
for example <code>NSObject_class.html</code> and
<code>NSObject_protocol.html</code> if the first file documents
the NSObject class and the second one the NSObject protocol. Then
all links will clearly refer to one file or the other one, and no
confusion will arise. If there are multiple destination files for
a link, the linker will guess which one is the right one, and that
might not give the desired result.
<h4>How links are checked</h4>
When a link is fixed up, the linker also performs some checking
that the link is correct. The first basic check, which is always
performed, is that the destination file can be resolved. A
warning is emitted if the destination file can't be resolved - in
this case (see above) the links can not even be fixed up, and its
path is left unspecified. Then, if the destination file was found
and if the linker was run with the option <code>-CheckLinks
YES</code> (note that this option is the default, so the second
check is normally performed unless you pass <code>-CheckLinks
NO</code> to the linker), the linker will actually check that the
link can be resolved. In practice, the linker checks that the
destination file actually contains the specified named section.
For example, consider the link <code>&lt;a
href="dest.html#location"&gt;</code>. This link points to the
name <code>location</code> inside the <code>dest.html</code> file.
When checking the link, the linker will read the destination file
<code>dest.html</code>, and parse it looking for an anchor tag
with a name attribute set to <code>location</code> - in practice
something like <code>&lt;a name="location"&gt;</code>. If it does
not find it, it emits a warning. If you already know that all
links can be resolved, you can run the linker with
<code>-CheckLinks NO</code> skipping the parsing of destination
files in order to get peak performance.
<h4>Path mappings</h4>
Path mappings are an additional feature of the HTML linker which
can be used when exporting documentation to be served by a web
server. If you are not putting your documentation on a web server
but simply reading it from the filesystem, then you don't need the
path mappings. The issue with exporting documentation to a web
server is that you refer to files using paths which are not
necessarily the same paths where the files are on disk. For
example, suppose that you have some HTML documentation in
<code>/opt/doc/base</code> and some other HTML documentation in
<code>/opt/doc/gui</code>. The HTML files in the two
documentation directories refer to each other. You can run the
HTML linker and fixup all links, and we are happy. But now
suppose that you set up a web server; the web server, for example,
will serve URLs beginning with <code>/Base</code> (meaning as in
requests from a browser of the form
<code>http://www.server.org/Base</code>) by taking files from
<code>/opt/doc/base</code>, and URLs beginning with
<code>Gui</code> by taking files from <code>/opt/doc/gui</code>.
To fixup the links in this case, you need path mappings. A path
mapping specifies that a certain directory on disk is to be
referred in some different way in links. In the example, you
would pass
<pre>
-PathMapping '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
</pre>
to the linker.
Each path mapping maps a <em>path on disk</em> to a <em>virtual
path</em>. For example, it maps the path on disk
<code>/opt/doc/base</code> to the virtual path <code>/Base</code>.
Each time the linker fixes up a link, after finding the
destination file, it checks the list of path mappings. If the
path to the destination file begins with the <em>path on disk</em>
of one of the path mappings, then that <em>path on disk</em> is
replaced with the corresponding <em>virtual path</em> in the path
to the destination file before the path to the destination file is
written out in the link.
For example, if you have the path mapping explained above, and if
the linker is fixing up the link <code>&lt;a
href="hi.html"&gt;</code>, where the destination file is
<code>/opt/doc/base/nicola/hi.html</code>, then the destination
path matches the path mapping for <code>/opt/doc/base</code>, so
the path mapping is applied and the link is fixed up to be
<code>&lt;a href="/Base/nicola/hi.html"&gt;</code> rather than
<code>&lt;a href="/opt/doc/base/nicola/hi.html"&gt;</code> as it
would normally have been without the path mapping.
<h4>Specifying path mappings</h4>
<h5>On the command line</h5>
Each path mapping specifies a mapping of a path on disk to a web
server alias. The first way to specify the mappings is on the
command line, in the form of a dictionary argument to the
<code>-PathMappings</code>, as in
<pre>
-PathMappings '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
</pre>
where <code>/opt/doc/base</code> and <code>/opt/doc/gui</code> are
the paths on disk and <code>/Base</code> and <code>/Gui</code> are
the corresponding web server URL paths.
<h5>In a path mappings file</h5>
The other way to specify mappings is to write them into a file,
in the format of a dictionary, as, for example, in a file containing
the following lines
<pre>
{
"/opt/doc/base"="/Base";
"/opt/doc/gui"="/Gui";
}
</pre>
and then tell the linker to read the path mappings from that file,
by giving the filename as option to the
<code>-PathMappingsFile</code>. For example, if the file
containing the mappings is called <code>mappings</code>, then you need
to pass
<pre>
-PathMappingsFile mappings
</pre>
to the linker to have it read mappings from the file.
<h5>Command line path mappings override file path mappings</h5>
Both command line path mappings and path mappings from a file can
be used at the same time; in case of conflict, command line path
mappings override path mappings from the file.
<h3>Summary of all the options</h3>
Each of the options beginning with a single hypen (<code>-</code>)
require an argument, as in
<pre>
HTMLLinker Documentation -CheckLinks YES --Destinations Documentation
</pre>
which sets <code>CheckLinks</code> to <code>YES</code>. The options
might be anywhere on the command line. Options which do not begin
with a single hypen (such as <code>--help</code>) do not require
an argument, as in
<pre>
HTMLLinker --help
</pre>
<h4>-CheckLinks</h4>
If set to <code>YES</code> (the default) causes the linker,
whenever it fixes up a link, to check that the destination file
actually contains the target <code>&lt;a name=""&gt;</code> tag.
(bug - does not manage yet the <code>id</code> attribute as per
newer HTML specifications). You might set it to <code>NO</code>
if you already know all links are correct, this will give you a
performance improvement because the linker does not need to parse
destination files to check links then.
<h4>-FixupAllLinks</h4>
If set to <code>NO</code> (the default) only links containing the
<code>rel</code> attribute set to <code>dynamical</code> are fixed
up in the input files. If set to <code>YES</code>, all links are
fixed up.
<h4>-PathMappings</h4>
If set to a dictionary, read the dictionary as path mappings. See
above for more details of path mappings.
<h4>-PathMappingsFile</h4>
If set to a string, consider it to be the name of a file; read
path mappings from that file. The file must contain the path
mappings in the form of a dictionary. See above for more details
on path mappings.
<h4>-Verbose</h4>
If set to <code>YES</code> prints some more messages than if set
to <code>NO</code> (the default).
<h4>--help</h4>
Prints a quick explanation of the command line syntax and exits.
<h4>--version</h4>
Prints the version and exits.
<h4>--Destinations</h4>
Ends the list of input files and starts the list of destination
files. All files (or directories) on the command line appearing
before <code>--Destinations</code> are considered input files,
while all files (or directories) after <code>--Destinations</code>
are considered destination files.
<hr>
<address><a href="mailto:n.pero@mi.flashnet.it">Nicola Pero</a></address>
<!-- Created: Fri Jan 4 08:30:07 GMT 2002 -->
<!-- hhmts start -->
Last modified: Fri Jan 4 11:44:15 GMT 2002
<!-- hhmts end -->
</body>
</html>