Updated for rewriting of the tool

git-svn-id: svn+ssh://svn.gna.org/svn/gnustep/libs/base/trunk@12013 72102866-910b-0410-8b05-ffd578937521
This commit is contained in:
nico 2002-01-07 01:09:37 +00:00
parent e626c6fab9
commit e50cbb77ed

View file

@ -13,164 +13,252 @@
The GNUstep HTML linker is able to fixup links from one HTML
document to other HTML ones. By link we mean the standard
<code>&lt;a href="NSString.html"&gt;</code> tag. By fixing up a
link we mean to modify the path in the <code>href</code> so that
it points to the actual file on disk. For example, if you have
the file <code>NSString.html</code> in the directory
<code>/home/nicola/Doc</code>, when the linker fixes up the
<code>&lt;a href="NSString.html"&gt;</code> link, it will replace
it with <code>&lt;a
href="/home/nicola/Doc/NSString.html"&gt;</code>.
<code>&lt;a href="NSString.html#DescriptionOfNSString"&gt;</code>
tag. By fixing up a link we mean to modify the path in the
<code>href</code> so that it points to the actual file on disk.
For example, if you the <code>DescriptionOfNSString</code>
location is in the file <code>NSStringOverview.html</code> in the
directory <code>/home/nicola/Doc</code>, when the linker fixes up
the <code>&lt;a
href="NSString.html#DescriptionOfNSString"&gt;</code> link, it
will replace it with <code>&lt;a
href="/home/nicola/Doc/NSStringOverview.html#DescriptionOfNSString"&gt;</code>.
Please note that when fixing up the link, the linker modifies both
the path and the file name that the link points to, but not the
location inside the file (the <code>DescriptionOfNSString</code>
in the example).
<h4>Example</h4>
<h4>Practical Usage of the linker</h4>
Say that you have a collection of HTML files, all in the same
directory, with working links from one file to the other one. Now
you move the files around, scattering them in many directories -
of course the links no longer work! By running the HTML linker on
the files, the HTML linker will modify all links inside the files,
resolving each of them to point to the actual full path of the
required file. The links will work again.
The typical usage of the linker is with maintaining
cross-references in software documentation. You need to establish
some sort of convention used by all your software documentation
for the link names. For example, suppose that your documentation
is about C libraries. For each C function, you might decide to
tag its documentation in the files with the name
<code>function$function_name</code>. For example, the place in
the doc where it documents the <code>start_library()</code>
function would have the HTML tag <code>&lt;a
name="function$start_library"&gt;</code>. Having established this
convention, in any HTML file in your documentation in which you
want to create a link to the documentation for the
<code>start_library()</code> function, you use the code
<code>&lt;a rel="dynamic"
href="#function$start_library"&gt;</code> (please note that you
ignore the problem of locating the actual file which contains the
documentation for the <code>start_library()</code> function, that
is precisely what the linker will do for you). Whenever you
install the documentation for a new project, you first create a
relocation file for the project documentation, by running
<pre>
HTMLLinker -BuildRelocationFileForDir Documentation
</pre>
if for example the project documentation is in the
<code>Documentation</code> subdirectory. This will create a
<code>Documentation/table.htmlink</code> file, which contains a
list of all names found in the project documentation, and for each
of them, the file in which it's found. Then, you install the
project documentation (say for example that it's installed into
<code>/opt/gnustep/Local/Documentation/MyProject</code>), and once
it's installed, you can run the linker to update all links so that
they point to the actual files
<pre>
HTMLLinker /opt/gnustep/Local/Documentation/MyProject \
-l /opt/gnustep/Local/Documentation/MyProject \
-l /opt/gnustep/Local/Documentation/MyOtherProject
</pre>
This will fixup all links in <code>MyProject</code>'s HTML files
by using the relocation files of both <code>MyProject</code> and
<code>MyOtherProject</code>, so all links to anything which is
documented inside those files will be generated correctly.
<h4>Real world usage of the linker</h4>
<h4>Usage of the tool with autogsdoc</h4>
In the real world, it's more complicated than this because you
normally put the HTML files across different directories from the
very beginning. The HTML linker becomes helpful because you can
create links between these files as if they were in the same
directory ... and then - only at the end - run the HTML linker to
actually fixup the links and make them work. If you move around
the files or mess in any way with their paths, you can always
fixup the links afterwards by rerunning the linker - you don't
need to regenerate the HTML files.
You can use the tool with documentation generated by autogsdoc to
perform the linking (or to relink it). Make sure to use the option
<code>-LinksMarker gsdoc</code> because autogsdoc marks the links
to be fixed up by the linker by using <code>rel="gsdoc"</code>.
<h3>Specification</h3>
<h4>Input and destination files</h4>
<h4>Modes of operation</h4>
The HTML linker works in two phases:
The HTML linker uses <em>input files</em> and <em>destination
files</em>. Both type of files are always supposed to be HTML
files. Input files are modified by the linker during the run
(they have their links fixed up), while destination files are only
read (and sometime not even read). When the HTML linker is run,
it first prepares the list of input files and destination files
from the arguments on the command line. Then, it reads each input
file from disk in turn. It examines all links in the file, and
replaces the file with a new version of the file where all links
have been fixed up.
<ul>
<h4>Specifying input and destination files</h4>
<li> The first (called <i>generation of the relocation
table</i>) preprocesses a given set of HTML files so that it can
be the destination of links. It builds a relocation table for
the given set of HTML files. This relocation table simply maps
all names (as in <code>&lt;a name="xxx"&gt;</code>) in the files
to the file in which the name is found. The HTML files are not
touched. The linker is able to merge this dynamically generated
relocation table with pregenerated relocation tables loaded from
files (called <i>relocation files</i>).
</li>
<li> The second (called <i>linking</i>) links a given file to
the available HTML files on disk, by using the relocation table
to modify the HTML links in the file so that they point to
existing files.
</li>
</ul>
The HTML linker can also be run in a special mode, to generate a
relocation file for later reuse. In this mode, the HTML linker
will build the relocation table for all files in a directory, then
save the relocation table into a <code>table.htmlink</code> file
in that directory for later reuse.
There are three kinds of files:
<ul>
<li>
<em>input files</em>: these are HTML files which are modified
as a consequence of linking; they have their links fixed up.
</li>
<li>
<em>destination files</em>: these are HTML files which are
read to produce relocation tables.
</li>
<li>
<em>relocation files</em>: these files are not HTML files -
they are only created and read by the linker (unless you have
a tool which can manage them), and are in a specific - very
simple - format. They are used to save relocation information
for later reuse, so that the linker can run faster. Normally,
they have a <code>.htmlink</code> extension.
</li>
</ul>
<h4>Linker behaviour</h4>
The linker keeps a main relocation table, which is empty at the
beginning. When run, the linker performs the following steps:
<ol>
<li>
the linker reads and parses all relocation files specified on
the command line, and merges the relocation tables found there
into the main relocation table.
</li>
<li>
the linker reads and parses all destination files specified on
the command line, and builds a relocation table for them,
merging it into the main relocation table.
</li>
<li>
if any input files are specified on the command line, the
linker links the files using the relocation table.
</li>
</ol>
<h4>Specifying input, destination and relocation files</h4>
All command line arguments which do not begin with a hypen
(<code>-</code>), and which are not the values of defaults (for
example, not the <code>YES</code> in <code>-CheckLinks YES</code>,
because that is the value of the default
<code>-CheckLinks</code>), are interpreted as input files if they
come before the <code>--Destinations</code> argument, and as
destination files if they come after it. If a directory is
specified as an input (or destination) file, the linker will
recurse into the directory and add to the list of input (or
destination) files all files in the directory (and in the
directory's subdirectories, no matter how deeply nested) which
have one of the following extensions: <code>.html</code>,
<code>.HTML</code>, <code>.htm</code> or <code>.HTM</code>. The
<code>--Destinations</code> argument ends the list of input files,
and is followed by the list of destination files. A typical
invocation of the linker might be as follows:
example, not the <code>YES</code> in <code>-Warn YES</code>,
because that is the value of the default <code>-Warn</code>), are
interpreted as input files. Each destination file is specified by
using a <code>-d</code> option, and each relocation file by using
a <code>-l</code> option. If a directory is specified as an input
(or destination) file, the linker will recurse into the directory
and add to the list of input (or destination) files all files in
the directory (and in the directory's subdirectories, no matter
how deeply nested) which have one of the following extensions:
<code>.html</code>, <code>.HTML</code>, <code>.htm</code> or
<code>.HTM</code>. If a directory is specified as a relocation
file, the linker will add to the list of relocation files all
files in the directory which have the extension
<code>.htmlink</code>. A typical invocation of the linker is as
follows:
<pre>
HTMLLinker myfile.html --Destinations /usr/GNUstep/System/Documentation
HTMLLinker -BuildRelocationFileForDir Doc
</pre>
this invokes the linker with a single input file,
<code>myfile.html</code>, and all the HTML files inside the
<code>/usr/GNUstep/System/Documentation</code> directory as
destination files.
Builds a relocation file for the documentation in the
directory <code>Doc</code>. After this has been done, the
directory <code>Doc</code> can be used as a <code>-l</code>
argument.
<pre>
HTMLLinker test.html -l Doc
</pre>
Links the file <code>test.html</code> using the relocation file
just generated in the <code>Doc</code> directory.
<h4>What is a link</h4>
A link is an anchor tag with and <code>href</code>, such as
<code>&lt;a href="dest.html#location"&gt;</code>. The destination
file of the link is the file specified in the <code>href</code>;
<code>dest.html</code> in the example. If the link contains a
path to the file, for example <code>&lt;a
href="/nicola/Documentation/dest.html#location"&gt;</code>, the
path is ignored, so this link is considered by the linker to be
exactly the same as <code>&lt;a
href="dest.html#location&gt;</code>.
<code>dest.html</code> in the example. The destination file is
ignored by the linker; the name of the link (which is everything which
follows the <code>#</code>) is used to perform the linking.
<h4>Which links are fixed up</h4>
Normally, the linker will only fixup links which have the
<code>rel</code> attribute set to <code>dynamical</code>, as in
the following example: <code>&lt;a href="nicola.html"
rel="dynamical"&gt;</code>. In this way, you can specify in your
<code>rel</code> attribute set to <code>dynamic</code>, as in the
following example: <code>&lt;a href="nicola.html"
rel="dynamic"&gt;</code>. In this way, you can specify in your
HTML document which links you want to be fixed up, and which you
don't want to be. But in certain situations you might want to
force the linker to attempt to fixup all links; you can run the
linker with the <code>-CheckAllLinks YES</code> option to cause
this behaviour. As a special exception, links which obviously are
not to be fixed up, such as links beginning with <i>mailto:</i> or
<i>news:</i>, and links without a filename (which means they refer
to the same file they are in) are never fixed up.
don't want to be. You can change the type of links to be fixed up
by using the <code>-LinksMarker</code> options, as in
<code>-LinksMarker gsdoc</code>, which causes the linker to fixup
all links with the <code>rel</code> attribute set to
<code>gsdoc</code> rather than <code>dynamic</code>. In certain
situations you might want to force the linker to attempt to fixup
all links; you can run the linker with the <code>-FixupAllLinks
YES</code> option to cause this behaviour. As a special
exception, links which obviously are not to be fixed up, such as
links beginning with <i>mailto:</i> or <i>news:</i>, or links
without a name, are never fixed up.
<h4>How links are fixed up </h4>
When the HTML linker encounters a link which needs to be fixed up
(say <code>&lt;a href="dest.html#location"&gt;</code>), it
searches the list of destination files for a file with the same
name as the destination file of the link. If no such file is
searches the relocation table for a destination file which
contains the <code>location</code> name. If no such file is
found, the HTML linker emits a warning, and replaces the link in
the file with a link to the pathless filename. In the example, it
would simply emit <code>&lt;a
href="dest.html#location"&gt;</code>. If the destination file is
found in the list, instead, the HTML linker replaces the link with
the full path to the destination file on disk. For example, if
there is a <code>/home/nicola/Doc/dest.html</code> destination
file, the HTML linker will fixup the link to be <code>&lt;a
href="/home/nicola/Doc/dest.html#location"&gt;</code> (as a
special exception, if there is a path mapping which matches the
path to the destination file, it's applied to the path in the
the file with a link to the destination without the filename. In
the example, it would simply emit <code>&lt;a
href="#location"&gt;</code>. If the destination file is found in
the list, instead, the HTML linker replaces the link with the full
path to the destination file on disk. For example, if - according
to the relocation table, the file
<code>/home/nicola/Doc/dest.html</code> contains the name
<code>location</code>, the HTML linker will fixup the link to be
<code>&lt;a href="/home/nicola/Doc/dest.html#location"&gt;</code>
(as a special exception, if there is a path mapping which matches
the path to the destination file, it's applied to the path in the
link. See below for a detailed explanation of path mappings).
It's important to notice that you must have unique filenames for
It's important to notice that you must have unique link names for
the linker to work properly. For example, if you have two
different destination files with the same name, say
<code>NSObject.html</code>, then a link to
<code>NSObject.html</code> is likely not to be properly resolved
because the linker has no way to know if you meant the link to
point to the first or the second destination file! You should
choose filenames better to more uniquely specify their contents,
for example <code>NSObject_class.html</code> and
<code>NSObject_protocol.html</code> if the first file documents
the NSObject class and the second one the NSObject protocol. Then
all links will clearly refer to one file or the other one, and no
confusion will arise. If there are multiple destination files for
a link, the linker will guess which one is the right one, and that
might not give the desired result.
different destination files containing the same name, say
<code>NSObject.html</code> and <code>NSString.html</code> both
containing the name <code>init</code>, then the linker can't
resolve <code>&lt;a href="#init"&gt;</code>, because it has no way
to know if you meant the link to point to the first or the second
destination file! You should choose names better so that they
uniquely specify what they represent contents, for example
<code>NSObject_i_init</code> and <code>NSString_i_init</code> if
the first link is in the place documenting the <code>-init</code>
method of the NSObject class and the second one the one of the
NSString class. Then all links will clearly refer to one place or
the other one, and no confusion will arise. If there are multiple
destination files for a link, the linker will guess which one is
the right one, and that might not give the desired result.
<h4>How links are checked</h4>
When a link is fixed up, the linker also performs some checking
that the link is correct. The first basic check, which is always
performed, is that the destination file can be resolved. A
warning is emitted if the destination file can't be resolved - in
this case (see above) the links can not even be fixed up, and its
path is left unspecified. Then, if the destination file was found
and if the linker was run with the option <code>-CheckLinks
YES</code> (note that this option is the default, so the second
check is normally performed unless you pass <code>-CheckLinks
NO</code> to the linker), the linker will actually check that the
link can be resolved. In practice, the linker checks that the
destination file actually contains the specified named section.
For example, consider the link <code>&lt;a
href="dest.html#location"&gt;</code>. This link points to the
name <code>location</code> inside the <code>dest.html</code> file.
When checking the link, the linker will read the destination file
<code>dest.html</code>, and parse it looking for an anchor tag
with a name attribute set to <code>location</code> - in practice
something like <code>&lt;a name="location"&gt;</code>. If it does
not find it, it emits a warning. If you already know that all
links can be resolved, you can run the linker with
<code>-CheckLinks NO</code> skipping the parsing of destination
files in order to get peak performance.
When a link is fixed up, the linker implicitly checks that the link
is correct, because if the link name can't be found in the relocation
tables, a warning is issued.
<h4>Path mappings</h4>
@ -214,12 +302,12 @@
For example, if you have the path mapping explained above, and if
the linker is fixing up the link <code>&lt;a
href="hi.html"&gt;</code>, where the destination file is
href="hi.html#nicola"&gt;</code>, where the destination file is
<code>/opt/doc/base/nicola/hi.html</code>, then the destination
path matches the path mapping for <code>/opt/doc/base</code>, so
the path mapping is applied and the link is fixed up to be
<code>&lt;a href="/Base/nicola/hi.html"&gt;</code> rather than
<code>&lt;a href="/opt/doc/base/nicola/hi.html"&gt;</code> as it
<code>&lt;a href="/Base/nicola/hi.html#nicola"&gt;</code> rather than
<code>&lt;a href="/opt/doc/base/nicola/hi.html#nicola"&gt;</code> as it
would normally have been without the path mapping.
<h4>Specifying path mappings</h4>
@ -266,33 +354,38 @@
Each of the options beginning with a single hypen (<code>-</code>)
require an argument, as in
<pre>
HTMLLinker Documentation -CheckLinks YES --Destinations Documentation
HTMLLinker Documentation -LinksMarker gsdoc -d Documentation
</pre>
which sets <code>CheckLinks</code> to <code>YES</code>. The options
might be anywhere on the command line. Options which do not begin
with a single hypen (such as <code>--help</code>) do not require
an argument, as in
which sets <code>LinksMarker</code> to <code>gsdoc</code>. The
options might be anywhere on the command line. Options which do
not begin with a single hypen (such as <code>--help</code>) do not
require an argument, as in
<pre>
HTMLLinker --help
</pre>
<h4>-CheckLinks</h4>
<h4>-d</h4>
If set to <code>YES</code> (the default) causes the linker,
whenever it fixes up a link, to check that the destination file
actually contains the target <code>&lt;a name=""&gt;</code> tag.
(bug - does not manage yet the <code>id</code> attribute as per
newer HTML specifications). You might set it to <code>NO</code>
if you already know all links are correct, this will give you a
performance improvement because the linker does not need to parse
destination files to check links then.
Followed by a destination HTML file, or a directory containing
destination HTML files.
<h4>-l</h4>
Followed by a relocation file, or a directory containing relocation files.
<h4>-FixupAllLinks</h4>
If set to <code>NO</code> (the default) only links containing the
<code>rel</code> attribute set to <code>dynamical</code> are fixed
up in the input files. If set to <code>YES</code>, all links are
fixed up.
<code>rel</code> attribute set to <code>dynamic</code> (or
whatever specified as <code>LinksMarkers</code>)are fixed up in
the input files. If set to <code>YES</code>, all links are fixed
up.
<h4>-LinksMarker</h4>
If set (and if <code>FixupAllLinks</code> is <code>NO</code>),
only links with the <code>rel</code> attribute set to its value
are processed. By default it is set to <code>dynamic</code>.
<h4>-PathMappings</h4>
@ -319,19 +412,11 @@
Prints the version and exits.
<h4>--Destinations</h4>
Ends the list of input files and starts the list of destination
files. All files (or directories) on the command line appearing
before <code>--Destinations</code> are considered input files,
while all files (or directories) after <code>--Destinations</code>
are considered destination files.
<hr>
<address><a href="mailto:n.pero@mi.flashnet.it">Nicola Pero</a></address>
<!-- Created: Fri Jan 4 08:30:07 GMT 2002 -->
<!-- hhmts start -->
Last modified: Fri Jan 4 11:44:15 GMT 2002
Last modified: Sun Jan 6 22:54:58 GMT 2002
<!-- hhmts end -->
</body>
</html>