Updated for rewriting of the tool

git-svn-id: svn+ssh://svn.gna.org/svn/gnustep/libs/base/trunk@12013 72102866-910b-0410-8b05-ffd578937521
This commit is contained in:
nico 2002-01-07 01:09:37 +00:00
parent e626c6fab9
commit e50cbb77ed

View file

@ -13,164 +13,252 @@
The GNUstep HTML linker is able to fixup links from one HTML The GNUstep HTML linker is able to fixup links from one HTML
document to other HTML ones. By link we mean the standard document to other HTML ones. By link we mean the standard
<code>&lt;a href="NSString.html"&gt;</code> tag. By fixing up a <code>&lt;a href="NSString.html#DescriptionOfNSString"&gt;</code>
link we mean to modify the path in the <code>href</code> so that tag. By fixing up a link we mean to modify the path in the
it points to the actual file on disk. For example, if you have <code>href</code> so that it points to the actual file on disk.
the file <code>NSString.html</code> in the directory For example, if you the <code>DescriptionOfNSString</code>
<code>/home/nicola/Doc</code>, when the linker fixes up the location is in the file <code>NSStringOverview.html</code> in the
<code>&lt;a href="NSString.html"&gt;</code> link, it will replace directory <code>/home/nicola/Doc</code>, when the linker fixes up
it with <code>&lt;a the <code>&lt;a
href="/home/nicola/Doc/NSString.html"&gt;</code>. href="NSString.html#DescriptionOfNSString"&gt;</code> link, it
will replace it with <code>&lt;a
href="/home/nicola/Doc/NSStringOverview.html#DescriptionOfNSString"&gt;</code>.
Please note that when fixing up the link, the linker modifies both
the path and the file name that the link points to, but not the
location inside the file (the <code>DescriptionOfNSString</code>
in the example).
<h4>Example</h4> <h4>Practical Usage of the linker</h4>
Say that you have a collection of HTML files, all in the same The typical usage of the linker is with maintaining
directory, with working links from one file to the other one. Now cross-references in software documentation. You need to establish
you move the files around, scattering them in many directories - some sort of convention used by all your software documentation
of course the links no longer work! By running the HTML linker on for the link names. For example, suppose that your documentation
the files, the HTML linker will modify all links inside the files, is about C libraries. For each C function, you might decide to
resolving each of them to point to the actual full path of the tag its documentation in the files with the name
required file. The links will work again. <code>function$function_name</code>. For example, the place in
the doc where it documents the <code>start_library()</code>
function would have the HTML tag <code>&lt;a
name="function$start_library"&gt;</code>. Having established this
convention, in any HTML file in your documentation in which you
want to create a link to the documentation for the
<code>start_library()</code> function, you use the code
<code>&lt;a rel="dynamic"
href="#function$start_library"&gt;</code> (please note that you
ignore the problem of locating the actual file which contains the
documentation for the <code>start_library()</code> function, that
is precisely what the linker will do for you). Whenever you
install the documentation for a new project, you first create a
relocation file for the project documentation, by running
<pre>
HTMLLinker -BuildRelocationFileForDir Documentation
</pre>
if for example the project documentation is in the
<code>Documentation</code> subdirectory. This will create a
<code>Documentation/table.htmlink</code> file, which contains a
list of all names found in the project documentation, and for each
of them, the file in which it's found. Then, you install the
project documentation (say for example that it's installed into
<code>/opt/gnustep/Local/Documentation/MyProject</code>), and once
it's installed, you can run the linker to update all links so that
they point to the actual files
<pre>
HTMLLinker /opt/gnustep/Local/Documentation/MyProject \
-l /opt/gnustep/Local/Documentation/MyProject \
-l /opt/gnustep/Local/Documentation/MyOtherProject
</pre>
This will fixup all links in <code>MyProject</code>'s HTML files
by using the relocation files of both <code>MyProject</code> and
<code>MyOtherProject</code>, so all links to anything which is
documented inside those files will be generated correctly.
<h4>Real world usage of the linker</h4> <h4>Usage of the tool with autogsdoc</h4>
In the real world, it's more complicated than this because you You can use the tool with documentation generated by autogsdoc to
normally put the HTML files across different directories from the perform the linking (or to relink it). Make sure to use the option
very beginning. The HTML linker becomes helpful because you can <code>-LinksMarker gsdoc</code> because autogsdoc marks the links
create links between these files as if they were in the same to be fixed up by the linker by using <code>rel="gsdoc"</code>.
directory ... and then - only at the end - run the HTML linker to
actually fixup the links and make them work. If you move around
the files or mess in any way with their paths, you can always
fixup the links afterwards by rerunning the linker - you don't
need to regenerate the HTML files.
<h3>Specification</h3> <h3>Specification</h3>
<h4>Input and destination files</h4> <h4>Modes of operation</h4>
The HTML linker works in two phases:
The HTML linker uses <em>input files</em> and <em>destination <ul>
files</em>. Both type of files are always supposed to be HTML
files. Input files are modified by the linker during the run
(they have their links fixed up), while destination files are only
read (and sometime not even read). When the HTML linker is run,
it first prepares the list of input files and destination files
from the arguments on the command line. Then, it reads each input
file from disk in turn. It examines all links in the file, and
replaces the file with a new version of the file where all links
have been fixed up.
<h4>Specifying input and destination files</h4> <li> The first (called <i>generation of the relocation
table</i>) preprocesses a given set of HTML files so that it can
be the destination of links. It builds a relocation table for
the given set of HTML files. This relocation table simply maps
all names (as in <code>&lt;a name="xxx"&gt;</code>) in the files
to the file in which the name is found. The HTML files are not
touched. The linker is able to merge this dynamically generated
relocation table with pregenerated relocation tables loaded from
files (called <i>relocation files</i>).
</li>
<li> The second (called <i>linking</i>) links a given file to
the available HTML files on disk, by using the relocation table
to modify the HTML links in the file so that they point to
existing files.
</li>
</ul>
The HTML linker can also be run in a special mode, to generate a
relocation file for later reuse. In this mode, the HTML linker
will build the relocation table for all files in a directory, then
save the relocation table into a <code>table.htmlink</code> file
in that directory for later reuse.
There are three kinds of files:
<ul>
<li>
<em>input files</em>: these are HTML files which are modified
as a consequence of linking; they have their links fixed up.
</li>
<li>
<em>destination files</em>: these are HTML files which are
read to produce relocation tables.
</li>
<li>
<em>relocation files</em>: these files are not HTML files -
they are only created and read by the linker (unless you have
a tool which can manage them), and are in a specific - very
simple - format. They are used to save relocation information
for later reuse, so that the linker can run faster. Normally,
they have a <code>.htmlink</code> extension.
</li>
</ul>
<h4>Linker behaviour</h4>
The linker keeps a main relocation table, which is empty at the
beginning. When run, the linker performs the following steps:
<ol>
<li>
the linker reads and parses all relocation files specified on
the command line, and merges the relocation tables found there
into the main relocation table.
</li>
<li>
the linker reads and parses all destination files specified on
the command line, and builds a relocation table for them,
merging it into the main relocation table.
</li>
<li>
if any input files are specified on the command line, the
linker links the files using the relocation table.
</li>
</ol>
<h4>Specifying input, destination and relocation files</h4>
All command line arguments which do not begin with a hypen All command line arguments which do not begin with a hypen
(<code>-</code>), and which are not the values of defaults (for (<code>-</code>), and which are not the values of defaults (for
example, not the <code>YES</code> in <code>-CheckLinks YES</code>, example, not the <code>YES</code> in <code>-Warn YES</code>,
because that is the value of the default because that is the value of the default <code>-Warn</code>), are
<code>-CheckLinks</code>), are interpreted as input files if they interpreted as input files. Each destination file is specified by
come before the <code>--Destinations</code> argument, and as using a <code>-d</code> option, and each relocation file by using
destination files if they come after it. If a directory is a <code>-l</code> option. If a directory is specified as an input
specified as an input (or destination) file, the linker will (or destination) file, the linker will recurse into the directory
recurse into the directory and add to the list of input (or and add to the list of input (or destination) files all files in
destination) files all files in the directory (and in the the directory (and in the directory's subdirectories, no matter
directory's subdirectories, no matter how deeply nested) which how deeply nested) which have one of the following extensions:
have one of the following extensions: <code>.html</code>, <code>.html</code>, <code>.HTML</code>, <code>.htm</code> or
<code>.HTML</code>, <code>.htm</code> or <code>.HTM</code>. The <code>.HTM</code>. If a directory is specified as a relocation
<code>--Destinations</code> argument ends the list of input files, file, the linker will add to the list of relocation files all
and is followed by the list of destination files. A typical files in the directory which have the extension
invocation of the linker might be as follows: <code>.htmlink</code>. A typical invocation of the linker is as
follows:
<pre> <pre>
HTMLLinker myfile.html --Destinations /usr/GNUstep/System/Documentation HTMLLinker -BuildRelocationFileForDir Doc
</pre> </pre>
this invokes the linker with a single input file, Builds a relocation file for the documentation in the
<code>myfile.html</code>, and all the HTML files inside the directory <code>Doc</code>. After this has been done, the
<code>/usr/GNUstep/System/Documentation</code> directory as directory <code>Doc</code> can be used as a <code>-l</code>
destination files. argument.
<pre>
HTMLLinker test.html -l Doc
</pre>
Links the file <code>test.html</code> using the relocation file
just generated in the <code>Doc</code> directory.
<h4>What is a link</h4> <h4>What is a link</h4>
A link is an anchor tag with and <code>href</code>, such as A link is an anchor tag with and <code>href</code>, such as
<code>&lt;a href="dest.html#location"&gt;</code>. The destination <code>&lt;a href="dest.html#location"&gt;</code>. The destination
file of the link is the file specified in the <code>href</code>; file of the link is the file specified in the <code>href</code>;
<code>dest.html</code> in the example. If the link contains a <code>dest.html</code> in the example. The destination file is
path to the file, for example <code>&lt;a ignored by the linker; the name of the link (which is everything which
href="/nicola/Documentation/dest.html#location"&gt;</code>, the follows the <code>#</code>) is used to perform the linking.
path is ignored, so this link is considered by the linker to be
exactly the same as <code>&lt;a
href="dest.html#location&gt;</code>.
<h4>Which links are fixed up</h4> <h4>Which links are fixed up</h4>
Normally, the linker will only fixup links which have the Normally, the linker will only fixup links which have the
<code>rel</code> attribute set to <code>dynamical</code>, as in <code>rel</code> attribute set to <code>dynamic</code>, as in the
the following example: <code>&lt;a href="nicola.html" following example: <code>&lt;a href="nicola.html"
rel="dynamical"&gt;</code>. In this way, you can specify in your rel="dynamic"&gt;</code>. In this way, you can specify in your
HTML document which links you want to be fixed up, and which you HTML document which links you want to be fixed up, and which you
don't want to be. But in certain situations you might want to don't want to be. You can change the type of links to be fixed up
force the linker to attempt to fixup all links; you can run the by using the <code>-LinksMarker</code> options, as in
linker with the <code>-CheckAllLinks YES</code> option to cause <code>-LinksMarker gsdoc</code>, which causes the linker to fixup
this behaviour. As a special exception, links which obviously are all links with the <code>rel</code> attribute set to
not to be fixed up, such as links beginning with <i>mailto:</i> or <code>gsdoc</code> rather than <code>dynamic</code>. In certain
<i>news:</i>, and links without a filename (which means they refer situations you might want to force the linker to attempt to fixup
to the same file they are in) are never fixed up. all links; you can run the linker with the <code>-FixupAllLinks
YES</code> option to cause this behaviour. As a special
exception, links which obviously are not to be fixed up, such as
links beginning with <i>mailto:</i> or <i>news:</i>, or links
without a name, are never fixed up.
<h4>How links are fixed up </h4> <h4>How links are fixed up </h4>
When the HTML linker encounters a link which needs to be fixed up When the HTML linker encounters a link which needs to be fixed up
(say <code>&lt;a href="dest.html#location"&gt;</code>), it (say <code>&lt;a href="dest.html#location"&gt;</code>), it
searches the list of destination files for a file with the same searches the relocation table for a destination file which
name as the destination file of the link. If no such file is contains the <code>location</code> name. If no such file is
found, the HTML linker emits a warning, and replaces the link in found, the HTML linker emits a warning, and replaces the link in
the file with a link to the pathless filename. In the example, it the file with a link to the destination without the filename. In
would simply emit <code>&lt;a the example, it would simply emit <code>&lt;a
href="dest.html#location"&gt;</code>. If the destination file is href="#location"&gt;</code>. If the destination file is found in
found in the list, instead, the HTML linker replaces the link with the list, instead, the HTML linker replaces the link with the full
the full path to the destination file on disk. For example, if path to the destination file on disk. For example, if - according
there is a <code>/home/nicola/Doc/dest.html</code> destination to the relocation table, the file
file, the HTML linker will fixup the link to be <code>&lt;a <code>/home/nicola/Doc/dest.html</code> contains the name
href="/home/nicola/Doc/dest.html#location"&gt;</code> (as a <code>location</code>, the HTML linker will fixup the link to be
special exception, if there is a path mapping which matches the <code>&lt;a href="/home/nicola/Doc/dest.html#location"&gt;</code>
path to the destination file, it's applied to the path in the (as a special exception, if there is a path mapping which matches
the path to the destination file, it's applied to the path in the
link. See below for a detailed explanation of path mappings). link. See below for a detailed explanation of path mappings).
It's important to notice that you must have unique filenames for It's important to notice that you must have unique link names for
the linker to work properly. For example, if you have two the linker to work properly. For example, if you have two
different destination files with the same name, say different destination files containing the same name, say
<code>NSObject.html</code>, then a link to <code>NSObject.html</code> and <code>NSString.html</code> both
<code>NSObject.html</code> is likely not to be properly resolved containing the name <code>init</code>, then the linker can't
because the linker has no way to know if you meant the link to resolve <code>&lt;a href="#init"&gt;</code>, because it has no way
point to the first or the second destination file! You should to know if you meant the link to point to the first or the second
choose filenames better to more uniquely specify their contents, destination file! You should choose names better so that they
for example <code>NSObject_class.html</code> and uniquely specify what they represent contents, for example
<code>NSObject_protocol.html</code> if the first file documents <code>NSObject_i_init</code> and <code>NSString_i_init</code> if
the NSObject class and the second one the NSObject protocol. Then the first link is in the place documenting the <code>-init</code>
all links will clearly refer to one file or the other one, and no method of the NSObject class and the second one the one of the
confusion will arise. If there are multiple destination files for NSString class. Then all links will clearly refer to one place or
a link, the linker will guess which one is the right one, and that the other one, and no confusion will arise. If there are multiple
might not give the desired result. destination files for a link, the linker will guess which one is
the right one, and that might not give the desired result.
<h4>How links are checked</h4> <h4>How links are checked</h4>
When a link is fixed up, the linker also performs some checking When a link is fixed up, the linker implicitly checks that the link
that the link is correct. The first basic check, which is always is correct, because if the link name can't be found in the relocation
performed, is that the destination file can be resolved. A tables, a warning is issued.
warning is emitted if the destination file can't be resolved - in
this case (see above) the links can not even be fixed up, and its
path is left unspecified. Then, if the destination file was found
and if the linker was run with the option <code>-CheckLinks
YES</code> (note that this option is the default, so the second
check is normally performed unless you pass <code>-CheckLinks
NO</code> to the linker), the linker will actually check that the
link can be resolved. In practice, the linker checks that the
destination file actually contains the specified named section.
For example, consider the link <code>&lt;a
href="dest.html#location"&gt;</code>. This link points to the
name <code>location</code> inside the <code>dest.html</code> file.
When checking the link, the linker will read the destination file
<code>dest.html</code>, and parse it looking for an anchor tag
with a name attribute set to <code>location</code> - in practice
something like <code>&lt;a name="location"&gt;</code>. If it does
not find it, it emits a warning. If you already know that all
links can be resolved, you can run the linker with
<code>-CheckLinks NO</code> skipping the parsing of destination
files in order to get peak performance.
<h4>Path mappings</h4> <h4>Path mappings</h4>
@ -214,12 +302,12 @@
For example, if you have the path mapping explained above, and if For example, if you have the path mapping explained above, and if
the linker is fixing up the link <code>&lt;a the linker is fixing up the link <code>&lt;a
href="hi.html"&gt;</code>, where the destination file is href="hi.html#nicola"&gt;</code>, where the destination file is
<code>/opt/doc/base/nicola/hi.html</code>, then the destination <code>/opt/doc/base/nicola/hi.html</code>, then the destination
path matches the path mapping for <code>/opt/doc/base</code>, so path matches the path mapping for <code>/opt/doc/base</code>, so
the path mapping is applied and the link is fixed up to be the path mapping is applied and the link is fixed up to be
<code>&lt;a href="/Base/nicola/hi.html"&gt;</code> rather than <code>&lt;a href="/Base/nicola/hi.html#nicola"&gt;</code> rather than
<code>&lt;a href="/opt/doc/base/nicola/hi.html"&gt;</code> as it <code>&lt;a href="/opt/doc/base/nicola/hi.html#nicola"&gt;</code> as it
would normally have been without the path mapping. would normally have been without the path mapping.
<h4>Specifying path mappings</h4> <h4>Specifying path mappings</h4>
@ -266,33 +354,38 @@
Each of the options beginning with a single hypen (<code>-</code>) Each of the options beginning with a single hypen (<code>-</code>)
require an argument, as in require an argument, as in
<pre> <pre>
HTMLLinker Documentation -CheckLinks YES --Destinations Documentation HTMLLinker Documentation -LinksMarker gsdoc -d Documentation
</pre> </pre>
which sets <code>CheckLinks</code> to <code>YES</code>. The options which sets <code>LinksMarker</code> to <code>gsdoc</code>. The
might be anywhere on the command line. Options which do not begin options might be anywhere on the command line. Options which do
with a single hypen (such as <code>--help</code>) do not require not begin with a single hypen (such as <code>--help</code>) do not
an argument, as in require an argument, as in
<pre> <pre>
HTMLLinker --help HTMLLinker --help
</pre> </pre>
<h4>-CheckLinks</h4> <h4>-d</h4>
Followed by a destination HTML file, or a directory containing
destination HTML files.
<h4>-l</h4>
If set to <code>YES</code> (the default) causes the linker, Followed by a relocation file, or a directory containing relocation files.
whenever it fixes up a link, to check that the destination file
actually contains the target <code>&lt;a name=""&gt;</code> tag.
(bug - does not manage yet the <code>id</code> attribute as per
newer HTML specifications). You might set it to <code>NO</code>
if you already know all links are correct, this will give you a
performance improvement because the linker does not need to parse
destination files to check links then.
<h4>-FixupAllLinks</h4> <h4>-FixupAllLinks</h4>
If set to <code>NO</code> (the default) only links containing the If set to <code>NO</code> (the default) only links containing the
<code>rel</code> attribute set to <code>dynamical</code> are fixed <code>rel</code> attribute set to <code>dynamic</code> (or
up in the input files. If set to <code>YES</code>, all links are whatever specified as <code>LinksMarkers</code>)are fixed up in
fixed up. the input files. If set to <code>YES</code>, all links are fixed
up.
<h4>-LinksMarker</h4>
If set (and if <code>FixupAllLinks</code> is <code>NO</code>),
only links with the <code>rel</code> attribute set to its value
are processed. By default it is set to <code>dynamic</code>.
<h4>-PathMappings</h4> <h4>-PathMappings</h4>
@ -319,19 +412,11 @@
Prints the version and exits. Prints the version and exits.
<h4>--Destinations</h4>
Ends the list of input files and starts the list of destination
files. All files (or directories) on the command line appearing
before <code>--Destinations</code> are considered input files,
while all files (or directories) after <code>--Destinations</code>
are considered destination files.
<hr> <hr>
<address><a href="mailto:n.pero@mi.flashnet.it">Nicola Pero</a></address> <address><a href="mailto:n.pero@mi.flashnet.it">Nicola Pero</a></address>
<!-- Created: Fri Jan 4 08:30:07 GMT 2002 --> <!-- Created: Fri Jan 4 08:30:07 GMT 2002 -->
<!-- hhmts start --> <!-- hhmts start -->
Last modified: Fri Jan 4 11:44:15 GMT 2002 Last modified: Sun Jan 6 22:54:58 GMT 2002
<!-- hhmts end --> <!-- hhmts end -->
</body> </body>
</html> </html>