The GNUstep HTML Linker

Introduction

What the HTML linker does

The GNUstep HTML linker is able to fixup links from one HTML document to other HTML ones. By link we mean the standard <a href="NSString.html"> tag. By fixing up a link we mean to modify the path in the href so that it points to the actual file on disk. For example, if you have the file NSString.html in the directory /home/nicola/Doc, when the linker fixes up the <a href="NSString.html"> link, it will replace it with <a href="/home/nicola/Doc/NSString.html">.

Example

Say that you have a collection of HTML files, all in the same directory, with working links from one file to the other one. Now you move the files around, scattering them in many directories - of course the links no longer work! By running the HTML linker on the files, the HTML linker will modify all links inside the files, resolving each of them to point to the actual full path of the required file. The links will work again.

Real world usage of the linker

In the real world, it's more complicated than this because you normally put the HTML files across different directories from the very beginning. The HTML linker becomes helpful because you can create links between these files as if they were in the same directory ... and then - only at the end - run the HTML linker to actually fixup the links and make them work. If you move around the files or mess in any way with their paths, you can always fixup the links afterwards by rerunning the linker - you don't need to regenerate the HTML files.

Specification

Input and destination files

The HTML linker uses input files and destination files. Both type of files are always supposed to be HTML files. Input files are modified by the linker during the run (they have their links fixed up), while destination files are only read (and sometime not even read). When the HTML linker is run, it first prepares the list of input files and destination files from the arguments on the command line. Then, it reads each input file from disk in turn. It examines all links in the file, and replaces the file with a new version of the file where all links have been fixed up.

Specifying input and destination files

All command line arguments which do not begin with a hypen (-), and which are not the values of defaults (for example, not the YES in -CheckLinks YES, because that is the value of the default -CheckLinks), are interpreted as input files if they come before the --Destinations argument, and as destination files if they come after it. If a directory is specified as an input (or destination) file, the linker will recurse into the directory and add to the list of input (or destination) files all files in the directory (and in the directory's subdirectories, no matter how deeply nested) which have one of the following extensions: .html, .HTML, .htm or .HTM. The --Destinations argument ends the list of input files, and is followed by the list of destination files. A typical invocation of the linker might be as follows:
      HTMLLinker myfile.html --Destinations /usr/GNUstep/System/Documentation
    
this invokes the linker with a single input file, myfile.html, and all the HTML files inside the /usr/GNUstep/System/Documentation directory as destination files.

What is a link

A link is an anchor tag with and href, such as <a href="dest.html#location">. The destination file of the link is the file specified in the href; dest.html in the example. If the link contains a path to the file, for example <a href="/nicola/Documentation/dest.html#location">, the path is ignored, so this link is considered by the linker to be exactly the same as <a href="dest.html#location>.

Which links are fixed up

Normally, the linker will only fixup links which have the rel attribute set to dynamical, as in the following example: <a href="nicola.html" rel="dynamical">. In this way, you can specify in your HTML document which links you want to be fixed up, and which you don't want to be. But in certain situations you might want to force the linker to attempt to fixup all links; you can run the linker with the -CheckAllLinks YES option to cause this behaviour. As a special exception, links which obviously are not to be fixed up, such as links beginning with mailto: or news:, and links without a filename (which means they refer to the same file they are in) are never fixed up.

How links are fixed up

When the HTML linker encounters a link which needs to be fixed up (say <a href="dest.html#location">), it searches the list of destination files for a file with the same name as the destination file of the link. If no such file is found, the HTML linker emits a warning, and replaces the link in the file with a link to the pathless filename. In the example, it would simply emit <a href="dest.html#location">. If the destination file is found in the list, instead, the HTML linker replaces the link with the full path to the destination file on disk. For example, if there is a /home/nicola/Doc/dest.html destination file, the HTML linker will fixup the link to be <a href="/home/nicola/Doc/dest.html#location"> (as a special exception, if there is a path mapping which matches the path to the destination file, it's applied to the path in the link. See below for a detailed explanation of path mappings). It's important to notice that you must have unique filenames for the linker to work properly. For example, if you have two different destination files with the same name, say NSObject.html, then a link to NSObject.html is likely not to be properly resolved because the linker has no way to know if you meant the link to point to the first or the second destination file! You should choose filenames better to more uniquely specify their contents, for example NSObject_class.html and NSObject_protocol.html if the first file documents the NSObject class and the second one the NSObject protocol. Then all links will clearly refer to one file or the other one, and no confusion will arise. If there are multiple destination files for a link, the linker will guess which one is the right one, and that might not give the desired result.

How links are checked

When a link is fixed up, the linker also performs some checking that the link is correct. The first basic check, which is always performed, is that the destination file can be resolved. A warning is emitted if the destination file can't be resolved - in this case (see above) the links can not even be fixed up, and its path is left unspecified. Then, if the destination file was found and if the linker was run with the option -CheckLinks YES (note that this option is the default, so the second check is normally performed unless you pass -CheckLinks NO to the linker), the linker will actually check that the link can be resolved. In practice, the linker checks that the destination file actually contains the specified named section. For example, consider the link <a href="dest.html#location">. This link points to the name location inside the dest.html file. When checking the link, the linker will read the destination file dest.html, and parse it looking for an anchor tag with a name attribute set to location - in practice something like <a name="location">. If it does not find it, it emits a warning. If you already know that all links can be resolved, you can run the linker with -CheckLinks NO skipping the parsing of destination files in order to get peak performance.

Path mappings

Path mappings are an additional feature of the HTML linker which can be used when exporting documentation to be served by a web server. If you are not putting your documentation on a web server but simply reading it from the filesystem, then you don't need the path mappings. The issue with exporting documentation to a web server is that you refer to files using paths which are not necessarily the same paths where the files are on disk. For example, suppose that you have some HTML documentation in /opt/doc/base and some other HTML documentation in /opt/doc/gui. The HTML files in the two documentation directories refer to each other. You can run the HTML linker and fixup all links, and we are happy. But now suppose that you set up a web server; the web server, for example, will serve URLs beginning with /Base (meaning as in requests from a browser of the form http://www.server.org/Base) by taking files from /opt/doc/base, and URLs beginning with Gui by taking files from /opt/doc/gui. To fixup the links in this case, you need path mappings. A path mapping specifies that a certain directory on disk is to be referred in some different way in links. In the example, you would pass
     -PathMapping '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
    
to the linker. Each path mapping maps a path on disk to a virtual path. For example, it maps the path on disk /opt/doc/base to the virtual path /Base. Each time the linker fixes up a link, after finding the destination file, it checks the list of path mappings. If the path to the destination file begins with the path on disk of one of the path mappings, then that path on disk is replaced with the corresponding virtual path in the path to the destination file before the path to the destination file is written out in the link. For example, if you have the path mapping explained above, and if the linker is fixing up the link <a href="hi.html">, where the destination file is /opt/doc/base/nicola/hi.html, then the destination path matches the path mapping for /opt/doc/base, so the path mapping is applied and the link is fixed up to be <a href="/Base/nicola/hi.html"> rather than <a href="/opt/doc/base/nicola/hi.html"> as it would normally have been without the path mapping.

Specifying path mappings

On the command line
Each path mapping specifies a mapping of a path on disk to a web server alias. The first way to specify the mappings is on the command line, in the form of a dictionary argument to the -PathMappings, as in
     -PathMappings '{ "/opt/doc/base"="/Base"; "/opt/doc/gui"="/Gui"; }'
    
where /opt/doc/base and /opt/doc/gui are the paths on disk and /Base and /Gui are the corresponding web server URL paths.
In a path mappings file
The other way to specify mappings is to write them into a file, in the format of a dictionary, as, for example, in a file containing the following lines
      { 
        "/opt/doc/base"="/Base"; 
        "/opt/doc/gui"="/Gui"; 
      }
    
and then tell the linker to read the path mappings from that file, by giving the filename as option to the -PathMappingsFile. For example, if the file containing the mappings is called mappings, then you need to pass
      -PathMappingsFile mappings
    
to the linker to have it read mappings from the file.
Command line path mappings override file path mappings
Both command line path mappings and path mappings from a file can be used at the same time; in case of conflict, command line path mappings override path mappings from the file.

Summary of all the options

Each of the options beginning with a single hypen (-) require an argument, as in
      HTMLLinker Documentation -CheckLinks YES --Destinations Documentation
    
which sets CheckLinks to YES. The options might be anywhere on the command line. Options which do not begin with a single hypen (such as --help) do not require an argument, as in
      HTMLLinker --help
    

-CheckLinks

If set to YES (the default) causes the linker, whenever it fixes up a link, to check that the destination file actually contains the target <a name=""> tag. (bug - does not manage yet the id attribute as per newer HTML specifications). You might set it to NO if you already know all links are correct, this will give you a performance improvement because the linker does not need to parse destination files to check links then.

-FixupAllLinks

If set to NO (the default) only links containing the rel attribute set to dynamical are fixed up in the input files. If set to YES, all links are fixed up.

-PathMappings

If set to a dictionary, read the dictionary as path mappings. See above for more details of path mappings.

-PathMappingsFile

If set to a string, consider it to be the name of a file; read path mappings from that file. The file must contain the path mappings in the form of a dictionary. See above for more details on path mappings.

-Verbose

If set to YES prints some more messages than if set to NO (the default).

--help

Prints a quick explanation of the command line syntax and exits.

--version

Prints the version and exits.

--Destinations

Ends the list of input files and starts the list of destination files. All files (or directories) on the command line appearing before --Destinations are considered input files, while all files (or directories) after --Destinations are considered destination files.
Nicola Pero
Last modified: Fri Jan 4 11:44:15 GMT 2002