Updated 21/08/2007: Added workaround
Updated 02/08/2008: Moved workaround at the top

The workaround

One way to have consistent gsiftp URIs with both globus-url-copy and the CoG kit is to use // for absolute paths and /~/ for relative paths. They should work with both clients. What a URL with just one slash points to still depends on which client you use, so you should avoid them if you can.

The problem

Globus's GridFTP has become the GGF standard for transfering files in a Grid enviroment. It is mainly an extension of FTP that is able to use GSI (Grid Security Infrastructure) authentication.

When using protocols such as FTP or HTTP, it is quite natural to use the URI (URL) to refer to a file. Even when FTP is considered separately from the Web (i.e. even if clicking on an FTP URL in a web browser didn't work), the concept of a URI helps a lot to address files. Similarly, I'd like my applications to be able to keep track of the files stored on GridFTP servers using URIs. There is some Globus tool support for using GridFTP URIs (prefixed with gsiftp://), in particular in globus-url-copy (which is a generic tool to copy a file from one URL to another URL) and in the Java CoG kit (which provides a Java implementation of much of the Globus Toolkit, and even more).
Sadly, using gsiftp URIs is just not possible.

Not only the gsiftp:// URIs are not formally defined in the GGF standard[1] (and just barely in the globus-url-copy documentation), but there a no fewer than 3 ways of interpreting the same URI!

The default globus-url-copy format (gsiftp://host/absolute-path/file).

In this case, the path refers to the absolute path on the server. A URI to a file in the home directory ($HOME/testfile) can be written like this:

  • gsiftp://host/home/username/testfile

(provided that whoever uses it knows that $HOME is /home/username), or

  • gsiftp://host/~/testfile

The main problem is that it is counter-intuitive when one has the FTP URI format in mind.

The RFC 1738 way (similar to FTP URIs), using globus-url-copy -rp .

The standard for FTP URIs says that the path should be relative to the initial path where FTP server logs in the client. For example, ftp://host/path1/path2 should perform the equivalent of cd path1 and cd path2. The vast majority of FTP servers set the default location to the home directory. The same RFC says that if you want an absolute path from that, the first / (root) should be encoded as %2f. Therefore, using the -rp option of globus-url-copy, the following two URIs should refer to the same file:

  • gsiftp://host/testfile, and
  • gsiftp://host/%2fhome/username/testfile

The Java CoG format.

This one is also relative, like the RFC 1738 format, but uses // (two slashes) instead of /%2f to designate the root directory. For example: gsiftp://host//home/username/testfile

Conclusions

The problems really start if use both URIs that have an absolute path and others that have a relative path. For absolute paths, formats 1 and 3 behave more or less in the same manner (at least, URIs written using // would work with 1 and 3); for relative paths, formats 2 and 3 behave in the same manner, but differently to format 1. Since some of our files are at absolute path locations and others are at relative path locations, and since we'd like our application to be partly using globus-url-copy and partly using the Java API (of the Java CoG), using gsiftp:// URIs becomes a bit tricky...

Giving 3 possible interpretations for a given URI spoils a bit the point of the identifier. This is just unusable.
What I find shocking is that these three different interpretations have actually been produced within a single project: Globus. If Grid interoperability is not achieved within a single project, how can it ever work across several of them?

Note

[1] The GGF standard mentions URLs that could be presented to a server, but the context of use is not quite clear.