gsiftp URI madness
Updated 21/08/2007: Added workaround
Updated 02/08/2008: Moved workaround at the top
One way to have consistent gsiftp URIs with both
globus-url-copy and the CoG kit is to use
// for absolute paths and
/~/ for relative paths. They should work with both clients. What a URL with just one slash points to still depends on which client you use, so you should avoid them if you can.
When using protocols such as FTP or HTTP, it is quite natural to use the URI (URL) to refer to a file. Even when FTP is considered separately from the Web (i.e. even if
clicking on an FTP URL in a web browser didn't work), the concept of a URI helps a lot to address files. Similarly, I'd like my applications to be able to keep track of the files stored on GridFTP servers using URIs. There is some Globus tool support for using GridFTP URIs (prefixed with
gsiftp://), in particular in globus-url-copy (which is a generic tool to copy a file from one URL to another URL) and in the Java CoG kit (which provides a Java implementation of much of the Globus Toolkit, and even more).
gsiftp URIs is just not possible.
The default globus-url-copy format (
In this case, the path refers to the absolute path on the server. A URI to a file in the home directory (
$HOME/testfile) can be written like this:
(provided that whoever uses it knows that
The main problem is that it is counter-intuitive when one has the FTP URI format in mind.
The RFC 1738 way (similar to FTP URIs), using
globus-url-copy -rp .
The standard for FTP URIs says that the path should be relative to the initial path where FTP server logs in the client. For example,
ftp://host/path1/path2 should perform the equivalent of
cd path1 and
cd path2. The vast majority of FTP servers set the default location to the home directory.
The same RFC says that if you want an absolute path from that, the first
/ (root) should be encoded as
Therefore, using the
globus-url-copy, the following two URIs should refer to the same file:
The Java CoG format.
This one is also relative, like the RFC 1738 format, but uses
// (two slashes) instead of
/%2f to designate the root directory.
For example: gsiftp://host//home/username/testfile
The problems really start if use both URIs that have an absolute path and others that have a relative path.
For absolute paths, formats 1 and 3 behave more or less in the same manner (at least, URIs written using
// would work with 1 and 3); for relative paths, formats 2 and 3 behave in the same manner, but differently to format 1.
Since some of our files are at
absolute path locations and others are at
relative path locations, and since we'd like our application to be partly using
globus-url-copy and partly using the Java API (of the Java CoG), using
gsiftp:// URIs becomes a bit tricky...
Giving 3 possible interpretations for a given URI spoils a bit the point of the identifier. This is just unusable.
What I find shocking is that these three different interpretations have actually been produced within a single project: Globus. If Grid interoperability is not achieved within a single project, how can it ever work across several of them?