Distributed Matter - Blog

To content | To menu | To search

Tag - specifications

Entries feed - Comments feed

Friday, July 27 2007

A look at WS-Notification

I'm having a look at the WS-Notification (WS-BaseNotification 1.3 OASIS Standard). I usually find reading some of these specs laborious (and little rewarding), but I'm trying to put some good will towards Web Services (in the WS-* specifications sense). However, shortly after beginning, I found a few caveats that would probably make it difficult for all compliant implementations to interoperate.

Here is an excerpt of the introduction to Section 3 (the NotificationConsumer interface):

WS-BaseNotification allows a NotificationConsumer to receive a Notification in one of two forms:
1. The NotificationConsumer MAY simply receive the “raw” Notification (i.e. the application-specific content).
2. The NotificationConsumer MAY receive the Notification data as a Notify message as described below.
[...]
When a Subscriber sends a Subscribe request message, it indicates which form of Notification is required (the raw Notification, or the Notify Message). The NotificationProducer MUST observe this component of the Subscription and use the form that has been requested, if it is able. If it does not support the form requested, it MUST fault.

At a first glance, a NotificationConsumer (i.e. the recipient of the notification) can be compliant with the WS-Notification standard so long as it can receive the message, whether-or-not it complies with the format described in the following pages. There are subsequent mentions of the raw format, but its use seems to imply the use of SOAP (in a context that uses MAYs and SHOULDs).

Later on, in Section 4.2 (NotificationProducer/Subscribe):

The NotificationProducer should specify via WSDL, policy assertions, meta-data or by some other means, the information it expects to be present in a ConsumerReference. If a ConsumerReference does not contain sufficient information, the NotificationProducer MAY choose to fault or it MAY choose to use out of band mechanisms to obtain the required information.

In addition, WS-Notification relies on WSRF (which more or less re-invents HTTP-based resources, but that's another story). The WSRF specification defines a set of accessors to get and set resource properties, but leaves the door open regarding how these should behave, especially when setting multiple properties in one request. Fair enough, HTTP leaves this responsibility to the applications that use it too. Interestingly, though, WS-Notification doesn't say much either about what should happen when using SetResourceProperties.

To sum this up, I think the core concepts of NotificationConsumer, NotificationProducer, etc. are sound, but the two excerpts produced above make me doubt it can actually achieve some sort of interoperability. It almost sounds like "do what you want so long as you use SOAP and WS-Addressing". I'm yet to be convinced that these two add any value for achieving the goals of Web Services (SOAP for security, maybe?).

Friday, December 8 2006

gsiftp URI madness

Updated 21/08/2007: Added workaround
Updated 02/08/2008: Moved workaround at the top

The workaround

One way to have consistent gsiftp URIs with both globus-url-copy and the CoG kit is to use // for absolute paths and /~/ for relative paths. They should work with both clients. What a URL with just one slash points to still depends on which client you use, so you should avoid them if you can.

The problem

Globus's GridFTP has become the GGF standard for transfering files in a Grid enviroment. It is mainly an extension of FTP that is able to use GSI (Grid Security Infrastructure) authentication.

When using protocols such as FTP or HTTP, it is quite natural to use the URI (URL) to refer to a file. Even when FTP is considered separately from the Web (i.e. even if clicking on an FTP URL in a web browser didn't work), the concept of a URI helps a lot to address files. Similarly, I'd like my applications to be able to keep track of the files stored on GridFTP servers using URIs. There is some Globus tool support for using GridFTP URIs (prefixed with gsiftp://), in particular in globus-url-copy (which is a generic tool to copy a file from one URL to another URL) and in the Java CoG kit (which provides a Java implementation of much of the Globus Toolkit, and even more).
Sadly, using gsiftp URIs is just not possible.

Not only the gsiftp:// URIs are not formally defined in the GGF standard[1] (and just barely in the globus-url-copy documentation), but there a no fewer than 3 ways of interpreting the same URI!

The default globus-url-copy format (gsiftp://host/absolute-path/file).

In this case, the path refers to the absolute path on the server. A URI to a file in the home directory ($HOME/testfile) can be written like this:

  • gsiftp://host/home/username/testfile

(provided that whoever uses it knows that $HOME is /home/username), or

  • gsiftp://host/~/testfile

The main problem is that it is counter-intuitive when one has the FTP URI format in mind.

The RFC 1738 way (similar to FTP URIs), using globus-url-copy -rp .

The standard for FTP URIs says that the path should be relative to the initial path where FTP server logs in the client. For example, ftp://host/path1/path2 should perform the equivalent of cd path1 and cd path2. The vast majority of FTP servers set the default location to the home directory. The same RFC says that if you want an absolute path from that, the first / (root) should be encoded as %2f. Therefore, using the -rp option of globus-url-copy, the following two URIs should refer to the same file:

  • gsiftp://host/testfile, and
  • gsiftp://host/%2fhome/username/testfile

The Java CoG format.

This one is also relative, like the RFC 1738 format, but uses // (two slashes) instead of /%2f to designate the root directory. For example: gsiftp://host//home/username/testfile

Conclusions

The problems really start if use both URIs that have an absolute path and others that have a relative path. For absolute paths, formats 1 and 3 behave more or less in the same manner (at least, URIs written using // would work with 1 and 3); for relative paths, formats 2 and 3 behave in the same manner, but differently to format 1. Since some of our files are at absolute path locations and others are at relative path locations, and since we'd like our application to be partly using globus-url-copy and partly using the Java API (of the Java CoG), using gsiftp:// URIs becomes a bit tricky...

Giving 3 possible interpretations for a given URI spoils a bit the point of the identifier. This is just unusable.
What I find shocking is that these three different interpretations have actually been produced within a single project: Globus. If Grid interoperability is not achieved within a single project, how can it ever work across several of them?

Note

[1] The GGF standard mentions URLs that could be presented to a server, but the context of use is not quite clear.