Tuesday, August 17, 2010

To URI or not to URI

I'm not an English native speaker, so don't blame me if this Shakespeare reference is not recognizable. Maybe "URI or not URI" would fit better for pronunciation...

Anyway, I'll brighten up the situation for anyone who doesn't know URIs: "URI" stands for Uniform Resource Identifier, and the "not to URI" part of the title refers to "URL", Uniform Resource Locator. Seriously, I don't know what the difference between them is supposed to be, just that they behave pretty differently in Java.
(And for anyone who thinks, "Well, so you do know the difference?" No, the Java classes are an implementation of a specification that I don't understand, and not even really know. I know, however, how the implementation behaves. If this behavior follows the original specification, which was not made for the Java language but presumably for the internet, I don't know.)

So, from my point of view, both URIs and URLs are meant to identify some sort of resource, like a network drive of some web site.
  • In Java, you can directly connect to a URL and read from and write to it. You can't do that for a URI, but you can convert between both forms (if the resource is legal in both representations).
  • The equals() and hashCode() methods of URL perform a name space lookup, making them very slow. URLs are therefore poor choices for map keys, for example for caching downloaded files.
So, it seems like a tied match. The first one is "obvious", you can see it in the API that URI can't directly open connections. equals() and hashCode() being slow is so subtle that there is an entry in FindBugs, an Eclipse plugin that helps to find common programming mistakes, which is how I learned about it.

The third difference I've experienced so far is very subtle and unexpected, and it is the real reason for this post. wow, about time to get to the point...

You can represent a file inside a Jar (or Zip) archive using a URL or URI; both work. You can also "resolve" a relative URL/URI against another one to get an absolute one. But the combination of both only works with URL!

jar:file:/path/to/archive.jar!/path/inside/archive/
jar:file:/path/to/archive.jar!/path/inside/archive.txt

are both legitimate URIs/URLs. Say, you want to resolve "hello.txt" against these, you expect:

jar:file:/path/to/archive.jar!/path/inside/archive/hello.txt
jar:file:/path/to/archive.jar!/path/inside/hello.txt

however, this works only with URLs. A URI won't be able to resolve and will just return "hello.txt"

I said that I recently reimplemented TreeProperties, and I did it based on URIs. As you might guess, resolving paths relative to some context is pretty important for a tree structure. Finding the bug was very annoying, because reading from a Jar file, or even multiple Jar files, is not something I have years of experience with. Fixing the bug was way easier once I sorted out its origin.

I'd like to end this post with another Shakespeare quote:

"But none come to mind. I hope I have at least fooled you with the quotation marks. Just be aware or the differences between Uniform Resource Locators and Identifiers"

5 comments:

Anonymous said...

I always thought URL/URI stood for Uniform Resource etc..

Anyway, bearing in mind that I havn't done much in java, I think the key difference can be ascertained from "Identifier" vs "Locator". URI only bothers with identifying and separating resources while URL allows means to access those resources.

Based entirely on blog entires ( ;) ), I'd say URL's are all you need for TreeProperties.

Silly Freak said...

You're right, I have no idea how that slipped in. I know that it stands for Uniform... I guess I was very tired when I wrote that

Yeah, but even though I'm a really object oriented programmer, I see no reason in terms of separation of concern. The separation of these constructs seem to cause more code duplication than it enables separation of concern. Besides, URLs are also totally suited for identification purposes.

In the end, they were, but as my system also adds support for different data types, it will more or less need both. So I asked the question: Which conversion is more "natural": Using URIs and converting to URLs, or the other way round? The rest of the story is my blog post ;)

nantuko84 said...

LandDropAction uses LAND_DROP_COUNTER to make sure land wasn't played this turn.

But if we look deeper into the code, PlayerImpl.getCounter(String name) creates counter using new EditableCounterImpl that creates count using new EditableProperty.
That means that null will be used as initialValue, right?

Next time, when if try to get counters in LandDropAction:
if(c.getCount() > 0) return false;
count.getValue() will return null and as we have

public int getCount() {
return count.getValue();
}

java will try to convert Integer to int, and as getValue() returns null, NullPointerException will occur.

Silly Freak said...

that's right; should be fixed now

nantuko84 said...

I have some experience with Scala. why are you planning to use it in laterna?
btw, idea scala plugin is much better and more stable.