Comparing XML namespace names for equivalence

Don Box's Spoutlet

Syndication

I recently asked a question internally about how XML namespace names are tested for equivalence.
 
Specifically, if a piece of XML software encountered the following two XML elements in the wild:
 
<Foo xmlns="http://example.com/hello world/" />
<Foo xmlns="http://example.com/hello%20world/" />
 
would that software consider their names to be the same or different?
 
The default Microsoft XML parsers don't normalize, so unless the software above the parser does something special, the names are different.
 
FWIW, our XML serialization stacks don't normalize, so they also see the two names as different.
 
I haven't had time to see what our XSLT engine(s), our XSD validators, or the XMLDT in SQL do.
 
I recall that sometime during the last millennium, there was a big brouhaha over the use of relative URI (specifically in ADO recordsets) that I could swear resulted in a clarification of this, but alas, I can't find it (even with that other search engine).
 
I'm curious what the state of the practice is around this? 
 
Feel free to comment here or send me private email (dbox @ the usual place).

Posted Feb 19 2006, 05:22 PM by don-box

Comments

David Wragg wrote re: Comparing XML namespace names for equivalence
on 02-19-2006 1:14 PM
From the XML Namespaces recommendation <http://www.w3.org/TR/REC-xml-names/#dt-identical>:
> [Definition:] URI references which identify
> namespaces are considered identical when they
> are exactly the same character-for-character.
> Note that URI references which are not
> identical in this sense may in fact be
> functionally equivalent. Examples include URI
> references which differ only in case, or
> which are in external entities which have
> different effective base URIs.

That seems pretty definitive. Sounds like your parsers and XML serialization stacks are getting it right.
Derek Denny-Brown wrote re: Comparing XML namespace names for equivalence
on 02-19-2006 2:46 PM
The accepted wisdom for namespaces is to treat them as strings. Most (all?) parsers do absolutely nothing to validate that a namespace actually conforms to the rules for a proper URI. As the xsi:schemaLocatation attribute is defined as a space seperated list, spaces in a namespace are highly discouraged. I haven't looked into this recently, but I also seem to remember that there is some subtle variation regarding whether parsers trim leading/trailing whitespace off the xmlns attribute's value or not, so I have always recommended against having any whitespace in the namespace, anywhere.
Sam Ruby wrote re: Comparing XML namespace names for equivalence
on 02-19-2006 5:46 PM
Note that the portion of the spec that David quotes says character for character, which means that if you were to substitute the XML encoded numeric entity #x20; for the URI encoding for this character, the namespaces *are* to be treated as identical.
Clemens Vasters wrote re: Comparing XML namespace names for equivalence
on 02-20-2006 12:23 AM
Namespace is URI typed and therefore "character by character" should respect RFC3986, Sect 2. "%20" is a single character as per these rules. These rules are really the only significant difference between URI and "string" when it comes to using URIs purely as identifiers. I agree with Sam.
David Wragg wrote re: Comparing XML namespace names for equivalence
on 02-20-2006 2:09 AM
Clemens,

I can see where you are coming from, but my interpretation was different. I think "character-by-character" is referring to the sequence of characters after XML attribute value normalization (this is Sam's point), but without any consideration of URI syntax. I guess this shows that a range of interpretations are possible. I now appreciate why Don asked about the state of practice!

With that said, I'll venture into spec-lawyer territory:

- RFC3986, section 2 describes a "Comparison Ladder". This includes the method you suggest ("6.2.2.2. Percent-Encoding Normalization") but I do not see a clear reason to favour that over "6.2.1. Simple String Comparison".

- The original "Namespaces in XML" recommendation refers to RFC2396, which RFC3986 superseded. RFC2396 is even less specific about URI equivalence than RFC3986, and discourages percent unescaping except when breaking the URI into its component parts (2.4.2). I think that the authors of the namespaces recommendation were aware that equality of URIs is a problem area, and they sought to avoid the issues entirely with the phrase "character-by-character". Allowing percent unescaping takes you back into that minefield: For instance, if unescaping "%20" to " " is ok, what about unescaping "%2f" to "/"?

- Finally, if you take the "Namespaces in XML 1.1" recommendation into account (which not everyone will do), it goes into much greater detail about comparing namespace IRIs, and contains a number of examples. Returning to the comparison ladder from RFC3986, the only option consistent with these examples seems to be "6.2.1. Simple String Comparison" (though I am not at all confident that I fully understand the consequences of IRIs vs. URIs).
David Wragg wrote re: Comparing XML namespace names for equivalence
on 02-20-2006 4:51 AM
This might be the result of the brouhaha:
http://www.w3.org/2000/09/xppa

(It's referenced from the "Namespaces in XML 1.1" recommendation, which says that relative namespace URIs are deprecated.)
Dominic Cronin wrote re: Comparing XML namespace names for equivalence
on 04-29-2006 10:17 AM
URL Escaping isn't the only issue here. How do you deal with quote characters?

http://www.dominic.cronin.nl/weblog/archive/2006/04/29/xml-namespace-weirdness-in-msxml4

Add a Comment

(required)  
(optional)
(required)  
Remember Me?