The single biggest problem in the WS space today is data and service versioning. I've been thinking about this problem for years now, and I finally came to an answer that is simple, straight-forward and plays by all the rules. The inspiration came from Henry Thompson's presentation at the XSD workshop that happend last year. I also reread some pieces of the XSD spec, where he dropped some hints about this model. My current, and I hope final, solution to this problem grew from there.
Most OO folks assume that an XML instance is associated with a single XSD. Most XML people do not. Rather an XML instance may be valid according to a whole range of XSDs. In the case of versioning, those XSDs are related in a simple way. They all share the same target namespace. Each new version may add new top-level constructs (elements, types, groups, etc.) and extend existing constructs with optional content. The first sort of change is okay because it doesn't change the transitive closure of any existing constructs so it won't break clients (with a little caveat that a wildcard matching the target namespace has to be assume match things that may be added in a future version). The second sort of change is okay because the extensions are optional. Instances created based on an earlier schema version don't have those elements, but that's okay. Any other changes to existing definitions require a new schema with a new target namespace.
The next problem is what to do about instances of a new version of the schema that are sent to a consumer built with an old version of the schema. In this case there may be extra elements that the consumer didn't know about when it was created. This case arises all the time with Web services, where a new service has to return data to the old client. The solution to this problem is for the client to ignore the extra data. Both the .NET XML-to-object mappers (XmlSerializer and DataContractSerializer) do this. JAXB 2.0 (which is used by JAX-WS) has an option to do this. In other words, if your code doesn't do this already, it will be able to soon.
Ah, but what about schema validation? If an application built with the older version of the schema attempts to validate data sent based on the newer version of the schema, there may be extra elements in the instance that will cause validation to fail. The first attempt to fix this involved adding a wildcard to accept this extra content and then to introduce either hierarchical or inline delimiters to work around the determinism constraint of schema. The goal was to stop the schema validation error from occurring. This solution leads to schemas and marshalers that are too clever by half and create real issues for interoperability between toolkits. Luckily, there is another solution.
The XSD spec does not define validity as a single boolean value. Nor does it say how a system has to react to validation issues. When you validate a document, the processor tells you whether validation was attempted for a given element. If it was attempted, it tells you what schema component was used, and whether the element was valid, invalid, or unknown (because it isn't in the schema). You can decide what to do with that information.
Most of the time, today, people build validation logic that treats anything other than valid elements as an error and throw an exception. But you could be more flexible. For instance, you could build a validator that, upon encountering an unknown element in a sequence, could simply ignore it and all others in the sequence until the parent element's closing tag. “Ignore“ could mean either don't throw an exceptions or it could mean actually filter that data out of the element stream. The important thing is that during validation, if you detect extra stuff, you let it slide. Of course, for the elements you do know about, you can validate their content up until you hit extra unexpected data. I have a .NET 2.0 implementation of this working now, hopefully I'll have time to clean it up and post it soon.
So, my model for versioning comes down to these points:
- A service can evolve it's contract in a controlled way without breaking clients
- Clients must assume that the contract they get is a snapshot in time and the service is free to evolve it's contract in a controlled way
- An application producing an XML instance should make sure it matches the schema that application is using
- An application consuming an XML instance should assume it matches the schema that application is using plus additional elements
- If an application consuming an XML instance wants to schema validate it, it should be forgiving in how it deals with unknown elements in the stream and should not simply throw exceptions
Really, this is simply another variation of Postel's law: be careful in what you produce and flexible with what you receive, which is the basis for all the successful distributed systems I know of.
I'm very happy with this model. I think it is pretty intuitive and it works with today's tools. It also doesn't attempt to twist instance around the XSD UPA requirement nor does it attempt to change the semantics of XSD or XSD validation. It only argues for a different reaction to issues that arise during validation, which is totally reasonable.
Posted
Apr 14 2006, 02:55 PM
by
tim-ewald