I often get asked to explain how XSD:s work and everytime I explain it in a different manner. I thought that if I document one way here I could later reference it for the rest of my life 🙂
I have below an example XSD that I will talk about. This is by no means a complete tutorial but an simple look into the world of XSD
XSD EXAMPLE
<?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema"> <element name="student"> <complexType> <sequence> <element name="name" type="string"/> <element name="pnr"> <simpleType> <restriction base="string"> <pattern value="[0-9]{2}[0-1][0-9][0-9][0-9][0-9a-z][0-9]{3}"/> </restriction> </simpleType> </element> <element name="address" type="string"/> <element name="zip" type="integer"/> <element name="city"> <simpleType> <restriction base="string"> <maxLength value="82"/> <minLength value="3"/> </restriction> </simpleType> </element> </sequence> </complexType> </element> </schema>
The root tag in this xsd is the element ‘student‘. This explains the name of the “highest” element in the XML. It also gives us the names of tags below the root element: name, pnr, address, zip and city. Let us look at each tag separately:
‘name‘ – This tag has to obey to the rules set in the built-in type: string. This pretty much include any text in any length (even empty)
‘pnr‘ – For this tag we cannot use an built in xsd type. We have to create our own. To do this we set a pattern value. This is a normal regular expression and in this case it demands that ‘pnr’ starts with 2 single digit integers followed by a binary digit, 3 single digit integers, 1 character that can be a single digit integer or a lowercase letter in the range from a to z. We end with 3 digits between 0 and 9.
‘address‘ – This element also has to obey by the built-in string type rules
‘zip‘ – This element has to obey by the integer type which is a little more stringent than string
‘city‘ – This element has to obey to the two built in restrictions minLength (3 characters) and maxLength (82 characters).
A few words about simpleType and complexType
When we need to create our own restrictions we need to encapsulate them into simpleType (single element) or complexType (multiple elements). complexType can also be used to describe whole structures as done in the example message with the student substructure that is encapsulated in a complexType
Matching XML example
<?xml version="1.0"?> <student> <name>Nikla</name> <pnr>1214567890</pnr> <address>Kungsgatan 1</address> <zip>12345</zip> <city>Gothenburg</city> </student>
Non matching XML example
<?xml version="1.0"?> <student> <name>Niklas</name> <pnr>121456-7890</pnr> <address>Kungsgatan 1</address> <zip>123 45</zip> <city>Gothenburg</city> </student>
It’s often more interesting to talk about non matching examples since they give us a more in-depth look of the problems you could encounter while creating your XSD. Here the ‘pnr’ element contains a dash (‘-‘) which is not defined in our pattern. In our pattern only letters and numbers are allowed. The ‘zip’ element is also false since we demand a value of type ‘integer‘. The whitespace between ‘3’ and ‘4’ makes the value not an integer
To get a feel for the restrictions and types that is built-in I have listed a bunch of them below:
Built in types:
decimal, float, double, integer, positiveInteger, negativeInteger, nonPositiveInteger, nonNegativeInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte, dateTime, date, gYearMonth, gYear, duration, gMonthDay, gDay, gMonth, string, normalizedString, token, language, NMTOKEN, NMTOKENS, Name, NCName, ID, IDREFS, ENTITY, ENTITIES, QName, boolean, hexBinary, base64Binary, anyURI, notation
Built in restrictions:
minExclusive, minInclusive, maxExclusive, maxInclusive, totalDigits, fractionDigits, length, minLength, maxLength, enumeration, whiteSpace, pattern
XSD:s can easily become very complicated but when they do – please consider following the KISS rule and simplify them. You might understand an XSD that you have created yourself, no matter how complicated it is but a colleague might not…. This is the reason I have not touched the subjects of namespaces and imports which often overcomplicate things
Hope you found this small introduction into the world of XSD:s useful