Tag Archives: XSD

XSD, a quick introduction

I often get asked to explain how XSD:s work and everytime I explain it in a different manner. I thought that if I document one way here I could later reference it for the rest of my life 🙂
I have below an example XSD that I will talk about. This is by no means a complete tutorial but an simple look into the world of XSD

XSD EXAMPLE

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
 
<element name="student">
  <complexType>
    <sequence>
      <element name="name" type="string"/>
      <element name="pnr">
        <simpleType>
          <restriction base="string">
            <pattern value="[0-9]{2}[0-1][0-9][0-9][0-9][0-9a-z][0-9]{3}"/>
          </restriction>
        </simpleType>
      </element>
      <element name="address" type="string"/>
      <element name="zip" type="integer"/>
      <element name="city">
        <simpleType>
	  <restriction base="string">
	    <maxLength value="82"/>
         <minLength value="3"/>
	  </restriction>
	</simpleType>
      </element>
    </sequence>
  </complexType>
</element>
 
</schema>

The root tag in this xsd is the element ‘student‘. This explains the name of the “highest” element in the XML. It also gives us the names of tags below the root element: name, pnr, address, zip and city. Let us look at each tag separately:
name‘ – This tag has to obey to the rules set in the built-in type: string. This pretty much include any text in any length (even empty)
pnr‘ – For this tag we cannot use an built in xsd type. We have to create our own. To do this we set a pattern value. This is a normal regular expression and in this case it demands that ‘pnr’ starts with 2 single digit integers followed by a binary digit, 3 single digit integers, 1 character that can be a single digit integer or a lowercase letter in the range from a to z. We end with 3 digits between 0 and 9.
address‘ – This element also has to obey by the built-in string type rules
zip‘ – This element has to obey by the integer type which is a little more stringent than string
city‘ – This element has to obey to the two built in restrictions minLength (3 characters) and maxLength (82 characters).

A few words about simpleType and complexType
When we need to create our own restrictions we need to encapsulate them into simpleType (single element) or complexType (multiple elements). complexType can also be used to describe whole structures as done in the example message with the student substructure that is encapsulated in a complexType

Matching XML example

<?xml version="1.0"?>
 
<student>
  <name>Nikla</name>
  <pnr>1214567890</pnr>
  <address>Kungsgatan 1</address>
  <zip>12345</zip>
  <city>Gothenburg</city>
</student>

Non matching XML example

<?xml version="1.0"?>
 
<student>
  <name>Niklas</name>
  <pnr>121456-7890</pnr>
  <address>Kungsgatan 1</address>
  <zip>123 45</zip>
  <city>Gothenburg</city>
</student>

It’s often more interesting to talk about non matching examples since they give us a more in-depth look of the problems you could encounter while creating your XSD. Here the ‘pnr’ element contains a dash (‘-‘) which is not defined in our pattern. In our pattern only letters and numbers are allowed. The ‘zip’ element is also false since we demand a value of type ‘integer‘. The whitespace between ‘3’ and ‘4’ makes the value not an integer

To get a feel for the restrictions and types that is built-in I have listed a bunch of them below:
Built in types:
decimal, float, double, integer, positiveInteger, negativeInteger, nonPositiveInteger, nonNegativeInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte, dateTime, date, gYearMonth, gYear, duration, gMonthDay, gDay, gMonth, string, normalizedString, token, language, NMTOKEN, NMTOKENS, Name, NCName, ID, IDREFS, ENTITY, ENTITIES, QName, boolean, hexBinary, base64Binary, anyURI, notation

Built in restrictions:
minExclusive, minInclusive, maxExclusive, maxInclusive, totalDigits, fractionDigits, length, minLength, maxLength, enumeration, whiteSpace, pattern

XSD:s can easily become very complicated but when they do – please consider following the KISS rule and simplify them. You might understand an XSD that you have created yourself, no matter how complicated it is but a colleague might not…. This is the reason I have not touched the subjects of namespaces and imports which often overcomplicate things

Hope you found this small introduction into the world of XSD:s useful

Validate elements in any order and any number of times using XSD

Sometimes you just want to validate any number of elements any number of times. There is no intuitive way in XSD to accomplish this so we have to use a trick to get it to work. The solution looks a little like this:

<xs:element name="myElement">
    <xs:complexType>
      <xs:sequence minOccurs="0" maxOccurs="unbounded">
        <xs:choice>
          <xs:element name="myId" type="xs:int" />
          <xs:element name="myName" type="xs:string" />
        </xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

This lets us validate any of the elements in the <choice> (myId and MyName) list any number of times and in any order, so the following will validate:

<myElement>
  <myId>3</myId>
  <myName>Niklas</myName>
</myElement>

and:

<myElement>
  <myName>Niklas</myName>
  <myId>3</myId> 
</myElement>

and:

<myElement>
  <myId>3</myId>
  <myName>Niklas</myName>
  <myName>Anders</myName>
</myElement>

but not:

<myElement>
  <myId>hello</myId>
  <myName>Niklas</myName>
</myElement>

Last one does not validate since ‘hello’ is not a integer.

Tested with xmllint with libxml version 20708

Using the xmlns:xml namespace when validating with xsd

The xml namespace is the default namespace but sometimes you need to define it when working with xsd. The secret here is that the xml namespace only accepts one particular uri (“http://www.w3.org/XML/1998/namespace”). Here is an example of an xsd validation using the xml namespace:
XML to validate

<?xml version="1.0"?>
<books  xmlns="http://www.my.com"
		xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

		xsi:schemaLocation="http://www.w3.org/XML/1998/namespace xml.xsd
        http://www.my.com my.xsd"

		xmlns:xml="http://www.w3.org/XML/1998/namespace">
    
    <book xml:author="Astrid Lindgren"/>
</books>

To validate the xml:author xml we use the schema definition below (xml.xsd):

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
            targetNamespace="http://www.w3.org/XML/1998/namespace" 
            elementFormDefault="qualified">
                
       <xs:attribute name="author" type="xs:string"/>

</xs:schema>

Notice the “targetNamespace”