XSD, a quick introduction

I often get asked to explain how XSD:s work and everytime I explain it in a different manner. I thought that if I document one way here I could later reference it for the rest of my life :)
I have below an example XSD that I will talk about. This is by no means a complete tutorial but an simple look into the world of XSD

XSD EXAMPLE

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">

<element name="student">
  <complexType>
    <sequence>
      <element name="name" type="string"/>
      <element name="pnr">
        <simpleType>
          <restriction base="string">
            <pattern value="[0-9]{2}[0-1][0-9][0-9][0-9][0-9a-z][0-9]{3}"/>
          </restriction>
        </simpleType>
      </element>
      <element name="address" type="string"/>
      <element name="zip" type="integer"/>
      <element name="city">
        <simpleType>
	  <restriction base="string">
	    <maxLength value="82"/>
         <minLength value="3"/>
	  </restriction>
	</simpleType>
      </element>
    </sequence>
  </complexType>
</element>

</schema>

The root tag in this xsd is the element ‘student‘. This explains the name of the “highest” element in the XML. It also gives us the names of tags below the root element: name, pnr, address, zip and city. Let us look at each tag separately:
name‘ – This tag has to obey to the rules set in the built-in type: string. This pretty much include any text in any length (even empty)
pnr‘ – For this tag we cannot use an built in xsd type. We have to create our own. To do this we set a pattern¬†value. This is a normal regular expression and in this case it demands that ‘pnr’ starts with 2 single digit integers followed by a binary digit, 3 single digit integers, 1 character that can be a single digit integer or a lowercase letter in the range from a to z. We end with 3 digits between 0 and 9.
address‘ – This element also has to obey by the built-in string type rules
zip‘ – This element has to obey by the integer type which is a little more stringent than string
city‘ – This element has to obey to the two built in restrictions minLength (3 characters) and maxLength (82 characters).

A few words about simpleType and complexType
When we need to create our own restrictions we need to encapsulate them into simpleType (single element) or complexType (multiple elements). complexType can also be used to describe whole structures as done in the example message with the student substructure that is encapsulated in a complexType

Matching XML example

<?xml version="1.0"?>

<student>
  <name>Niklas</name>
  <pnr>1214567890</pnr>
  <address>Kungsgatan 1</address>
  <zip>12345</zip>
  <city>Gothenburg</city>
</student>

Non matching XML example

<?xml version="1.0"?>

<student>
  <name>Niklas</name>
  <pnr>121456-7890</pnr>
  <address>Kungsgatan 1</address>
  <zip>123 45</zip>
  <city>Gothenburg</city>
</student>

It’s often more interesting to talk about non matching examples since they give us a more in-depth look of the problems you could encounter while creating your XSD. Here the ‘pnr’ element contains a dash (‘-‘) which is not defined in our pattern. In our pattern only letters and numbers are allowed. The ‘zip’ element is also false since we demand a value of type ‘integer‘. The whitespace between ‘3’ and ‘4’ makes the value not an integer

To get a feel for the restrictions and types that is built-in I have listed a bunch of them below:
Built in types:
decimal, float, double, integer, positiveInteger, negativeInteger, nonPositiveInteger, nonNegativeInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte, dateTime, date, gYearMonth, gYear, duration, gMonthDay, gDay, gMonth, string, normalizedString, token, language, NMTOKEN, NMTOKENS, Name, NCName, ID, IDREFS, ENTITY, ENTITIES, QName, boolean, hexBinary, base64Binary, anyURI, notation

Built in restrictions:
minExclusive, minInclusive, maxExclusive, maxInclusive, totalDigits, fractionDigits, length, minLength, maxLength, enumeration, whiteSpace, pattern

XSD:s can easily become very complicated but when they do – please consider following the KISS rule and simplify them. You might understand an XSD that you have created yourself, no matter how complicated it is but a colleague might not…. This is the reason I have not touched the subjects of namespaces and imports which often overcomplicate things

Hope you found this small introduction into the world of XSD:s useful

My HTMLEncode function in Java

Every now and then I have to work in an environment where imports of frameworks is prohibited and it in times like that that I had to create my own HTMLEncode function. Here is the result:

public static String HTMLEncode(String inputString) {

		// Check if string contains ANY special characters (<>"&)
		if(inputString.indexOf("<") != -1 || 
                   inputString.indexOf(">") != -1 || 
                   inputString.indexOf("\"") != -1 ||
                   inputString.indexOf("&") != -1) {

			char c;
			StringBuffer out = new StringBuffer();
			for(int i=0; i < inputString.length(); i++) {
				
			    c = inputString.charAt(i);
			    
			    if(c=='"' || c=='<' || c=='>') {
			       out.append("&#"+(int)c+";");
			    }
			    else if(c == '&'){
                    // Is &-sign preceding an HTML entity?
			    	if(inputString.indexOf("&", i) == i || 
			    		inputString.indexOf("& #38;", i) == i ||
		    			inputString.indexOf("& lt;", i)  == i || 
		    			inputString.indexOf("& #60;", i) == i ||
		    			inputString.indexOf("& gt;", i)  == i ||
		    			inputString.indexOf("& #62;", i) == i ||
		    			inputString.indexOf("& quot;", i)== i ||
		    			inputString.indexOf("& #34;", i) == i ){
			    		
			    		out.append(c);
			    		
			    	}
			    	else {
			    		out.append("&#"+(int)c+";");
			    	}
			    }
			    else {
			        out.append(c);
			    }		
			}			
			return out.toString();
		}
		else {
			return inputString;			
		}				
	}

NOTE! The spaces inside the strings on row 21 thru 27 are only there for displaying purposes. In real code these spaces should be removed

Charset problem in Play framework after upgrading OSX

This is such a strange problem that I just have to write it down for future reference.
Involved systems:
* One Ubuntu 14.04.2 LTS on AWS (Amazon Web Services)
* One MacBook Pro 2010 with 10.7.5 with iTerm 2.1.1
* One MacBook Pro 2015 with 10.10.3 with iTerm 2.1.1
* One Play framework 2.3 application

Problem description:
After starting the Play application with the MacBook that has the 10.10.3 version all files that were written to disk had all non-ascii characters shown as ??. When starting the Play application with the old computer (10.7.5) these characters where displayed correctly.

After quite a lot of trial-and-error I found that the command ‘locale’ on the remote AWS server complained about:
“locale: Cannot set LC_CTYPE to default locale: No such file or directory”
“locale: Cannot set LC_ALL to default locale: No such file or directory”
when using the newer computer but not with the old

Solution:
The ‘locale’ command error lead me to the following solution in iTerm:
Untick the Terminal option “Set locale variables automaticly” in Preference
QQ20140113-3

This option is AFAIK default on in iTerm

After this was done the ‘locale’ error was gone and all files had the correct charset