Tuesday, May 5, 2009

Adventures in static typing

The computer language I use most when programming is Java. It is not a language I love, but it gets the job done. Sometimes, though, I wish it would allow me to express more within a static type system.

I have been using JAXB in order to do data binding from an XML schema to Java. The generated Java code is not bad, but sometimes, using it is awkward. For example, the XML schema fragment


<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="k"/>
<xs:element ref="r"/>
<xs:element ref="a"/>
</xs:choice>


gets compiled into the Java fragment


/**
* Gets the value of the ksAndRS property.
*
* <p>
* This accessor method returns a reference to the live list,
* not a snapshot. Therefore any modification you make to the
* returned list will be present inside the JAXB object.
* This is why there is not a <CODE>set</CODE> method for the ksAndRS property.
*
* <p>
* For example, to add a new item, do as follows:
* <pre>
* getKSAndRS().add(newItem);
* </pre>
*
*
* <p>
* Objects of the following type(s) are allowed in
* the list
* {@link K }
* {@link A }
* {@link R }
*
*
*/
public List<Object> getKSAndRS() {
if (ksAndRS == null) {
ksAndRS = new ArrayList<Object>();
}
return this.ksAndErrorsAndRS;
}


In other words, all the possible items are stuffed into a List<Object>. This is annoying because the client of the code must be written to perform a runtime check, for each item in the list, of what type the item actually has. For example:


for (Object item : u.getKSAndRS()) {
System.println(doItem(item));
}


where there are definitions


String doItem() throws UnexpectedTypeException {
if (item instanceof K) {
return doK((K) item);
}
else if (item instanceof A) {
return doA((A) item);
}
else if (item instanceof R) {
return doR((R) item);
}
else {
throw new UnexpectedTypeException("u list",
item);
}
}

String doK(K item) {
return "saw K";
}

String doR(R item) {
return "saw R";
}

String doA(A item) {
return "saw K";
}


Ugly stuff. I have to look at the generated Javadoc comment to know what types to look for and handle, and perform these casts. You might wonder why I need the final case, in which I throw an exception for a situation that "cannot" happen. Well, apart from it being defensive programming, it is critical for maintenance purposes. As it turns out, I change the XML schema fairly often, and therefore regenerate the Java code often. Suppose I added another possibility to the schema, e.g:


<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="k"/>
<xs:element ref="r"/>
<xs:element ref="a"/>
<xs:element ref="newthing"/>
</xs:choice>


If I didn't use defensive programming, I could totally forget (being a human being) to update the client code to add a test for the newthing type introduced, and the program would silently do nothing for data that uses the new case.

Here's where a better static type system comes in. I think polymorphic variants, as found in the programming language Objective Caml (OCaml), are the perfect solution, especially since many of these XML elements can occur in many different contexts in different lists. For example, consider the following OCaml code:


type itemType = [`K | `R | `A]

(* Dummy implementation. *)
let getKSAndRS : unit -> itemType list = function
() -> [`K; `R; `K; `A; `R]

let doItem : itemType -> string = function
| `K -> "saw K"
| `R -> "saw R"
| `A -> "saw A"

let _ = List.iter print_endline
(List.map doItem (getKSAndRS ()))


The output is


saw K
saw R
saw K
saw A
saw R


We immediately see one advantage of using this type system: the compiler complains if somehow we forget to treat a case, e.g.,


(* Missing case for `A *)
let doItem : itemType -> string = function
| `K -> "saw K"
| `R -> "saw R"


leads to


File "poly.ml", line 8, characters 4-6:
Error: This pattern matches values of type [< `K | `R ]
but is here used to match values of type itemType
The first variant type does not allow tag(s) `A


Another advantage is the advantage over using just ordinary variant types (also known as algebraic datatypes or sum types), as found in core ML and Haskell and other similar languages. The advantage is that these labels can be used in more than one type, e.g.,


type anotherItemType = [`A | `C | `Z]

(* Dummy implementation. *)
let getCSAndZS : unit -> anotherItemType list = function
() -> [`Z; `A; `C; `A; `Z]

let doAnotherItem : anotherItemType -> string = function
| `C -> "saw C"
| `Z -> "saw Z"
| `A -> "saw A"

let _ = List.iter print_endline
(List.map doAnotherItem (getCSAndZS ()))


can be added in the same module in the same program without `A in one context conflicting with that in the other one.

So not only do we have static type safety, but also, for maintenance purposes, if the definitions of the types change, the compiler will warn us if we either are still handling a case that no longer exists or have not written new code to handle a case that has been added. Unfortunately, it is not practical for me to use languages such as OCaml, for a whole set of reasons, but maybe one day the world will be ready for them. Meanwhile, some people in industry, such as Jane Street Capital, do make use of OCaml, even devoting a blog to their experiences with it. They use features such as polymorphic variants, as described in this paper.

No comments:

Post a Comment