Tuesday, July 29, 2008

Number Format Hell

I am in localised format hell at the moment.

I am currently re-working Apache Cocoon CForms.
My current task is a validating, currency field, which can properly display and edit any currency in any locale.

CForms (started long before Browsers became smart with big Ajax Libraries) allows you to use Convertor classes to map either way between an Object in the business logic and a localised String to be edited in a form.

Number Convertors are based on java.text.NumberFormat class.

Let's say that the Model (Bean, XML frgment, etc.) I am editing contains a value that represents a Currency in Pounds Sterling as a java.math.BigDecimal, I'd set up my Convertor like this :
<fd:field id="dieselprice" required="true">
<fd:label>Price for a liter diesel:</fd:label>
<fd:datatype base="decimal">
<fd:convertor variant="currency" currency="GBP"/>
</fd:datatype>
</fd:field>

When my number eg. "1000.00" goes into the Convertor, it is localised to the format for the viewer's locale, so in the UK someone should see: "£1,000.00", in France: "1 000,00 UK£" etc. (PS. Diesel will cost £1000 per litre one day, you'll see ....)

So far, so good. The user gets a number in a format they recognise, the edited number is returned to the server and the Convertor performs the reverse operation, to store it back to the java.math.BigDecimal.

Now the fun starts ........

I am changing CForms on the client-side to use all Dojo editors. In the case of currency, I use the dijit.form.CurrencyTextBox. It is a cool editor, while the editor is not focussed, it shows the formatted version "1 000,00 UK£" (fr_FR) but as soon as you click to edit it, it gives you the simple version, where it is harder to make editing mistakes "1000,00" and gives you validation feedback while you are typing.

So here comes the first impedance mis-match .... Dojo does not expect the server to format and localise the number as a String, it expects the server to send the value in the same format as a JavaScript primitive "1000.00". Dojo then uses it's vast library of localisation formatting rules to convert the primitive to a formatted String suitable for the locale of the User, allow that to be edited and post the new primitive back to the server.

So, I can hack Dojo, or I can hack CForms. I hacked Dojo, because CForms has the right behaviour, it should still work when someone has JavaScript turned off, they will see a simpler form but with properly localised values in it.

So I extended dijit.form.CurrencyTextBox <-- cocoon.forms.CurrencyField, to allow it to send and receive formatted strings instead of number primitives (as text).

Boom Boom, job done! (so I thought).

The first hint of trouble appeared when I was testing numbers represented as percentages. They would display fine in some locales but not in others. You see Dojo needs to be able to interpret the format to be able to validate it and it turned out that Java and Dojo use different formats for percentages in fr_FR and de_DE (France and Germany).

While Java formats them as "#,##0%" (123%) Dojo formats them as "#,##0 %" (123 %).

I am thinking WTF! These are supposed to be international standards! Where's this stuff coming from?

On the Java side, the closest hint I could find was that java.text.DecimalFormat is copyrighted by Taligent (IBM). It is also possible that IBM wrote the number formatting in Dojo, I believe they contributed all of the Internationalisation classes to Dojo, but specifically, Dojo's datasource for compiling it's lookup tables is Unicode.org. I have yet to ascertain where Java sources it's data, but there are clearly problems, I read that JDK7 will package the currency bundles in a way that does not require a whole JDK update when currencies change.

So there is an annoyingly large number of very detailed differences between localised currency formats in Dojo and Java, simple differences like individual locale's currency symbol can be worked around via Dojo APIs, deeper problems like group and decimal separators not matching for Arabic and some far-eastern countries are proving more wasteful of my time.

I am currently using Java 1.4.2, will upgrading change the situation? Make it better? Make it worse? My JDK is supplied and maintained by Apple. Will moving to another OS make it better or worse?

It's an abysmal situation!

One tip for the would-be user of java.util.Currency and java.text.DecimalFormat is, if you call DecimalFormat.setCurrency, make sure you also copy over the number of decimal fraction digits, DecimalFormat leaves them out, resulting in a bad format for currencies that have no decimals (Japanese Yen etc.).

Currency currency = Currency.getInstance("GBP");
int digits = currency.getDefaultFractionDigits();
DecimalFormat format = (DecimalFormat)NumberFormat.getCurrencyInstance(locale);
format.setCurrency(currency);
format.setMinimumFractionDigits(digits);
format.setMaximumFractionDigits(digits);

I did not find this in any of the tutorials online, but it is in the JavaDocs.

3 comments:

Jerm said...

During the day and on into the evening, I have identified more types of inconsistencies and implemented more hacks to work around them, and I keep finding more .......

This is a loosing game I think.

The best idea I have had for a real solution (so far) is to implement a pipeline in Cocoon to output the dojo/cdlr/currency (etc.) URIs not from the files Dojo provides, but generating the content from Java NumberFormat classes directly!!

Get it from the horse's mouth as they say .....

What do you think?

Jerm said...

Ha!
Or another option I suppose, is to have the Convertor generate a java.text.DecimalFormatSymbols Object for the locale, based on the same cldr XML that Dojo uses.

Jerm said...

I know, it's getting daft ..... Here's another one ..... ((Jeez! Apache is so adaptive! ))

You can use Ant build scripts provided in the Dojo release to build the files that provide the dojo/cldr URIs, for the locales you are interested in (or all of them).

The build script (downloads Saxon!! then) runs some XSLT to transform cldr XML files from Unicode.org to build the json files that Dojo loads as bundles.

Ant runs in Java.

It could maybe build the cldr files from all of Java's supported Locales, by reading the java.text.DecimalFormatSymbols from each Locale supported by DecimalFormat?

Ha Ha Ha !!

Any more for any more?