April 14, 2007

UTF-8 Conversion Tricks

Filed under: cplusplus,Optimization,Programming — floodyberry @ 3:04 am
UTF-8 is a wonderfully simple encoding format with some very nice properties, but the juggling required to convert to UTF-16, and UTF-32 can be a little tricky and fairly easy to do poorly. This is further compounded by the various error conditions you must keep an eye out for, such as overlong encodings, reserved ranges, surrogate markers, incomplete sequences, and so on.

These are a couple tricks you can employ to hopefully keep the conversion fast and robust.


