Yves, we are thinking about a general API for encoding detection that could initially just check for BOM/Unicode signatures. I believe we have a feature request for this already. Mark and I just brainstormed about what we may want an API look like.
The reason for doing what ICU is doing currently is simple pragmatism. None of our converters auto-detects anything, and they write only what you tell them to write.
When you deal with serialized data structures and fields in files or databases, that is exactly what you want.
With signature-carrying files and transmission protocols, there is more work necessary.
It seems to me that a converter API with its ability to take one byte at a time, and no other way to pass additional information ("I know the language of the text..."), is not the best way to implement this.
On output, writing a BOM/signature is easy: if you know you need one, then just call the converter once with U+feff.
Although, with this one feature, I could imagine having an API "emit a Unicode signature if you are a converter for a Unicode encoding".
markus
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT