At this moment Unicode is an implementation detail rather than a language feature.
I must say it: not supporting Unicode (or: explicitly planning not to support it) is a BAD thing. You will hardly notice it if you come from the US. It may get a bit tricky if you come from the UK, as you may want want to use the pound or euro symbol. If you come from a diacritics-rich language, then you may start feeling stupid. Prepare to serve yourself and your users communicates like:
"Sarra, thas cammanacata has baan adaptad ta tha fana pragrammang langaaga wa ara asang." ("Sorry, this communicate has been adapted to the fine programming language we are using."---it is not that hard guess after all, ain't it?)
No, PG, please don't be that guy.
Python's Unicode support sucked badly at the beginning, but they kept improving it. Right now it is kinda acceptable (though I regularly spend some time debugging Unicode errors---you'd imagine by now I would get used to it), in Py3k is hopefully gonna be made right. Ruby Unicode support still sucks, and that is basically why I don't use it (even though I like its semantics a lot, as it is more lispy than Python). Not being able to divide a word from your own language into three character substrings (Unicode characters use more than one byte) is plainly ridiculous... Even on the prototype level.
Of course, no one says it has to be done right now. But I'd like to know it is in the plans.
Ruby's unicode support is acceptable in 1.8, and good in 1.9. I'm not asking for the world, I just want string to be able to contain text in any encoding, and to be able to split a string into chars, given a encoding.
I would like to use numbers in various encoding like reversed (bigendian on little endian machines and vice versa). I also want the language to natively support all these number encondings and to be able to add two numbers, given their encodings.
I imagine it only officially supports ASCII because it will be migrated away from MzScheme eventually.
Note: Those "u"s are supposed to have umlauts, but that's apparently normalized away somewhere. The point is, u with an umlaut is treated as a single character by the current implementation.
I'm sure it'll be agnostic, if by "agnostic" you mean that it just reads in strings as a sequence of bytes. It would be easier to do that than to check for non-ASCII characters and handle them specially.
The real question is: how many people will repeat the ASCII business without checking when the thing supports utf8 just fine out of the box? (I'm working on a project in Chinese right now.)
I read his announcement and I completely disagree. Strings are pretty basic, and getting them right is part of the work of a language designer. They're more important than macros. Not getting strings right can cripple a language.
And to call not supporting unicode "offensive" is missing the point. Only supporting ascii makes the language less powerful. It means you can't use Arc for solving problems involving text manipulation in languages other than English. That's a big space. Only supporting UTF-8 would make more sense.
And for all the Java bashing nowadays, Java got Strings right, and Perl, Python, PHP and Ruby didn't.
Java unicode support has historical been a mess too. They assumed that 16 bits would always and forever be enough for any code point. This was only "fixed" in 2004 and the warts are still there.
I suppose the lesson to take away is that just about every single language has messed up characters sets. It can't then be a fatal mistake but certainly isn't one that makes any sense to repeat.
I can't believe he said that. At this stage no one really expected Arc to have any sort of UTF/Unicode/I18N support. He should have kept that for himself and then the users would have built the libraries on top of Arc. Well, I guess time will tell. Will keep an eye on it.