-
Perl Brokenness...
2004-02-27 10:55 in /tech/perl
Perl is supposed to recognize the UTF-16 encoding, with a byte order mark to determine the endianness. Unfortunately, if you try to read in a file with this encoding, it expects to find a BOM at the beginning of each block it reads in, and bails out on you.
Well, fine, so most of the time you actually "know" the endianness, so just use, say, UTF-16LE instead. Except that now you have that BOM on your first line. Maybe you figure this is no problem since you were already doing some validation / cleanup on the data and had a
s/^\s+//in there. Except, Perl doesn't include the byte order mark (aka "ZERO WIDTH NO-BREAK SPACE") in its whitespace class, so now you have to do something ugly likes/^[\s\x{FEFF}]+//instead.Bleech!
Leave a comment
Please use plain text only. No HTML tags are allowed.