Message81407
I had my first indication to rather use "macintosh" instead of
"mac_roman" from Wikipedia http://en.wikipedia.org/wiki/Mac_OS_Roman
which states that the charset part of a MIME content-type specification
should be maciontosh. I'm not quoting this as any kind of authority, but
rather to point out that it is likely for people to use this.
I did a comparison of http://tools.ietf.org/rfc/rfc1345.txt (RFC) and
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT (UNI)
using the attached perl script. The results:
3 codepoints unused in RFC but defined in UNI: f0, f6, f7
1 codepoint unused in UNI but defined in RFC: 7f
2 codepoints with slightly different character names, same meaning
9 codepoints with actually different definitions:
a5: rfc 2219 BULLET OPERATOR
uni 2022 BULLET
c4: rfc e023 DUTCH GUILDER SIGN (IBM437 159)
uni 0192 LATIN SMALL LETTER F WITH HOOK
c6: rfc 0394 GREEK CAPITAL LETTER DELTA
uni 2206 INCREMENT
c9: rfc 22ef MIDLINE HORIZONTAL ELLIPSIS
uni 2026 HORIZONTAL ELLIPSIS
d0: rfc 2014 EM DASH
uni 2013 EN DASH
d1: rfc 2013 EN DASH
uni 2014 EM DASH
d7: rfc 25c6 BLACK DIAMOND
uni 25ca LOZENGE
db: rfc 00a4 CURRENCY SIGN
uni 20ac EURO SIGN
f8: rfc 203e OVERLINE
uni 00af MACRON
a5 and c6 could be different interpretations of symbols that look pretty
much the same. The introduction of the euro sign instead of the generic
currency sign seems to be a recent modification documented in UNI. The
change of the order of the dashes seems really confusing.
Notice also this line in the RFC:
&rem source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991
So it looks like the RFC used the unicode definition as its source. What
part of it I'm not sure, and where the differences come I'm even less sure.
My next steps:
* Look for further references, e.g. from apple, and compare them as well
* Try some things out on a mac, see how it behaves in real life
* Compare all this to the current python implementation
* Write a patch to either provide an alias or a new charset "macintosh"
Help welcome. |
|
Date |
User |
Action |
Args |
2009-02-08 18:56:06 | gagern | set | recipients:
+ gagern, lemburg, zenzen, yenzenz |
2009-02-08 18:56:05 | gagern | set | messageid: <1234119365.89.0.970525337917.issue843590@psf.upfronthosting.co.za> |
2009-02-08 18:56:05 | gagern | link | issue843590 messages |
2009-02-08 18:56:03 | gagern | create | |
|