Message145894
> We shouldn't use the MS codec if we have our own, as they may differ.
Ok, I agree. MS codec has a nice replacement behaviour (search for a similar
glyph): cp1252 encodes Ł to b'L' for example. Our codec raises a
UnicodeEncodeError on u'\u0141'.encode('cp1252').
> As for the 65001 bug: is that actually solved by this codec?
Sorry, which bug?
See tests using CP_UTF8 in test_codecs. Depending on the Windows version, you
don't get the same behaviour on surrogates. Before Windows Vista, surrogates
were always encoded, whereas you can now choose the behaviour using the Python
error handler:
if self.vista_or_later():
tests.append(('\udc80', 'strict', None)) # None=UnicodeEncodeError
tests.append(('\udc80', 'ignore', b''))
tests.append(('\udc80', 'replace', b'?'))
else:
tests.append(('\udc80', 'strict', b'\xed\xb2\x80')) |
|
Date |
User |
Action |
Args |
2011-10-19 08:15:56 | vstinner | set | recipients:
+ vstinner, loewis, amaury.forgeotdarc |
2011-10-19 08:15:55 | vstinner | link | issue13216 messages |
2011-10-19 08:15:55 | vstinner | create | |
|