Message104894
> Your name will end up being partially escaped as surrogate:
>
> 'L\udcf6wis'
>
> Further processing will fail
That depends on the further processing, no?
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
> range(256)
Where did you get this error from?
> It doesn't work if an application tries to work *with* the data,
> e.g. tries to convert it
Converting it to what?
> parse it
Parsing will work fine.
> decode it
It's a string. You shouldn't decode it.
> The reason is
> that information included by the use of the 'surrogateescape'
> error handler is lost along the way and this then causes data
> corruption.
And how would that not happen if it was bytes? The problems you describe
were one of the primary motivations to switch to Unicode: it's *byte*
strings that have these problems. |
|
Date |
User |
Action |
Args |
2010-05-03 22:11:23 | loewis | set | recipients:
+ loewis, lemburg, gregory.p.smith, pitrou, vstinner, ezio.melotti, Arfrever |
2010-05-03 22:11:20 | loewis | link | issue8603 messages |
2010-05-03 22:11:20 | loewis | create | |
|