This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients Arfrever, ezio.melotti, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-08-25.18:55:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408992902.52.0.0500199216556.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
>    data.encode('utf-8', 'replace').decode('utf-8')
>    data.encode('utf-8', 'ignore').decode('utf-8')

Why not the reverse:

os.fsencode(data).decode('utf-8', 'replace')
os.fsencode(data).decode('utf-8', 'ignore')

Note that "backslashreplace" needs to be enhanced to work when decoding too.
Note that "xmlcharrefreplace" doesn't make sense here: it encodes a *character* reference, but you're precisely trying to represent something which fails interpreting as a character.

(AFAIK, XML can't represent non-text data, except in NDATA sequences)
History
Date User Action Args
2014-08-25 18:55:02pitrousetrecipients: + pitrou, ncoghlan, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2014-08-25 18:55:02pitrousetmessageid: <1408992902.52.0.0500199216556.issue18814@psf.upfronthosting.co.za>
2014-08-25 18:55:02pitroulinkissue18814 messages
2014-08-25 18:55:02pitroucreate