This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author reingart
Recipients ezio.melotti, reingart
Date 2012-10-28.04:06:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1351397171.27.0.117750740634.issue16343@psf.upfronthosting.co.za>
In-reply-to
Content
Working in an internationalization proposal <http://python.org.ar/pyar/TracebackInternationalizationProposal>
I've stopped at #9769 where multi byte encodings (like utf-8) is not supported by PyUnicode_FromFormatV()

Beside my proposal, I think utf-8 should be supported for consistency with the other unicode functions, like PyUnicode_FromString() or even unicode_fromformat_arg()

Attached is a patch that:
- enhanced the iterator to detect multibyte sequences, with sanity checks about start & continuation bytes
- replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful
- tests

Hope it helps, this is my first patch for cpython and my C skills are a bit rusty, so excuse me if there is any newbie glitch
History
Date User Action Args
2012-10-28 04:06:11reingartsetrecipients: + reingart, ezio.melotti
2012-10-28 04:06:11reingartsetmessageid: <1351397171.27.0.117750740634.issue16343@psf.upfronthosting.co.za>
2012-10-28 04:06:11reingartlinkissue16343 messages
2012-10-28 04:06:10reingartcreate