This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients larry, loewis, serhiy.storchaka, vstinner
Date 2014-02-09.13:18:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1391951905.18.0.272262636483.issue20574@psf.upfronthosting.co.za>
In-reply-to
Content
(Follow up of issue #20538 and #20571.) Attached patch implements incremental decoders for multibyte code pages (on Windows), especially for CP_UTF8 aka "cp65001" in Python.

Code pages 932, 936, 949, 950 and 1361 already have an incremental decoder since:
---
changeset:   38817:549c547700af
branch:      legacy-trunk
user:        Martin v. Löwis <martin@v.loewis.de>
date:        Wed Jun 14 05:21:04 2006 +0000
files:       Doc/api/concrete.tex Include/unicodeobject.h Lib/encodings/mbcs.py Misc/NEWS Modules/_codecsmodule.c Objects/unicodeobject.c
description:
Patch #1455898: Incremental mode for "mbcs" codec.
---

Python currently uses IsDBCSLeadByteEx():
http://msdn.microsoft.com/en-us/library/windows/desktop/dd318667%28v=vs.85%29.aspx

And CharPrevA():
http://msdn.microsoft.com/en-us/library/windows/desktop/ms647471%28v=vs.85%29.aspx

But IsDBCSLeadByteEx() only supports code pages 932, 936, 949, 950 and 1361.

Python supports the code page 65001 (codec "cp65001") since Python 3.3. New tests on incremental decoders were added in Python 3.4: I addedd a skip for cp65001 since it was not supported (#20571). This issue implements the incremental decoder and so removes the skip.

I prefer to wait for Python 3.5 (not rush for add this new feature after 3.4 beta 3). cp65001 is mostly used for output (sys.stdout/sys.stderr) on Windows, not for input.
History
Date User Action Args
2014-02-09 13:18:25vstinnersetrecipients: + vstinner, loewis, larry, serhiy.storchaka
2014-02-09 13:18:25vstinnersetmessageid: <1391951905.18.0.272262636483.issue20574@psf.upfronthosting.co.za>
2014-02-09 13:18:25vstinnerlinkissue20574 messages
2014-02-09 13:18:24vstinnercreate