This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_codecs currently failing on several Windows buildbots
Type: behavior Stage: needs patch
Components: Tests, Unicode, Windows Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, doerwalter, ezio.melotti, georg.brandl, larry, lemburg, loewis, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: release blocker Keywords:

Created on 2014-02-09 09:16 by larry, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg210732 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 09:16
The Windows buildbots are currently broken due to a codec issue.  I populated the "nosy" list based on the "unicode" experts from the Experts Index.


http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/4040


test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed
ok

======================================================================
ERROR: test_readline (test.test_codecs.CP65001Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 157, in test_readline
    self.assertEqual(readalllines("".join(vw), True), "|".join(vw))
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 136, in readalllines
    line = reader.readline(size=size, keepends=keepends)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 548, in readline
    data = self.read(readsize, firstline=True)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 494, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.
msg210733 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 09:20
Note that this appears to be in Windows-specific code ("CP_UTF8"), rather than being cross-platform code which happens to only fail on Windows.  So we need someone who does both Windows and Unicode.
msg210741 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-09 10:23
It looks to be related to changeset e988661e458c5402c0236cd1084a8671249a760d
Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
msg210743 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 10:29
Serhiy said on IRC that he doesn't have a Windows development environment, so he didn't think he could help.
msg210744 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-09 10:34
UTF-7 decoder is not related to this test.

The test_readline test was broken from the born, and a part of this test was do nothing. After fixing it in issue20520, new bugs were exposed: issue20538 and this. This bug was hidden until fixing issue20538.

Note that there is no test_partial in CP65001Test. Perhaps it is related.

The simplest solution would be to temporary skip test_readline in CP65001Test:

    test_readline = unittest.expectedFailure(ReadTest.test_readline)
msg210747 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-09 10:53
The test tries to decode a partial UTF-8 bytes string. The problem is that codecs.code_page_decode() doesn't implement fully partial decoders. The decoder only supports partial decoding for a few code pages: 932, 936, 949, 950, and 1361. The partial decoding is currently based on IsDBCSLeadByteEx():
http://msdn.microsoft.com/en-us/library/windows/desktop/dd318667%28v=vs.85%29.aspx

It may be possible to enhance decoders, but it's not a regression from Python 3.3 and so can be done in Python 3.5.

Please just skip failing tests for CP_UTF8 (cp 65001) and maybe other Windows code pages in test_codecs.

(I don't have time to write a patch to skip, sorry.)
msg210751 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-09 12:14
New changeset 4f6499fc2f09 by Victor Stinner in branch 'default':
Issue #20571: skip test_readline() of test_codecs for Windows code page 65001.
http://hg.python.org/cpython/rev/4f6499fc2f09
msg210761 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-09 13:18
I opened #20574 to implement the missing feature for cp65001.
msg211983 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-23 07:34
New changeset d8f48717b74e by Victor Stinner in branch '3.3':
Issue #20571: skip test_readline() of test_codecs for Windows code page 65001.
http://hg.python.org/cpython/rev/d8f48717b74e
msg211984 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-02-23 07:34
Would have been nice to do this also on 3.3 branch...
msg211986 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-23 09:37
Ah yes, sorry. I forgot that the utf-7 change was also applied to 3.3.
History
Date User Action Args
2022-04-11 14:57:58adminsetgithub: 64770
2014-02-23 09:37:34vstinnersetmessages: + msg211986
2014-02-23 07:34:47georg.brandlsetnosy: + georg.brandl
messages: + msg211984
2014-02-23 07:34:25python-devsetmessages: + msg211983
2014-02-09 13:18:55vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg210761
2014-02-09 12:14:09python-devsetnosy: + python-dev
messages: + msg210751
2014-02-09 10:53:14vstinnersetmessages: + msg210747
2014-02-09 10:34:42serhiy.storchakasetmessages: + msg210744
2014-02-09 10:29:28larrysetmessages: + msg210743
2014-02-09 10:23:22vstinnersetmessages: + msg210741
2014-02-09 09:24:32serhiy.storchakasetnosy: + lemburg, doerwalter, serhiy.storchaka
components: + Unicode, Windows
2014-02-09 09:20:30larrysetmessages: + msg210733
2014-02-09 09:16:49larrycreate