This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: utf_8_sig streamreader bug, patch, and test
Type: crash Stage:
Components: Unicode Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: doerwalter Nosy List: doerwalter, jgsack
Priority: normal Keywords: patch

Created on 2007-11-15 08:06 by jgsack, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
u8sig26.diff jgsack, 2007-11-15 08:06
test_utf8sig_stream.py jgsack, 2007-11-15 08:08
diff-u.py26_utf8sig jgsack, 2007-11-16 04:50
Messages (5)
msg57520 - (view) Author: James G. sack (jim) (jgsack) Date: 2007-11-15 08:06
The streamreader in utf_8_sig.py fails when asked to read a specified 
bytelength of data that ends up in the middle of a multibyte utf8 code.

I will attached a atandalone unittest (which does work from autotest, but 
doesn't use test_support), test_utf8sig_stream.py.

I will attach a patch (applied to the trunk 2.6 version), u8sig26.diff.

Regards,
..jim
msg57523 - (view) Author: James G. sack (jim) (jgsack) Date: 2007-11-15 09:12
Oops, it looks like my patch may have broken test_partial in test_codecs. I 
will try to figure out what the test_partial does in the next day or so, 
unless someone else can add some insignt in the meantime.

.jim
msg57524 - (view) Author: James G. sack (jim) (jgsack) Date: 2007-11-15 09:48
One additional clue: test_codecs succeeds in verbose mode but fails in non-
verbose mode (autotest "verbosity") .. I think. My eyes are getting 
blurry. More tomorrow, I guess.

..j
msg57582 - (view) Author: James G. sack (jim) (jgsack) Date: 2007-11-16 04:50
I found the errror in my previous patch. It lacked a self.decode=.. line 
in the StreamReader.decode elif branch. 

I attach a replacement patch diff-u.py26_utf8sig (apply to the 2.6 version 
of utf_8_sig.py. (If allowed, I will next remove the incorrect patch.)

This one passes test_codecs.py as well as my previously attached test 
module.

The resulting utf_8_sig.py may benefit from further refctoring, but I 
didn't want to do more than necessary to fix the immediate bug.

Regards,
..jim
msg57633 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-11-19 12:50
Checked in your change and the test as r59049 (trunk) and r59050 (2.5).
Thanks for the patch.
History
Date User Action Args
2022-04-11 14:56:28adminsetgithub: 45785
2007-11-19 12:50:17doerwaltersetstatus: open -> closed
resolution: fixed
messages: + msg57633
2007-11-16 04:50:10jgsacksetfiles: + diff-u.py26_utf8sig
messages: + msg57582
2007-11-15 16:21:07georg.brandlsetassignee: doerwalter
nosy: + doerwalter
2007-11-15 16:02:15christian.heimessetkeywords: + patch
2007-11-15 09:48:43jgsacksetmessages: + msg57524
2007-11-15 09:12:16jgsacksetmessages: + msg57523
2007-11-15 08:08:35jgsacksetfiles: + test_utf8sig_stream.py
2007-11-15 08:06:32jgsackcreate