Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(97)

#12892: UTF-16 and UTF-32 codecs should reject (lone) surrogates

Can't Edit
Can't Publish+Mail
Start Review
Created:
6 years, 5 months ago by ezio.melotti
Modified:
4 years, 7 months ago
Reviewers:
victor.stinner, storchaka
CC:
lemburg, gvanrossum, loewis, mjpieters, AntoinePitrou, haypo, ezio.melotti, devnull_psf.upfronthosting.co.za, tchrist_perl.com, kennyluck_csail.mit.edu, storchaka, tmp12342_gmail.com
Visibility:
Public.

Patch Set 1 #

Patch Set 2 #

Patch Set 3 #

Patch Set 4 #

Patch Set 5 #

Total comments: 30

Patch Set 6 #

Total comments: 13

Patch Set 7 #

Patch Set 8 #

Unified diffs Side-by-side diffs Delta from patch set Stats Patch
Doc/library/codecs.rst View 1 2 3 4 5 6 7 2 chunks +18 lines, -7 lines 0 comments Download
Doc/whatsnew/3.4.rst View 1 2 3 4 5 6 7 1 chunk +7 lines, -0 lines 0 comments Download
Lib/test/test_codecs.py View 1 2 3 4 5 6 7 10 chunks +55 lines, -12 lines 0 comments Download
Misc/ACKS View 1 2 3 4 5 6 7 1 chunk +1 line, -0 lines 0 comments Download
Misc/NEWS View 1 2 3 4 5 6 7 1 chunk +6 lines, -0 lines 0 comments Download
Objects/stringlib/codecs.h View 1 2 3 4 5 6 7 2 chunks +182 lines, -16 lines 0 comments Download
Objects/unicodeobject.c View 1 2 3 4 5 6 7 16 chunks +227 lines, -30 lines 0 comments Download
Python/codecs.c View 1 2 3 4 5 6 7 4 chunks +146 lines, -17 lines 0 comments Download

Messages

Total messages: 6
victor.stinner_gmail.com
http://bugs.python.org/review/12892/diff/9461/Objects/stringlib/codecs.h File Objects/stringlib/codecs.h (right): http://bugs.python.org/review/12892/diff/9461/Objects/stringlib/codecs.h#newcode599 Objects/stringlib/codecs.h:599: #if STRINGLIB_MAX_CHAR >= 0x80 You should duplicate the whole ...
4 years, 7 months ago #1
storchaka_gmail.com
http://bugs.python.org/review/12892/diff/9461/Objects/stringlib/codecs.h File Objects/stringlib/codecs.h (right): http://bugs.python.org/review/12892/diff/9461/Objects/stringlib/codecs.h#newcode599 Objects/stringlib/codecs.h:599: #if STRINGLIB_MAX_CHAR >= 0x80 On 2013/10/10 11:14:46, haypo wrote: ...
4 years, 7 months ago #2
ezio.melotti
http://bugs.python.org/review/12892/diff/9504/Doc/library/codecs.rst File Doc/library/codecs.rst (right): http://bugs.python.org/review/12892/diff/9504/Doc/library/codecs.rst#newcode364 Doc/library/codecs.rst:364: The ``'surrogatepass'`` error handlers now works with utf-16\* and ...
4 years, 7 months ago #3
storchaka_gmail.com
http://bugs.python.org/review/12892/diff/9504/Doc/library/codecs.rst File Doc/library/codecs.rst (right): http://bugs.python.org/review/12892/diff/9504/Doc/library/codecs.rst#newcode364 Doc/library/codecs.rst:364: The ``'surrogatepass'`` error handlers now works with utf-16\* and ...
4 years, 7 months ago #4
ezio.melotti
http://bugs.python.org/review/12892/diff/9504/Doc/whatsnew/3.4.rst File Doc/whatsnew/3.4.rst (right): http://bugs.python.org/review/12892/diff/9504/Doc/whatsnew/3.4.rst#newcode185 Doc/whatsnew/3.4.rst:185: and Serhiy Storchaka in :issue:`12892`. On 2013/10/12 16:56:31, storchaka ...
4 years, 7 months ago #5
storchaka_gmail.com
4 years, 7 months ago #6
http://bugs.python.org/review/12892/diff/9504/Lib/test/test_codecs.py
File Lib/test/test_codecs.py (right):

http://bugs.python.org/review/12892/diff/9504/Lib/test/test_codecs.py#newcode729
Lib/test/test_codecs.py:729: # UTF-16 and UTF-32
On 2013/10/12 17:13:34, ezio.melotti wrote:
> On 2013/10/12 16:56:31, storchaka wrote:
> > On 2013/10/12 15:25:42, ezio.melotti wrote:
> > > What does this comment refer to?
> > > This method is about utf-8, the other classes should either add different
> > > assertions, or just inherit the base method as is.
> > 
> > This is Victor's or Kang-Hao's comment. Actually, as I have ascertained,
this
> > test doesn't make sense for UTF-16 and UTF-32.
> 
> You mean that they don't have the surrogateescape error handler?
> If the test is specific to utf-8 and surrogateescape maybe it could be moved
in
> a separate method, either way the comment can be removed or replaced by
> something that say that this is specific to utf-8 and/or surrogatescape.

The surrogateescape error handler works only with ASCII compatible encodings. I
have changed a comment.

http://bugs.python.org/review/12892/diff/9504/Lib/test/test_codecs.py#newcode890
Lib/test/test_codecs.py:890: class UTF8SigTest(ReadTest, unittest.TestCase):
On 2013/10/12 17:13:34, ezio.melotti wrote:
> On 2013/10/12 16:56:31, storchaka wrote:
> > On 2013/10/12 15:25:42, ezio.melotti wrote:
> > > Should this get an additional surrogateescape assertion like the one added
> to
> > > the UTF8Test class?
> > 
> > I doubt. This test was not here before, so it is not regression. And it
> doesn't
> > test any feature added by this patch.
> 
> But adding it wouldn't harm, would it?
> If the behavior is tested for utf8, I guess that utf8-sig would do the same
> thing, and adding the test would ensure this.

Well, UTF8SigTest now inherits UTF8Test.
Sign in to reply to this message.

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7