classification
Title: PEP 383 implementation
Type: Stage:
Components: Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: benjamin.peterson, exarkun, loewis, pitrou
Priority: release blocker Keywords: patch

Created on 2009-05-03 20:38 by loewis, last changed 2009-10-13 15:24 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
383.diff loewis, 2009-05-04 05:05
Messages (11)
msg87069 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-03 20:37
This is the implementation of PEP 383. Rietveld submission will follow.
msg87070 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-03 20:42
http://codereview.appspot.com/52095
msg87071 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-03 20:52
Some notes, in addition to the PEP:

- the file system APIs have all stopped using et converters, and use O&
converters, with a new API function PyUnicode_FSConverter. This outputs
a bytes or bytearray object which needs to be released when done; if it
is a bytearray, the bytes also need to be locked when the GIL is released.

- to convert the environment successfully, initialization of the
filesystemddefaultencoding had to be moved up in the code.

- conversion of the command line arguments can't use codecs. Instead, it
uses the C library, hoping that their encoding is the same as the one
that we later determine as the file system encoding. With C99 mbrtowc,
it is possible to put utf8b escapes in the output; without that
function, every non-ASCII byte gets escaped if the CRT fails to convert
the entire string successfully.

- in the unlikely case that listdir still fails (if the file system
encoding cannot convert ASCII bytes sometimes), listdir now raises an
exception, to be consistent with the failure mode in all other places
where this problem may occur.
msg87074 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-05-03 21:41
The calls to bytes2str are not checked in posixmodule.c.


http://codereview.appspot.com/52095/diff/1/5
File Include/unicodeobject.h (right):

http://codereview.appspot.com/52095/diff/1/5#newcode1254
Line 1254: The function is intended to be used for paths and file names
only
If these are only meant to be used internally during bootstrap, should
they have the _Py prefix?

http://codereview.appspot.com/52095/diff/1/10
File Lib/test/test_pep383.py (right):

http://codereview.appspot.com/52095/diff/1/10#newcode1
Line 1: from test import support
I'm not sure this deserves a whole new file. Couldn't UtfbTest go in
test_codecs and FileTests go in test_os?

http://codereview.appspot.com/52095/diff/1/10#newcode2
Line 2: import unittest, shutil, os, sys
PEP 8 says these should be on separate lines.

http://codereview.appspot.com/52095/diff/1/10#newcode33
Line 33: if os.name != 'win32':
Isn't os.name "nt" on Windows?

Also, you could skip this whole class on Windows by decorating the class
with @unittest.skipIf(sys.platform.startswith("win")).

http://codereview.appspot.com/52095/diff/1/10#newcode47
Line 47: f = open(self.bdir + b"/" + fn, "w")
Looks like a good place for os.path.join.

http://codereview.appspot.com/52095/diff/1/13
File Modules/posixmodule.c (right):

http://codereview.appspot.com/52095/diff/1/13#newcode547
Line 547: else if(PyByteArray_Check(o)) {
There's a small white space problem here.

http://codereview.appspot.com/52095/diff/1/13#newcode548
Line 548: if (lock && o->ob_type->tp_as_buffer->bf_getbuffer(o, NULL, 0)
< 0)
I believe you want PyObject_GetBuffer here.

http://codereview.appspot.com/52095/diff/1/13#newcode596
Line 596: bytes2str(name, 0));
What if byte2str returns NULL?

http://codereview.appspot.com/52095/diff/1/2
File Python/codecs.c (right):

http://codereview.appspot.com/52095/diff/1/2#newcode852
Line 852: if (!res)
This leaks "object".

http://codereview.appspot.com/52095/diff/1/2#newcode895
Line 895: return NULL;
This leaks "object".

http://codereview.appspot.com/52095
msg87098 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-04 05:00
Reviewers: report_bugs.python.org, Benjamin,

Message:
Fixed in r72272

http://codereview.appspot.com/52095/diff/1/5
File Include/unicodeobject.h (right):

http://codereview.appspot.com/52095/diff/1/5#newcode1254
Line 1254: The function is intended to be used for paths and file names
only
On 2009/05/03 21:34:26, Benjamin wrote:
> If these are only meant to be used internally during bootstrap, should
they have
> the _Py prefix?

Perhaps. However, I only moved the declarations from a different part of
the file, so I feel renaming them is out of scope for this patch. Only
PyUnicode_FSConverter is new.

http://codereview.appspot.com/52095/diff/1/10
File Lib/test/test_pep383.py (right):

http://codereview.appspot.com/52095/diff/1/10#newcode1
Line 1: from test import support
On 2009/05/03 21:34:26, Benjamin wrote:
> I'm not sure this deserves a whole new file. Couldn't UtfbTest go in
test_codecs
> and FileTests go in test_os?

Done.

http://codereview.appspot.com/52095/diff/1/10#newcode2
Line 2: import unittest, shutil, os, sys
On 2009/05/03 21:34:26, Benjamin wrote:
> PEP 8 says these should be on separate lines.

Done.

http://codereview.appspot.com/52095/diff/1/10#newcode33
Line 33: if os.name != 'win32':
On 2009/05/03 21:34:26, Benjamin wrote:
> Isn't os.name "nt" on Windows?

class with
> @unittest.skipIf(sys.platform.startswith("win")).

Done. It's now in a non-Win32 section of test_os.

http://codereview.appspot.com/52095/diff/1/10#newcode47
Line 47: f = open(self.bdir + b"/" + fn, "w")
On 2009/05/03 21:34:26, Benjamin wrote:
> Looks like a good place for os.path.join.

Done.

http://codereview.appspot.com/52095/diff/1/13
File Modules/posixmodule.c (right):

http://codereview.appspot.com/52095/diff/1/13#newcode547
Line 547: else if(PyByteArray_Check(o)) {
On 2009/05/03 21:34:26, Benjamin wrote:
> There's a small white space problem here.

Sorry, I don't see it: what is the problem?

http://codereview.appspot.com/52095/diff/1/13#newcode548
Line 548: if (lock && o->ob_type->tp_as_buffer->bf_getbuffer(o, NULL, 0)
< 0)
On 2009/05/03 21:34:26, Benjamin wrote:
> I believe you want PyObject_GetBuffer here.

Done. Notice, however, the loss of symmetry with release_bytes, where an
equivalent API function does not appear to exist.

http://codereview.appspot.com/52095/diff/1/13#newcode596
Line 596: bytes2str(name, 0));
On 2009/05/03 21:34:26, Benjamin wrote:
> What if byte2str returns NULL?

It really shouldn't; bytes2str should only be applied to results of the
PyUnicode_FSConverter, which should have checked that the result is
either bytes or bytearray. I have now changed bytes2str to give a
FatalError otherwise.

http://codereview.appspot.com/52095/diff/1/2
File Python/codecs.c (right):

http://codereview.appspot.com/52095/diff/1/2#newcode852
Line 852: if (!res)
On 2009/05/03 21:34:26, Benjamin wrote:
> This leaks "object".

Done. Again :-(

http://codereview.appspot.com/52095/diff/1/2#newcode895
Line 895: return NULL;
On 2009/05/03 21:34:26, Benjamin wrote:
> This leaks "object".

Done.

Please review this at http://codereview.appspot.com/52095

Affected files:
   M     Doc/library/codecs.rst
   M     Doc/library/os.rst
   M     Include/unicodeobject.h
   A     Lib/test/test_pep383.py
   M     Modules/_io/fileio.c
   M     Modules/posixmodule.c
   M     Modules/python.c
   M     Objects/unicodeobject.c
   M     Python/codecs.c
   M     Python/pythonrun.c
   M     configure
   M     configure.in
   M     pyconfig.h.in
msg87120 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-04 12:25
A couple of really small things.

http://codereview.appspot.com/52095/diff/30/1013
File Lib/test/test_codecs.py (right):

http://codereview.appspot.com/52095/diff/30/1013#newcode1545
Line 1545: b"foo\xa5bar")
You decode with iso-8859-3 but encode with iso-8859-4. Is it intended?

http://codereview.appspot.com/52095/diff/30/1014
File Lib/test/test_os.py (right):

http://codereview.appspot.com/52095/diff/30/1014#newcode704
Line 704: filenames = [b'foo\xf6bar', b'foo\xf6bar']
Is my sight bad, or is this twice the same filename?

http://codereview.appspot.com/52095/diff/30/1016
File Modules/python.c (right):

http://codereview.appspot.com/52095/diff/30/1016#newcode47
Line 47: /* Try conversion with mbsrtwocs (C99), and escape
non-decodable bytes. */
Typo: s/mbsrtwocs/mbrtowc/

http://codereview.appspot.com/52095
msg87173 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-05-04 20:53
Aside from Antoine's comments, I think it looks good now.
msg87207 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-05 04:02
Fixed in r72310

http://codereview.appspot.com/52095/diff/30/1013
File Lib/test/test_codecs.py (right):

http://codereview.appspot.com/52095/diff/30/1013#newcode1545
Line 1545: b"foo\xa5bar")
On 2009/05/04 12:25:43, Antoine Pitrou wrote:
> You decode with iso-8859-3 but encode with iso-8859-4. Is it intended?

No, fixed.

http://codereview.appspot.com/52095/diff/30/1014
File Lib/test/test_os.py (right):

http://codereview.appspot.com/52095/diff/30/1014#newcode704
Line 704: filenames = [b'foo\xf6bar', b'foo\xf6bar']
On 2009/05/04 12:25:43, Antoine Pitrou wrote:
> Is my sight bad, or is this twice the same filename?

Oops, fixed.

http://codereview.appspot.com/52095/diff/30/1016
File Modules/python.c (right):

http://codereview.appspot.com/52095/diff/30/1016#newcode47
Line 47: /* Try conversion with mbsrtwocs (C99), and escape
non-decodable bytes. */
On 2009/05/04 12:25:43, Antoine Pitrou wrote:
> Typo: s/mbsrtwocs/mbrtowc/

Done.

http://codereview.appspot.com/52095
msg87209 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-05 04:43
Committed as r72313
msg93731 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-10-08 00:07
It would be useful to have the surrogateescape error handler backported
to 2.7 to make it easier to start handling the kind of data it is needed
for.
msg93924 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-10-13 15:24
I'm skeptical that a backport is a useful thing. First, the handler is 
primarily meant for use with PEP 383, which means that it's all 
internal; few applications will ever need to use it explicitly.

Furthermore, applications/libraries that do use it most likely will have 
to support 2.6 and earlier, as well, so they aren't helped with the 
handler. My advise to such libraries is that they should include and 
register their own implementation of it, or else fall back to "strict" 
on 2.x (say).

In any case: this issue is closed by PEP 383 being implemented. If you 
still desire a backport, please create a new report.
History
Date User Action Args
2009-10-13 15:24:25loewissetmessages: + msg93924
2009-10-08 00:07:04exarkunsetnosy: + exarkun
messages: + msg93731
2009-05-05 04:44:02loewissetstatus: open -> closed
resolution: accepted
messages: + msg87209
2009-05-05 04:02:26loewissetmessages: + msg87207
2009-05-04 20:53:43benjamin.petersonsetassignee: benjamin.peterson -> loewis
messages: + msg87173
2009-05-04 12:25:44pitrousetnosy: + pitrou
messages: + msg87120
2009-05-04 05:07:03loewissetfiles: + 383.diff
2009-05-04 05:04:56loewissetfiles: - 383.diff
2009-05-04 05:01:07loewissetmessages: + msg87098
2009-05-03 21:41:40benjamin.petersonsetmessages: + msg87074
2009-05-03 20:52:48loewissetmessages: + msg87071
2009-05-03 20:42:40loewissetmessages: + msg87070
2009-05-03 20:38:14loewiscreate