Issue5915
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-05-03 20:38 by loewis, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
383.diff | loewis, 2009-05-04 05:05 |
Messages (11) | |||
---|---|---|---|
msg87069 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-03 20:37 | |
This is the implementation of PEP 383. Rietveld submission will follow. |
|||
msg87070 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-03 20:42 | |
http://codereview.appspot.com/52095 |
|||
msg87071 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-03 20:52 | |
Some notes, in addition to the PEP: - the file system APIs have all stopped using et converters, and use O& converters, with a new API function PyUnicode_FSConverter. This outputs a bytes or bytearray object which needs to be released when done; if it is a bytearray, the bytes also need to be locked when the GIL is released. - to convert the environment successfully, initialization of the filesystemddefaultencoding had to be moved up in the code. - conversion of the command line arguments can't use codecs. Instead, it uses the C library, hoping that their encoding is the same as the one that we later determine as the file system encoding. With C99 mbrtowc, it is possible to put utf8b escapes in the output; without that function, every non-ASCII byte gets escaped if the CRT fails to convert the entire string successfully. - in the unlikely case that listdir still fails (if the file system encoding cannot convert ASCII bytes sometimes), listdir now raises an exception, to be consistent with the failure mode in all other places where this problem may occur. |
|||
msg87074 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2009-05-03 21:41 | |
The calls to bytes2str are not checked in posixmodule.c. http://codereview.appspot.com/52095/diff/1/5 File Include/unicodeobject.h (right): http://codereview.appspot.com/52095/diff/1/5#newcode1254 Line 1254: The function is intended to be used for paths and file names only If these are only meant to be used internally during bootstrap, should they have the _Py prefix? http://codereview.appspot.com/52095/diff/1/10 File Lib/test/test_pep383.py (right): http://codereview.appspot.com/52095/diff/1/10#newcode1 Line 1: from test import support I'm not sure this deserves a whole new file. Couldn't UtfbTest go in test_codecs and FileTests go in test_os? http://codereview.appspot.com/52095/diff/1/10#newcode2 Line 2: import unittest, shutil, os, sys PEP 8 says these should be on separate lines. http://codereview.appspot.com/52095/diff/1/10#newcode33 Line 33: if os.name != 'win32': Isn't os.name "nt" on Windows? Also, you could skip this whole class on Windows by decorating the class with @unittest.skipIf(sys.platform.startswith("win")). http://codereview.appspot.com/52095/diff/1/10#newcode47 Line 47: f = open(self.bdir + b"/" + fn, "w") Looks like a good place for os.path.join. http://codereview.appspot.com/52095/diff/1/13 File Modules/posixmodule.c (right): http://codereview.appspot.com/52095/diff/1/13#newcode547 Line 547: else if(PyByteArray_Check(o)) { There's a small white space problem here. http://codereview.appspot.com/52095/diff/1/13#newcode548 Line 548: if (lock && o->ob_type->tp_as_buffer->bf_getbuffer(o, NULL, 0) < 0) I believe you want PyObject_GetBuffer here. http://codereview.appspot.com/52095/diff/1/13#newcode596 Line 596: bytes2str(name, 0)); What if byte2str returns NULL? http://codereview.appspot.com/52095/diff/1/2 File Python/codecs.c (right): http://codereview.appspot.com/52095/diff/1/2#newcode852 Line 852: if (!res) This leaks "object". http://codereview.appspot.com/52095/diff/1/2#newcode895 Line 895: return NULL; This leaks "object". http://codereview.appspot.com/52095 |
|||
msg87098 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-04 05:00 | |
Reviewers: report_bugs.python.org, Benjamin, Message: Fixed in r72272 http://codereview.appspot.com/52095/diff/1/5 File Include/unicodeobject.h (right): http://codereview.appspot.com/52095/diff/1/5#newcode1254 Line 1254: The function is intended to be used for paths and file names only On 2009/05/03 21:34:26, Benjamin wrote: > If these are only meant to be used internally during bootstrap, should they have > the _Py prefix? Perhaps. However, I only moved the declarations from a different part of the file, so I feel renaming them is out of scope for this patch. Only PyUnicode_FSConverter is new. http://codereview.appspot.com/52095/diff/1/10 File Lib/test/test_pep383.py (right): http://codereview.appspot.com/52095/diff/1/10#newcode1 Line 1: from test import support On 2009/05/03 21:34:26, Benjamin wrote: > I'm not sure this deserves a whole new file. Couldn't UtfbTest go in test_codecs > and FileTests go in test_os? Done. http://codereview.appspot.com/52095/diff/1/10#newcode2 Line 2: import unittest, shutil, os, sys On 2009/05/03 21:34:26, Benjamin wrote: > PEP 8 says these should be on separate lines. Done. http://codereview.appspot.com/52095/diff/1/10#newcode33 Line 33: if os.name != 'win32': On 2009/05/03 21:34:26, Benjamin wrote: > Isn't os.name "nt" on Windows? class with > @unittest.skipIf(sys.platform.startswith("win")). Done. It's now in a non-Win32 section of test_os. http://codereview.appspot.com/52095/diff/1/10#newcode47 Line 47: f = open(self.bdir + b"/" + fn, "w") On 2009/05/03 21:34:26, Benjamin wrote: > Looks like a good place for os.path.join. Done. http://codereview.appspot.com/52095/diff/1/13 File Modules/posixmodule.c (right): http://codereview.appspot.com/52095/diff/1/13#newcode547 Line 547: else if(PyByteArray_Check(o)) { On 2009/05/03 21:34:26, Benjamin wrote: > There's a small white space problem here. Sorry, I don't see it: what is the problem? http://codereview.appspot.com/52095/diff/1/13#newcode548 Line 548: if (lock && o->ob_type->tp_as_buffer->bf_getbuffer(o, NULL, 0) < 0) On 2009/05/03 21:34:26, Benjamin wrote: > I believe you want PyObject_GetBuffer here. Done. Notice, however, the loss of symmetry with release_bytes, where an equivalent API function does not appear to exist. http://codereview.appspot.com/52095/diff/1/13#newcode596 Line 596: bytes2str(name, 0)); On 2009/05/03 21:34:26, Benjamin wrote: > What if byte2str returns NULL? It really shouldn't; bytes2str should only be applied to results of the PyUnicode_FSConverter, which should have checked that the result is either bytes or bytearray. I have now changed bytes2str to give a FatalError otherwise. http://codereview.appspot.com/52095/diff/1/2 File Python/codecs.c (right): http://codereview.appspot.com/52095/diff/1/2#newcode852 Line 852: if (!res) On 2009/05/03 21:34:26, Benjamin wrote: > This leaks "object". Done. Again :-( http://codereview.appspot.com/52095/diff/1/2#newcode895 Line 895: return NULL; On 2009/05/03 21:34:26, Benjamin wrote: > This leaks "object". Done. Please review this at http://codereview.appspot.com/52095 Affected files: M Doc/library/codecs.rst M Doc/library/os.rst M Include/unicodeobject.h A Lib/test/test_pep383.py M Modules/_io/fileio.c M Modules/posixmodule.c M Modules/python.c M Objects/unicodeobject.c M Python/codecs.c M Python/pythonrun.c M configure M configure.in M pyconfig.h.in |
|||
msg87120 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-05-04 12:25 | |
A couple of really small things. http://codereview.appspot.com/52095/diff/30/1013 File Lib/test/test_codecs.py (right): http://codereview.appspot.com/52095/diff/30/1013#newcode1545 Line 1545: b"foo\xa5bar") You decode with iso-8859-3 but encode with iso-8859-4. Is it intended? http://codereview.appspot.com/52095/diff/30/1014 File Lib/test/test_os.py (right): http://codereview.appspot.com/52095/diff/30/1014#newcode704 Line 704: filenames = [b'foo\xf6bar', b'foo\xf6bar'] Is my sight bad, or is this twice the same filename? http://codereview.appspot.com/52095/diff/30/1016 File Modules/python.c (right): http://codereview.appspot.com/52095/diff/30/1016#newcode47 Line 47: /* Try conversion with mbsrtwocs (C99), and escape non-decodable bytes. */ Typo: s/mbsrtwocs/mbrtowc/ http://codereview.appspot.com/52095 |
|||
msg87173 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2009-05-04 20:53 | |
Aside from Antoine's comments, I think it looks good now. |
|||
msg87207 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-05 04:02 | |
Fixed in r72310 http://codereview.appspot.com/52095/diff/30/1013 File Lib/test/test_codecs.py (right): http://codereview.appspot.com/52095/diff/30/1013#newcode1545 Line 1545: b"foo\xa5bar") On 2009/05/04 12:25:43, Antoine Pitrou wrote: > You decode with iso-8859-3 but encode with iso-8859-4. Is it intended? No, fixed. http://codereview.appspot.com/52095/diff/30/1014 File Lib/test/test_os.py (right): http://codereview.appspot.com/52095/diff/30/1014#newcode704 Line 704: filenames = [b'foo\xf6bar', b'foo\xf6bar'] On 2009/05/04 12:25:43, Antoine Pitrou wrote: > Is my sight bad, or is this twice the same filename? Oops, fixed. http://codereview.appspot.com/52095/diff/30/1016 File Modules/python.c (right): http://codereview.appspot.com/52095/diff/30/1016#newcode47 Line 47: /* Try conversion with mbsrtwocs (C99), and escape non-decodable bytes. */ On 2009/05/04 12:25:43, Antoine Pitrou wrote: > Typo: s/mbsrtwocs/mbrtowc/ Done. http://codereview.appspot.com/52095 |
|||
msg87209 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-05 04:43 | |
Committed as r72313 |
|||
msg93731 - (view) | Author: Jean-Paul Calderone (exarkun) * | Date: 2009-10-08 00:07 | |
It would be useful to have the surrogateescape error handler backported to 2.7 to make it easier to start handling the kind of data it is needed for. |
|||
msg93924 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-10-13 15:24 | |
I'm skeptical that a backport is a useful thing. First, the handler is primarily meant for use with PEP 383, which means that it's all internal; few applications will ever need to use it explicitly. Furthermore, applications/libraries that do use it most likely will have to support 2.6 and earlier, as well, so they aren't helped with the handler. My advise to such libraries is that they should include and register their own implementation of it, or else fall back to "strict" on 2.x (say). In any case: this issue is closed by PEP 383 being implemented. If you still desire a backport, please create a new report. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:48 | admin | set | github: 50165 |
2009-10-13 15:24:25 | loewis | set | messages: + msg93924 |
2009-10-08 00:07:04 | exarkun | set | nosy:
+ exarkun messages: + msg93731 |
2009-05-05 04:44:02 | loewis | set | status: open -> closed resolution: accepted messages: + msg87209 |
2009-05-05 04:02:26 | loewis | set | messages: + msg87207 |
2009-05-04 20:53:43 | benjamin.peterson | set | assignee: benjamin.peterson -> loewis messages: + msg87173 |
2009-05-04 12:25:44 | pitrou | set | nosy:
+ pitrou messages: + msg87120 |
2009-05-04 05:07:03 | loewis | set | files: + 383.diff |
2009-05-04 05:04:56 | loewis | set | files: - 383.diff |
2009-05-04 05:01:07 | loewis | set | messages: + msg87098 |
2009-05-03 21:41:40 | benjamin.peterson | set | messages: + msg87074 |
2009-05-03 20:52:48 | loewis | set | messages: + msg87071 |
2009-05-03 20:42:40 | loewis | set | messages: + msg87070 |
2009-05-03 20:38:14 | loewis | create |