This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: parser: store the filename as an unicode object
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2010-12-28 02:40 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
parser_filename_obj-3.patch vstinner, 2011-01-05 04:26
Messages (9)
msg124755 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-28 02:40
The Python parser stores the filename as a byte string. But it decodes the filename on error because most Python functions now use unicode strings. Instead of decoding the filename at error, which may raise a new error, I propose to decode the filename on the creation of the parser object and only store the filename as unicode.

This issue would prepare the last part of the full unicode support (#3080).
msg124823 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-28 22:14
I like the idea, but I don't like the trend that parser code continues to diverge from pgen.  I understand that most of the Python runtime is not available to pgen, but maybe a more elegant solution than changing the type conditional on PGEN can be found.  For example, maybe filename could be decoded from FS encoding to UTF-8?
msg124826 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-28 23:15
> maybe a more elegant solution than changing the type conditional 
> on PGEN can be found

In pgen, the filename is only used to display the following warning, in indenterror():

   <filename>: inconsistent use of tabs and spaces in indentation

In pratical, this warning never occurs on Grammar/Grammar: this file doesn't use indentation at all, only continuation lines.

A better solution is maybe just to drop the filename for pgen. Anyway, pgen only compiles *one* file (Grammar/Grammar), so we don't need the input filename ;-)
msg124827 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-28 23:16
When testing my patch, I found and fixed two bugs in pgen:
 - r87557: PGEN was not defined to compile pgenmain.c and printgrammar.c
 - r87558: pgen error was ignored on "make Parser/pgen.stamp" (when executing pgen to compile the grammar)
msg124828 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-28 23:32
Version 2 of the patch:
 - remove filename attribute from perrdetail and tok_state structure in PGEN mode, and add a comment to explain why
 - rename filename_obj to filename
 - indenterror() no longer print the input filename in PGEN mode
msg125302 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-04 11:02
err_clear() should set err->filename to NULL.
msg125409 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-05 04:26
Version 3 of the patch to fix also #9319.
msg130937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-15 01:02
@Benjamin: You told me that you don't want two versions of pgen, but I don't remember why. As my work on #3080 is mostly done, I now plan to patch the Python parser to store the filename as Unicode. So could you please review the patch attached to this issue?
msg132990 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-04 23:48
New changeset 6e9dc970ac0e by Victor Stinner in branch 'default':
Issue #10785: Store the filename as Unicode in the Python parser.
http://hg.python.org/cpython/rev/6e9dc970ac0e
History
Date User Action Args
2022-04-11 14:57:10adminsetgithub: 54994
2011-04-04 23:56:24vstinnersetstatus: open -> closed
resolution: fixed
2011-04-04 23:48:20python-devsetnosy: + python-dev
messages: + msg132990
2011-03-15 01:02:57vstinnersetnosy: belopolsky, vstinner, benjamin.peterson
messages: + msg130937
2011-01-06 13:03:32pitrousetnosy: + benjamin.peterson
2011-01-05 04:26:52vstinnersetfiles: - parser_filename_obj-2.patch
nosy: belopolsky, vstinner
2011-01-05 04:26:50vstinnersetfiles: - parser_filename_obj.patch
nosy: belopolsky, vstinner
2011-01-05 04:26:45vstinnersetfiles: + parser_filename_obj-3.patch
nosy: belopolsky, vstinner
messages: + msg125409
2011-01-04 11:02:42vstinnersetnosy: belopolsky, vstinner
messages: + msg125302
versions: - Python 3.2
2010-12-28 23:32:43vstinnersetfiles: + parser_filename_obj-2.patch
nosy: belopolsky, vstinner
messages: + msg124828
2010-12-28 23:16:39vstinnersetnosy: belopolsky, vstinner
messages: + msg124827
2010-12-28 23:15:11vstinnersetnosy: belopolsky, vstinner
messages: + msg124826
2010-12-28 22:14:19belopolskysetnosy: + belopolsky
messages: + msg124823
2010-12-28 02:50:16vstinnersetfiles: + parser_filename_obj.patch
2010-12-28 02:49:34vstinnersetfiles: - parse_filename_obj.patch
2010-12-28 02:40:20vstinnercreate