classification
Title: Cannot distinguish b"str" from "str" in ast module.
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, kayhayen
Priority: normal Keywords:

Created on 2010-08-26 00:47 by kayhayen, last changed 2010-08-27 13:18 by benjamin.peterson. This issue is now closed.

Messages (8)
msg114950 - (view) Author: Kay Hayen (kayhayen) Date: 2010-08-26 00:47
There is no way to decide if a string literal should be non-unicode when the default has been set to unicode_literals. Please see: 

>>> import ast
>>> ast.dump( ast.parse( """c = "d" """ ) )
"Module(body=[Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='d'))])"
>>> from __future__ import unicode_literals
>>> ast.dump( ast.parse( """c = "d" """ ) )
"Module(body=[Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='d'))])"
>>> ast.dump( ast.parse( """c = b"d" """ ) )
"Module(body=[Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='d'))])"
>>> ast.dump( ast.parse( """c = u"d" """ ) )
"Module(body=[Assign(targets=[Name(id='c', ctx=Store())], value=Str(s=u'd'))])"

I have checked fields, but didn't find anything either. Without an indication of the Str literal type, its type cannot be detected. In either case it is "str" and may or not have to be converted to a unicode value.
msg114952 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-08-26 01:52
You'll have to look at the compile flags or search the future flags.
msg114957 - (view) Author: Kay Hayen (kayhayen) Date: 2010-08-26 06:38
You didn't understand. Please tell me, how to decide if this is a unicode literal or a str (2.x) literal:

value=Str(s='d')

It's just not possible. When I found a "from __future__ import unicode_literals" in the code before, it means I should convert "value.s" to unicode fine. But the syntax allows with b"d" to make an exception for some strings. Your test "test_compile.py" contains it.

May I ask you to not "close" this bug therefore, as your proposal is not feasible? I really need ast.parse() to return different nodes for the string literals "d" and b"d" or else I cannot detect the non-unicode literals with unicode literals as default.
msg114970 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-08-26 12:57
I see that it's a problem, but there's nothing we can do about it now, so you'll have to determine whether it was unicode literals or not based on compile flags.
msg115015 - (view) Author: Kay Hayen (kayhayen) Date: 2010-08-26 18:08
Hello Benjamin,

thank you for the response. What do you mean with there is "nothing we can do about it". Is it not possible to add another field indicating the prefix given to a literal?

BTW: I believe raw strings are also no longer recognizable. Fortunately I do not need to do that, but look here:

>>> ast.dump( ast.parse( r"""a = r'\n'""" ) )
"Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=Str(s='\\\\n'))])"

Currently the only work around to not being able to tell if there was a b"" in the source code, is to open the file and check myself. And getting the actual raw string is not feasible at all.

So why not at least have an "ast" module that allows to decide without ambiguity what the user said. I agree that raw strings can be solved before the AST and it doesn't matter much. But I don't think it's acceptable that CPython can execute the code correctly, but using the AST nodes, there is no way to tell.

I thought they share code?

Yours,
Kay
msg115018 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-08-26 18:20
2010/8/26 Kay Hayen <report@bugs.python.org>:
>
> Kay Hayen <kayhayen@gmx.de> added the comment:
>
> Hello Benjamin,
>
> thank you for the response. What do you mean with there is "nothing we can do about it". Is it not possible to add another field indicating the prefix given to a literal?

We can't do anything about it because 2.7 has been released and a "new
flag" would be a disallowed new feature.

>
> BTW: I believe raw strings are also no longer recognizable. Fortunately I do not need to do that, but look here:

If you need to recognize those (or unicode literals for that matter),
you could look at the parser module's raw trees.
msg115055 - (view) Author: Kay Hayen (kayhayen) Date: 2010-08-27 07:04
This is to inform you that I worked around the bug by reading the source file in question and checking the indicated position. This is currently the only way to decide if a literal should be unicode or str with unicode_literals from future imported.

It goes like this:

        kind = _sources[ filename ][ node.lineno - 1][ node.col_offset ]

        if kind != 'b':
            value = unicode( node.s )
        else:
            value = node.s


I don't see how removing the ambgious representation of what I presume is a wanted language construct can be considered a new feature. But that is your decision to make.

Best regards,
Kay Hayen
msg115073 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-08-27 13:18
2010/8/27 Kay Hayen <report@bugs.python.org>:
>
> Kay Hayen <kayhayen@gmx.de> added the comment:
>
> This is to inform you that I worked around the bug by reading the source file in question and checking the indicated position. This is currently the only way to decide if a literal should be unicode or str with unicode_literals from future imported.

I see. I'm not really sure what you're problem is again because if
unicode_literals is in effect, the AST will have decoded the literal
into unicode.

"Module(body=[ImportFrom(module='__future__',
names=[alias(name='unicode_literals', asname=None)], level=0),
Expr(value=Tuple(elts=[Str(s=u's'), Str(s=u's')], ctx=Load()))])"
History
Date User Action Args
2010-08-27 13:18:12benjamin.petersonsetmessages: + msg115073
2010-08-27 07:04:11kayhayensetmessages: + msg115055
2010-08-26 18:20:50benjamin.petersonsetmessages: + msg115018
2010-08-26 18:08:18kayhayensetmessages: + msg115015
2010-08-26 12:57:59benjamin.petersonsetmessages: + msg114970
2010-08-26 06:38:47kayhayensetmessages: + msg114957
2010-08-26 01:52:07benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg114952

resolution: wont fix
2010-08-26 00:47:52kayhayencreate