classification
Title: ast.parse gives wrong position for some Names when non-ascii characters occur before
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Aivar.Annamaa, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-09-25 18:28 by Aivar.Annamaa, last changed 2013-09-28 21:25 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
bug.txt Aivar.Annamaa, 2013-09-25 18:28
Messages (5)
msg198406 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:28
ast.parse gives col_offset=4 for Name x when given program "a + x" vs. col_offset=5 for x when Name a is replaced with a-umlaut (I can't write that character here, because it seems that the issue system doesn't handle non-ascii characters either).

See the attached Python shell transcript for explanation (it's in UTF-8).
msg198407 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-25 18:39
This looks similar to issue2382 and issue10382.
msg198408 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:43
Serhiy, it's similar in that it has to do with encodings, but I think it's caused by different bug.

I suspect it has something to do with the fact that tokenizer gives positions as offsets of characters and ast as offsets of UTF-8 bytes.
msg198409 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:47
I should explain more -- the string containing the program is read in correctly, the trouble occurs during parse.
msg198410 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 19:02
> ast [gives positions as] offsets of UTF-8 bytes

Oops, I'm sorry, I realized only now that this is the explanation of this behaviour. So, after all it seems to be a feature (albeit very weird), not a bug.
History
Date User Action Args
2013-09-28 21:25:49terry.reedysetstatus: open -> closed
stage: resolved
2013-09-25 19:02:32Aivar.Annamaasetresolution: not a bug
messages: + msg198410
2013-09-25 18:47:03Aivar.Annamaasetmessages: + msg198409
2013-09-25 18:43:59Aivar.Annamaasetmessages: + msg198408
2013-09-25 18:39:03serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg198407
2013-09-25 18:28:29Aivar.Annamaacreate