Title: ast.parse gives wrong position for some Names when non-ascii characters occur before
Messages (5)
msg198406 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:28
ast.parse gives col_offset=4 for Name x when given program "a + x" vs. col_offset=5 for x when Name a is replaced with a-umlaut (I can't write that character here, because it seems that the issue system doesn't handle non-ascii characters either).

See the attached Python shell transcript for explanation (it's in UTF-8).
msg198407 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-25 18:39
This looks similar to issue2382 and issue10382.
msg198408 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:43
Serhiy, it's similar in that it has to do with encodings, but I think it's caused by different bug.

I suspect it has something to do with the fact that tokenizer gives positions as offsets of characters and ast as offsets of UTF-8 bytes.
msg198409 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 18:47
I should explain more -- the string containing the program is read in correctly, the trouble occurs during parse.
msg198410 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2013-09-25 19:02
> ast [gives positions as] offsets of UTF-8 bytes

Oops, I'm sorry, I realized only now that this is the explanation of this behaviour. So, after all it seems to be a feature (albeit very weird), not a bug.
