Author ocean-city
Recipients ocean-city
Date 2008-03-18.05:22:29
SpamBayes Score 0.00455231
Marked as misclassified No
Message-id <1205817752.04.0.892564898323.issue2382@psf.upfronthosting.co.za>
In-reply-to
Content
Hello. I found another problem related to issue2301.
SyntaxError cursor "^" is shifted when multibyte
characters are in line (before "^").

I think this is because err->text is stored as UTF-8
which requires 3 bytes for multibyte character,
but actually cp932 (my console encoding) requires only 2 bytes for it.

So "^" is shited to right 5 bytes because there is 5 multibyte chars.

C:\Documents and Settings\WhiteRabbit>py3k x.py
push any key....

  File "x.py", line 3
    print "あいうえお"
                          ^
SyntaxError: invalid syntax
[22567 refs]

Sorry, I didn't know what PyTokenizer_RestoreEncoding really doing.
That function adjusted err_ret->offset for this encoding conversion.
So, Python2.5 can output cursor in right place. (Of course, if source
encoding is not compatible for console encoding, broken string is printed
though. Anyway, cursor is right)

C:\Documents and Settings\WhiteRabbit>py a.py
  File "a.py", line 2
    x "、「、、、ヲ、ィ、ェ"
                 ^
SyntaxError: invalid syntax
[8728 refs]

I tried to fix this problem, but I'm not sure how to fix this.
History
Date User Action Args
2008-03-18 05:22:32ocean-citysetspambayes_score: 0.00455231 -> 0.00455231
recipients: + ocean-city
2008-03-18 05:22:32ocean-citysetspambayes_score: 0.00455231 -> 0.00455231
messageid: <1205817752.04.0.892564898323.issue2382@psf.upfronthosting.co.za>
2008-03-18 05:22:30ocean-citylinkissue2382 messages
2008-03-18 05:22:29ocean-citycreate