Message83700
Proof of concept of patch fixing this issue:
- parse_syntax_error() reads the text line into a PyUnicodeObject*
instead of a "const char**"
- create utf8_to_unicode_offset(): convert byte offset to a number of
characters. The Python version should be something like:
def utf8_to_unicode_offset(text, byte_offset):
utf8 = text.encode("utf-8")
utf8 = utf8[:byte_offset]
text = str(utf8, "utf-8")
return len(text)
- reuse adjust_offset() from
py3k_adjust_cursor_at_syntax_error_v2.patch, but force the use of
wcswidth() because HAVE_WCSWIDTH is not defined by configure
- print_error_text() works on unicode characters and not on bytes!
The patch should be refactorized:
- move adjust_offset(), utf8_to_unicode_offset(), utf8_len() in
unicodeobject.c. You might create a new method "width()" for the
unicode type. This method can be used to fix center(), ljust() and
rjust() unicode methods (see issue #3446). |
|
Date |
User |
Action |
Args |
2009-03-17 21:40:31 | vstinner | set | recipients:
+ vstinner, amaury.forgeotdarc, ocean-city, LambertDW |
2009-03-17 21:40:30 | vstinner | set | messageid: <1237326030.9.0.435545192021.issue2382@psf.upfronthosting.co.za> |
2009-03-17 21:40:27 | vstinner | link | issue2382 messages |
2009-03-17 21:40:26 | vstinner | create | |
|