Message 83700 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	LambertDW, amaury.forgeotdarc, ocean-city, vstinner
Date	2009-03-17.21:40:24
SpamBayes Score	2.2604474e-11
Marked as misclassified	No
Message-id	<1237326030.9.0.435545192021.issue2382@psf.upfronthosting.co.za>
In-reply-to

Content
Proof of concept of patch fixing this issue: - parse_syntax_error() reads the text line into a PyUnicodeObject* instead of a "const char**" - create utf8_to_unicode_offset(): convert byte offset to a number of characters. The Python version should be something like: def utf8_to_unicode_offset(text, byte_offset): utf8 = text.encode("utf-8") utf8 = utf8[:byte_offset] text = str(utf8, "utf-8") return len(text) - reuse adjust_offset() from py3k_adjust_cursor_at_syntax_error_v2.patch, but force the use of wcswidth() because HAVE_WCSWIDTH is not defined by configure - print_error_text() works on unicode characters and not on bytes! The patch should be refactorized: - move adjust_offset(), utf8_to_unicode_offset(), utf8_len() in unicodeobject.c. You might create a new method "width()" for the unicode type. This method can be used to fix center(), ljust() and rjust() unicode methods (see issue #3446).

Proof of concept of patch fixing this issue:
 - parse_syntax_error() reads the text line into a PyUnicodeObject* 
instead of a "const char**"
 - create utf8_to_unicode_offset(): convert byte offset to a number of 
characters. The Python version should be something like:

   def utf8_to_unicode_offset(text, byte_offset):
      utf8 = text.encode("utf-8")
      utf8 = utf8[:byte_offset]
      text = str(utf8, "utf-8")
      return len(text)

 - reuse adjust_offset() from 
py3k_adjust_cursor_at_syntax_error_v2.patch, but force the use of 
wcswidth() because HAVE_WCSWIDTH is not defined by configure
 - print_error_text() works on unicode characters and not on bytes!

The patch should be refactorized:
 - move adjust_offset(), utf8_to_unicode_offset(), utf8_len() in 
unicodeobject.c. You might create a new method "width()" for the 
unicode type. This method can be used to fix center(), ljust() and 
rjust() unicode methods (see issue #3446).

History
Date	User	Action	Args
2009-03-17 21:40:31	vstinner	set	recipients: + vstinner, amaury.forgeotdarc, ocean-city, LambertDW
2009-03-17 21:40:30	vstinner	set	messageid: <1237326030.9.0.435545192021.issue2382@psf.upfronthosting.co.za>
2009-03-17 21:40:27	vstinner	link	issue2382 messages
2009-03-17 21:40:26	vstinner	create