Message 138484 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, loewis, ocean-city, vstinner
Date	2011-06-17.00:35:33
SpamBayes Score	1.9934339e-07
Marked as misclassified	No
Message-id	<1308270934.11601.13.camel@marge>
In-reply-to	<1308268297.4.0.0514915174467.issue12281@psf.upfronthosting.co.za>

Content
> What is the use of these code_page_encode() functions? I wrote them to be able to write tests. We can maybe use them to implement the Python code page codecs using a custom codec register function: see msg138246. Windows codecs seem to be less reliable/portable than Python builtin codecs, they behave differently depending on the Windows version. Windows codecs are maybe faster, I should (write and) run a benchmark. My main concern is to fix error handling of the Python mbcs codec. -- I am also trying to factorize the code in posixmodule.c: I would like to remove the bytes implementation of each function when a function has two implementations (bytes and Unicode) only for Windows. The idea is to decode filenames exactly as Windows do and reuse the Unicode implementation. I don't know yet how Windows do decode bytes filenames (especially how it handles undecodable bytes), I suppose that it uses MultiByteToWideChar using cp=CP_ACP and flags=0. We may patch os.fsdecode() to handle undecodable bytes like Windows does. codecs.code_page_decode() would help this specific idea, except that my current patch doesn't allow to specify directly the flags. "replace" and "ignore" error handlers don't behave as flags=0, or at least not in some cases. codecs.code_page_decode() should allow to specific an error handler or the flags (mutual exclusive options). Example: def fsdecode(filename): if isinstance(filename, bytes): return codecs.code_page_decode(codecs.CP_ACP, filename, flags=0) elif isinstance(filename, str): return filename else: raise TypeError()

> What is the use of these code_page_encode() functions?

I wrote them to be able to write tests.

We can maybe use them to implement the Python code page codecs using a
custom codec register function: see msg138246. Windows codecs seem to be
less reliable/portable than Python builtin codecs, they behave
differently depending on the Windows version. Windows codecs are maybe
faster, I should (write and) run a benchmark.

My main concern is to fix error handling of the Python mbcs codec.

--

I am also trying to factorize the code in posixmodule.c: I would like to
remove the bytes implementation of each function when a function has two
implementations (bytes and Unicode) only for Windows. The idea is to
decode filenames exactly as Windows do and reuse the Unicode
implementation. I don't know yet how Windows do decode bytes filenames
(especially how it handles undecodable bytes), I suppose that it uses
MultiByteToWideChar using cp=CP_ACP and flags=0.

We may patch os.fsdecode() to handle undecodable bytes like Windows
does. codecs.code_page_decode() would help this specific idea, except
that my current patch doesn't allow to specify directly the flags.
"replace" and "ignore" error handlers don't behave as flags=0, or at
least not in some cases. codecs.code_page_decode() should allow to
specific an error handler *or* the flags (mutual exclusive options).

Example:

def fsdecode(filename):
   if isinstance(filename, bytes):
       return codecs.code_page_decode(codecs.CP_ACP, filename, flags=0)
   elif isinstance(filename, str):
       return filename
   else:
       raise TypeError()

History
Date	User	Action	Args
2011-06-17 00:35:35	vstinner	set	recipients: + vstinner, loewis, amaury.forgeotdarc, ocean-city
2011-06-17 00:35:34	vstinner	link	issue12281 messages
2011-06-17 00:35:33	vstinner	create