New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mbcs encoding ignores errors #39624
Comments
The following snippet: >>> u'@test-\u5171'.encode("mbcs", "strict")
'@test-?' Should raise a UnicodeError. The errors param is Attaching a test case, and the start of a patch. The
Comments/guidance appreciated. |
Logged In: YES Attaching a patch. This patch also attempts to handle As I mentioned, patch has a few issues |
Logged In: YES No idea why this was assigned to me - unicode is certainly |
Logged In: YES The conventional semantics of "ignore" would be "remove the You could try to get more detailed error indication by In the .encode case, you could try using \0 as the default What is the meaning of the WC_DEFAULTCHAR flag, in I'm somewhat concerned with backwards compatibility, since |
Is this behavior still present? If so, is it still interesting to change it? |
It is still present, but I'm not sure what problems can be seen due to |
I patched py3k with mbcs_errors.patch (only encode_mbcs, not the decoder function) and most test pass: I opened bpo-8784 for test_tarfile failure. I don't think that it's a problem that mbcs only supports few error handlers, eg. 'strict', 'replace' and 'errors' (but not 'ignore' nor 'surrogateescape'). mbcs should be avoided anyway :-) It is kept for backward compatibility (with Python2). Python3 tries to avoid it by using the Unicode functions of Windows API. I don't know exactly where mbcs is still used in Python3. If mbcs becomes more strict and raise new errors, I would like to say that the problem comes from the program, not in the encodig, and the program should be fixed (especilly if the "program" is the Python standard library). About the backward compatibility with Python < 3.2: I don't know exactly if this change would be a problem or not. I bet that few people use (directly or indirectly) mbcs with Python 3.1 (on Windows), and few peple (or nobody) would notice this change. And as I wrote, if someone notices a problem: the problem should be fixed in the function using mbcs, not in the codec. |
Since this change breaks backward compatibility, it's a very bad idea to change mbcs codec in Python 2.7: remove this version from this issue. |
Updated version of the patch for py3k:
The whole test suite pass with these patch. |
I worked again on the patch. I opened new issues to prepare the new mbcs codec:
bpo-8967 can be used to get the translated message of a mbcs encode error. PyErr_GetWindowsMessage() returns a PyUnicodeObject, whereas make_translate_exception() and PyUnicodeTranslateError_SetReason() expect a "char*". Another patch is requied: translate_reason_unicode.patch (attached to this issue, not tested). But I don't think that the message is very important for now :-) bpo-8784 (tarfile/Windows: Don't use mbcs as the default encoding) is still open. |
New version of the patch:
The patch requires bpo-8969 patch (use mbcs in strict mode to encode/decode filenames). |
Tim: are you interested in testing this patch? |
Update the patch (I commited the patch on tarfile module): version 3. |
Patch version 4:
|
I commited the last patch to py3k: r82037. Let see how the buildbots react :-) |
I'm unlikely to get to it soon. If there's no urgency I can On 12/06/2010 01:02, STINNER Victor wrote:
|
Close this issue: nothing special on the buildbots. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: