classification
Title: Missing documentation for codecs.escape_decode
Type: Stage: patch review
Components: Documentation Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: asvetlov, carlbordum, docs@python, gregory.p.smith, mdartiailh, njs, paulehoffman, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-06-07 15:02 by mdartiailh, last changed 2019-07-14 14:55 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 14747 open carlbordum, 2019-07-13 14:19
Messages (9)
msg295342 - (view) Author: Matthieu Dartiailh (mdartiailh) * Date: 2017-06-07 15:02
codecs.escape_decode does not appear in the codecs documentation. This function is to my knowledge the only convenient way to process the escaped characters in a literal string (actually found here https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python). It is most useful when implementing a parser for a language extending python semantic while retaining python processing of string (cf https://github.com/MatthieuDartiailh/enaml).

Is there a reason for that function not being documented ?
msg295344 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-07 15:22
This is an internal function kept for compatibility. It is used only for decoding pickle protocol 0 data created in Python 2. Look at unicode_escape and raw_unicode_escape codecs for doing similar decoding to strings in Python 3.
msg295347 - (view) Author: Matthieu Dartiailh (mdartiailh) * Date: 2017-06-07 15:36
The issue is that unicode_escape will not properly handle strings mixing
unicode character and escaped character as it assumes latin-1 compatible
characters only. For example, given the literal string 'Δ\nΔ', one
cannot encode using latin-1 and encoding it using utf-8 then using
unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using
codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives
the proper output. Internally the Python parser handle this case but I
was unable to find where and this is the closest solution I found. I
guess it may be possible using error handlers but it seems much more
cumbersome.

Best regards

Matthieu
msg327259 - (view) Author: Paul Hoffman (paulehoffman) * Date: 2018-10-06 21:48
Bumping this thread a bit. It appears that this "internal" function is being talked about out in the real world. I came across it in a recent blog post, saw that it wasn't in the official documentation, and went looking here.

I propose that it be documented even if it feels like a tad of a kludge.
msg327268 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-07 07:58
-1
Internal function means: you can use it on your risk but the function can be changed or even removed in any Python release.
I see no point in documenting and making it public.
msg339469 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-05 01:03
We can't change it or remove it, it is public by virtue of its name.  We should document it.

Removing or renaming it to be _private requires a PendingDeprecationWarning -> DeprecationWarning -> removal cycle.  it is well known and used.

https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3/23151714#23151714
msg347827 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-13 14:32
I disagree. We can change, rename or remove it because it is not public function and never was. But we can not just remove it while it is used in the pickle module, and there is no reason to change it as it works pretty good for its purpose.

If you want to make it public and maintain it, I suggest first discuss this on the Python-Ideas mailing list. You should prove that the benefit of adding it is larger than the cost of the maintance.
msg347919 - (view) Author: Carl Bordum Hansen (carlbordum) * Date: 2019-07-14 14:30
You have a point, the function is not in codecs.__all__. Reading the stackoverflow questions, it seems like this is a function that is useful.
msg347922 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-14 14:55
Reading the stackoverflow questions, I am not sure that this function would be useful for the author of the question. He just needs to remove b'\\000', this is only what we know. There are many ways to do it, and after using codecs.escape_decode() you will need to remove b'\000'.

If you want to add a feature similar to the "string-escape" codec in Python 3, it is better to provide it officially as a new codec "bytes-escape" (functions like codecs.utf_16_le_decode() are internal). But we should discuss its behavior taking to account the difference between string literals in Python 2 and bytes literals in Python 3. For example how to treat non-escaped non-ascii bytes (they where acceptable in Python 2, but not in Python 3).
History
Date User Action Args
2019-07-14 14:55:14serhiy.storchakasetmessages: + msg347922
2019-07-14 14:30:25carlbordumsetnosy: + carlbordum
messages: + msg347919
2019-07-13 14:32:02serhiy.storchakasetmessages: + msg347827
2019-07-13 14:19:48carlbordumsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14542
2019-04-05 01:03:44gregory.p.smithsetstage: needs patch
2019-04-05 01:03:29gregory.p.smithsetnosy: + gregory.p.smith, njs

messages: + msg339469
versions: + Python 3.8, - Python 3.3, Python 3.4, Python 3.5, Python 3.6
2019-04-05 01:02:19gregory.p.smithlinkissue36530 superseder
2018-10-07 07:58:25asvetlovsetnosy: + asvetlov
messages: + msg327268
2018-10-06 21:48:10paulehoffmansetnosy: + paulehoffman
messages: + msg327259
2017-06-07 15:36:21mdartiailhsetmessages: + msg295347
2017-06-07 15:22:14serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg295344
2017-06-07 15:02:55mdartiailhcreate