classification
Title: uu package uses old encoding
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: LawfulEvil, martin.panter, serhiy.storchaka, xiang.zhang
Priority: normal Keywords:

Created on 2017-04-19 18:22 by LawfulEvil, last changed 2017-05-03 03:18 by xiang.zhang. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1326 merged xiang.zhang, 2017-04-27 16:08
Messages (14)
msg291893 - (view) Author: Kyle Glowacki (LawfulEvil) Date: 2017-04-19 18:22
Looking in the man pages for the uuencode and uudecode (http://www.manpagez.com/man/5/uuencode/), I see that the encoding used to go from ascii 32 to 95 but that 32 is deprecated and generally newer releases go from 33-96 (with 96 being used in place of 32).   This replaces the " " in the encoding with "`".  

For example, the newest version of busybox only accepts the new encoding.

The uu package has no way to specify to use this new encoding making it a pain to integrate.   Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.
msg292416 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-27 10:23
Looks like perl has already encoded in this way:

[~]$ perl -e 'print pack("u","Ca\x00t")'
$0V$`=```

> Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.

The decoder source code explicitly states it could resolve backtick since some encoders use '`' instead of space.

To maintain backwards compatibility, I think we can add a keyword-only backtick parameter to binascii.b2a_uu and uuencode.
msg292464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-27 16:40
Is there any standard?

From Wikipedia [1]:

"""
Note that 96 ("`" grave accent) is a character that is seen in uuencoded files but is typically only used to signify a 0-length line, usually at the end of a file. It will never naturally occur in the actual converted data since it is outside the range of 32 to 95. The sole exception to this is that some uuencoding programs use the grave accent to signify padding bytes instead of a space. However, the character used for the padding byte is not standardized, so either is a possibility.
"""

This obviously makes impossible using "`" as zero instead of space.

[1] https://en.wikipedia.org/wiki/Uuencoding#Uuencode_table
msg292465 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-27 16:57
There seems no standard. I also read the wikipedia but for perl and uuencode on my Linux, they now all use backticks to represent zero instead of spaces.

[~]$ perl -e 'print pack("u","Ca\x00t")'
$0V$`=```
[~]$ cat /tmp/test
Ca[~]$ uuencode /tmp/test -
begin 664 -
"0V$`
`
end

while Python now:

>>> import uu
>>> uu.encode('/tmp/test', '-')
begin 664 test
"0V$ 
 
end

Except the link Kyle gives, the manpage of FreeBSD describes the new algorithm: http://www.unix.com/man-page/freebsd/5/uuencode/

I don't propose to change current behaviour to break backwards compatibility. But I think it's reasonable to provide a way to allow users to use backticks.
msg292466 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-27 17:39
What about other popular languages? Java, PHP, Ruby, Tcl, C#, JavaScript, Swift, Go, Rust? Do any languages provide a way for configuring zero character and what are the names of the options? Are there languages that use "`" instead of a space only for padding, but not for representing an ordinal zero?
msg292491 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-04-28 00:50
FWIW I am using NXP LPC microcontrollers at the moment, whose bootloader uses the grave/backtick instead of spaces. (NXP application note AN11229.)  Although in practice it does seem to accept Python's spaces instead of graves.

I wouldn't put too much weight to Wikipedia, especially where it says graves are not used for encoded data (vs length and padding). Earlier versions of Wikipedia did mention graves in regular data.

I understand the reason for avoiding spaces is to due to spaces being stripped (e.g. by email, copy and paste, etc). You have to avoid spaces in data, not just padding, because a data space may still appear at the end of a line.
msg292513 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-28 05:52
Uuencode has no official standards and it all depends on the implementation. For other languages, I could only find PHP, java, activetcl? have official implementation. PHP and activetcl defaults to backticks and no options. Java defaults to spaces and no options.
msg292515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-28 06:29
Thanks Martin and Xiang. Wikipedia is not a reliable source, but it usually is based on reliable sources. In this case seems it is wrong.

The next question is about parameter name. The Wikipedia uses the name "grave accent", the manpage of FreeBSD uuencode uses the name "backquote", the proposed patch uses the name "backtick". "Grave accent" is an official Unicode name, "backquote" and "backtick" are commonly used in programming context. We could use also the name containing "space" with the default value True.
msg292516 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-28 06:39
I think "grave accent" is not suitable. Although it's the standard unicode name but it's not commonly used in programming so not direct enough. "backquote" and "backtick" seems could be used interchangeably I don't have any preference. Perl seems to use backtick instead of backquote when ` is a language part. Yeah, space is also a choice.
msg292517 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-28 07:25
Python 2 used the term "backquote" when ` is a language part.
msg292518 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-28 07:38
token defines it as backquote. But in doc there are also several places calling it backticks[1][2]. Do you have any preference Serhiy and Martin?

[1] https://docs.python.org/release/3.0.1/whatsnew/3.0.html#removed-syntax
[2] https://docs.python.org/2/library/2to3.html?highlight=backtick#2to3fixer-repr
msg292559 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-04-29 03:26
I think I would prefer b2a_uu(data, grave=True), but am also happy with Xiang’s backtick=True if others prefer that. :) In my mind “grave accent” is the pure ASCII character; it just got abused for other things. Other options:

b2a_uu(data, space=False)
b2a_uu(data, avoid_spaces=True)
b2a_uu(data, use_0x60=True)
msg292579 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-29 08:54
I'm +0 for something containing "space" in the option name since the purpose of changing the UU encoding is avoiding stripping spaces. But this is not strong preference.

Actually there is no need to add a new option in b2a_uu(), since we can just use b2a_uu(data).replace(b' ', b'`'). It is added just for convenience.
msg292834 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-03 03:16
New changeset 13f1f423fac39f8f14a3ce919dd236975517d5c6 by Xiang Zhang in branch 'master':
bpo-30103: Allow Uuencode in Python using backtick as zero instead of space (#1326)
https://github.com/python/cpython/commit/13f1f423fac39f8f14a3ce919dd236975517d5c6
History
Date User Action Args
2017-05-03 03:18:36xiang.zhangsetstatus: open -> closed
resolution: fixed
stage: resolved
2017-05-03 03:16:23xiang.zhangsetmessages: + msg292834
2017-04-29 08:54:13serhiy.storchakasetmessages: + msg292579
2017-04-29 03:26:31martin.pantersetmessages: + msg292559
2017-04-28 07:38:58xiang.zhangsetmessages: + msg292518
2017-04-28 07:25:11serhiy.storchakasetmessages: + msg292517
2017-04-28 06:39:29xiang.zhangsetmessages: + msg292516
2017-04-28 06:29:52serhiy.storchakasetmessages: + msg292515
2017-04-28 05:52:41xiang.zhangsetmessages: + msg292513
2017-04-28 00:50:14martin.pantersetnosy: + martin.panter
messages: + msg292491
2017-04-27 17:39:56serhiy.storchakasetmessages: + msg292466
2017-04-27 16:57:26xiang.zhangsetmessages: + msg292465
2017-04-27 16:40:12serhiy.storchakasetmessages: + msg292464
2017-04-27 16:08:15xiang.zhangsetpull_requests: + pull_request1437
2017-04-27 10:23:42xiang.zhangsetnosy: + serhiy.storchaka, xiang.zhang
messages: + msg292416
2017-04-19 18:30:17r.david.murraysetcomponents: + Library (Lib), - Extension Modules
2017-04-19 18:30:04r.david.murraysettype: behavior -> enhancement
versions: + Python 3.7, - Python 3.4
2017-04-19 18:22:20LawfulEvilcreate