msg291893 - (view) |
Author: Kyle Glowacki (LawfulEvil) |
Date: 2017-04-19 18:22 |
Looking in the man pages for the uuencode and uudecode (http://www.manpagez.com/man/5/uuencode/), I see that the encoding used to go from ascii 32 to 95 but that 32 is deprecated and generally newer releases go from 33-96 (with 96 being used in place of 32). This replaces the " " in the encoding with "`".
For example, the newest version of busybox only accepts the new encoding.
The uu package has no way to specify to use this new encoding making it a pain to integrate. Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.
|
msg292416 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-04-27 10:23 |
Looks like perl has already encoded in this way:
[~]$ perl -e 'print pack("u","Ca\x00t")'
$0V$`=```
> Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.
The decoder source code explicitly states it could resolve backtick since some encoders use '`' instead of space.
To maintain backwards compatibility, I think we can add a keyword-only backtick parameter to binascii.b2a_uu and uuencode.
|
msg292464 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-27 16:40 |
Is there any standard?
From Wikipedia [1]:
"""
Note that 96 ("`" grave accent) is a character that is seen in uuencoded files but is typically only used to signify a 0-length line, usually at the end of a file. It will never naturally occur in the actual converted data since it is outside the range of 32 to 95. The sole exception to this is that some uuencoding programs use the grave accent to signify padding bytes instead of a space. However, the character used for the padding byte is not standardized, so either is a possibility.
"""
This obviously makes impossible using "`" as zero instead of space.
[1] https://en.wikipedia.org/wiki/Uuencoding#Uuencode_table
|
msg292465 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-04-27 16:57 |
There seems no standard. I also read the wikipedia but for perl and uuencode on my Linux, they now all use backticks to represent zero instead of spaces.
[~]$ perl -e 'print pack("u","Ca\x00t")'
$0V$`=```
[~]$ cat /tmp/test
Ca[~]$ uuencode /tmp/test -
begin 664 -
"0V$`
`
end
while Python now:
>>> import uu
>>> uu.encode('/tmp/test', '-')
begin 664 test
"0V$
end
Except the link Kyle gives, the manpage of FreeBSD describes the new algorithm: http://www.unix.com/man-page/freebsd/5/uuencode/
I don't propose to change current behaviour to break backwards compatibility. But I think it's reasonable to provide a way to allow users to use backticks.
|
msg292466 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-27 17:39 |
What about other popular languages? Java, PHP, Ruby, Tcl, C#, JavaScript, Swift, Go, Rust? Do any languages provide a way for configuring zero character and what are the names of the options? Are there languages that use "`" instead of a space only for padding, but not for representing an ordinal zero?
|
msg292491 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2017-04-28 00:50 |
FWIW I am using NXP LPC microcontrollers at the moment, whose bootloader uses the grave/backtick instead of spaces. (NXP application note AN11229.) Although in practice it does seem to accept Python's spaces instead of graves.
I wouldn't put too much weight to Wikipedia, especially where it says graves are not used for encoded data (vs length and padding). Earlier versions of Wikipedia did mention graves in regular data.
I understand the reason for avoiding spaces is to due to spaces being stripped (e.g. by email, copy and paste, etc). You have to avoid spaces in data, not just padding, because a data space may still appear at the end of a line.
|
msg292513 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-04-28 05:52 |
Uuencode has no official standards and it all depends on the implementation. For other languages, I could only find PHP, java, activetcl? have official implementation. PHP and activetcl defaults to backticks and no options. Java defaults to spaces and no options.
|
msg292515 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-28 06:29 |
Thanks Martin and Xiang. Wikipedia is not a reliable source, but it usually is based on reliable sources. In this case seems it is wrong.
The next question is about parameter name. The Wikipedia uses the name "grave accent", the manpage of FreeBSD uuencode uses the name "backquote", the proposed patch uses the name "backtick". "Grave accent" is an official Unicode name, "backquote" and "backtick" are commonly used in programming context. We could use also the name containing "space" with the default value True.
|
msg292516 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-04-28 06:39 |
I think "grave accent" is not suitable. Although it's the standard unicode name but it's not commonly used in programming so not direct enough. "backquote" and "backtick" seems could be used interchangeably I don't have any preference. Perl seems to use backtick instead of backquote when ` is a language part. Yeah, space is also a choice.
|
msg292517 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-28 07:25 |
Python 2 used the term "backquote" when ` is a language part.
|
msg292518 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-04-28 07:38 |
token defines it as backquote. But in doc there are also several places calling it backticks[1][2]. Do you have any preference Serhiy and Martin?
[1] https://docs.python.org/release/3.0.1/whatsnew/3.0.html#removed-syntax
[2] https://docs.python.org/2/library/2to3.html?highlight=backtick#2to3fixer-repr
|
msg292559 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2017-04-29 03:26 |
I think I would prefer b2a_uu(data, grave=True), but am also happy with Xiang’s backtick=True if others prefer that. :) In my mind “grave accent” is the pure ASCII character; it just got abused for other things. Other options:
b2a_uu(data, space=False)
b2a_uu(data, avoid_spaces=True)
b2a_uu(data, use_0x60=True)
|
msg292579 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-29 08:54 |
I'm +0 for something containing "space" in the option name since the purpose of changing the UU encoding is avoiding stripping spaces. But this is not strong preference.
Actually there is no need to add a new option in b2a_uu(), since we can just use b2a_uu(data).replace(b' ', b'`'). It is added just for convenience.
|
msg292834 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-05-03 03:16 |
New changeset 13f1f423fac39f8f14a3ce919dd236975517d5c6 by Xiang Zhang in branch 'master':
bpo-30103: Allow Uuencode in Python using backtick as zero instead of space (#1326)
https://github.com/python/cpython/commit/13f1f423fac39f8f14a3ce919dd236975517d5c6
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:45 | admin | set | github: 74289 |
2017-05-03 03:18:36 | xiang.zhang | set | status: open -> closed resolution: fixed stage: resolved |
2017-05-03 03:16:23 | xiang.zhang | set | messages:
+ msg292834 |
2017-04-29 08:54:13 | serhiy.storchaka | set | messages:
+ msg292579 |
2017-04-29 03:26:31 | martin.panter | set | messages:
+ msg292559 |
2017-04-28 07:38:58 | xiang.zhang | set | messages:
+ msg292518 |
2017-04-28 07:25:11 | serhiy.storchaka | set | messages:
+ msg292517 |
2017-04-28 06:39:29 | xiang.zhang | set | messages:
+ msg292516 |
2017-04-28 06:29:52 | serhiy.storchaka | set | messages:
+ msg292515 |
2017-04-28 05:52:41 | xiang.zhang | set | messages:
+ msg292513 |
2017-04-28 00:50:14 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg292491
|
2017-04-27 17:39:56 | serhiy.storchaka | set | messages:
+ msg292466 |
2017-04-27 16:57:26 | xiang.zhang | set | messages:
+ msg292465 |
2017-04-27 16:40:12 | serhiy.storchaka | set | messages:
+ msg292464 |
2017-04-27 16:08:15 | xiang.zhang | set | pull_requests:
+ pull_request1437 |
2017-04-27 10:23:42 | xiang.zhang | set | nosy:
+ serhiy.storchaka, xiang.zhang messages:
+ msg292416
|
2017-04-19 18:30:17 | r.david.murray | set | components:
+ Library (Lib), - Extension Modules |
2017-04-19 18:30:04 | r.david.murray | set | type: behavior -> enhancement versions:
+ Python 3.7, - Python 3.4 |
2017-04-19 18:22:20 | LawfulEvil | create | |