msg217633 - (view) |
Author: Joshua J Cogliati (Joshua.J.Cogliati) * |
Date: 2014-04-30 18:00 |
The -3 option should warn about str to bytes conversions and str to bytes comparisons:
For example in Python 3 the following happens:
python3
Python 3.3.2 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> b"a" + "a"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
>>> b"a" == "a"
False
>>>
But even python2 -3 does not warn about either of these uses:
python2 -3
Python 2.7.5 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> b"a" + "a"
'aa'
>>> b"a" == "a"
True
>>> u"a" + "a"
u'aa'
>>> u"a" == "a"
True
>>>
These two issues are some of the more significant problems I have in trying get python2 code working with python3, and if -3 does not warn about it this is harder to do.
|
msg217703 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2014-05-01 14:58 |
Unfortunately it's impossible to warn against this in Python 2 since the bytes type is just another name for the str type:
>>> str == bytes
True
>>> type(b'1')
<type 'str'>
What we could potentially do, though, is change things such that -3 does what you are after when comparing bytes/str to unicode in Python 2. Unfortunately in that instance it's still a murky question as to whether that will help things more than hurt them as some people explicitly leave strings as-is in both Python 2 and Python 3 for either speed or code simplicity reasons.
|
msg217707 - (view) |
Author: Joshua J Cogliati (Joshua.J.Cogliati) * |
Date: 2014-05-01 15:14 |
Hm. That is a good point. Possibly it could only be done when
from __future__ import unicode_literals
has been used. For example:
python2 -3
Python 2.7.5 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> type(b"a") == type("a")
True
>>> from __future__ import unicode_literals
>>> type(b"a") == type("a")
False
>>> b"a" == "a"
True
>>> b"a" + "a"
u'aa'
>>>
After unicode_literals is used, then b"a" and "a" have a different type and the same code would be an issue in python3:
python3
Python 3.3.2 <snip>
>>> type(b"a") == type("a")
False
>>> b"a" == "a"
False
>>> b"a" + "a"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
>>>
|
msg217752 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2014-05-02 14:51 |
Yes, that's a possibility if we want to take the route and essentially prevent people from ever explicitly knowing that a str in Python 2 will be a str in Python 3 and they are okay with that.
|
msg218393 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-05-13 01:28 |
Attached py2_warn_cmp_bytes_text.patch adds BytesWarning for bytes == unicode, bytes != unicode, unicode == bytes, unicode != bytes and similar comparisons with bytearray. The new warnings are added when -b or -bb command line options are used.
As a consequence, a lot of tests are failing with the patch applied and -bb command line option.
Some tests are obviously wrong (unicode expected, but the tests use bytes), but it's much more complex to fix tricky modules like urllib, os.path, json and re (sre_parse) to handle "correctly" bytes and unicode. In some other cases, the warning should be made quiet because a same test compares bytes and then text.
It also means that programs currently working fine with Python 2.7.6 with "-3 -b" options will start to see new BytesWarning warnings. Is it acceptable?
What is the purpose of -b in Python 2? Help developers to notice earlier future Unicode issues in their program, or help them to port their code to Python 3?
Maybe the new warnings should only by emited if -3 and -b options are used at the same time?
Tell me if you would like to see my work-in-progress patch to fix the whole test suite. Just the stats:
$ hg diff --stat
Lib/_pyio.py | 16 ++++++++--------
Lib/ctypes/test/test_arrays.py | 12 ++++++------
Lib/ctypes/test/test_buffers.py | 20 ++++++++++----------
Lib/ctypes/test/test_cast.py | 2 +-
Lib/ctypes/test/test_memfunctions.py | 10 +++++-----
Lib/ctypes/test/test_prototypes.py | 10 +++++-----
Lib/ctypes/test/test_structures.py | 2 +-
Lib/fractions.py | 1 +
Lib/sqlite3/dump.py | 8 ++++----
Lib/sqlite3/test/dump.py | 4 ++--
Lib/sre_parse.py | 3 ++-
Lib/test/string_tests.py | 6 +++---
Lib/test/test_builtin.py | 14 ++++++++++----
Lib/test/test_bytes.py | 27 +++++++++++++++++++++++++--
Lib/test/test_format.py | 10 +++++++---
Lib/test/test_future4.py | 2 +-
Lib/test/test_pyexpat.py | 28 ++++++++++++++--------------
Lib/test/test_sax.py | 20 ++++++++++----------
Lib/test/test_tempfile.py | 4 ++--
Objects/bytearrayobject.c | 2 +-
Objects/stringobject.c | 9 +++++++++
Objects/unicodeobject.c | 8 ++++++++
22 files changed, 135 insertions(+), 83 deletions(-)
A funny one:
diff -r 670fb496f1f6 Lib/test/test_future4.py
--- a/Lib/test/test_future4.py Sun May 11 23:37:26 2014 -0400
+++ b/Lib/test/test_future4.py Tue May 13 03:28:12 2014 +0200
@@ -43,5 +43,5 @@ class TestFuture(unittest.TestCase):
def test_main():
test_support.run_unittest(TestFuture)
-if __name__ == "__main__":
+if __name__ == b"__main__":
test_main()
|
msg218394 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-05-13 01:31 |
The title of the issue is "python2 -3 does not warn about str/unicode to bytes conversions and comparisons".
IMO it would be insane to emit BytesWarning on unicode(str). It would break most code using unicode. six.u() function is based on this feature. For example, six.u("abc") calls unicode("abc") in Python 2.
I have no opinion for the encode operation: str(unicode).
|
msg218421 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-05-13 09:23 |
See also issue19656.
|
msg218466 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2014-05-13 15:36 |
I thought we gave ourselves the wiggle room to change the warnings we emitted for -3 (I unfortunately can't find a reference to something relating to that in the Python 2.7 PEP)?
|
msg218497 - (view) |
Author: Josh Cogliati (jrincayc) |
Date: 2014-05-14 01:43 |
Other than in the source code in Modules/main.c, is -b documented anywhere? (For 2.7.6, The html docs, man page, and --help all failed to mention it)
|
msg219505 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-06-01 15:13 |
I think that even if we accept this change (I am unsure in this), a warning should be raised only when bytes and unicode objects are equal. When they are not equal, a warning should not be raised, because this matches Python 3 behavior.
|
msg219558 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-06-02 09:21 |
Serhiy wrote:
"I think that even if we accept this change (I am unsure in this), a warning should be raised only when bytes and unicode objects are equal. When they are not equal, a warning should not be raised, because this matches Python 3 behavior."
Python 3 warns even if strings are equal.
$ python3 -b -Wd
Python 3.3.2 (default, Mar 5 2014, 08:21:05)
e" for more information.
>>> b'abc' == 'abc'
__main__:1: BytesWarning: Comparison between bytes and string
False
>>> b'abc' == 'abc'
False
The warning is not repeat in the interactive interprter because it is emited twice at the same location "__main__:1".
|
msg228559 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-10-05 11:06 |
> Python 3 warns even if strings are equal.
Did you mean "not equal"? In Python 3 strings and bytes are always not equal.
|
msg363574 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2020-03-07 04:01 |
Python 2 is done.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:03 | admin | set | github: 65600 |
2020-03-07 04:01:46 | benjamin.peterson | set | status: open -> closed
nosy:
+ benjamin.peterson messages:
+ msg363574
resolution: rejected stage: resolved |
2020-03-06 20:52:42 | brett.cannon | set | status: pending -> open nosy:
- brett.cannon
|
2017-02-19 19:12:00 | serhiy.storchaka | set | status: open -> pending |
2014-10-05 11:06:27 | serhiy.storchaka | set | messages:
+ msg228559 |
2014-06-02 09:21:11 | vstinner | set | messages:
+ msg219558 |
2014-06-01 15:13:05 | serhiy.storchaka | set | messages:
+ msg219505 |
2014-05-16 05:13:53 | cvrebert | set | nosy:
+ cvrebert
|
2014-05-14 01:43:13 | jrincayc | set | nosy:
+ jrincayc messages:
+ msg218497
|
2014-05-13 15:36:48 | brett.cannon | set | messages:
+ msg218466 |
2014-05-13 09:23:35 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg218421
|
2014-05-13 05:42:00 | Arfrever | set | nosy:
+ Arfrever
|
2014-05-13 01:31:51 | vstinner | set | messages:
+ msg218394 |
2014-05-13 01:28:26 | vstinner | set | files:
+ py2_warn_cmp_bytes_text.patch keywords:
+ patch messages:
+ msg218393
|
2014-05-02 14:51:54 | brett.cannon | set | messages:
+ msg217752 |
2014-05-01 15:14:12 | Joshua.J.Cogliati | set | messages:
+ msg217707 |
2014-05-01 14:58:59 | brett.cannon | set | nosy:
+ brett.cannon
messages:
+ msg217703 title: python2 -3 does not warn about str to bytes conversions and comparisons -> python2 -3 does not warn about str/unicode to bytes conversions and comparisons |
2014-04-30 18:00:16 | Joshua.J.Cogliati | create | |