classification
Title: python2 -3 does not warn about str/unicode to bytes conversions and comparisons
Type: Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Joshua.J.Cogliati, benjamin.peterson, cvrebert, ezio.melotti, jrincayc, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-04-30 18:00 by Joshua.J.Cogliati, last changed 2020-03-07 04:01 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
py2_warn_cmp_bytes_text.patch vstinner, 2014-05-13 01:28 review
Messages (13)
msg217633 - (view) Author: Joshua J Cogliati (Joshua.J.Cogliati) * Date: 2014-04-30 18:00
The -3 option should warn about str to bytes conversions and str to bytes comparisons:
For example in Python 3 the following happens:

python3
Python 3.3.2 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> b"a" + "a"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
>>> b"a" == "a"
False
>>> 

But even python2 -3 does not warn about either of these uses:

python2 -3
Python 2.7.5 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> b"a" + "a"
'aa'
>>> b"a" == "a"
True
>>> u"a" + "a"
u'aa'
>>> u"a" == "a"
True
>>> 

These two issues are some of the more significant problems I have in trying get python2 code working with python3, and if -3 does not warn about it this is harder to do.
msg217703 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-01 14:58
Unfortunately it's impossible to warn against this in Python 2 since the bytes type is just another name for the str type:

>>> str == bytes
True
>>> type(b'1')
<type 'str'>

What we could potentially do, though, is change things such that -3 does what you are after when comparing bytes/str to unicode in Python 2. Unfortunately in that instance it's still a murky question as to whether that will help things more than hurt them as some people explicitly leave strings as-is in both Python 2 and Python 3 for either speed or code simplicity reasons.
msg217707 - (view) Author: Joshua J Cogliati (Joshua.J.Cogliati) * Date: 2014-05-01 15:14
Hm.  That is a good point.  Possibly it could only be done when 
from __future__ import unicode_literals
has been used.  For example:

python2 -3
Python 2.7.5 <snip>
Type "help", "copyright", "credits" or "license" for more information.
>>> type(b"a") == type("a")
True
>>> from __future__ import unicode_literals
>>> type(b"a") == type("a")
False
>>> b"a" == "a"
True
>>> b"a" + "a"
u'aa'
>>> 


After unicode_literals is used, then b"a" and "a" have a different type and the same code would be an issue in python3:
 python3
Python 3.3.2 <snip>
>>> type(b"a") == type("a")
False
>>> b"a" == "a"
False
>>> b"a" + "a"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
>>>
msg217752 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-02 14:51
Yes, that's a possibility if we want to take the route and essentially prevent people from ever explicitly knowing that a str in Python 2 will be a str in Python 3 and they are okay with that.
msg218393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-13 01:28
Attached py2_warn_cmp_bytes_text.patch adds BytesWarning for bytes == unicode, bytes != unicode, unicode == bytes, unicode != bytes and similar comparisons with bytearray. The new warnings are added when -b or -bb command line options are used.

As a consequence, a lot of tests are failing with the patch applied and -bb command line option.

Some tests are obviously wrong (unicode expected, but the tests use bytes), but it's much more complex to fix tricky modules like urllib, os.path, json and re (sre_parse) to handle "correctly" bytes and unicode. In some other cases, the warning should be made quiet because a same test compares bytes and then text.

It also means that programs currently working fine with Python 2.7.6 with "-3 -b" options will start to see new BytesWarning warnings. Is it acceptable?

What is the purpose of -b in Python 2? Help developers to notice earlier future Unicode issues in their program, or help them to port their code to Python 3?

Maybe the new warnings should only by emited if -3 and -b options are used at the same time?

Tell me if you would like to see my work-in-progress patch to fix the whole test suite. Just the stats:

$ hg diff --stat
 Lib/_pyio.py                         |  16 ++++++++--------
 Lib/ctypes/test/test_arrays.py       |  12 ++++++------
 Lib/ctypes/test/test_buffers.py      |  20 ++++++++++----------
 Lib/ctypes/test/test_cast.py         |   2 +-
 Lib/ctypes/test/test_memfunctions.py |  10 +++++-----
 Lib/ctypes/test/test_prototypes.py   |  10 +++++-----
 Lib/ctypes/test/test_structures.py   |   2 +-
 Lib/fractions.py                     |   1 +
 Lib/sqlite3/dump.py                  |   8 ++++----
 Lib/sqlite3/test/dump.py             |   4 ++--
 Lib/sre_parse.py                     |   3 ++-
 Lib/test/string_tests.py             |   6 +++---
 Lib/test/test_builtin.py             |  14 ++++++++++----
 Lib/test/test_bytes.py               |  27 +++++++++++++++++++++++++--
 Lib/test/test_format.py              |  10 +++++++---
 Lib/test/test_future4.py             |   2 +-
 Lib/test/test_pyexpat.py             |  28 ++++++++++++++--------------
 Lib/test/test_sax.py                 |  20 ++++++++++----------
 Lib/test/test_tempfile.py            |   4 ++--
 Objects/bytearrayobject.c            |   2 +-
 Objects/stringobject.c               |   9 +++++++++
 Objects/unicodeobject.c              |   8 ++++++++
 22 files changed, 135 insertions(+), 83 deletions(-)

A funny one:

diff -r 670fb496f1f6 Lib/test/test_future4.py
--- a/Lib/test/test_future4.py  Sun May 11 23:37:26 2014 -0400
+++ b/Lib/test/test_future4.py  Tue May 13 03:28:12 2014 +0200
@@ -43,5 +43,5 @@ class TestFuture(unittest.TestCase):
 def test_main():
     test_support.run_unittest(TestFuture)
 
-if __name__ == "__main__":
+if __name__ == b"__main__":
     test_main()
msg218394 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-13 01:31
The title of the issue is "python2 -3 does not warn about str/unicode to bytes conversions and comparisons".

IMO it would be insane to emit BytesWarning on unicode(str). It would break most code using unicode. six.u() function is based on this feature. For example, six.u("abc") calls unicode("abc") in Python 2.

I have no opinion for the encode operation: str(unicode).
msg218421 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-13 09:23
See also issue19656.
msg218466 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-13 15:36
I thought we gave ourselves the wiggle room to change the warnings we emitted for -3 (I unfortunately can't find a reference to something relating to that in the Python 2.7 PEP)?
msg218497 - (view) Author: Josh Cogliati (jrincayc) Date: 2014-05-14 01:43
Other than in the source code in Modules/main.c, is -b documented anywhere? (For 2.7.6, The html docs, man page, and --help all failed to mention it)
msg219505 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-06-01 15:13
I think that even if we accept this change (I am unsure in this), a warning should be raised only when bytes and unicode objects are equal. When they are not equal, a warning should not be raised, because this matches Python 3 behavior.
msg219558 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-02 09:21
Serhiy wrote:
"I think that even if we accept this change (I am unsure in this), a warning should be raised only when bytes and unicode objects are equal. When they are not equal, a warning should not be raised, because this matches Python 3 behavior."

Python 3 warns even if strings are equal.

$ python3 -b -Wd
Python 3.3.2 (default, Mar  5 2014, 08:21:05) 
e" for more information.
>>> b'abc' == 'abc'
__main__:1: BytesWarning: Comparison between bytes and string
False
>>> b'abc' == 'abc'
False

The warning is not repeat in the interactive interprter because it is emited twice at the same location "__main__:1".
msg228559 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-05 11:06
> Python 3 warns even if strings are equal.

Did you mean "not equal"? In Python 3 strings and bytes are always not equal.
msg363574 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2020-03-07 04:01
Python 2 is done.
History
Date User Action Args
2020-03-07 04:01:46benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg363574

resolution: rejected
stage: resolved
2020-03-06 20:52:42brett.cannonsetstatus: pending -> open
nosy: - brett.cannon
2017-02-19 19:12:00serhiy.storchakasetstatus: open -> pending
2014-10-05 11:06:27serhiy.storchakasetmessages: + msg228559
2014-06-02 09:21:11vstinnersetmessages: + msg219558
2014-06-01 15:13:05serhiy.storchakasetmessages: + msg219505
2014-05-16 05:13:53cvrebertsetnosy: + cvrebert
2014-05-14 01:43:13jrincaycsetnosy: + jrincayc
messages: + msg218497
2014-05-13 15:36:48brett.cannonsetmessages: + msg218466
2014-05-13 09:23:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg218421
2014-05-13 05:42:00Arfreversetnosy: + Arfrever
2014-05-13 01:31:51vstinnersetmessages: + msg218394
2014-05-13 01:28:26vstinnersetfiles: + py2_warn_cmp_bytes_text.patch
keywords: + patch
messages: + msg218393
2014-05-02 14:51:54brett.cannonsetmessages: + msg217752
2014-05-01 15:14:12Joshua.J.Cogliatisetmessages: + msg217707
2014-05-01 14:58:59brett.cannonsetnosy: + brett.cannon

messages: + msg217703
title: python2 -3 does not warn about str to bytes conversions and comparisons -> python2 -3 does not warn about str/unicode to bytes conversions and comparisons
2014-04-30 18:00:16Joshua.J.Cogliaticreate