Message 184428 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	barry, durin42, gward, ncoghlan, r.david.murray, terry.reedy
Date	2013-03-18.06:17:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1363587478.73.0.248187729813.issue17445@psf.upfronthosting.co.za>
In-reply-to

Content
Changing behavior that already matches the docs is an enhancement, not a bugfix, and one that will almost certainly break code. It is therefore one that would normally require a deprecation period. I think the most you should ask for is to skip the deprecation period. I believe the urllib and difflib problems are quite different. I am going to presume that urllib simply converts bytes input to str and goes on from there, returning the result as str rather than (possibly) converting back to bytes. That is an example for this issue. Difflib.unified_diff, on the other hand, raises rather than returning an unexpected or undesired type. The 3 sections like the below have two problems given the toy input of two bytes objects. if tag in {'replace', 'delete'}: for line in a[i1:i2]: yield '-' + line First, iterating bytes or a slice of bytes returns ints, not 1-byte bytes. Hence the exception. Even if that were worked around, the mixed string constant + bytes expression would raise a TypeError. One fix for both problems would be to change the expression to '-' + str(line). Neither of these problems are bugs. The doc says "Compare a and b (lists of strings)". Actually, 'sequence of strings' is sufficient. For the operations of unified_diff, a string looks like a sequence of 1-char strings, which is why >>> for l in difflib.unified_diff('ab', 'c'): print(l) --- +++ @@ -1,2 +1 @@ -a -b +c works. The other lines yielded by unified_diff are produced with str.format, and % formatting does not seem to work with bytes either. So a dual string/bytes function would not be completely trivial. Greg, can you convert bytes to strings, or strings to bytes, for your tests, or do you have non-ascii codes in your bytes? Otherwise, I think it might be better to write a new function 'unified_diff_bytes' that did exactly what you want than to try to make unified_diff accept sequences of bytes.

Changing behavior that already matches the docs is an enhancement, not a bugfix, and one that will almost certainly break code. It is therefore one that would normally require a deprecation period. I think the most you should ask for is to skip the deprecation period.

I believe the urllib and difflib problems are quite different. I am going to presume that urllib simply converts bytes input to str and goes on from there, returning the result as str rather than (possibly) converting back to bytes. That is an example for this issue.

Difflib.unified_diff, on the other hand, raises rather than returning an unexpected or undesired type. The 3 sections like the below have two problems given the toy input of two bytes objects.

            if tag in {'replace', 'delete'}:
                for line in a[i1:i2]:
                    yield '-' + line

First, iterating bytes or a slice of bytes returns ints, not 1-byte bytes. Hence the exception. Even if that were worked around, the mixed string constant + bytes expression would raise a TypeError. One fix for both problems would be to change the expression to '-' + str(line).

Neither of these problems are bugs. The doc says "Compare a and b (lists of strings)". Actually, 'sequence of strings' is sufficient. For the operations of unified_diff, a string looks like a sequence of 1-char strings, which is why

>>> for l in difflib.unified_diff('ab', 'c'): print(l)

--- 

+++ 

@@ -1,2 +1 @@

-a
-b
+c

works.

The other lines yielded by unified_diff are produced with str.format, and % formatting does not seem to work with bytes either. So a dual string/bytes function would not be completely trivial.

Greg, can you convert bytes to strings, or strings to bytes, for your tests, or do you have non-ascii codes in your bytes? Otherwise, I think it might be better to write a new function 'unified_diff_bytes' that did exactly what you want than to try to make unified_diff accept sequences of bytes.

History
Date	User	Action	Args
2013-03-18 06:17:58	terry.reedy	set	recipients: + terry.reedy, barry, gward, ncoghlan, durin42, r.david.murray
2013-03-18 06:17:58	terry.reedy	set	messageid: <1363587478.73.0.248187729813.issue17445@psf.upfronthosting.co.za>
2013-03-18 06:17:58	terry.reedy	link	issue17445 messages
2013-03-18 06:17:57	terry.reedy	create