Author terry.reedy
Recipients barry, durin42, gward, ncoghlan, r.david.murray, terry.reedy
Date 2013-03-18.06:17:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1363587478.73.0.248187729813.issue17445@psf.upfronthosting.co.za>
In-reply-to
Content
Changing behavior that already matches the docs is an enhancement, not a bugfix, and one that will almost certainly break code. It is therefore one that would normally require a deprecation period. I think the most you should ask for is to skip the deprecation period.

I believe the urllib and difflib problems are quite different. I am going to presume that urllib simply converts bytes input to str and goes on from there, returning the result as str rather than (possibly) converting back to bytes. That is an example for this issue.

Difflib.unified_diff, on the other hand, raises rather than returning an unexpected or undesired type. The 3 sections like the below have two problems given the toy input of two bytes objects.

            if tag in {'replace', 'delete'}:
                for line in a[i1:i2]:
                    yield '-' + line

First, iterating bytes or a slice of bytes returns ints, not 1-byte bytes. Hence the exception. Even if that were worked around, the mixed string constant + bytes expression would raise a TypeError. One fix for both problems would be to change the expression to '-' + str(line).

Neither of these problems are bugs. The doc says "Compare a and b (lists of strings)". Actually, 'sequence of strings' is sufficient. For the operations of unified_diff, a string looks like a sequence of 1-char strings, which is why

>>> for l in difflib.unified_diff('ab', 'c'): print(l)

--- 

+++ 

@@ -1,2 +1 @@

-a
-b
+c

works.

The other lines yielded by unified_diff are produced with str.format, and % formatting does not seem to work with bytes either. So a dual string/bytes function would not be completely trivial.

Greg, can you convert bytes to strings, or strings to bytes, for your tests, or do you have non-ascii codes in your bytes? Otherwise, I think it might be better to write a new function 'unified_diff_bytes' that did exactly what you want than to try to make unified_diff accept sequences of bytes.
History
Date User Action Args
2013-03-18 06:17:58terry.reedysetrecipients: + terry.reedy, barry, gward, ncoghlan, durin42, r.david.murray
2013-03-18 06:17:58terry.reedysetmessageid: <1363587478.73.0.248187729813.issue17445@psf.upfronthosting.co.za>
2013-03-18 06:17:58terry.reedylinkissue17445 messages
2013-03-18 06:17:57terry.reedycreate