Author gward
Recipients barry, durin42, gward, ncoghlan, r.david.murray, terry.reedy
Date 2013-03-18.18:50:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1363632627.1.0.872503303943.issue17445@psf.upfronthosting.co.za>
In-reply-to
Content
Replying to Terry Reedy:
> So a dual string/bytes function would not be completely trivial.

Correct. I have one working, but it makes my eyes bleed. I fail ashamed to have written it.

> Greg, can you convert bytes to strings, or strings to bytes

Nope. Here is the hypothetical use case: I have a text file written in Polish encoded in ISO-8859-1 committed to a Mercurial repository. (Or saved in a filesystem somewhere: doesn't really matter, except that Mercurial repositories are immutable, long-term, and *must* *not* *lose* *data*.) Then I decide I should play nicely with the rest of the world and transcode to UTF-8, so commit a new rev in UTF-8.

Years later, I need to look at the diff between those two old revisions. Rev 1 is a pile of ISO-8859-2 bytes, and rev 2 is a pile of UTF-8 bytes. The output of diff looks like

  - blah blah [iso-8859-2 bytes] blah
  + blah blah [utf-8 bytes] blah

Note this: the output of diff has some lines that are iso-8859-2 bytes and some that are utf-8 bytes. *There is no single encoding* that applies.

Note also that diff output must contain the exact original bytes, so that it can be consumed by patch. Diffs are read both by humans and by machines.

> Otherwise, I think it might be better to write a new function 
> 'unified_diff_bytes' that did exactly what you want than to try to 
> make unified_diff accept sequences of bytes.

Good idea. That might be much less revolting than what I have now. I'll give it a shot.
History
Date User Action Args
2013-03-18 18:50:27gwardsetrecipients: + gward, barry, terry.reedy, ncoghlan, durin42, r.david.murray
2013-03-18 18:50:27gwardsetmessageid: <1363632627.1.0.872503303943.issue17445@psf.upfronthosting.co.za>
2013-03-18 18:50:27gwardlinkissue17445 messages
2013-03-18 18:50:26gwardcreate