Message184490
Replying to Terry Reedy:
> So a dual string/bytes function would not be completely trivial.
Correct. I have one working, but it makes my eyes bleed. I fail ashamed to have written it.
> Greg, can you convert bytes to strings, or strings to bytes
Nope. Here is the hypothetical use case: I have a text file written in Polish encoded in ISO-8859-1 committed to a Mercurial repository. (Or saved in a filesystem somewhere: doesn't really matter, except that Mercurial repositories are immutable, long-term, and *must* *not* *lose* *data*.) Then I decide I should play nicely with the rest of the world and transcode to UTF-8, so commit a new rev in UTF-8.
Years later, I need to look at the diff between those two old revisions. Rev 1 is a pile of ISO-8859-2 bytes, and rev 2 is a pile of UTF-8 bytes. The output of diff looks like
- blah blah [iso-8859-2 bytes] blah
+ blah blah [utf-8 bytes] blah
Note this: the output of diff has some lines that are iso-8859-2 bytes and some that are utf-8 bytes. *There is no single encoding* that applies.
Note also that diff output must contain the exact original bytes, so that it can be consumed by patch. Diffs are read both by humans and by machines.
> Otherwise, I think it might be better to write a new function
> 'unified_diff_bytes' that did exactly what you want than to try to
> make unified_diff accept sequences of bytes.
Good idea. That might be much less revolting than what I have now. I'll give it a shot. |
|
Date |
User |
Action |
Args |
2013-03-18 18:50:27 | gward | set | recipients:
+ gward, barry, terry.reedy, ncoghlan, durin42, r.david.murray |
2013-03-18 18:50:27 | gward | set | messageid: <1363632627.1.0.872503303943.issue17445@psf.upfronthosting.co.za> |
2013-03-18 18:50:27 | gward | link | issue17445 messages |
2013-03-18 18:50:26 | gward | create | |
|