This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: readline not implemented for UTF-16
Type: enhancement Stage:
Components: Unicode Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lemburg Nosy List: benjamin.peterson, bob.ippolito, jimjjewett, lemburg
Priority: low Keywords:

Created on 2004-03-21 22:37 by bob.ippolito, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf16reader.py bob.ippolito, 2004-05-18 23:22 monkeypatch to get utf16 readline support
utf16reader.py bob.ippolito, 2004-05-19 18:38 second revision of monkeypatch
Messages (11)
msg54114 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-03-21 22:37
The StreamReader for UTF-16 (all three of them) doesn't 
implement readline.
msg54115 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-03-21 22:44
Logged In: YES 
user_id=38388

Patches are welcome :-)
msg54116 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-03-21 22:54
Logged In: YES 
user_id=139309

I don't need it enough to write a patch, but this is what I used instead.. 
and it seems like it might work:

    try:    
        for line in inFile:
            tline = translator(line)
            outFile.write(tline)
    except NotImplementedError:
        BUFFER = 16384
        bytes = inFile.read(BUFFER)
        while bytes:
            lines = bytes.split(u'\n')
            bytes = lines.pop()
            for line in lines:
                tline = translator(line)
                outFile.write(tline)
            newbytes = inFile.read(BUFFER)
            bytes += newbytes
            if not newbytes and bytes:
                bytes += u'\n'
msg54117 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-18 23:22
Logged In: YES 
user_id=139309

I've attached a monkeypatch to get readline support for utf-16 codecs..

import utf16reader
utf16reader.install()

It can be trivially inserted into the utf16 encodings implementation.. it 
would be really cool if someone would audit the implementation and 
sneak it in before Python 2.4 :)
msg54118 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-19 08:19
Logged In: YES 
user_id=38388

Thanks for the patch. Some comments:

* Unicode has a lot more line-end markers than just LF;
  you should use .splitlines() to break lines at all of them

* please collapse both methods (sized + unsized) into
  one method and default to 256 bytes for the buffer
  size
msg54119 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-19 18:38
Logged In: YES 
user_id=139309

Attaching a revised monkeypatch:
* splitlines is used (I wasn't aware of the other unicode EOL markers)
* 256 bytes is the new default buffer size

Why do you want sized and unsized to be in the same function?  They're 
both dispatched from readline as appropriate, and they are very different 
code paths.  It would be much uglier as one function, so I'm not going to 
do it in my own code.
msg54120 - (view) Author: Jim Jewett (jimjjewett) Date: 2004-05-19 23:10
Logged In: YES 
user_id=764593

It might be just an upload/download quirk, but when I tried, 
this concatenated short lines.  u"\n".join(...) worked better, 
but I'm not sure how that plays with other line breaks.  

It might work better to stick a class around the realine 
functions, so that self.buff can always be a (state-preserved) 
list; just return the first row, until the list length gets to one, 
then concatenate to that and resplit.
msg54121 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-26 19:41
Logged In: YES 
user_id=38388

I don't have time to review this now, but will get back to
it after EuroPython if you ping me. Thanks.
msg54122 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-26 19:46
Logged In: YES 
user_id=139309

Can you please give an example of a case where short lines get 
concatenated?  I can't fix it if I don't know what's wrong.
msg54123 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-26 19:52
Logged In: YES 
user_id=139309

Also, I've moved the latest copy of the code to my public repository at:
http://svn.red-bean.com/bob/unicode/trunk/utf16reader.py

this should be free of any quirks, but I still can't reproduce whatever 
problem jim is having.
msg65415 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-12 20:23
It seems this is no longer true.
History
Date User Action Args
2022-04-11 14:56:03adminsetgithub: 40061
2008-04-12 20:23:54benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg65415
nosy: + benjamin.peterson
2004-03-21 22:37:17bob.ippolitocreate