Message 258634 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	Zero, benjamin.peterson, docs@python, eryksun, fornax, martin.panter, pitrou, serhiy.storchaka, socketpair, steve.dower, stutzbach
Date	2016-01-19.23:34:45
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1453246486.06.0.790685821146.issue26158@psf.upfronthosting.co.za>
In-reply-to

Content
FYI, you can parse the cookie using struct or ctypes. For example: class Cookie(ctypes.Structure): _fields_ = (('start_pos', ctypes.c_longlong), ('dec_flags', ctypes.c_int), ('bytes_to_feed', ctypes.c_int), ('chars_to_skip', ctypes.c_int), ('need_eof', ctypes.c_byte)) In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values: >>> f.tell() 680564735109527527154978616360239628288 >>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder) >>> state = Cookie.from_buffer_copy(cookie_bytes) >>> state.start_pos 0 >>> state.dec_flags 0 >>> state.bytes_to_feed 16 >>> state.chars_to_skip 2 >>> state.need_eof 0 So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters. Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate. >>> f = open('temp.txt', 'w+', encoding='utf-7') >>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7')) 5 >>> _ = f.seek(0); f.read(2) 'аб' >>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder) >>> state = Cookie.from_buffer_copy(cookie_bytes) >>> f.buffer.seek(state.start_pos) 0 >>> buf = f.buffer.read(state.bytes_to_feed) >>> s = buf.decode(f.encoding)[:state.chars_to_skip] >>> f.buffer.seek(state.start_pos) 0 >>> f.buffer.write(s.encode(f.encoding)) 8 >>> f.buffer.truncate() 8 >>> f.close() >>> open('temp.txt', encoding='utf-7').read() 'аб' Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?

FYI, you can parse the cookie using struct or ctypes. For example:

    class Cookie(ctypes.Structure):
        _fields_ = (('start_pos',     ctypes.c_longlong),
                    ('dec_flags',     ctypes.c_int),
                    ('bytes_to_feed', ctypes.c_int),
                    ('chars_to_skip', ctypes.c_int),
                    ('need_eof',      ctypes.c_byte))

In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values:

    >>> f.tell()
    680564735109527527154978616360239628288
    >>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> state.start_pos
    0
    >>> state.dec_flags
    0
    >>> state.bytes_to_feed
    16
    >>> state.chars_to_skip
    2
    >>> state.need_eof
    0

So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters. 

Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate.

    >>> f = open('temp.txt', 'w+', encoding='utf-7')
    >>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7'))
    5
    >>> _ = f.seek(0); f.read(2)
    'аб'
    >>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> f.buffer.seek(state.start_pos)
    0
    >>> buf = f.buffer.read(state.bytes_to_feed)
    >>> s = buf.decode(f.encoding)[:state.chars_to_skip]
    >>> f.buffer.seek(state.start_pos)
    0
    >>> f.buffer.write(s.encode(f.encoding))
    8
    >>> f.buffer.truncate()
    8
    >>> f.close()
    >>> open('temp.txt', encoding='utf-7').read()
    'аб'

Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?

History
Date	User	Action	Args
2016-01-19 23:34:46	eryksun	set	recipients: + eryksun, pitrou, benjamin.peterson, stutzbach, Zero, docs@python, socketpair, martin.panter, serhiy.storchaka, steve.dower, fornax
2016-01-19 23:34:46	eryksun	set	messageid: <1453246486.06.0.790685821146.issue26158@psf.upfronthosting.co.za>
2016-01-19 23:34:46	eryksun	link	issue26158 messages
2016-01-19 23:34:45	eryksun	create