This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients Zero, benjamin.peterson, docs@python, eryksun, fornax, martin.panter, pitrou, serhiy.storchaka, socketpair, steve.dower, stutzbach
Date 2016-01-19.23:34:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1453246486.06.0.790685821146.issue26158@psf.upfronthosting.co.za>
In-reply-to
Content
FYI, you can parse the cookie using struct or ctypes. For example:

    class Cookie(ctypes.Structure):
        _fields_ = (('start_pos',     ctypes.c_longlong),
                    ('dec_flags',     ctypes.c_int),
                    ('bytes_to_feed', ctypes.c_int),
                    ('chars_to_skip', ctypes.c_int),
                    ('need_eof',      ctypes.c_byte))

In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values:

    >>> f.tell()
    680564735109527527154978616360239628288
    >>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> state.start_pos
    0
    >>> state.dec_flags
    0
    >>> state.bytes_to_feed
    16
    >>> state.chars_to_skip
    2
    >>> state.need_eof
    0

So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters. 

Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate.

    >>> f = open('temp.txt', 'w+', encoding='utf-7')
    >>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7'))
    5
    >>> _ = f.seek(0); f.read(2)
    'аб'
    >>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> f.buffer.seek(state.start_pos)
    0
    >>> buf = f.buffer.read(state.bytes_to_feed)
    >>> s = buf.decode(f.encoding)[:state.chars_to_skip]
    >>> f.buffer.seek(state.start_pos)
    0
    >>> f.buffer.write(s.encode(f.encoding))
    8
    >>> f.buffer.truncate()
    8
    >>> f.close()
    >>> open('temp.txt', encoding='utf-7').read()
    'аб'

Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?
History
Date User Action Args
2016-01-19 23:34:46eryksunsetrecipients: + eryksun, pitrou, benjamin.peterson, stutzbach, Zero, docs@python, socketpair, martin.panter, serhiy.storchaka, steve.dower, fornax
2016-01-19 23:34:46eryksunsetmessageid: <1453246486.06.0.790685821146.issue26158@psf.upfronthosting.co.za>
2016-01-19 23:34:46eryksunlinkissue26158 messages
2016-01-19 23:34:45eryksuncreate