Message 223491 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	abarnert
Recipients	Douglas.Alan, abarnert, amaury.forgeotdarc, benjamin.peterson, eric.araujo, facundobatista, georg.brandl, jcon, ncoghlan, nessus42, pitrou, r.david.murray, ralph.corderoy, rhettinger, ysj.ray
Date	2014-07-20.00:41:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1405816895.54.0.705421525632.issue1152248@psf.upfronthosting.co.za>
In-reply-to

Content
While we're at it, Douglas Alan's solution wouldn't be an ideal solution even if it were a builtin. A fileLineIter obviously doesn't support the stream API. It means you end up with two objects that share the same file, but have separate buffers and out-of-sync file pointers. And it's a lot slower. That being said, I think it may be useful enough to put in the stdlib—even more so if you pull the resplit-an-iterator-of-strings code out: def resplit(strings, separator): partialLine = None for s in strings: if partialLine: partialLine += s else: partialLine = s if not s: break lines = partialLine.split(separator) partialLine = lines.pop() yield from lines if partialLine: yield partialLine Now, you can do this: with open('rdm-example') as f: chunks = iter(partial(f.read, 8192), '') lines = resplit(chunks, '\0') lines = (line + '\n' for line in lines) # Or, if you're just going to strip off the newlines anyway: with open('file-0-example') as f: chunks = iter(partial(f.read, 8192), '') lines = resplit(chunks, '\0') # Or, if you have a binary file: with open('binary-example, 'rb') as f: chunks = iter(partial(f.read, 8192), b'') lines = resplit(chunks, b'\0') # Or, if I understand ysj.ray's example: with open('ysj.ray-example') as f: chunks = iter(partial(f.read, 8192), '') lines = resplit(chunks, '\r\n') records = resplit(lines, '\t') # Or, if you have something that isn't a file at all: lines = resplit((packet.body for packet in packets), '\n')

While we're at it, Douglas Alan's solution wouldn't be an ideal solution even if it were a builtin. A fileLineIter obviously doesn't support the stream API. It means you end up with two objects that share the same file, but have separate buffers and out-of-sync file pointers. And it's a lot slower.

That being said, I think it may be useful enough to put in the stdlib—even more so if you pull the resplit-an-iterator-of-strings code out:

def resplit(strings, separator):
    partialLine = None
    for s in strings:
        if partialLine:
            partialLine += s
        else:
            partialLine = s
        if not s:
            break
        lines = partialLine.split(separator)
        partialLine = lines.pop()
        yield from lines
    if partialLine:
        yield partialLine

Now, you can do this:

with open('rdm-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')
    lines = (line + '\n' for line in lines)

# Or, if you're just going to strip off the newlines anyway:
with open('file-0-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')

# Or, if you have a binary file:
with open('binary-example, 'rb') as f:
    chunks = iter(partial(f.read, 8192), b'')
    lines = resplit(chunks, b'\0')

# Or, if I understand ysj.ray's example:
with open('ysj.ray-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\r\n')
    records = resplit(lines, '\t')

# Or, if you have something that isn't a file at all:
lines = resplit((packet.body for packet in packets), '\n')

History
Date	User	Action	Args
2014-07-20 00:41:35	abarnert	set	recipients: + abarnert, georg.brandl, rhettinger, facundobatista, amaury.forgeotdarc, ncoghlan, pitrou, benjamin.peterson, nessus42, eric.araujo, ralph.corderoy, r.david.murray, ysj.ray, Douglas.Alan, jcon
2014-07-20 00:41:35	abarnert	set	messageid: <1405816895.54.0.705421525632.issue1152248@psf.upfronthosting.co.za>
2014-07-20 00:41:35	abarnert	link	issue1152248 messages
2014-07-20 00:41:33	abarnert	create