This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author amaury.forgeotdarc
Recipients BreamoreBoy, amaury.forgeotdarc, benjamin.peterson, brian.curtin, mrabarnett, nnorwitz, pitrou, rhettinger, runedevik, tim.golden
Date 2010-09-17.20:11:21
SpamBayes Score 3.4028336e-14
Marked as misclassified No
Message-id <1284754284.21.0.0958011972447.issue1744752@psf.upfronthosting.co.za>
In-reply-to
Content
I think there's actually a bug in the MSVCRT read() function, which was not too hard to spot (see explanation below).  In short, a CRLF file opened in text mode may skip a newline after 4GB.

I'm re-closing the issue as "won't fix". There's really nothing we can do about it.  But note that Python 3.x is not affected (raw files are always opened in binary mode and CRLF translation is done by Python); with 2.7, you may use io.open().

Other issues: issue1142, issue1672853, issue1451466 also report the same end-of-line issue on Windows (I just searched for "windows gb" in the tracker...) I'll close them as well.

Now, the explanation of the bug; it's not easy to reproduce because it depends both on the internal FILE buffer size and the number of chars passed to fread().
In the Microsoft CRT source code, in open.c, there is a block starting with this encouraging comment "This is the hard part.  We found a CR at end of buffer.  We must peek ahead to see if next char is an LF."
Oddly, there is an almost exact copy of this function in Perl source code:
http://perl5.git.perl.org/perl.git/blob/4342f4d6df6a7dfa22a470aa21e54a5622c009f3:/win32/win32.c#l3668
The problem is in the call to SetFilePointer(), used to step back one position after the lookahead; it will fail because it is unable to return the current position in a 32bit DWORD. [The fix is easy; do you see it?]
At this point, the function thinks that the next read() will return the LF, but it won't because the file pointer was not moved back.
History
Date User Action Args
2010-09-17 20:11:24amaury.forgeotdarcsetrecipients: + amaury.forgeotdarc, nnorwitz, rhettinger, pitrou, runedevik, tim.golden, benjamin.peterson, mrabarnett, brian.curtin, BreamoreBoy
2010-09-17 20:11:24amaury.forgeotdarcsetmessageid: <1284754284.21.0.0958011972447.issue1744752@psf.upfronthosting.co.za>
2010-09-17 20:11:22amaury.forgeotdarclinkissue1744752 messages
2010-09-17 20:11:21amaury.forgeotdarccreate