Title: file object method .tell() sometimes returns large number when position is right before a line break
msg337148 - (view) Author: Erik Wennstrom (erwenn) Date: 2019-03-04 20:45
Sometimes, when the position on a text file object is right before a line break, the file object method .tell() returns a bizarre large number (18446744073709551621) instead of the correct position.

The incorrect behavior occurs consistently for certain text files, but sometimes, a slight modification of the file will cause the behavior to revert to normal.

I can get this behavior in both Python 3.7.2 and 3.6.5. I've seen it on two different Windows X machines.

I've included two sample text files and a program that tests them both with the same code, which opens the file, reads 4 characters from the file, and then prints the result of the .tell() method. Both should print 4, but one of them prints 18446744073709551621. The only difference between the text files is that one of them has a single extra character before the last line break (which I should note is several lines away from the line where the weird behavior occurs).

Frankly, I don't even have a sliver of an inkling of a notion as to how this error might happen. I encountered it in the middle of teaching an intro programming lecture where I was showing them how file object positions work with .read(). Brought the entire class to a screeching halt.
msg337149 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-03-04 21:06
Your attached file doesn't seem to be a valid zip file.

Also, note that the result of `tell` on a file opened in text mode is documented [1] as being an opaque integer; there is no guarantee that the result of `tell` has any relation to the number of characters read from the file.  `tell` on a binary file does return the exact position of the cursor in the file, though.

msg337151 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-03-04 21:27
Stuff like that happens in any language supporting a tell() function for a file opened in text mode on Windows, inherited from the platform C's ftell() implementation:

The value returned by ftell and _ftelli64 may not reflect the physical byte offset for streams opened in text mode, because text mode causes carriage return-linefeed translation.

The _only_ legitimate use for a tell() result from a file opened in text mode is to pass it as an argument to fseek() later.

As Zachary said, if you need tell() to return an actual byte offset, you need to open the file in binary mode instead.
