This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Wrong file.tell() function results (Windows 10/Python 64 3.8.2/3.7 - no bug in PyPy3.6/Python2.7)
Type: behavior Stage: resolved
Components: Interpreter Core, IO, Windows Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Cezary.Wagner, eric.smith, eryksun, paul.moore, steve.dower, tim.golden, tim.peters, zach.ware
Priority: normal Keywords:

Created on 2020-03-13 23:39 by Cezary.Wagner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (15)
msg364126 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-13 23:39
I wrote code which scan very large file PGN (chess games database).

But I found that tell() function is buggy see results.

Here is some code:
    with open('../s01_parser_eval/data/out-6976.txt') as pgn:

            is_game_parsed = parser.parse_game(visitor=visitor)

            # if processing_statistics.games % 100 == 0:
            print(processing_statistics.games,
                  processing_statistics.positions,
                  processing_statistics.moves,
                  '%.2f' % processing_statistics.get_games_to_moves(),
                  '%.2f' % processing_statistics.get_positions_to_moves(),
                  '%.2f' % speed if speed else speed,
                  pgn.tell())
            print(pgn.tell())

This code can be simplified to this:
    with open('../s01_parser_eval/data/out-6976.txt') as pgn:
        while True:
            pgn.readline()
            print(pgn.tell())



1 1 0 0.00 0.00 318.64 1008917597
1008917597
2 47 46 23.00 1.02 343.64 1008917599
1008917599
3 47 46 15.33 1.02 291.08 1008920549
1008920549
4 107 107 26.75 1.00 292.03 1008920551
1008920551
5 107 107 21.40 1.00 185.41 18446744074718477807 <- ???
18446744074718477807
6 234 235 39.17 1.00 157.63 1008926192
1008926192
7 234 235 33.57 1.00 167.75 1008928371
1008928371
8 276 278 34.75 0.99 180.48 1008928373
1008928373
9 276 278 30.89 0.99 185.30 1008931145
1008931145
10 334 336 33.60 0.99 192.58 1008931147
1008931147
11 334 336 30.55 0.99 164.90 1008937220
1008937220
12 468 472 39.33 0.99 149.00 1008937222
1008937222
13 468 472 36.31 0.99 157.58 1008938833
1008938833
14 495 502 35.86 0.99 165.96 1008938835
1008938835
15 495 502 33.47 0.99 167.89 1008941875
1008941875
16 556 567 35.44 0.98 172.10 1008941877
1008941877
17 556 567 33.35 0.98 177.84 1008943769
1008943769
18 591 604 33.56 0.98 184.09 1008943771
1008943771
19 591 604 31.79 0.98 185.38 1008946692
1008946692
20 653 666 33.30 0.98 188.68 1008946694
1008946694
21 653 666 31.71 0.98 192.90 18446744074718500485  <- ???
18446744074718500485
msg364127 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-13 23:44
Some good snippet for testing very short.

with open('../s01_parser_eval/data/out-6976.txt') as pgn:
    pgn.seek(1008915299)
    while True:
        pgn.readline()
        print(pgn.tell())

1008915327
1008915366
1008915387
1008915409
1008915425
1008915449
1008915471
1008915490
1008915509
1008915534
1008915559
1008915572
1008915631
1008915654
1008915678
1008915680
1008917597
1008917599
1008917631
1008917670
1008917696
1008917718
1008917734
1008917758
1008917780
1008917799
1008917818
1008917843
1008917868
1008917881
1008917942
1008917965
1008917989
1008917991
1008920549
1008920551
1008920583
1008920622
1008920643
1008920663
1008920679
1008920703
1008920725
1008920744
1008920763
1008920788
1008920813
1008920826
1008920877
1008920900
1008920924
1008920926
18446744074718477807 <- ???
1008926192
1008926220
1008926259
1008926276
1008926304
1008926320
1008926344
1008926366
1008926385
1008926404
1008926428
1008926452
1008926465
1008926521
1008926544
1008926568
1008926570
1008928371
1008928373
1008928401
1008928440
1008928460
1008928491
1008928507
1008928531
1008928553
1008928572
1008928591
1008928615
1008928640
1008928653
1008928690
1008928713
1008928737
1008928739
1008931145
1008931147
1008931175
1008931214
1008931233
1008931253
1008931269
1008931293
1008931315
1008931334
1008931353
1008931377
1008931401
1008931414
1008931463
1008931486
1008931516
1008931518
1008937220
1008937222
1008937254
1008937293
1008937315
1008937340
1008937356
1008937380
1008937402
1008937421
1008937440
1008937465
1008937490
1008937503
1008937536
1008937559
18446744074718489200 <- ???
msg364128 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-13 23:59
I do some test and bu exist in 3.8/3.7 but not in no bug in PyPy3.6/Python2.7.
msg364129 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-03-14 00:02
tell() is opaque when opening a text file: you can't interpret the output, its only use is for input to seek().

From the docs https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects:

"f.tell() returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode."

Does the value returned from tell() not work in seek()? That would be the only bug here.
msg364130 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 00:13
Let's test it now.
msg364131 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-03-14 00:19
This is very well known on Windows, and the behavior is inherited from the Windows C libraries.  If you need a byte count instead, then - as the docs already say - you need to open the file in binary mode instead.
msg364132 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 00:32
Really really strange but it works :)
"an opaque number when in text mode." -> so it is Windows C libraries.

I use it in production code too so my heart speed up when I see number but it works as you said.

It looks complicated/slow if I have to open file in binary mode. 
I can do it but it is ugly workaround to get position?

It looks some decide that speed is better than functionality so binary files is only option to get for example estimate progress in some speedometer.

I think that should some function to convert this .tell() for text files into real .tell().

with open('../s01_parser_eval/data/out-6976.txt') as pgn:
    pgn.seek(1008915299)
    t = None
    while True:
        if t:
            pgn.seek(t)
        pgn.readline()
        pt = t
        t = pgn.tell()
        if pt:
            if pt > t:
                print('Strange %s!', t)
                pgn.seek(t)

        print(pgn.tell())
msg364133 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 00:34
Thank you for very good explanation. It was hard to understand.

I am programming a lot (++10 years) in many language but I still learning new things.
msg364134 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-03-14 01:02
Sorry, but there is no documented relationship between byte offsets and tell() results for text-mode files in Windows:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/ftell-ftelli64?view=vs-2019
msg364137 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-03-14 02:17
> Sorry, but there is no documented relationship between byte 
> offsets and tell() results for text-mode files in Windows:

The I/O stack in Python 3 does not use C FILE streams, and this issue is not related to Windows. TextIOWrapper.tell returns a "cookie" based on the decoder state:

https://github.com/python/cpython/blob/3.8/Modules/_io/textio.c#L2589
msg364138 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-03-14 02:22
Good to know, Eryk - thanks!
msg364157 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 12:48
> The I/O stack in Python 3 does not use C FILE streams, and this issue is not related to Windows. TextIOWrapper.tell returns a "cookie" based on the decoder state:

That can big problem when I use serialization if f.tell() is "cookie".

When I serialize it and run program again f.seek() will not works.

I will test it but I think that can be big problem since this behavior is very unclear and non standard (comparing to C++/C#/Java ...).

Maybe it should some method to get right position not "opaque position".
msg364158 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 12:59
I tested is and it is not "state cookie" but it is "absolute position" and it is "stateless".

> The I/O stack in Python 3 does not use C FILE streams, and this issue is not related to Windows. TextIOWrapper.tell returns a "cookie" based on the decoder state:

I will study: https://github.com/python/cpython/blob/3.8/Modules/_io/textio.c#L2589 - Thank you for reference.

See this test code (wrong seek is good - decoder has not state before - first line of program).
You can swap this two fragments and it still works.

print('seek 18446744073709554618')
with open('../s01_parser_eval/data/out-6976.txt') as pgn:
    pgn.seek(18446744073709554618)
    while pgn.tell() != 3003:
        pgn.readline()
        print(pgn.tell())

print()
print('seek 0')
with open('../s01_parser_eval/data/out-6976.txt') as pgn:
    pgn.seek(0)
    while pgn.tell() != 18446744073709554618:
        pgn.readline()
        print(pgn.tell())
    pgn.readline()
    print('next', pgn.tell())

print('seek 18446744073709554618')
with open('../s01_parser_eval/data/out-6976.txt') as pgn:
    pgn.seek(18446744073709554618)
    while pgn.tell() != 3003:
        pgn.readline()
        print(pgn.tell()))
msg364159 - (view) Author: Cezary Wagner (Cezary.Wagner) Date: 2020-03-14 13:00
C:\root\Python\Python38\python.exe "C:/Users/Cezary Wagner/PycharmProjects/chess-lichess-eval-parse/sandbox/s03_create_tree/s03_python_bug.py"
seek 18446744073709554618
3003

seek 0
75
114
145
165
181
205
227
246
265
290
315
328
365
387
411
413
18446744073709554618
next 3003
seek 18446744073709554618
3003

Process finished with exit code 0
msg364163 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-03-14 13:48
> That can big problem when I use serialization if f.tell() is "cookie".

I'm sorry, but that's the way it is with text files. You'll need to find some other way to accomplish what you're trying to achieve.

Since this isn't a bug, I'm closing this issue. You might want to try stackoverflow or python-list if you'd like some additional help.
History
Date User Action Args
2022-04-11 14:59:28adminsetgithub: 84143
2020-03-14 13:48:52eric.smithsetstatus: open -> closed
type: crash -> behavior
messages: + msg364163

resolution: not a bug
stage: resolved
2020-03-14 13:00:35Cezary.Wagnersetmessages: + msg364159
2020-03-14 12:59:34Cezary.Wagnersetmessages: + msg364158
2020-03-14 12:48:10Cezary.Wagnersetmessages: + msg364157
2020-03-14 02:22:39tim.peterssetmessages: + msg364138
2020-03-14 02:17:10eryksunsetnosy: + eryksun
messages: + msg364137
2020-03-14 01:02:34tim.peterssetmessages: + msg364134
2020-03-14 00:34:10Cezary.Wagnersetmessages: + msg364133
2020-03-14 00:32:32Cezary.Wagnersetmessages: + msg364132
2020-03-14 00:19:25tim.peterssetnosy: + tim.peters
messages: + msg364131
2020-03-14 00:13:24Cezary.Wagnersetmessages: + msg364130
2020-03-14 00:02:31eric.smithsetnosy: + eric.smith
messages: + msg364129
2020-03-14 00:00:44Cezary.Wagnersetnosy: + paul.moore, tim.golden, zach.ware, steve.dower

components: + Interpreter Core, Windows, IO
versions: + Python 3.7
2020-03-13 23:59:47Cezary.Wagnersetmessages: + msg364128
title: Wrong file.tell() function results (Windows 10/Python 64 3.8.2/3.7 - no bug in PyPy2.6/Python2.7) -> Wrong file.tell() function results (Windows 10/Python 64 3.8.2/3.7 - no bug in PyPy3.6/Python2.7)
2020-03-13 23:58:47Cezary.Wagnersettitle: Wrong tell function results (Windows 10/Python 64 3.8.2) -> Wrong file.tell() function results (Windows 10/Python 64 3.8.2/3.7 - no bug in PyPy2.6/Python2.7)
2020-03-13 23:49:42Cezary.Wagnersettitle: Wrong tell function results. -> Wrong tell function results (Windows 10/Python 64 3.8.2)
2020-03-13 23:44:08Cezary.Wagnersetmessages: + msg364127
2020-03-13 23:39:01Cezary.Wagnercreate