This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: SpooledTemporaryFile does not seek correctly after being rolled over
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Gary Fernie, James Hennessy, graham.coster, martin.panter, methane, nubirstein, r.david.murray, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2019-10-28 23:25 by graham.coster, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
spooled-temp-file-does-not-seek-correctly-after-being-rolled.py graham.coster, 2019-10-28 23:25 Script demonstrating StopIteration exception after file is rolled over
Messages (7)
msg355602 - (view) Author: Graham Coster (graham.coster) * Date: 2019-10-28 23:46
Hi there,

As demonstrated in the attached script, I found that a StopIteration exception occurs once a SpooledTemporaryFile has been rolled over from memory to disk, even though seek(0) had move the file position to the top of the file.

I had previously been using a TemporaryFile, which wrote directly to disk, without any issues.  However, once I swapped to a SpooledTemporaryFile, the StopIteration exception started to appear.  

I'm guessing the problem relates to the csv reader having an Iterator that is no longer looking at the correct file after the SpooledTemporaryFile is rolled-over.  I found that instantiating a new csv reader post roll-over works around the problem.

I had expected that I could just swap from a TemporaryFile to a SpooledTemporaryFile with no change to behaviour. However, if this is not the case, perhaps the SpooledTemporaryFile documentation should explicitly state this.

Cheers, Graham.
msg357410 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-11-24 19:54
The docs currently say "The returned object is a file-like object whose _file attribute is either an io.BytesIO or io.StringIO object (depending on whether binary or text mode was specified) or a true file object, depending on whether rollover() has been called."  The fact that taking an iterator gets you whatever the *current* _file object is is implied by that but not made explicit.  A doc update to make that explicit would probably be appropriate.
msg357423 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-11-25 02:37
SpooledTemporaryFile has very serious bug which causes data corruption (#26730).  Please don't use it with text mode until it is fixed.
msg357473 - (view) Author: Graham Coster (graham.coster) * Date: 2019-11-26 01:48
This may be a silly question, however, does SpooledTemporaryFile need to exist at all?

From some testing on macOS, SpooledTemporaryFile appeared to never have a performance advantage over OS file caching, but with max_size greater than 4GB, it was a significant disadvantage.   

I found that the macOS built-in file cache was increasing in size as I wrote bigger TemporaryFile files, up to some limit the OS had decided. So, it seems the OS is automatically doing the same job as SpooledTemporaryFile.  Once the OS decided to write to disk, there was no sudden hit to performance, it just slowed down.

However, when SpooledTemporaryFile rolled-over large max_size files, there was a temporary big hit to performance, which then became a consistent slow down the same as TemporaryFile.

A big issue came with very large SpooledTemporaryFile  max_sizes hogging RAM and causing the OS to start swapping all processes.  This caused a huge performance hit to my program and the system as a whole. Once my program did finish, it took the system considerable time to reclaim swap.

I’m guessing SpooledTemporaryFile may have benefits on light weight embedded OSes that have no, or poor, file caching.  However, tuning the max_size to work with embedded systems’ limited RAM could be tricky for developers and would be hardware dependent. So, perhaps leaving file caching to operating systems is actually a better, and safer, option than offering it in Python?

If there are no benefits to SpooledTemporaryFile, should it be deprecated? If so, as it is phasesd out, could it be patched to be a TemporaryFile wrapper, with no rollover functionality?

> On 25 Nov 2019, at 1:37 pm, Inada Naoki <report@bugs.python.org> wrote:
> 
> 
> Inada Naoki <songofacandy@gmail.com> added the comment:
> 
> SpooledTemporaryFile has very serious bug which causes data corruption (#26730).  Please don't use it with text mode until it is fixed.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue38625>
> _______________________________________
msg406446 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-11-17 03:57
Is this bug fixed by #26730?
msg406448 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-11-17 04:08
I confirmed that this bug is fixed, but I found another error.
msg406455 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-11-17 05:49
The another error I found is already reported as #42868.
History
Date User Action Args
2022-04-11 14:59:22adminsetgithub: 82806
2021-11-17 05:49:37methanesetmessages: + msg406455
2021-11-17 04:08:52methanesetstatus: open -> closed
stage: resolved
resolution: fixed
versions: + Python 3.8, Python 3.9
2021-11-17 04:08:21methanesetmessages: + msg406448
2021-11-17 03:57:55methanesetmessages: + msg406446
2019-11-26 01:48:04graham.costersetmessages: + msg357473
2019-11-25 02:37:27methanesetmessages: + msg357423
2019-11-24 19:54:43r.david.murraysetmessages: + msg357410
2019-10-28 23:46:46graham.costersetmessages: + msg355602
2019-10-28 23:25:13graham.costercreate