This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steve.dower
Recipients doerwalter, lemburg, paul.moore, serhiy.storchaka, steve.dower, terry.reedy, tim.golden, zach.ware
Date 2019-08-02.21:34:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1564781660.58.0.735421605814.issue36311@roundup.psfhosted.org>
In-reply-to
Content
If we reduce our chunk size below INT_MAX, then we avoid the issue entirely. Our logic for hitting the middle of a multibyte character is fine (perhaps fixed since this issue was opened?), there's just a weird edge case at 2 GiB in the API call.

As a bonus, smaller chunks seems to have a performance benefit too. It seems like INT_MAX/4 is the sweet spot - it took about a quarter of the time for my 2GiB test case as INT_MAX (and we're measuring in tens of seconds here, so I'm pretty comfortable with the direction of the result). INT_MAX/2 and INT_MAX/8 were both slower than INT_MAX/4.
History
Date User Action Args
2019-08-02 21:34:20steve.dowersetrecipients: + steve.dower, lemburg, doerwalter, terry.reedy, paul.moore, tim.golden, zach.ware, serhiy.storchaka
2019-08-02 21:34:20steve.dowersetmessageid: <1564781660.58.0.735421605814.issue36311@roundup.psfhosted.org>
2019-08-02 21:34:20steve.dowerlinkissue36311 messages
2019-08-02 21:34:20steve.dowercreate