Author p-ganssle
Recipients belopolsky, izbyshev, p-ganssle, serhiy.storchaka
Date 2018-08-21.22:05:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1534889129.36.0.56676864532.issue34454@psf.upfronthosting.co.za>
In-reply-to
Content
So this is related to something I was actually meaning to fix. When I wrote this code I didn't understand the way PyUnicode works, there's actually no need to call `PyUnicode_AsUTF8AndSize()` on the entire unicode string.

My understanding is that each glyph in a given PyUnicode object is the same size, which means that this section of the code can go: https://github.com/python/cpython/blob/master/Modules/_datetimemodule.c#L4862

Instead we can just break the string up as glyphs 0-10 and 11+ and pass them on. Since by the contract of the function glyphs 0-10 and 11+ *must* be ASCII, we no longer need to worry about *valid* use cases where a character un-representable by UTF-8 will lead to anything except an error.

Obviously the null pointer error needs to be fixed since it should raise an error and not segfault.

I'd be happy to do the part where the string is broken up *before* being passed to PyUnicode_AsUTF8AndSize() if it would make it easier to implement your PR (which seems to touch a lot of other parts of the code as well).
History
Date User Action Args
2018-08-21 22:05:29p-gansslesetrecipients: + p-ganssle, belopolsky, serhiy.storchaka, izbyshev
2018-08-21 22:05:29p-gansslesetmessageid: <1534889129.36.0.56676864532.issue34454@psf.upfronthosting.co.za>
2018-08-21 22:05:29p-gansslelinkissue34454 messages
2018-08-21 22:05:29p-gansslecreate