classification
Title: _zoneinfo.c incorrectly checks bounds of `day` variable in calenderrule_new
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, p-ganssle, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-12-16 15:59 by p-ganssle, last changed 2020-12-17 21:16 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 23825 open serhiy.storchaka, 2020-12-17 21:05
Messages (10)
msg383180 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-16 15:59
This is a code style issue — in https://github.com/python/cpython/pull/23614, a regression was deliberately introduced to satisfy an overzealous compiler. The `day` variable has logical bounds `0 <= day <= 6`. In the original code, both sides of this boundary condition were explicitly checked (since this logically documents the bounds of the variable).

Some compilers complain about checking `day < 0`, because `day` is an unsigned type. It is not an immutable fact that `day` will always be an unsigned type, and implicitly relying on this fact makes the code both less readable and more fragile. This was changed over my objections and despite the fact that I had a less fragile solution available that also satisfied the overzealous compiler.

In the short term, my preferred solution would be to add in a static assertion that `day` is an unsigned type — this does not have to work on every platform, it simply needs to serve as a notification to make the code less fragile and to document our assumptions to both readers and the compiler.

In the long term, I think we need a way to solve the problem that it is apparently not possible to disable any compiler warnings even if they don't apply to the situation!
msg383182 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-16 16:27
I do not think there is any problem *now*. The type of day is uint8_t, it cannot be negative.

BTW, why is it uint8_t? Would not be easier to use type int by default? I doubt that it saves memory or makes the code faster.
msg383186 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-16 16:38
As I mentioned, it's a style issue. Yes this did not introduce any user-observable bugs, nor should it have changed the machine code emitted by the assembly on any competent compiler.

The issue is that I had deliberately chosen to do a "redundant" check to improve the readability and to be robust against someone saying, "Why is this a unit8_t instead of an int? I don't think it makes anything faster..." or some other justification for changing all the uint8_t fields to ints.

Previous to this, we had something where the compiler would fix any bugs caused by that by itself by emitting an additional bounds check. In my proposed fix to the compiler warnings, there was a static assertion where the machine would alert us if the situation changed. The problem is that we now have a non-machine-checked comment for something that would be difficult to issue tests for.

Frankly, I think that even the worst case scenario here isn't that bad — TZif files malformed in a very specific way would be accepted as valid. As it's coded now, this would probably just lead to a crash later, or possibly some weird results in time zone math. Still, I think we need a solution to this problem because our blind adherence to an invalid compiler warning is leading us to write more fragile code, whic h is especially problematic in C, which is notoriously dangerous and problematic to write.
msg383187 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2020-12-16 16:48
> Some compilers complain about checking `day < 0`, because `day` is an unsigned type

Just my two cents, this isn't just "some compilers". Everything from gcc, msvc, C# to the rust compiler complain about this sort of code. As they should, this is effectively dead code.

I think the more pragmatic way to enforce and document this assumption would be to have a unit test that actually checks that the constructor fails with "negative" days. It'll continue to fail right now as its interpretation as an unsigned int will be large and it will start failing if someone changes this to a signed type.
msg383188 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-16 17:07
> Just my two cents, this isn't just "some compilers". Everything from gcc, msvc, C# to the rust compiler complain about this sort of code. As they should, this is effectively dead code.

They complain because most of the time it's a sign that people were intending to use a signed integer, and it's an indicator that they may be susceptible to integer overflow.

In this case, it was a deliberate choice to include the extra check knowing it's dead code, because it is essentially a machine-checked documentation of the bounds of that particular variable.


> I think the more pragmatic way to enforce and document this assumption would be to have a unit test that actually checks that the constructor fails with "negative" days. It'll continue to fail right now as its interpretation as an unsigned int will be large and it will start failing if someone changes this to a signed type.

This would be helpful, but it's not an either-or situation. This particular thing would be tricky to write a targeted unit test for, because it's a very deep implementation detail. Such a unit test would also be quite physically separated from the code in question, whereas if we left the code as-is, we'd never see this type of bug, and if we added a static assertion we'd be told at compile time exactly what is wrong.

The biggest problem here is the fact that we cannot disable compiler warnings when complying with them makes the code less readable and/or less robust. This makes compiler warnings less useful, since it changes the calculus around whether or not it's worth it to introduce a new compiler warning.
msg383192 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-16 17:47
I propose to change the type of day to int. It will remove any objections againsy the range check and make the code less error-prone for integer overflow and sign loss.
msg383200 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-16 18:47
There are at least two issues with this:

1. This is a constructor for a struct, and the struct would really unnecessarily balloon in size if we switched everything over to be 32-bit integers (which I think is your suggestion?). This is not a major concern because in typical operation there are not likely to be more than 500 of these structs in existence at any one time (due to the cache), but it's still a minor annoyance.

I would guess that accepting `int` in the constructor function and converting it to uint8_t as part of building the struct would be maybe neutral to negative, since it involves casting a large signed type to a small unsigned one. This could cause more problems with overflow, not less.

2. In the current implementation `day` is unsigned by its nature — it represents a value indexed at 0 for which negative indices have no meaning. If we leave `day` as a uint8_t, then that tells the compiler at least something about the valid range of the value. By turning `day` (and presumably the other elements of this struct) into a less-suitable type, we're depriving ourselves of possibly *useful* compiler warnings.

Barring a solution where we have a simple and robust way to suppress warnings, I think the next-best solution is to add a static assertion about the signedness of the type for this particular case. I'd be happy to write it in a relatively general way so that it can be re-used elsewhere in CPython for the same purpose.
msg383202 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-16 19:08
I propose to change types of function parameters and local variables. In any case the constructor checks the range of parameters, so there is no problem with packing them into compact structure.

This will help to avoid errors of implicit conversion between different integer types. Also it can help to avoid code duplication in parsing integers of different size and signedness.

Adding a static assertion about the signedness of uint8_t looks meaningless to me.
msg383203 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-16 19:24
> Adding a static assertion about the signedness of uint8_t looks meaningless to me.

My proposal is to add a static assertion that `day` is an unsigned type. In the code it would look something like it does in the backports.zoneinfo code (https://github.com/pganssle/zoneinfo/blob/07ec80ad5dc7e7e4b4f861ddbb61a9b71e9f27c7/lib/zoneinfo_module.c#L1247-L1255):


```
    // The actual bounds of day are (day >= 0 && day <= 6), but since day is an
    // unsigned variable, day >= 0 is always true. To ensure that a bug is not
    // introduced in the event that someone changes day to a signed type, we
    // will assert that it is an unsigned type.
    Py_ASSERT_VAR_UNSIGNED(day);
    if (day > 6) {
        PyErr_Format(PyExc_ValueError, "Day must be in [0, 6]");
        return -1;
    }
```

> I propose to change types of function parameters and local variables. In any case the constructor checks the range of parameters, so there is no problem with packing them into compact structure.

> This will help to avoid errors of implicit conversion between different integer types. Also it can help to avoid code duplication in parsing integers of different size and signedness.

This is not entirely unreasonable in my opinion, though in this case everything is determined by the POSIX standard and an RFC, so it is unlikely that we'll ever see anything outside of 8 bit integers for these things. If you'd like to refactor the parsing code to use signed integers unconditionally and have them converted as part of the constructor that seems reasonable.
msg383266 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-17 21:16
PR 23825 refactors parsing. There is now a single function for parsing numbers. It saves around 70 lines of code and makes easy to implement support of full range of transition hours.

It could be possible to represent transition time as 32-bit offset in seconds in CalendarRule and DayRule instead of 16-bit hour, and 8-bit minute and second. In any case we need that value instead of separate values of hour, minute and second.
History
Date User Action Args
2020-12-17 21:16:01serhiy.storchakasetmessages: + msg383266
2020-12-17 21:05:55serhiy.storchakasetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request22687
2020-12-16 19:24:28p-gansslesetmessages: + msg383203
2020-12-16 19:08:22serhiy.storchakasetmessages: + msg383202
2020-12-16 18:47:18p-gansslesetmessages: + msg383200
2020-12-16 17:47:33serhiy.storchakasetmessages: + msg383192
2020-12-16 17:07:26p-gansslesetmessages: + msg383188
2020-12-16 16:48:22ammar2setnosy: + ammar2
messages: + msg383187
2020-12-16 16:38:20p-gansslesetmessages: + msg383186
2020-12-16 16:27:12serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg383182
2020-12-16 15:59:28p-gansslecreate