This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author p-ganssle
Recipients doodspav, nmaynes, p-ganssle, xtreak
Date 2020-08-04.15:07:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1596553650.09.0.387297141821.issue41371@roundup.psfhosted.org>
In-reply-to
Content
I think for now skipping the tests when lzma is missing is the easiest thing, though another option would be to drop the compression on the input test data so that the tests don't depend on lzma.

Taking a look at the data files, it looks like we get around 50% compression using either lzma or gzip, but the uncompressed file is only 32k to start with:

    $ du -b tests/data/*
    31054   tests/data/zoneinfo_data.json
    15127   tests/data/zoneinfo_data.json.gz
    12895   tests/data/zoneinfo_data.json.lz

We're also currently using the "fat" binaries that `zic` produces (which includes hard-coded transitions all the way until 2038). The new default for `zic` is to produce "slim" binaries, and the script to update test data does nothing to explicitly request fat binaries. If we were to switch over to "slim" binaries, the result would be more like this:

    $ du -b tests/data/*
    8297    tests/data/zoneinfo_data_slim.json.gz
    7750    tests/data/zoneinfo_data_slim.json.lz
    15551   tests/data/zoneinfo_data_unc_slim.json

So we're still looking at ~2:1 compression for both gzip and lzma, but the overall file size is 50% of what it was to start with. The biggest downside to this is that the way the "slim" binaries work is that once a rule repeats indefinitely, `zic` stops producing explicit transitions for it, and falls back to a simple repeating rule, meaning that the current set of tests would take a different code path.

I think we can go with the following course of action (3 or 4 different PRs):

1. Start by skipping the tests when `lzma` is missing.
2. Update the test suite so that it is testing more or less the same thing when the binaries are compiled with `-b slim`.
3. Change `Lib/test/test_zoneinfo/data/update_test_data.py` so that it pulls the raw data from the `tzdata` module on PyPI (which is compiled with `-b slim`) instead of the user's machine.
4. Change `update_test_data.py` to stop using `lzma` and change the tests so that they are able to process the new format of the JSON files.

If we ever decide that we really want the compression again, I assume that `gzip` is found more commonly than `lzma` among systems that don't build the whole standard library, so it might be mildly preferable to switch to `gzip`.
History
Date User Action Args
2020-08-04 15:07:30p-gansslesetrecipients: + p-ganssle, xtreak, doodspav, nmaynes
2020-08-04 15:07:30p-gansslesetmessageid: <1596553650.09.0.387297141821.issue41371@roundup.psfhosted.org>
2020-08-04 15:07:30p-gansslelinkissue41371 messages
2020-08-04 15:07:29p-gansslecreate