Message 374822 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	p-ganssle
Recipients	doodspav, nmaynes, p-ganssle, xtreak
Date	2020-08-04.15:07:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1596553650.09.0.387297141821.issue41371@roundup.psfhosted.org>
In-reply-to

Content
I think for now skipping the tests when lzma is missing is the easiest thing, though another option would be to drop the compression on the input test data so that the tests don't depend on lzma. Taking a look at the data files, it looks like we get around 50% compression using either lzma or gzip, but the uncompressed file is only 32k to start with: $ du -b tests/data/* 31054 tests/data/zoneinfo_data.json 15127 tests/data/zoneinfo_data.json.gz 12895 tests/data/zoneinfo_data.json.lz We're also currently using the "fat" binaries that `zic` produces (which includes hard-coded transitions all the way until 2038). The new default for `zic` is to produce "slim" binaries, and the script to update test data does nothing to explicitly request fat binaries. If we were to switch over to "slim" binaries, the result would be more like this: $ du -b tests/data/* 8297 tests/data/zoneinfo_data_slim.json.gz 7750 tests/data/zoneinfo_data_slim.json.lz 15551 tests/data/zoneinfo_data_unc_slim.json So we're still looking at ~2:1 compression for both gzip and lzma, but the overall file size is 50% of what it was to start with. The biggest downside to this is that the way the "slim" binaries work is that once a rule repeats indefinitely, `zic` stops producing explicit transitions for it, and falls back to a simple repeating rule, meaning that the current set of tests would take a different code path. I think we can go with the following course of action (3 or 4 different PRs): 1. Start by skipping the tests when `lzma` is missing. 2. Update the test suite so that it is testing more or less the same thing when the binaries are compiled with `-b slim`. 3. Change `Lib/test/test_zoneinfo/data/update_test_data.py` so that it pulls the raw data from the `tzdata` module on PyPI (which is compiled with `-b slim`) instead of the user's machine. 4. Change `update_test_data.py` to stop using `lzma` and change the tests so that they are able to process the new format of the JSON files. If we ever decide that we really want the compression again, I assume that `gzip` is found more commonly than `lzma` among systems that don't build the whole standard library, so it might be mildly preferable to switch to `gzip`.

I think for now skipping the tests when lzma is missing is the easiest thing, though another option would be to drop the compression on the input test data so that the tests don't depend on lzma.

Taking a look at the data files, it looks like we get around 50% compression using either lzma or gzip, but the uncompressed file is only 32k to start with:

$ du -b tests/data/*
31054 tests/data/zoneinfo_data.json
15127 tests/data/zoneinfo_data.json.gz
12895 tests/data/zoneinfo_data.json.lz

We're also currently using the "fat" binaries that `zic` produces (which includes hard-coded transitions all the way until 2038). The new default for `zic` is to produce "slim" binaries, and the script to update test data does nothing to explicitly request fat binaries. If we were to switch over to "slim" binaries, the result would be more like this:

$ du -b tests/data/*
8297 tests/data/zoneinfo_data_slim.json.gz
7750 tests/data/zoneinfo_data_slim.json.lz
15551 tests/data/zoneinfo_data_unc_slim.json

So we're still looking at ~2:1 compression for both gzip and lzma, but the overall file size is 50% of what it was to start with. The biggest downside to this is that the way the "slim" binaries work is that once a rule repeats indefinitely, `zic` stops producing explicit transitions for it, and falls back to a simple repeating rule, meaning that the current set of tests would take a different code path.

I think we can go with the following course of action (3 or 4 different PRs):

1. Start by skipping the tests when `lzma` is missing.
2. Update the test suite so that it is testing more or less the same thing when the binaries are compiled with `-b slim`.
3. Change `Lib/test/test_zoneinfo/data/update_test_data.py` so that it pulls the raw data from the `tzdata` module on PyPI (which is compiled with `-b slim`) instead of the user's machine.
4. Change `update_test_data.py` to stop using `lzma` and change the tests so that they are able to process the new format of the JSON files.

If we ever decide that we really want the compression again, I assume that `gzip` is found more commonly than `lzma` among systems that don't build the whole standard library, so it might be mildly preferable to switch to `gzip`.

History
Date	User	Action	Args
2020-08-04 15:07:30	p-ganssle	set	recipients: + p-ganssle, xtreak, doodspav, nmaynes
2020-08-04 15:07:30	p-ganssle	set	messageid: <1596553650.09.0.387297141821.issue41371@roundup.psfhosted.org>
2020-08-04 15:07:30	p-ganssle	link	issue41371 messages
2020-08-04 15:07:29	p-ganssle	create