classification
Title: Addition of a "list of available time zones" function to zoneinfo
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: p-ganssle Nosy List: FFY00, belopolsky, gaborbernat, lemburg, lukasz.langa, p-ganssle, vstinner
Priority: deferred blocker Keywords: patch

Created on 2020-05-06 17:19 by p-ganssle, last changed 2020-05-18 15:21 by lukasz.langa.

Pull Requests
URL Status Linked Edit
PR 20158 merged p-ganssle, 2020-05-17 18:06
Messages (9)
msg368285 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-06 17:19
One thing that I sort of overlooked in PEP 615 that I think will be a common feature request for zoneinfo is the ability to get a list of time zones available on the current TZPATH.

This is more complicated to implement than it sounds like, but luckily I already did it in order to implement the property tests for the zoneinfo module:

https://github.com/pganssle/zoneinfo/blob/ffd21a6d065e04725e04b37bb430c2559fefd5fa/tests/test_zoneinfo_property.py#L23-L70

The biggest complication is that there are files on TZPATH that are not necessarily time zones, and when I looked I couldn't easily find a list of all available time zones, so the only way to tell which ones are time zones and which ones aren't is to open each one and read the first 4 bytes. However, `tzdata` does ship with a list of available time zones, so the way I've got the function working right now is:

1. If `tzdata` is installed – despite the fact that it's technically the end of the search path, convert the list of available zones that ships with `tzdata` to your starting set (since you know these are all valid zones).
2. Walk the `tzpath`, and every time you find something not in the set of valid keys, open it and read the first 4 bytes, then add that to the set.

The common cases will be that `tzdata` is not available (in which case no harm done), or `tzdata` has the same set of keys as the TZPATH, in which case you never have to open any of the TZif files. The fact that the search order is inverted from what it normally is doesn't matter because the output is the union of all the sets anyway.

I don't know that the PEP needs to be amended – if this were a feature request for Python 3.10 I don't think it would need a PEP to go through, but I don't mind amending the PEP to document it.

Design decisions (with my initial answers, loosely held):

1. Should this be a classmethod or a free-standing function?

I'm inclined towards free-standing function because zoneinfo.TZPATH is at the module level and not the class level.

2. What should the return type be?

I think frozenset() or set(); a sorted list is also a reasonable option, but for people who just want to use "in" or show some subset of available zones, that would be wasteful.

We should go with frozenset() if we want there to be some possibility of caching the result in the future.

3. Should we try to cache the result?

I would say no, at least for now. It would not be super easy to get the cache invalidation right in the general case, and not many people are likely to be sensitive to the performance of this operation – the people who are can easily create their own cache.

4. What should it be called?

Naming things is hard. Options:

- zoneinfo.timezones()
- zoneinfo.all_timezones()
- zoneinfo.timezone_{list,set}()
- zoneinfo.valid_timezones()
- zoneinfo.valid_keys()
- zoneinfo.available_timezones()

`pytz` has a similar thing and calls it all_timezones. It also has something called common_timezones, but that is a bit harder to pull off and I'm not confident that we'll get something satisfactory before the feature freeze.
msg368530 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-09 17:06
I have an initial implementation against the reference implementation here: https://github.com/pganssle/zoneinfo/pull/60

Once GH-19909 is merged, I will turn that into a PR against CPython.

For the first pass I went with:

1. free-standing function
2. returning set()
3. no cache
4. available_timezones()
msg368799 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-13 19:59
From some discussion on the reference implementation PR, it seems that this may be a more complicated feature than I had bargained for: https://github.com/pganssle/zoneinfo/pull/60

The issue is that the current implementation picks up the posix/ and right/ directories as well as the posixrules file, none of which is *wrong* — they are available time zones, after all — but they're not really canonical zones. In `tzdata` we have a list of zones sourced from tzdata's `make zonenames`, which does not include the posix/ and right/ directories.

There is `zone1970.tab` and `zone.tab`, but that has *too* strict of a subset — it only includes canonical zones associated with a specific country, as far as I can tell, and not things like UTC. It also includes, for example, America/Nuuk but not America/Godthab (which was the canonical name up until 2020a).

I'm considering postponing this feature to Python 3.10 so that we can have more time to figure out a decent API for this.

Łukasz: Question for you as release manager — would you prefer that we put this in for 3.9 with the understanding that before the final release we may end up needing to remove it as a nuisance or change what it returns (and possibly flags that would be passed to it), or would that be too disruptive in the beta period? I'm confident that we can make a final decision before October, just not confident that we can make a final decision before Monday.
msg369143 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-17 18:09
I've opened a PR adding this feature and tagged this as release blocker, since I'd like to make sure this is in the beta if Łukasz does not object.

The concerns about the stability of the function I expressed earlier have not changed much, though we did get some feedback from the tz database list that at least make me think that the new approach (excluding posix/, right/ and posixrules) is *probably* the right way to go.
msg369148 - (view) Author: gaborbernat (gaborbernat) * Date: 2020-05-17 19:26
I think semantically the correct expression would be available timezones, free function, set and no cache. Document the operation is expensive and users should cache if they want repeated access or provide an available timezones cached function 👍 my 2c
msg369162 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-18 01:55
New changeset e527ec8abe0849e784ce100f53c2736986b670ae by Paul Ganssle in branch 'master':
bpo-40536: Add zoneinfo.available_timezones (GH-20158)
https://github.com/python/cpython/commit/e527ec8abe0849e784ce100f53c2736986b670ae
msg369163 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-18 01:56
I've merged the existing implementation, but I'm leaving this staged as "release blocker" so that Łukasz can have final say on whether this goes into 3.9.

Thanks for the help everyone!
msg369209 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-18 12:47
def valid_key(fpath):
        try:
            with open(fpath, "rb") as f:
                return f.read(4) == b"TZif"
        except Exception:  # pragma: nocover
            return False


Why not only catching "except OSError:" rather than any exception?

Or even "except (FileNotFoundError, PermissionError):" if you want to be pedantic :-p
msg369243 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-05-18 15:21
This goes into 3.9, but please settle on a final design long before October. Scratch that, please do it ideally before beta 2.

I'm downgrading from release blocker, exception handling can be discussed and changed after beta 1 as well.
History
Date User Action Args
2020-05-18 15:21:48lukasz.langasetpriority: release blocker -> deferred blocker
2020-05-18 15:21:42lukasz.langasetmessages: + msg369243
2020-05-18 12:47:57vstinnersetnosy: + vstinner
messages: + msg369209
2020-05-18 01:56:54p-gansslesetmessages: + msg369163
2020-05-18 01:55:15p-gansslesetmessages: + msg369162
2020-05-17 19:26:48FFY00setnosy: + FFY00
2020-05-17 19:26:00gaborbernatsetnosy: + gaborbernat
messages: + msg369148
2020-05-17 18:09:42p-gansslesetpriority: normal -> release blocker

messages: + msg369143
2020-05-17 18:06:44p-gansslesetpull_requests: + pull_request19461
2020-05-13 19:59:37p-gansslesetassignee: p-ganssle

messages: + msg368799
nosy: + lukasz.langa
2020-05-11 17:15:34p-gansslesetpull_requests: - pull_request19344
2020-05-11 17:14:31p-gansslesetkeywords: + patch
stage: patch review
pull_requests: + pull_request19344
2020-05-09 17:06:15p-gansslesetmessages: + msg368530
2020-05-06 17:19:51p-gansslecreate