Title: Remove unicode_format.h from stringlib
Components: Interpreter Core Versions: Python 3.8
Assigned To: eric.smith Nosy List: anthonypjshaw, eric.smith
Created on 2015-08-25 21:30 by eric.smith, last changed 2022-04-11 14:58 by admin.

msg249160 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-08-25 21:30
Objects/stringlib/unicode_format.h does not belong in stringlib. Back when it was originally written for 2.x, it used stringlib to provide the str and unicode versions of str.format, str.__format__, int.__format__, etc.

However, in 3.x, and especially with PEP 393 (Flexible String Representation), not only is the stringlib functionality no longer needed, it's not used at all.

My suggestion is to just copy the source into Objects/unicodeobject.c, which is the only place it's used. Then delete the stringlib file.

The only downside of including it in unicodeobject.c is that it makes our largest C file about 8% larger:

wc -l says:
1284  Objects/stringlib/unicode_format.h
15414 Objects/unicodeobject.c

There's some argument to be made to separate out the int.__format__, float.__format__ etc. code, and move them to some other library. I don't think they're a huge part of unicode_format.h. And to separate them out would require creating some _Py_* functions to do their work. But it's probably the right thing to do. I'll investigate.
msg249212 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-08-26 18:43
Actually, int.__format__, etc. are not in this file. So that's good.

The things that are in this file but are unrelated to unicodeobject.c are the support routines for implementing string.Formatter. I think I'll move those elsewhere, as a first step.
msg341626 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-06 19:47
Eric, there have been further changes to Objects/stringlib/unicode_format.h since this original note, I've raised a PR with the intent of your note from 2015.

There also hasn't been any change to the situation, unicode_format.h is only used in unicodeobject.c stil.
msg341627 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-06 19:48
> The things that are in this file but are unrelated to unicodeobject.c 
 are the support routines for implementing string.Formatter.

I'm not sure which functions that relates to, if you could let me know I'd be happy to add those to the PR.
msg341632 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-06 19:56
I think I meant things like PyFieldNameIter_Type, but it would require some analysis.
msg341642 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-06 20:42
The code is mostly:

FieldNameIterator * related functions
FormatterIterator * related functions
MarkupIterator * related functions

There are a few other utility methods in there as well
