classification
Title: [EASY] hex() documentation: mention "%x" % int
Type: Stage:
Components: Documentation Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Manvi B, Mariatta, Sharan Yalburgi, docs@python, eric.smith, ezio.melotti, haypo, serhiy.storchaka, wolma
Priority: normal Keywords: easy, patch

Created on 2016-03-07 17:47 by haypo, last changed 2017-07-06 19:31 by Mariatta.

Files
File name Uploaded Description Edit
issue26506.diff Manvi B, 2016-03-20 11:28 review
issue26506.diff Manvi B, 2016-03-21 08:33 review
issue26506.diff Manvi B, 2016-03-21 08:42 review
issue26506.diff Manvi B, 2016-03-22 07:46 review
Pull Requests
URL Status Linked Edit
PR 2479 closed Sharan Yalburgi, 2017-06-28 16:54
PR 2525 merged Manvi B, 2017-07-01 10:17
Messages (29)
msg261308 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-07 17:47
I regulary see Python code using hex(value)[2:], whereas "%x" % value does the same thing. We should mention "%x" % value in the hex() doc. Maybe also mention "%#X" % value to format in upper case?
msg261312 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-03-07 18:58
For 3.5 and 2.7, I'd suggest:
format(value, 'x')
or:
format(value, 'X')

Although you might disagree because of the verbosity. But at least you're not parsing a string at runtime. 

And for 3.6 with PEP-498:
f'{value:x}'

There are of course options for padding and adding the '0x', as well.
msg261478 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-03-09 21:15
Your two suggestions prompted me to do a speed comparison between them and the result surprised me.

I tried:

import random
nums = [random.randint(0, 255) for n in range(10000000)]

then timed the simple:

for n in nums:
    hx = '%X' % n  # or hx = format(n, 'X')

I also tested a number of more complex formats like:
hx = '%{:02X}'.format(n) vs hx = '%%%02X' % n

In all cases, the old vs new formatting styles are rather similar in speed in my system Python 2.7.6 (with maybe a slight advantage for the format-based formatting).
In Python 3.5.0, however, old-style %-formatting is much speedier than under Python 2, while new-style formatting doesn't appear to have changed much, with the result that %-formatting is now between 30-50% faster than format-based formatting.

So I guess my questions are:

- are my timings wrong?

and if not:

- how got %-formatting improved (generally? or for %X specifically?)
- can this speed up be transferred to format-based formatting somehow?
msg261479 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-03-09 21:43
Without lots of analysis (and disassembly), I can't speak to how valid your tests are, but they don't seem unreasonable.

format() will always be slower, because it's more general (primarily in that it can be extended to new types). Plus, it involves at least a name lookup that %-formatting can skip. The usual ways to optimize this lookup holds here, too, if speed is really that critical (which I'm skeptical of).

For example, say you had a custom type which implemented __format__ to understand the "X" format code. Using format(), this type could format itself as hex. %-formatting can't do that.

In any event, I don't think we want to promulgate the fastest way to do a hex conversion, just the clearest.

I can't say why format() in 3.5 is slower. There are many changes and tracking it down would be quite time consuming.
msg261480 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-03-09 21:47
Ah, but it's not that format() is slower in 3.5, but that %-formatting got faster.
It looks as if it got optimized and I was wondering whether the same optimization could be applied to format().
msg261481 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-09 23:16
> Ah, but it's not that format() is slower in 3.5, but that %-formatting got faster.

Hum, python3 looks faster on this dummy microbenchmark yeah. Who said that Python 3 is slower? :-)

$ python2 -m timeit -s 'import random; nums = [random.randint(0, 255) for n in range(10**5)]' '["%x" % x for x in nums]'
10 loops, best of 3: 43.7 msec per loop

$ python3 -m timeit -s 'import random; nums = [random.randint(0, 255) for n in range(10**5)]' '["%x" % x for x in nums]'
10 loops, best of 3: 19.2 msec per loop

I spent a lot time to micro-optimize str%args, str.format(args), and operations on str in general in Python 3. I wrote a first article to explain my work on optimization:
https://haypo.github.io/pybyteswriter.html

I have a draft article explaning other kinds of optimizations related to the PEP 393.

> It looks as if it got optimized and I was wondering whether the same optimization could be applied to format().

str.format(args) was also optimized, but it's still faster than str%args.

On Python 3, "%x" % 0x1234abc takes 17 nanoseconds according to timeit. It's super fast! Any extra work can have a non negligible overhead. For example, it's known that operators are faster than functions in Python. One reason is that a calling requires to lookup the function in namespaces (local, global or builtin namespaces). It can be even worse (slower) to lookup a method (especially with custom __getattr__ method).

--

Hum, I don't recall why you started to talk about performance :-D

Why not documenting "%x" % value *and* format(value, 'x')?

I prefer "%x" % value. I never use format(value, ...) but sometimes I use "{0:x}".format(value).

f'{x:value}' looks too magical for me.
msg261494 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-10 11:21
> I regulary see Python code using hex(value)[2:]

In fact, it's even worse, I also saw Python 2 code stripping trailing "L", since hex(long) adds a L suffix...

$ python2
Python 2.7.10 (default, Sep  8 2015, 17:20:17) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2
>>> hex(123L)
'0x7bL'
>>> "%x" % 123L
'7b'
>>> format(123L, "x")
'7b'
>>> "%#x" % 123L
'0x7b'
>>> format(123L, "#x")
'0x7b'
msg261516 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-03-10 17:25
> Hum, python3 looks faster on this dummy microbenchmark yeah. Who said that Python 3 is slower? :-)

If you're alluding to that seemingly endless thread over on python-list, let me say that it is not my motivation to start anything like that here. Sorry also if I sort of hijacked your documentation issue with my performance question.

I really only wondered whether there would be any argument for or against any of the two versions (%-interpolation, format-based) other than stylistic ones.
That's why I ran the micro-benchmark and, in fact, I was expecting %-interpolation to be faster exactly because it is less flexible.
What I am surprised by is not the fact that %-interpolation got faster in Python3, but the fact that format didn't.
I was wondering whether %-interpolation maybe takes some fast path in Python3 that simply wasn't implemented for format. If that was the case it could have been rewarding to just optimize format the same way.
As I know Victor is working on performance stuff I thought I'd just ask here, but from your answer I gather that things are rather not so simple and that's ok.

> I wrote a first article to explain my work on optimization:
https://haypo.github.io/pybyteswriter.html

Thanks for the link.

> str.format(args) was also optimized, but it's still faster than str%args.

You mean slower I assume ?

> Hum, I don't recall why you started to talk about performance :-D

See above.

> Why not documenting "%x" % value *and* format(value, 'x')?

> I prefer "%x" % value. I never use format(value, ...) but sometimes I use "{0:x}".format(value).

I prefer the last version, use the first sometimes, but documenting several ways seems reasonable.
msg261518 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-10 17:47
> That's why I ran the micro-benchmark and, in fact, I was expecting %-interpolation to be faster exactly because it is less flexible.

Actually %-interpolation is more flexible.

>>> '%x' % 123
'7b'
>>> '%0X' % 123
'7B'
>>> '%#x' % 123
'0x7b'
>>> '%04x' % 123
'007b'

If document alternatives for hex(), we should also document formatting alternatives for bin(), oct(), repr(), ascii(), str(), chr(), str.ljust(), str.rjust(), str.center(), str.zfill().
msg261542 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-11 06:38
Serhiy Storchaka added the comment:
> If document alternatives for hex(), we should also document formatting
alternatives for bin(), oct(),

Ok for these two since they also add a prefix. But I don't see the point of
documenting alternatives for the other listed functions. The matter here is
the 0x prefix.
msg261544 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-11 06:56
There is no harm if use hex(value)[2:]. It's a matter of taste.

We can mention "%x" % value for the case if the user just doesn't know about this alternative. The same is for value.ljust(5) and '%-5s' % value.
msg261549 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-11 08:37
I opened the issue when I read this change:
https://review.openstack.org/#/c/288224/2/neutron/common/utils.py

    rndstr = hex(...)[2:]
    # Whether there is a trailing 'L' is a py2/3 incompatibility
    rndstr = rndstr.rstrip('L')
    return rndstr.zfill(length)

can be simply written

    return "{0:0{1}x}".format(..., length)

It's less readable, but it's more efficient.
msg261551 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-11 08:51
I agree with you and always prefer formatting strings.

Your example shows that at least an alternative to str.zfill() should be mentioned in the educational purposes.

With C-style formatting your example can be written more laconically:

    return "%0*x" % (length, ...)
msg262068 - (view) Author: Manvi B (Manvi B) * Date: 2016-03-20 11:28
Modified documentation for the functions bin(), hex() and oct() as mentioned in the comments. Submitted the patch.
msg262084 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-20 17:14
You misunderstood the whole purpose of my issue! You must not write
hex()[:2] (it's inefficent)! Please remove it from your patch.
msg262085 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-20 17:23
The documentation for hex() doesn't look the bests place for examples of using string formatting. I think it is enough to add short references to corresponding formatting codes.
msg262106 - (view) Author: Manvi B (Manvi B) * Date: 2016-03-21 08:33
Removed hex()[:2] from the patch.
msg262107 - (view) Author: Manvi B (Manvi B) * Date: 2016-03-21 08:42
Modified the patch with '%x' % value.
msg262108 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-21 09:00
Serhiy Storchaka:
> The documentation for hex() doesn't look the bests place for examples of using string formatting. I think it is enough to add short references to corresponding formatting codes.

I like Manvi B's patch with many examples. It's hard to read formatting strings, it's hard to compute the result, so full examples are just more obvious.

I don't think that it hurts to add many formatting examples. I expect that most users will combine the result of bin/hex/oct with another string, so suggesting using formatting functions will probably help them to simplify the code.

For example,
   print("x=", hex(x), "y=", hex(y))
can be written:
   print("x=%#x y=%#x" % (x, y))
or
   print("x={:#x} y={:#x}".format(x, y))
or
   print(f"x={x:#x} y={y:#x}")

The first expression using hex() adds spaces after "=", but well, it's just to give a simple example. IMHO formatting strings are more readable.
msg262109 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-03-21 09:02
> The documentation for hex() doesn't look the bests place for examples
> of using string formatting. I think it is enough to add short
> references to corresponding formatting codes.

I think those examples take too much space compared to the actual docs of the functions.

I can think of 3 possible solutions:

1) keep the examples but condense them so that they don't take so much space:
>>> n = 255
>>> f'{n:#x}', format(n, '#x'), '%#x' % n
('0xff', '0xff', '0xff')
>>> f'{n:x}', format(n, 'x'), '%x' % n
('ff', 'ff', 'ff')
>>> f'{n:X}', format(n, 'X'), '%X' % n
('FF', 'FF', 'FF')

or

>>> '%#x' % 255, '%x' % 255, '%X' % 255
('0xff', 'ff', 'FF')
>>> format(255, '#x'), format(255, 'x'), format(255, 'X')
('0xff', 'ff', 'FF')
>>> f'{255:#x}', f'{255:x}', f'{255:X}'
('0xff', 'ff', 'FF')

(the latter should only go in 3.6 though)

2) add a direct link to https://docs.python.org/3/library/string.html#format-examples where there are already some examples (more can be added if needed);

3) add a single footnote for all 3 functions that includes examples using old/new string formatting and f-strings, mentions the fact that # can be used to omit the prefix and the fact that b/o/x and B/O/X can be used for lowercase and uppercase output.

FWIW I don't think that performances matter too much in this case, but I also dislike hex(value)[2:] and agree it should not be mentioned.
msg262110 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-21 09:23
Ezio Melotti added the comment:
> I can think of 3 possible solutions:
>
> 1) keep the examples but condense them so that they don't take so much space:
>>>> n = 255
>>>> f'{n:#x}', format(n, '#x'), '%#x' % n
> ('0xff', '0xff', '0xff')
>>>> f'{n:x}', format(n, 'x'), '%x' % n
> ('ff', 'ff', 'ff')
>>>> f'{n:X}', format(n, 'X'), '%X' % n
> ('FF', 'FF', 'FF')

Hum. It's not easy to read these complex formatting strings when they are written like that.

> or
>
>>>> '%#x' % 255, '%x' % 255, '%X' % 255
> ('0xff', 'ff', 'FF')
>>>> format(255, '#x'), format(255, 'x'), format(255, 'X')
> ('0xff', 'ff', 'FF')
>>>> f'{255:#x}', f'{255:x}', f'{255:X}'
> ('0xff', 'ff', 'FF')

I really prefer when the same kind of the formating strings are written on the same line. I really like this example. Short, obvious, easy to read.

I have a prefererence for an example using a variable name rather than a number literal. It's more common to manipulate variables than number literals.

If you use a variable, please use a variable name longer than "n" to get more readable example. Otherwise, it's not obvious what is in the variable name in "{n:x}": is "n" the variable? is "x" the variable?


In short, I suggest this example:

>>> value = 255
>>> '%#x' % value, '%x' % value, '%X' % value
('0xff', 'ff', 'FF')
>>> format(value, '#x'), format(value, 'x'), format(value, 'X')
('0xff', 'ff', 'FF')
>>> f'{value:#x}', f'{value:x}', f'{value:X}'
('0xff', 'ff', 'FF')


Note: Ezio, do you prefer format(value, 'x) for '{:x}'.format(value)?


> 2) add a direct link to https://docs.python.org/3/library/string.html#format-examples where there are already some examples (more can be added if needed);

IMHO it's ok to add formatting examples to bin/hex/oct. Using your compact example, it's not going to make the doc too long.
msg262119 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-03-21 11:11
> Note: Ezio, do you prefer format(value, 'x') for '{:x}'.format(value)?

While formatting a single value the former is better/shorter, but the latter is perhaps more common since you usually have something else in the string too.

The latter can also be used to do something like:
>>> '{num:x} {num:X} {num:#x} {num:#X}'.format(num=255)
'ff FF 0xff 0XFF'
msg262168 - (view) Author: Manvi B (Manvi B) * Date: 2016-03-22 07:46
Considered the reviews from STINNER Victor (haypo) and comments, the patch is modified.
msg297104 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-06-28 01:10
Can someone pick the last patch and convert it to a pull request? CPython moved to GitHub in the meanwhile! See http://docs.python.org/devguide/ ;-)
msg297195 - (view) Author: Sharan Yalburgi (Sharan Yalburgi) * Date: 2017-06-28 16:31
Hey, I am new to Open Source, can I work on this?
msg297196 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-06-28 16:34
> Hey, I am new to Open Source, can I work on this?

Hi, did you read http://docs.python.org/devguide/ ? IMHO its a good start. You can also join the https://www.python.org/dev/core-mentorship/ group to get help!
msg297198 - (view) Author: Mariatta Wijaya (Mariatta) * (Python committer) Date: 2017-06-28 17:01
When uploading patch from another person, please include "Original patch by <original author>" in the PR, and the commit message.
Thanks.
msg297200 - (view) Author: Sharan Yalburgi (Sharan Yalburgi) * Date: 2017-06-28 17:05
> Hi, did you read http://docs.python.org/devguide/ ? IMHO its a good start. You can also join the https://www.python.org/dev/core-mentorship/ group to get help!

Yes I did. Thank you. I have made a PR. I says I haven't signed CLA yet. I am doing that right now.

> When uploading patch from another person, please include "Original patch by <original author>" in the PR, and the commit message.

Will do that thank you.
msg297836 - (view) Author: Mariatta Wijaya (Mariatta) * (Python committer) Date: 2017-07-06 19:31
New changeset 67ba4fa467ffff825d6a0c0a21cc54ff1df2ed1b by Mariatta (Manvisha Kodali) in branch 'master':
bpo-26506: hex() documentation: mention %x % int (GH-2525)
https://github.com/python/cpython/commit/67ba4fa467ffff825d6a0c0a21cc54ff1df2ed1b
History
Date User Action Args
2017-07-06 19:31:00Mariattasetmessages: + msg297836
2017-07-01 10:17:19Manvi Bsetpull_requests: + pull_request2590
2017-06-28 17:05:24Sharan Yalburgisetmessages: + msg297200
2017-06-28 17:01:56Mariattasetnosy: + Mariatta
messages: + msg297198
2017-06-28 16:54:36Sharan Yalburgisetpull_requests: + pull_request2533
2017-06-28 16:34:29hayposetmessages: + msg297196
2017-06-28 16:31:08Sharan Yalburgisetnosy: + Sharan Yalburgi
messages: + msg297195
2017-06-28 01:10:55hayposetkeywords: + easy
title: hex() documentation: mention "%x" % int -> [EASY] hex() documentation: mention "%x" % int
2017-06-28 01:10:40hayposetmessages: + msg297104
2016-03-22 07:46:42Manvi Bsetfiles: + issue26506.diff

messages: + msg262168
2016-03-21 11:11:35ezio.melottisetmessages: + msg262119
2016-03-21 09:23:57hayposetmessages: + msg262110
2016-03-21 09:02:17ezio.melottisetnosy: + ezio.melotti
messages: + msg262109
2016-03-21 09:00:55hayposetmessages: + msg262108
2016-03-21 08:42:38Manvi Bsetfiles: + issue26506.diff

messages: + msg262107
2016-03-21 08:33:31Manvi Bsetfiles: + issue26506.diff

messages: + msg262106
2016-03-20 17:23:44serhiy.storchakasetmessages: + msg262085
2016-03-20 17:14:57hayposetmessages: + msg262084
2016-03-20 11:28:45Manvi Bsetfiles: + issue26506.diff

nosy: + Manvi B
messages: + msg262068

keywords: + patch
2016-03-11 08:51:36serhiy.storchakasetmessages: + msg261551
2016-03-11 08:37:32hayposetmessages: + msg261549
2016-03-11 06:56:51serhiy.storchakasetmessages: + msg261544
2016-03-11 06:38:43hayposetmessages: + msg261542
2016-03-10 17:47:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg261518
2016-03-10 17:25:59wolmasetmessages: + msg261516
2016-03-10 11:21:10hayposetmessages: + msg261494
2016-03-09 23:16:57hayposetmessages: + msg261481
2016-03-09 21:47:42wolmasetmessages: + msg261480
2016-03-09 21:43:11eric.smithsetmessages: + msg261479
2016-03-09 21:15:46wolmasetnosy: + wolma
messages: + msg261478
2016-03-07 18:58:59eric.smithsetnosy: + eric.smith
messages: + msg261312
2016-03-07 17:47:03haypocreate