classification
Title: Fixes for profile/cprofile
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: alexandre.vassalotti, brett.cannon, christian.heimes, gvanrossum
Priority: normal Keywords: patch

Created on 2007-10-19 22:18 by christian.heimes, last changed 2008-01-06 22:29 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_profile_fix.patches christian.heimes, 2007-10-19 22:31
py3k_profile_fix2.patches christian.heimes, 2007-10-21 04:31
py3k_profile_fix-3.patch alexandre.vassalotti, 2007-10-22 01:10
py3k_profile_fix3.patch christian.heimes, 2007-10-25 10:47
Messages (13)
msg56569 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-19 22:18
The patch fixes the output for profile and cProfile. Another patch from
Alexandre and me added additional calls to the UTF-8 codec.
msg56570 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-19 22:29
There is no patch.  =)
msg56571 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-19 22:31
Sure there is a patch ... well it's ... *uhm* ... it's hidden under your
bed. O:-)
msg56621 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-21 04:31
The new patch does a far more better job. I had the idea after a
discussion with Alexandre on #python and a small debugging session.

The tests for profile, cProfile and doctest are failing on my Linux box
because Py_FileSystemDefaultEncoding is "UTF-8" but PyUnicode_Decode()
has a shortcut for "utf-8" only. I've added a tolower loop and shortcuts
for UTF-16 and UTF-32 to PyUnicode_Decode(). 

I've also disabled the tests on system with a fs encoding that isn't
built in.
msg56635 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2007-10-22 01:10
I don't think it's possible to add shortcuts in PyUnicode_Decode for
UTF-16 and UTF-32 because the byte-order can be different depending of
the platform. So, these two need to pass through the codecs module.

I am sure if it's better, but I factored out the normalization routine
into its own function.
msg56637 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-22 01:17
Alexandre Vassalotti wrote:
> I don't think it's possible to add shortcuts in PyUnicode_Decode for
> UTF-16 and UTF-32 because the byte-order can be different depending of
> the platform. So, these two need to pass through the codecs module.

utf-16 and utf-32 are the the names for the native codecs. The explicit
names are e.g. utf-16-be or utf-32-le. The last argument 0 also means
"native byte order".

I used a shorter algorithm to optimize the normalization for the special
cases of the strcmp() shortcuts. Your version is fine but takes several
CPU cycles longer. I don't think it has a large performance impact. ;)

Christian
msg56718 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-24 19:07
I like Chris's patch better, but he should use (sizeof(lower)-1) instead
of 8, and he should use the TOLOWER macro from bytes_methods.h.

Please submit a new one and I'll check it in!
msg56739 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-25 10:47
Why should I use (sizeof(lower)-1)? Do you mean
PyMem_Malloc(strlen(encoding) + 1)?

Changes since last patch:
* added #include "bytes_methods.h" in unicodeobject.c
* fix all is* to use the macros from bytes_methods.h
* use malloc/free the lower version instead of using a fixed buffer
* replace '_' with '-' in encoding

I still think that a fixed buffer of 12 chars (strlen(iso-8859-1) + 1 +
\0) is sufficient and faster.
* also check for iso-8859-1
msg56764 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-25 23:48
Committed revision 58659.

I'm sorry I confused you; I was fine with the version that has char
lower[N] but I wanted you to not repeat N-1 later in the code. See what
I checked in. :-)
msg56765 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-26 00:00
Guido van Rossum wrote:
> Guido van Rossum added the comment:
> 
> Committed revision 58659.
> 
> I'm sorry I confused you; I was fine with the version that has char
> lower[N] but I wanted you to not repeat N-1 later in the code. See what
> I checked in. :-)

I still don't understand why you are using (sizeof lower) - 2 and what
&lower[(sizeof lower) - 2] returns. Is it the memory address of lower[17]?
msg56766 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2007-10-26 00:19
> I still don't understand why you are using (sizeof lower) - 2 

It is simply to avoid duplicating the constant (a.k.a. the Don't Repeat
Yourself (DRY) rule).

> and what &lower[(sizeof lower) - 2] returns. Is it the memory address
> of lower[17]?

It the address of lower[18] to be exact. (l < &lower[(sizeof lower) -
2]) is simply tricky notation to check the bound of the array.
Personally, I used like to subtract pointer, ((lower - l + 1) < (sizeof
lower)) to get the bound. But now, I find Guido's trick more cute (and
less error-prone). :)
msg56767 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2007-10-26 00:29
> Personally, I used like to subtract pointer, ((lower - l + 1) < 
> (sizeof lower)) to get the bound.

Doh. I got it backward. It's (l - lower + 1), not (lower - l + 1).

> But now, I find Guido's trick more cute (and less error-prone).

At least, I got that right. :)
msg56768 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-26 00:30
Alexandre Vassalotti wrote:
> It the address of lower[18] to be exact. (l < &lower[(sizeof lower) -
> 2]) is simply tricky notation to check the bound of the array.
> Personally, I used like to subtract pointer, ((lower - l + 1) < (sizeof
> lower)) to get the bound. But now, I find Guido's trick more cute (and
> less error-prone). :)

Wow, that's a cool trick. The old wizard has still some astonishing
tricks in the sleeves of his robe. :]
History
Date User Action Args
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-10-26 00:30:36christian.heimessetmessages: + msg56768
2007-10-26 00:29:04alexandre.vassalottisetmessages: + msg56767
2007-10-26 00:19:29alexandre.vassalottisetmessages: + msg56766
2007-10-26 00:00:48christian.heimessetmessages: + msg56765
2007-10-25 23:48:39gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg56764
2007-10-25 10:47:36christian.heimessetfiles: + py3k_profile_fix3.patch
messages: + msg56739
2007-10-24 19:07:21gvanrossumsetpriority: normal
messages: + msg56718
2007-10-22 01:17:03christian.heimessetmessages: + msg56637
2007-10-22 01:11:00alexandre.vassalottisetfiles: + py3k_profile_fix-3.patch
messages: + msg56635
2007-10-21 04:32:01christian.heimessetfiles: + py3k_profile_fix2.patches
nosy: + alexandre.vassalotti
type: behavior
messages: + msg56621
2007-10-20 04:58:37loewissetkeywords: + patch
2007-10-19 22:31:03christian.heimessetfiles: + py3k_profile_fix.patches
messages: + msg56571
2007-10-19 22:29:48brett.cannonsetkeywords: + py3k
assignee: gvanrossum
messages: + msg56570
nosy: + brett.cannon
2007-10-19 22:18:02christian.heimescreate