classification
Title: Cannot build 2.7 with --enable-unicode=no
Type: behavior Stage: resolved
Components: Build, Extension Modules, Regular Expressions, Unicode Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: amaury.forgeotdarc, ezio.melotti, lemburg, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-05-14 23:27 by amaury.forgeotdarc, last changed 2013-05-21 19:59 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
re_nounicode.patch serhiy.storchaka, 2013-05-17 11:11 review
Messages (10)
msg189255 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-14 23:27
python2.7 can't be compiled with --enable-unicode=no
Because of a crash in the re module. It's a regression from 2.7.3.

$ ./python -c 'import re; re.compile("([a-zA-Z][a-zA-Z0-9_]+)\s*=\s*(.*)")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/amauryfa/python/cpython2.7/Lib/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/home/amauryfa/python/cpython2.7/Lib/re.py", line 240, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/amauryfa/python/cpython2.7/Lib/sre_compile.py", line 533, in compile
    groupindex, indexgroup
RuntimeError: invalid SRE code


The cause is in sre.h: when Py_USING_UNICODE is false, SRE_CODE is defined as "unsigned long" instead of "unsigned short"!

When this is fixed, the following modules did not compile:
_io _json _sqlite3 _ssl _testcapi
Which modules are supposed to work without unicode?
msg189273 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-05-15 07:34
Please note that the official way to build Python without Unicode 
support is (see http://bugs.python.org/issue445762):

./configure --disable-unicode

See http://bugs.python.org/issue8767 for the most recent set of fixes 
that were supplied to make it work again in Python 2.7.4.

If the above doesn't work, it's a bug (as well) - and has been for
quite a few releases. Technically, --disable-unicode is mapped to 
--enable-unicode=no by configure.

I guess we'd need a buildbot to check whether --disable-unicode still
works, i.e. produces a Python version that compiles and passes
the basic tests.
msg189277 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-15 08:49
I can't reproduce this when build with --disable-unicode.
msg189314 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-15 22:05
Can't build here with after "./configure --disable-unicode".
Serhiy, which OS did you try? I'm running Debian 64bit, with gcc 4.6.3
msg189445 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-17 11:11
Ubuntu 32-bit, gcc 4.6.3. The bug requires 64 bit.

This patch should fix it.
msg189447 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-05-17 11:42
On 17.05.2013 13:11, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> Ubuntu 32-bit, gcc 4.6.3. The bug requires 64 bit.
> 
> This patch should fix it.

int and long are the same size on Linux 64-bit platforms.

You probably want to use "short" instead.
msg189448 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-05-17 11:47
On 17.05.2013 13:42, Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg added the comment:
> 
> On 17.05.2013 13:11, Serhiy Storchaka wrote:
>>
>> Serhiy Storchaka added the comment:
>>
>> Ubuntu 32-bit, gcc 4.6.3. The bug requires 64 bit.
>>
>> This patch should fix it.
> 
> int and long are the same size on Linux 64-bit platforms.

Sorry, scratch that. It's the same on Windows x64, not Linux x64.

> You probably want to use "short" instead.

This should still be safer.
msg189449 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-17 12:19
SRE_CODE should be at least 32-bit for support of long regexpes.
msg189780 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-21 19:54
New changeset 8408eed151eb by Serhiy Storchaka in branch '2.7':
Issue #17979: Fixed the re module in build with --disable-unicode.
http://hg.python.org/cpython/rev/8408eed151eb
msg189781 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-21 19:59
The re module with --disable-unicode is still broken on platform with sizeof(int) > 4, but totally portable fix requires more code.
History
Date User Action Args
2013-05-21 19:59:39serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg189781

stage: patch review -> resolved
2013-05-21 19:54:41python-devsetnosy: + python-dev
messages: + msg189780
2013-05-17 12:19:16serhiy.storchakasetmessages: + msg189449
2013-05-17 11:47:26lemburgsetmessages: + msg189448
2013-05-17 11:42:25lemburgsetmessages: + msg189447
2013-05-17 11:11:31serhiy.storchakasetfiles: + re_nounicode.patch

components: + Build, Extension Modules, Regular Expressions, Unicode

keywords: + patch
nosy: + mrabarnett
messages: + msg189445
stage: patch review
2013-05-15 22:05:39amaury.forgeotdarcsetmessages: + msg189314
2013-05-15 09:23:33ezio.melottisetnosy: + ezio.melotti
type: behavior
2013-05-15 08:49:22serhiy.storchakasetmessages: + msg189277
2013-05-15 07:34:28lemburgsetnosy: + lemburg
messages: + msg189273
2013-05-14 23:27:00amaury.forgeotdarccreate