classification
Title: Removal of basestring type
Type: Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: christian.heimes, gvanrossum
Priority: normal Keywords: patch

Created on 2007-10-10 21:23 by christian.heimes, last changed 2007-10-24 19:57 by gvanrossum. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_basestring_removal.patch christian.heimes, 2007-10-10 21:23
py3k_basestring_removal3.patch christian.heimes, 2007-10-15 16:10
fix_basestr.py christian.heimes, 2007-10-15 23:52
Messages (18)
msg56326 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-10 21:23
The patch removes the basestring type from Python 3.0. PyString and
PyUnicode are subclasses of PyBaseObject_Type. Each occurrence of
basestring was replaces with str, mostly isinstance(egg, basestring)
with a few exceptions. PyObject_TypeCheck(args, &PyBaseString_Type) is
replaced with a check for PyUnicode and PyString.
msg56327 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-10 21:39
Thanks, evaluating!
msg56328 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-10 21:54
I see 10 failing tests:

    test_ctypes test_email test_httplib test_inspect test_os test_re
    test_subprocess test_sys test_xml_etree test_xml_etree_c
msg56329 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-10 22:34
test_ctypes: works for me
test_email: need some help from an email expoert
test_httplib: __file__ has a wrong type str8. I'm looking into it.
test_inspect: same issue as httplib
test_os: same issue
test_re: I had the failing test before my changes
File "Lib/test/test_re.py", line 622, in test_empty_array
ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, f or d)
test_subprocess: I don't understand why it fails. The traceback is
missing a line
test_sys: related to __file__
test_xml_etree / test_xml_etree_c: a str8 / io error that may be related
to __file__
msg56330 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-10 22:43
On 10/10/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> test_ctypes: works for me

Did you svn up, make clean and rebuild?

> test_email: need some help from an email expoert

Which test is failing?

> test_httplib: __file__ has a wrong type str8. I'm looking into it.

Yes, __file__ always has that type. Fixing it is messy because it
requires using the default filesystem encoding. Can you try that as a
separate patch?

> test_inspect: same issue as httplib
> test_os: same issue
> test_re: I had the failing test before my changes

But it passes for me.

> File "Lib/test/test_re.py", line 622, in test_empty_array
> ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, f or d)

Hm. It passes for me.

> test_subprocess: I don't understand why it fails. The traceback is
> missing a line
> test_sys: related to __file__
> test_xml_etree / test_xml_etree_c: a str8 / io error that may be related
> to __file__

Thanks for looking into these!!
msg56331 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-10 23:16
Guido van Rossum wrote:
> Did you svn up, make clean and rebuild?

The ctypes package didn't change since my last rebuild an hour ago. I'm
on Linux (Ubuntu i386)
> 
>> test_email: need some help from an email expoert
> 
> Which test is failing?

test_decoded_generator()
The generator tries to print a str8 to a text file.

> Yes, __file__ always has that type. Fixing it is messy because it
> requires using the default filesystem encoding. Can you try that as a
> separate patch?

I'm already working on it. Can I introduce a new function
_PyUnicode_AsDefaultFSEncodedString that encodes unicode using
Py_FileSystemDefaultEncoding or UTF-8?

>> File "Lib/test/test_re.py", line 622, in test_empty_array
>> ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, f or d)
> 
> Hm. It passes for me.

I'm going to look into the issue later.
msg56337 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-11 03:17
On 10/10/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> Guido van Rossum wrote:
> > Did you svn up, make clean and rebuild?
>
> The ctypes package didn't change since my last rebuild an hour ago. I'm
> on Linux (Ubuntu i386)

Odd. I'll investigate when I have more time.

> >> test_email: need some help from an email expoert
> >
> > Which test is failing?
>
> test_decoded_generator()
> The generator tries to print a str8 to a text file.

Thought so. I have a tentative fix that I want approved by Barry
Warsaw before checking; you can see if it works for you too:

--- Lib/email/generator.py      (revision 58412)
+++ Lib/email/generator.py      (working copy)
@@ -288,7 +288,7 @@
         for part in msg.walk():
             maintype = part.get_content_maintype()
             if maintype == 'text':
-                print(part.get_payload(decode=True), file=self)
+                print(part.get_payload(decode=False), file=self)
             elif maintype == 'multipart':
                 # Just skip this
                 pass

> > Yes, __file__ always has that type. Fixing it is messy because it
> > requires using the default filesystem encoding. Can you try that as a
> > separate patch?
>
> I'm already working on it. Can I introduce a new function
> _PyUnicode_AsDefaultFSEncodedString that encodes unicode using
> Py_FileSystemDefaultEncoding or UTF-8?

That's a rather long name... I don't think it needs a leading
underscore. How about

PyUnicode_AsFSString()?
msg56444 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-15 15:48
Here is an updated patch which applies cleanly and fixes some additional
unit tests and removes one that doesn't make sense any more (re.compile
doesn't accept bytes).

The unit tests profile, cProfile and doctest fail w/ and w/o this patch.
They seem to suffer from the latest changes of our previous patch and
additional calls to utf_8_decode().
msg56446 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 15:58
Hm? This is a one-word patch to email/generator.py.

On 10/15/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> Here is an updated patch which applies cleanly and fixes some additional
> unit tests and removes one that doesn't make sense any more (re.compile
> doesn't accept bytes).
>
> The unit tests profile, cProfile and doctest fail w/ and w/o this patch.
> They seem to suffer from the latest changes of our previous patch and
> additional calls to utf_8_decode().
msg56447 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-15 16:10
> Hm? This is a one-word patch to email/generator.py.

Yes, I already noticed it and I'm creating a new patch now. I saw your
fix for the email generator problem in the bug report and wanted to add
it to my patch. I accidentally replaced the patch with the one liner.

Here is the new patch
msg56467 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 22:14
> The unit tests profile, cProfile and doctest fail w/ and w/o this patch.
> They seem to suffer from the latest changes of our previous patch and
> additional calls to utf_8_decode().

Any details on those? They don't fail for me.
msg56469 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 22:57
I'll check this in as soon as there's agreement on the list about this.

Not that I expect disagreement, but I just realized it was never brought
up and it isn't in PEP 3137 (yet).
msg56470 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-15 23:11
> Any details on those? They don't fail for me.

Here you are.

$ ./python Lib/test/test_cProfile.py

         121 function calls (101 primitive calls) in 1.000 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.000    1.000 <string>:1(<module>)
        8    0.064    0.008    0.080    0.010
test_cProfile.py:103(subhelper)
       28    0.028    0.001    0.028    0.001
test_cProfile.py:115(__getattr__)
        1    0.270    0.270    1.000    1.000 test_cProfile.py:30(testfunc)
     23/3    0.150    0.007    0.170    0.057 test_cProfile.py:40(factorial)
       20    0.020    0.001    0.020    0.001 test_cProfile.py:53(mul)
        2    0.040    0.020    0.600    0.300 test_cProfile.py:60(helper)
        4    0.116    0.029    0.120    0.030 test_cProfile.py:78(helper1)
        2    0.000    0.000    0.140    0.070
test_cProfile.py:89(helper2_indirect)
        8    0.312    0.039    0.400    0.050 test_cProfile.py:93(helper2)
        1    0.000    0.000    0.000    0.000 utf_8.py:15(decode)
        1    0.000    0.000    0.000    0.000 {_codecs.utf_8_decode}
        1    0.000    0.000    1.000    1.000 {exec}
       12    0.000    0.000    0.012    0.001 {hasattr}
        4    0.000    0.000    0.000    0.000 {method 'append' of 'list'
objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
        4    0.000    0.000    0.000    0.000 {sys.exc_info}

   Ordered by: standard name

Function                                          called...
                                                      ncalls  tottime
cumtime
<string>:1(<module>)                              ->       1    0.270
 1.000  test_cProfile.py:30(testfunc)
test_cProfile.py:103(subhelper)                   ->      16    0.016
 0.016  test_cProfile.py:115(__getattr__)
test_cProfile.py:115(__getattr__)                 ->
test_cProfile.py:30(testfunc)                     ->       1    0.014
 0.130  test_cProfile.py:40(factorial)
                                                           2    0.040
 0.600  test_cProfile.py:60(helper)
test_cProfile.py:40(factorial)                    ->    20/3    0.130
 0.147  test_cProfile.py:40(factorial)
                                                          20    0.020
 0.020  test_cProfile.py:53(mul)
test_cProfile.py:53(mul)                          ->
test_cProfile.py:60(helper)                       ->       4    0.116
 0.120  test_cProfile.py:78(helper1)
                                                           2    0.000
 0.140  test_cProfile.py:89(helper2_indirect)
                                                           6    0.234
 0.300  test_cProfile.py:93(helper2)
test_cProfile.py:78(helper1)                      ->       4    0.000
 0.004  {hasattr}
                                                           4    0.000
 0.000  {method 'append' of 'list' objects}
                                                           4    0.000
 0.000  {sys.exc_info}
test_cProfile.py:89(helper2_indirect)             ->       2    0.006
 0.040  test_cProfile.py:40(factorial)
                                                           2    0.078
 0.100  test_cProfile.py:93(helper2)
test_cProfile.py:93(helper2)                      ->       8    0.064
 0.080  test_cProfile.py:103(subhelper)
                                                           8    0.000
 0.008  {hasattr}
utf_8.py:15(decode)                               ->       1    0.000
 0.000  {_codecs.utf_8_decode}
{_codecs.utf_8_decode}                            ->
{exec}                                            ->       1    0.000
 1.000  <string>:1(<module>)
                                                           1    0.000
 0.000  utf_8.py:15(decode)
{hasattr}                                         ->      12    0.012
 0.012  test_cProfile.py:115(__getattr__)
{method 'append' of 'list' objects}               ->
{method 'disable' of '_lsprof.Profiler' objects}  ->
{sys.exc_info}                                    ->

   Ordered by: standard name

Function                                          was called by...
                                                      ncalls  tottime
cumtime
<string>:1(<module>)                              <-       1    0.000
 1.000  {exec}
test_cProfile.py:103(subhelper)                   <-       8    0.064
 0.080  test_cProfile.py:93(helper2)
test_cProfile.py:115(__getattr__)                 <-      16    0.016
 0.016  test_cProfile.py:103(subhelper)
                                                          12    0.012
 0.012  {hasattr}
test_cProfile.py:30(testfunc)                     <-       1    0.270
 1.000  <string>:1(<module>)
test_cProfile.py:40(factorial)                    <-       1    0.014
 0.130  test_cProfile.py:30(testfunc)
                                                        20/3    0.130
 0.147  test_cProfile.py:40(factorial)
                                                           2    0.006
 0.040  test_cProfile.py:89(helper2_indirect)
test_cProfile.py:53(mul)                          <-      20    0.020
 0.020  test_cProfile.py:40(factorial)
test_cProfile.py:60(helper)                       <-       2    0.040
 0.600  test_cProfile.py:30(testfunc)
test_cProfile.py:78(helper1)                      <-       4    0.116
 0.120  test_cProfile.py:60(helper)
test_cProfile.py:89(helper2_indirect)             <-       2    0.000
 0.140  test_cProfile.py:60(helper)
test_cProfile.py:93(helper2)                      <-       6    0.234
 0.300  test_cProfile.py:60(helper)
                                                           2    0.078
 0.100  test_cProfile.py:89(helper2_indirect)
utf_8.py:15(decode)                               <-       1    0.000
 0.000  {exec}
{_codecs.utf_8_decode}                            <-       1    0.000
 0.000  utf_8.py:15(decode)
{exec}                                            <-
{hasattr}                                         <-       4    0.000
 0.004  test_cProfile.py:78(helper1)
                                                           8    0.000
 0.008  test_cProfile.py:93(helper2)
{method 'append' of 'list' objects}               <-       4    0.000
 0.000  test_cProfile.py:78(helper1)
{method 'disable' of '_lsprof.Profiler' objects}  <-
{sys.exc_info}                                    <-       4    0.000
 0.000  test_cProfile.py:78(helper1)

####################################
$ ./python Lib/test/test_doctest.py

doctest (doctest) ... 66 tests with zero failures
**********************************************************************
File "/home/heimes/dev/python/py3k/Lib/test/test_doctest.py", line 1570,
in test.test_doctest.test_debug
Failed example:
    try: doctest.debug_src(s)
    finally: sys.stdin = real_stdin
Expected:
    > <string>(1)<module>()
    (Pdb) next
    12
    --Return--
    > <string>(1)<module>()->None
    (Pdb) print(x)
    12
    (Pdb) continue
Got:
    > /home/heimes/dev/python/py3k/Lib/encodings/utf_8.py(16)decode()
    -> return codecs.utf_8_decode(input, errors, True)
    (Pdb) next
    --Return--
    >
/home/heimes/dev/python/py3k/Lib/encodings/utf_8.py(16)decode()->('<string>',
8)
    -> return codecs.utf_8_decode(input, errors, True)
    (Pdb) print(x)
    *** NameError: NameError("name 'x' is not defined",)
    (Pdb) continue
    12
**********************************************************************
1 items had failures:
   1 of   4 in test.test_doctest.test_debug
***Test Failed*** 1 failures.
Traceback (most recent call last):
  File "Lib/test/test_doctest.py", line 2422, in <module>
    test_main()
  File "Lib/test/test_doctest.py", line 2406, in test_main
    test_support.run_doctest(test_doctest, verbosity=True)
  File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
569, in run_doctest
    raise TestFailed("%d of %d doctests failed" % (f, t))
test.test_support.TestFailed: 1 of 414 doctests failed

####################################
$ ./python Lib/test/test_email.py

Traceback (most recent call last):
  File "Lib/test/test_email.py", line 13, in <module>
    test_main()
  File "Lib/test/test_email.py", line 10, in test_main
    test_support.run_unittest(suite())
  File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
541, in run_unittest
    _run_suite(suite)
  File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
524, in _run_suite
    raise TestFailed(err)
test.test_support.TestFailed: Traceback (most recent call last):
  File "/home/heimes/dev/python/py3k/Lib/email/test/test_email.py", line
1445, in test_same_boundary_inner_outer
    msg = self._msgobj('msg_15.txt')
  File "/home/heimes/dev/python/py3k/Lib/email/test/test_email.py", line
67, in _msgobj
    return email.message_from_file(fp)
  File "/home/heimes/dev/python/py3k/Lib/email/__init__.py", line 46, in
message_from_file
    return Parser(*args, **kws).parse(fp)
  File "/home/heimes/dev/python/py3k/Lib/email/parser.py", line 68, in parse
    data = fp.read(8192)
  File "/home/heimes/dev/python/py3k/Lib/io.py", line 1240, in read
    readahead, pending = self._read_chunk()
  File "/home/heimes/dev/python/py3k/Lib/io.py", line 1136, in _read_chunk
    pending = self._decoder.decode(readahead, not readahead)
  File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position 86:
unexpected code byte
msg56471 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 23:13
BTW we need a 2to3 fixer for this.  Should be trivial -- just replace
*all* occurrences of basestring with str.
msg56473 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 23:27
Even before this patch, the re module doesn't work very well on byte
strings. IMO this should be fixed.  I've filed a separate bug to remind
us: bug 1282.
msg56475 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-15 23:52
Guido van Rossum wrote:
> BTW we need a 2to3 fixer for this.  Should be trivial -- just replace
> *all* occurrences of basestring with str.

I believe you that it's trivial for *you* but I've never dealt with the
fixers or the grammar. Fortunately for me I was able to copy the fixer
for standarderror. It toke just some minor tweaks :)

Let's see if the mail interface can handle attachments.
msg56501 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-16 18:13
Committed revision 58495.

Thanks Christian!!!
msg56725 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-24 19:57
2007/10/15, Christian Heimes <report@bugs.python.org>:
>
> Christian Heimes added the comment:
>
> Guido van Rossum wrote:
> > BTW we need a 2to3 fixer for this.  Should be trivial -- just replace
> > *all* occurrences of basestring with str.
>
> I believe you that it's trivial for *you* but I've never dealt with the
> fixers or the grammar. Fortunately for me I was able to copy the fixer
> for standarderror. It toke just some minor tweaks :)
>
> Let's see if the mail interface can handle attachments.

It did. :-) I renamed it to fix_basestring and submitted it.  See:

Committed revision 58644.
History
Date User Action Args
2007-10-24 19:57:40gvanrossumsetmessages: + msg56725
2007-10-16 18:13:58gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg56501
2007-10-15 23:52:41christian.heimessetfiles: + fix_basestr.py
messages: + msg56475
2007-10-15 23:27:01gvanrossumsetmessages: + msg56473
2007-10-15 23:13:21gvanrossumsetmessages: + msg56471
2007-10-15 23:11:45christian.heimessetmessages: + msg56470
2007-10-15 22:57:55gvanrossumsetmessages: + msg56469
2007-10-15 22:14:26gvanrossumsetmessages: + msg56467
2007-10-15 22:07:38gvanrossumsetfiles: - py3k_basestring_removal2.patch
2007-10-15 16:10:54christian.heimessetfiles: + py3k_basestring_removal3.patch
messages: + msg56447
2007-10-15 15:58:24gvanrossumsetmessages: + msg56446
2007-10-15 15:48:47christian.heimessetfiles: + py3k_basestring_removal2.patch
messages: + msg56444
2007-10-11 05:07:48loewissetkeywords: + patch
2007-10-11 03:17:24gvanrossumsetmessages: + msg56337
2007-10-10 23:16:38christian.heimessetmessages: + msg56331
2007-10-10 22:44:00gvanrossumsetmessages: + msg56330
2007-10-10 22:34:00christian.heimessetmessages: + msg56329
2007-10-10 21:54:50gvanrossumsetassignee: gvanrossum
messages: + msg56328
2007-10-10 21:39:28gvanrossumsetnosy: + gvanrossum
messages: + msg56327
2007-10-10 21:23:31christian.heimescreate