This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pickle fails on BeautifulSoup's navigableString instances
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: altherac, georg.brandl, taleinat
Priority: normal Keywords:

Created on 2007-07-19 18:23 by taleinat, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bug-175062.py altherac, 2007-08-24 11:47
Messages (4)
msg32531 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2007-07-19 18:23
Trying to pickle an instance of BeautifulSoup's NavigableString class, this is the result:
"RuntimeError: maximum recursion depth exceeded"


Diagnosis: The problem arises when trying to pickle such instances - pickle enters an endless loop and reaches the max recursion limit (eventually). This happens regardless of the protocol used.

Possibly related to SF bug #1581183: "pickle protocol 2 failure on int subclass"
http://sourceforge.net/tracker/index.php?funchttp://sourceforge.net/tracker/index.php?func=detail&aid=1581183&group_id=5470&atid=105470=detail&aid=1512695&group_id=5470&atid=105470


See http://mail.python.org/pipermail/idle-dev/2007-July/002600.html (originally a bug report for IDLE on the IDLE-dev list) for details (including how to recreate the error).

Related IDLE bug report: #1757057
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1757057&group_id=5470
msg55250 - (view) Author: Christophe Michel (altherac) Date: 2007-08-24 11:47
I started by isolating the most minimalist code that triggers the error.
If you play a bit with NavigableString, you will end up with the
attached code.

As expected, this program fails with RuntimeError: maximum recursion
depth exceeded
The evil recursion proceeds as follows :

>>  File "C:\Python25\lib\pickle.py", line 1364, in dump
>>    Pickler(file, protocol).dump(obj)

Initial call to dump(), as intended.

>>  File "C:\Python25\lib\pickle.py", line 224, in dump
>>    self.save(obj)

save() calls obj.__reduce_ex(), obj being our EvilString instance.

This function is defined in copyreg.py, line 58 and following my
example, returns a tuple containing three elements:
1) the _reconstructor function, as defined in copyreg.py, line 46
2) a tuple : (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)
   First element is the actual class of obj, second is the base class,
and third is the current instance (known as state).
3) an empty dict {}

>>  File "C:\Python25\lib\pickle.py", line 331, in save
>>    self.save_reduce(obj=obj, *rv)

save_reduce() calls self.save() twice:
- first on the func argument, which is the _reconstructor function. This
call works as intended
- next on the tuple (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)

>>  File "C:\Python25\lib\pickle.py", line 403, in save_reduce
>>    save(args)
>>  File "C:\Python25\lib\pickle.py", line 286, in save
>>    f(self, obj) # Call unbound method with explicit self

save() finds out its argument is a Tuple, and calls save_tuple()
appropriately

>>  File "C:\Python25\lib\pickle.py", line 564, in save_tuple
>>    save(element)

... and save_tuple() calls save() on each element of the tuple.
See what's wrong ?
This means calling save() again on the EvilString instance. Which, in
turn, will call save_reduce() on it, and so on.

The problem lies in _reduce_ex(), in the definition of the state of the
object:

copyreg.py, lines 65 to 70:
    if base is object:
        state = None
    else:
        if base is self.__class__:
            raise TypeError, "can't pickle %s objects" % base.__name__
        state = base(self)

When this code gets executed on an EvilString instance, base is the type
'unicode'.
Since it's not an object, and since it's not the actual class EvilString
either, the following line gets executed:
state=base(self)

Which corresponds to unicode(self), or self.__unicode__, which returns
an EvilString instance, not a variable of type unicode.
And there starts the recursion.

I don't know if this is flaw in the design of _reduce_ex, or a flaw
inherent to having __unicode__(self) returning self.
My guess is the latter is right.
msg55252 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-08-24 11:58
This is indeed tricky. The docs say __unicode__ "should return a Unicode
object", so I'm inclined to blame BeautifulSoup.

Asking Neal for a second opinion.
msg66796 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-13 19:25
Closing as "won't fix".
History
Date User Action Args
2022-04-11 14:56:25adminsetgithub: 45222
2008-05-13 19:25:07georg.brandlsetstatus: open -> closed
resolution: wont fix
messages: + msg66796
2007-08-24 11:58:25georg.brandlsetassignee: nnorwitz
messages: + msg55252
nosy: + georg.brandl
2007-08-24 11:48:01altheracsetfiles: + bug-175062.py
nosy: + altherac
messages: + msg55250
2007-07-19 18:23:56taleinatcreate