Message 400223 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	brilee
Recipients	brilee, rhettinger
Date	2021-08-24.18:22:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1629829337.71.0.74510811072.issue44992@roundup.psfhosted.org>
In-reply-to

Content
Thanks for clarifying - I see now that the docs specifically call out the lack of guarantees here with "usually but not always regard them as equivalent". I did want to specifically explain the context of my bug; 1. NumPy's strings have some unexpected behavior because they have fixed-length strings (represented inline) and var-length strings (which are pointers to Python strings). Various arcana dictate which version you get, and wrappers like pandas.read_csv can also throw a wrench in the mix. It is quite easy for the nominal "string type" to change from under you, which is how I stumbled on this bug. 2. I was using functools.cache as a way to intern objects and short-circuit otherwise very expensive equality calculations by reducing them to pointer comparisons - hence my desire for exact cache hits when typed=False. While I agree this is Working As Documented, it does not Work As Expected in my opinion. I would expect the stdlib optimized implementation to follow the same behavior as this naive implementation, which does consider "hello world" and np.str_("hello world") to be equivalent. def cache(func): _cache = {} @functools.wraps(func) def wrapped(args, kwargs): cache_key = tuple(args) + tuple(kwargs.items()) if cache_key not in _cache: _cache[cache_key] = func(args, **kwargs) return _cache[cache_key] return wrapped

Thanks for clarifying - I see now that the docs specifically call out the lack of guarantees here with "usually but not always regard them as equivalent".

I did want to specifically explain the context of my bug; 

1. NumPy's strings have some unexpected behavior because they have fixed-length strings (represented inline) and var-length strings (which are pointers to Python strings). Various arcana dictate which version you get, and wrappers like pandas.read_csv can also throw a wrench in the mix. It is quite easy for the nominal "string type" to change from under you, which is how I stumbled on this bug.

2. I was using functools.cache as a way to intern objects and short-circuit otherwise very expensive equality calculations by reducing them to pointer comparisons - hence my desire for exact cache hits when typed=False.

While I agree this is Working As Documented, it does not Work As Expected in my opinion. I would expect the stdlib optimized implementation to follow the same behavior as this naive implementation, which does consider "hello world" and np.str_("hello world") to be equivalent.

def cache(func):
  _cache = {}
  @functools.wraps(func)
  def wrapped(*args, **kwargs):
    cache_key = tuple(args) + tuple(kwargs.items())
    if cache_key not in _cache:
      _cache[cache_key] = func(*args, **kwargs)
    return _cache[cache_key]
  return wrapped

History
Date	User	Action	Args
2021-08-24 18:22:17	brilee	set	recipients: + brilee, rhettinger
2021-08-24 18:22:17	brilee	set	messageid: <1629829337.71.0.74510811072.issue44992@roundup.psfhosted.org>
2021-08-24 18:22:17	brilee	link	issue44992 messages
2021-08-24 18:22:17	brilee	create