Message 371843 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Seth.Troisi
Recipients	Seth.Troisi, eric.smith, ezio.melotti, matpi, mrabarnett, rhettinger, serhiy.storchaka
Date	2020-06-19.00:05:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1592525158.06.0.866302062175.issue39949@roundup.psfhosted.org>
In-reply-to

Content
I was thinking about how to add the end quote and found these weird cases: >>> "asdf'asdf'asdf" "asdf'asdf'asdf" >>> "asdf\"asdf\"asdf" 'asdf"asdf"asdf' >>> "asdf\"asdf'asdf" 'asdf"asdf\'asdf' This means that len(s) +2 (or 3 for bytes) != len(repr(s)) e.g. >>> s = "\"''''''" '"\'\'\'\'\'\'' >>> s >>> len(s) 7 >>> len(repr(s)) 15 This can lead to a weird partial trailing character >>> re.match(".", "a"48 + "'\"") <_sre.SRE_Match object; span=(0, 50), match='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\> This means I'll need to rethink len(group0) >= 48 as the condition for truncation (as a 30 length string can be truncated by %.50R) Maybe it makes sense to write group0 to a temp string and then check if that's truncated and extract the quote character from that OR PyUnicode_FromFormat('%R', group0[:50]) # avoids trailing escape character ('\') but might be longer than 50 characters

I was thinking about how to add the end quote and found these weird cases:
  >>> "asdf'asdf'asdf"
  "asdf'asdf'asdf"
  >>> "asdf\"asdf\"asdf"
  'asdf"asdf"asdf'
  >>> "asdf\"asdf'asdf"
  'asdf"asdf\'asdf'

This means that len(s) +2 (or 3 for bytes) != len(repr(s))
e.g.

>>> s = "\"''''''"
'"\'\'\'\'\'\''
>>> s
>>> len(s)
7
>>> len(repr(s))
15

This can lead to a weird partial trailing character 
  >>> re.match(".*", "a"*48 + "'\"")
  <_sre.SRE_Match object; span=(0, 50), match='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\>


This means I'll need to rethink len(group0) >= 48 as the condition for truncation (as a 30 length string can be truncated by %.50R)

Maybe it makes sense to write group0 to a temp string and then check if that's truncated and extract the quote character from that
OR
PyUnicode_FromFormat('%R', group0[:50]) # avoids trailing escape character ('\') but might be longer than 50 characters

History
Date	User	Action	Args
2020-06-19 00:05:58	Seth.Troisi	set	recipients: + Seth.Troisi, rhettinger, eric.smith, ezio.melotti, mrabarnett, serhiy.storchaka, matpi
2020-06-19 00:05:58	Seth.Troisi	set	messageid: <1592525158.06.0.866302062175.issue39949@roundup.psfhosted.org>
2020-06-19 00:05:58	Seth.Troisi	link	issue39949 messages
2020-06-19 00:05:57	Seth.Troisi	create