This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Raw Strings lack parody
Type: enhancement Stage: resolved
Components: Unicode Versions: Python 3.11
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Nicholas Willhite, ezio.melotti, mark.dickinson, steven.daprano, vstinner
Priority: normal Keywords:

Created on 2021-06-04 04:21 by Nicholas Willhite, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (9)
msg395064 - (view) Author: Nicholas Willhite (Nicholas Willhite) Date: 2021-06-04 04:21
I'm really sure this isn't filed correctly. I'm a total noob to this process, so feel free to redirect me. :) 

Bytes can be defined as a function, or a prefixed series. You can prefix a series with "b" and get the expected data type. You can also use the builtin functions "bytes" to get the same structure:

  bytes('foo\bar', 'utf-8') == b'foo\bar'
  True

But there's no builtin function for r'foo\bar' that gives you 'foo\\bar'.

This would be really handy for applications that accept a regular expression. If that regex was part of the source code, I'd just r'foo\bar' to get the expected string. Being able to accept something like bytes and do:

  data = b'foo\bar'
  raw_string(data)
  'foo\\bar'

would be really useful for applications that accept a regex as input. 

Is there an obvious way to do this that I'm not seeing? Has my google-foo failed me? Feels like a function that should exist in the stdlib.

Again, really sure I'm not "doing this correctly." So please direct me! :) 

Appreciative,
-Nick Willhite
msg395066 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-06-04 05:11
I think you have missed something important here:

    >>> data = b'foo\bar'
    >>> len(data)
    6
    >>> print(data)
    b'foo\x08ar'


If you want bytes including a backslash followed by a b, you need to use raw bytes rb'foo\bar' or escape the backslash.

Also Python 3.8 is in feature-freeze so the earliest this new feature could be added to the language is now 3.11.
msg395067 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-06-04 05:27
Remember that backslash escapes are only a Python syntactic feature. If you read data from a file, or from the input() builtin, that contains a backslash, it remains a backslash:

    >>> s = input()
    a\b
    >>> print(len(s), s == r'a\b')
    3 True

Backslashes are only special in two cases: as source code, and when displaying a string (or bytes) using `repr`.

So if you get a regex from the user, say by reading it from a file, or from stdin, or from a text field in a GUI, etc. and that regex contains a backslash, your string will contain a backslash and you don't need anything special.

Does this solve your problem?
msg395076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-04 11:47
You can use br"\n" to get 2 bytes: b"\\" and b"n".

IMO it's the best practice, to use raw strings for regular expressions.

Converting a regular string to a raw string sounds like a bad idea.
msg395077 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-06-04 11:58
> But there's no builtin function for r'foo\bar' that gives you 'foo\\bar'.

I'm confused about what's being requested here. r'foo\bar' and 'foo\\bar' are different source code representations of the exact same string (same type, same contents), so the identity function is such a function.

>>> f(r'foo\bar') == 'foo\\bar'
True

Nicholas: presumably you're after something more than the identity function. Can you clarify what the input and output types to your proposed function would be, and then give example input and output values?
msg395079 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-06-04 12:00
Sorry, I missed the definition of f in the last message. Trying again:

>>> def f(x): return x
... 
>>> f(r'foo\bar') == 'foo\\bar'
True
msg395080 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-06-04 12:07
Ah, I think I see: you want a function that turns the string "foo\bar" into "foo\\bar". Even if this were a good idea, I don't think it's feasible to do it in a non-surprising way.

For example, given such a function f, what outputs would you expect for:

  (a) f("\012"), and
  (b) f("\n")?

(And yes, this is a trick question: "\012" and "\n" are the same string.)
msg395086 - (view) Author: Nicholas Willhite (Nicholas Willhite) Date: 2021-06-04 14:05
Wow, thanks for all the helpful responses! I see how this was a classic PEBKAC issue on my end. 

Please closed this ticket at your convenience.
msg395092 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-04 15:33
I close the issue. Glad that you found a solution to your issue.
History
Date User Action Args
2022-04-11 14:59:46adminsetgithub: 88474
2021-06-04 15:33:04vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg395092

stage: resolved
2021-06-04 14:05:11Nicholas Willhitesetmessages: + msg395086
2021-06-04 12:07:29mark.dickinsonsetmessages: + msg395080
2021-06-04 12:00:32mark.dickinsonsetmessages: + msg395079
2021-06-04 11:58:32mark.dickinsonsetnosy: + mark.dickinson
messages: + msg395077
2021-06-04 11:47:03vstinnersetmessages: + msg395076
2021-06-04 05:27:50steven.dapranosetmessages: + msg395067
2021-06-04 05:11:15steven.dapranosetversions: + Python 3.11, - Python 3.8
nosy: + steven.daprano

messages: + msg395066

type: behavior -> enhancement
2021-06-04 04:21:26Nicholas Willhitecreate