This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cgi.escape Can Lead To XSS Vulnerabilities
Type: security Stage:
Components: Documentation, Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Copy cgi.escape() to html
View: 2830
Assigned To: docs@python Nosy List: Craig.Younkins, barry, docs@python, eric.araujo, fdrake, georg.brandl, orsenthil
Priority: critical Keywords:

Created on 2010-06-23 15:46 by Craig.Younkins, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (10)
msg108457 - (view) Author: Craig Younkins (Craig.Younkins) Date: 2010-06-23 15:46
The method in question: http://docs.python.org/library/cgi.html#cgi.escape
http://svn.python.org/view/python/tags/r265/Lib/cgi.py?view=markup   # at the bottom
http://code.python.org/hg/trunk/file/3be6ff1eebac/Lib/cgi.py#l1031

"Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character ('"') is also translated; this helps for inclusion in an HTML attribute value, as in <A HREF="...">. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead."

cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping.

The intended use of this method is unclear to me. Up to and including Mako 0.3.3, this method was the HTML escaping method. Used in this manner, single-quoted attributes with user-supplied data are easily susceptible to cross-site scripting vulnerabilities.

While the documentation says "if the value to be quoted might include single- or double-quote characters... [use the] xml.sax.saxutils module instead," it also implies that this method will make input safe for HTML. Because this method escapes 4 of the 5 key XML characters, it is reasonable to expect some will use it for HTML escaping.

I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML.

This is definitely affecting the security of some Python web applications. I already mentioned Mako, but I've found this type of bug in other frameworks and engines because the creators either called cgi.escape directly or modeled their own after it.

Craig Younkins
msg108469 - (view) Author: Craig Younkins (Craig.Younkins) Date: 2010-06-23 18:22
Proof of concept:
print """<body class='%s'></body>""" % cgi.escape("' onload='alert(1);' bad='")
msg108473 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-06-23 18:41
On Wed, Jun 23, 2010 at 03:46:35PM +0000, Craig Younkins wrote:
> cgi.escape never escapes single quote characters, which can easily
> lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be
> known by many, but a quick search reveals many are using cgi.escape
> for HTML attribute escaping.

cgi.escape is for HTML attribute escaping only.  I guess, you should
explain or point out to resources where 'single quotes' representation
in a non-entity format in a HTML page has lead to XSS.

> The intended use of this method is unclear to me. 

Escape HTML characters (most commonly), >,<, & and ". And mostly when
constructing responses where these characters are literally required.

> While the documentation says "if the value to be quoted might
> include single- or double-quote characters... [use the]
> xml.sax.saxutils module instead," it also implies that this method
> will make input safe for HTML. Because this method escapes 4 of the

"More suitable" for HTML would be the correct interpretation rather
make the "input safe". You might check the reference documentation
leading to xml.sax.saxutils.

> I suggest rewording the documentation for the method making it more
> clear what it should and should not be used for. 

The very next paragraph seems to address the security considerations
while using the cgi module itself, rather than limiting it to
cgi.escape. It says that:

"To be on the safe side, if you must pass a string gotten from a form
to a shell command, you should make sure the string contains only
alphanumeric characters, dashes, underscores, and periods."

With respect your bug report:

1. Any doc change suggestions you propose?  (After pointing out the
resources requested in first para)

2. If cgi.escape needs to escape single quotes, what should it be as:
lsquo/rsquo (for XHTML) and &#x27; or &#39; for Others?
msg108475 - (view) Author: Craig Younkins (Craig.Younkins) Date: 2010-06-23 19:05
> cgi.escape is for HTML attribute escaping only.

It is not safe for HTML attribute escaping because it does not encode single quotes.

> "More suitable" for HTML would be the correct interpretation rather make the "input safe".

"More suitable, but not quite secure"

Regardless of the intended use of this method, many many people are using it for insecure HTML entity escaping.

> you should explain or point out to resources where 
> 'single quotes' representation in a non-entity format 
> in a HTML page has lead to XSS.

print "<body class='%s'></body>" % cgi.escape("' onload='alert(1);' bad='")

> The very next paragraph seems to address the security considerations
> while using the cgi module itself, rather than limiting it to
> cgi.escape. It says that:
> "To be on the safe side, if you must pass a string gotten from a form
> to a shell command, you should make sure the string contains only
> alphanumeric characters, dashes, underscores, and periods."

The security concerns related to output on the web are very different from the concerns related sending user input to a shell command. The needed escaping is completely different. Also, the security advice above is woefully inadequate. 

> Any doc change suggestions you propose?

Convert the characters '&', '<' and '>' in string s to their HTML entity encoded values. If the optional flag quote is true, the double-quotation mark character ('"') is also encoded. Note that the output of this method is not safe to put in an HTML attribute because it does not escape single quotes. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead.

> If cgi.escape needs to escape single quotes, what should it be as:
> lsquo/rsquo (for XHTML) and &#x27; or &#39; for Others?

Sorry, I should have included that in the OP. It should escape to &#x27; 
It is also advised to escape the forward slash character ('/') to &#x2F;

See OWASP.org for an explanation of the complexities of the escaping:
http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content
msg112508 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-08-02 18:22
Unless someone can upload a specific patch to review in the next couple of hours, I'm going to reduce the priority for 2.6.6rc1.
msg112509 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-08-02 18:33
Applied doc patch to 2.6 in r83539.
msg112587 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-03 12:21
Are 2.6 docs built by an older Sphinx version? I wonder why the text uses “the :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module” and not “:func:`~xml.sax.saxutils.quoteattr” to get a direct link (or even just “consider using :func:`xml.sax.saxutils.quoteattr`.”).
msg112588 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-08-03 12:25
No, that's just a relic from the olden LaTeX days, and I've not paid attention enough to fix it :)
msg112600 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-03 13:13
Such constructs are notoriously tedious to grep for; patches are welcome.
msg113869 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-14 05:12
Markup nit fixed in r83999 (py3k) and r84001 (stupid typo), r84002 (3.1), r84003 (2.7).
History
Date User Action Args
2022-04-11 14:57:02adminsetgithub: 53307
2016-02-23 17:41:45gregory.p.smithlinkissue26398 superseder
2010-08-24 01:31:39benjamin.petersonsetstatus: open -> closed
resolution: duplicate
superseder: Copy cgi.escape() to html
2010-08-14 05:12:08eric.araujosetmessages: + msg113869
2010-08-03 13:13:21fdrakesetnosy: + fdrake
messages: + msg112600
2010-08-03 12:25:03georg.brandlsetmessages: + msg112588
2010-08-03 12:21:23eric.araujosetnosy: + eric.araujo
messages: + msg112587
2010-08-02 18:33:28georg.brandlsetpriority: release blocker -> critical
versions: - Python 2.6, Python 2.5
nosy: + georg.brandl

messages: + msg112509
2010-08-02 18:22:46barrysetnosy: + barry
messages: + msg112508
2010-07-31 19:57:46georg.brandlsetpriority: normal -> release blocker
2010-06-23 19:05:59Craig.Younkinssetmessages: + msg108475
2010-06-23 18:41:38orsenthilsetnosy: + orsenthil
messages: + msg108473
2010-06-23 18:22:40Craig.Younkinssetmessages: + msg108469
2010-06-23 15:47:38Craig.Younkinssettype: security
2010-06-23 15:46:33Craig.Younkinscreate