Monday, April 25, 2011

Preventing Cross-Site Scripting

To prevent XSS, developers must be very careful of user-supplied data that is served
back to users. We define user-supplied data as any data that comes from an outside network
connection to some web application. It could be a username submitted in an HTML form
at login, a backend AJAX request that was supposed to come from the JavaScript code
the developer programmed, an e-mail, or even HTTP headers. Treat all data entering a
web application from an outside network connection as potentially harmful.
For all user-supplied data that is later redisplayed back to users in all HTTP responses
such as web pages and AJAX responses (HTTP response code 200), page not found errors
(HTTP response code 404), server errors (like HTTP response code 502), redirects (like
HTTP response code 302), and so on, the developer must do one of the following:
• Escape the data properly so it is not interpreted as HTML (to browsers) or XML
(to Flash).
• Remove characters or strings that can be used maliciously.
Removing characters generally affects user experience. For instance, if the developer
removed apostrophes (’), some people with the last name O’Reilly, or the like, would be
frustrated that their last name is not displayed properly.
We highly discourage developers to remove strings, because strings can be repre-
sented in many ways. The strings are also interpreted differently by applications and

browsers. For example, the SAMY worm took advantage of the fact that IE does not con-
sider new lines as word delimiters. Thus, IE interprets javascriptand jav%0dascr%0dipt
as the same. Unfortunately, MySpace interpreted new lines as delimiting words and al-
lowed the following to be placed on Samy’s (and others’) MySpace pages:
We recommend escaping all user-supplied data that is sent back to a web browser with-
in AJAX calls, mobile applications, web pages, redirects, and so on. However, escaping
strings is not simple; you must escape with URL encoding, HTML entity encoding, or JavaS-
cript encoding depending on where the user-supplied data is placed in the HTTP responses.