As others have pointed out, scrubbing input data is not the correct approach. Here’s why:
-
The way data needs to be scrubbed depends on the context of how it is going to be used. You can’t know up front how the data will ultimately be used to you can’t make the proper decision of how it should be scrubbed when it is entered. For example, the OWASP sample scrubber routines distinguish between data that is going to be output as JavaScript, HTML Attributes, and raw HTML (as well as a couple others).
-
You can’t guarantee that all data that ends up in your database will have come through your input scrubber. It can come from another compromised system, sql injection, or even flaws in your own input scrubber.
-
Once you find out that XSS data exists in your database it is nearly impossible to fix. For example, if you find out that your original input scrubber was flawed you now have to figure out how to get rid of all of the problem data. If you use output scrubbing instead of input scrubbing you can simply alter your output scrubber and leave the data alone. Always assuming that the data could be bad means that it can stay bad in the database without impacting the application.
-
There is no reason to scrub data more than once. You have to do it on output anyway for the reasons listed above.
-
Other systems are likely to need the data and will puke if it is already scrubbed. Even if you don’t interface with any other systems now you never know when your boss is going to come to you and say that his boss wants to be able to run some simple queries using Crystal Reports in which your scrubbed input data can’t easily be unscrubbed before use.
-
Scrubbed data can mess up certain types of SQL statements. For example, depending on your scrubbing mechanism, sorting might be broken. Like clauses may also not work correctly. You want the data in your database to be in a pure unaltered form for the best results.
These are just a few reasons. There could be many more.