www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Html escaping for security: howto in D?

reply Fitz <fitz figmentengine.com> writes:
Hello (I am a newbie to dlang)

What's the recommended way to escape user input when outputting 
html?

intent: to stop XSS/etc, see 
https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

thanks in advance!

Fitz
Jul 06 2020
next sibling parent reply Fitz <fitz figmentengine.com> writes:
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:
 Hello (I am a newbie to dlang)

 What's the recommended way to escape user input when outputting 
 html?

 intent: to stop XSS/etc, see 
 https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

 thanks in advance!

 Fitz
looks like this forum uses https://github.com/CyberShadow/ae/blob/master/utils/text/html.d to do escaping. This code only escape 4/6 characters, not these: ' --> &#x27; / --> &#x2F; which looks risky?, if its storeed in "$encode", given <div class='$encoded'>hello, world</div> then $encode="blue' onclick='alert()" results in: <div class='blue' onclick='alert()'>hello, world</div> could be nasty
Jul 06 2020
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Monday, 6 July 2020 at 12:26:01 UTC, Fitz wrote:
 looks like this forum uses 
 https://github.com/CyberShadow/ae/blob/master/utils/text/html.d 
 to do escaping. This code only escape 4/6 characters, not these:
 ' --> &#x27;
 / --> &#x2F;
 which looks risky?, if its storeed in "$encode", given
 <div class='$encoded'>hello, world</div>
 then
 $encode="blue' onclick='alert()"
 results in:
 <div class='blue' onclick='alert()'>hello, world</div>
 could be nasty
If you don't escape single quotes, then don't use single quotes to delimit attributes. I fixed the function to also escape single quotes. Thanks for the report. But, I think you should look at Vibe.d or Hunt for a more complete framework.
Jul 06 2020
parent Fitz <fitz figmentengine.com> writes:
On Monday, 6 July 2020 at 12:39:42 UTC, Vladimir Panteleev wrote:
 On Monday, 6 July 2020 at 12:26:01 UTC, Fitz wrote:
 looks like this forum uses 
 https://github.com/CyberShadow/ae/blob/master/utils/text/html.d to do
escaping. This code only escape 4/6 characters, not these:
 ' --> &#x27;
 / --> &#x2F;
 which looks risky?, if its storeed in "$encode", given
 <div class='$encoded'>hello, world</div>
 then
 $encode="blue' onclick='alert()"
 results in:
 <div class='blue' onclick='alert()'>hello, world</div>
 could be nasty
If you don't escape single quotes, then don't use single quotes to delimit attributes. I fixed the function to also escape single quotes. Thanks for the report. But, I think you should look at Vibe.d or Hunt for a more complete framework.
thank you! I'll have a look at them to see what they provide
Jul 06 2020
prev sibling next sibling parent reply aberba <karabutaworld gmail.com> writes:
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:
 Hello (I am a newbie to dlang)

 What's the recommended way to escape user input when outputting 
 html?

 intent: to stop XSS/etc, see 
 https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

 thanks in advance!

 Fitz
So in D you'll have to do multiple things. The first one is using some kind of stripTags() as available PHP. I had it in me some time ago to create such a collection of handy utilities...a very long long time ago...two yrs 😜. See https://code.dlang.org/packages/sanival for stripTags() Its a very limited implementation and uses std.regex which many people here who are critical about performance will speak against. I'm yet to see an alternative. So you could use that if you don't find a better alternative. That's just the first step. The second would be to use prepared statements in whatever database you use if it's vulnerable to such attacks.. SQL injection for instance. Not all databases are. Third will be to have a server-side validation function which checks for unexpected characters/tags and issue an error to the users. You should probably do the third one first 😀 You could go as deep as you want. But those are how I might do it.
Jul 06 2020
parent reply Fitz <fitz figmentengine.com> writes:
On Monday, 6 July 2020 at 14:57:22 UTC, aberba wrote:
 utilities...a very long long time ago...two yrs 😜. See 
 https://code.dlang.org/packages/sanival for stripTags()
 Its a very limited implementation and uses std.regex which many 
 people here who are critical about performance will speak 
 against. I'm yet to see an alternative. So you could use that 
 if you don't find a better alternative.
Can't see stripTags? in https://code.dlang.org/packages/sanival
Jul 07 2020
parent reply aberba <karabutaworld gmail.com> writes:
On Tuesday, 7 July 2020 at 17:55:44 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 14:57:22 UTC, aberba wrote:
 utilities...a very long long time ago...two yrs 😜. See 
 https://code.dlang.org/packages/sanival for stripTags()
 Its a very limited implementation and uses std.regex which 
 many people here who are critical about performance will speak 
 against. I'm yet to see an alternative. So you could use that 
 if you don't find a better alternative.
Can't see stripTags? in https://code.dlang.org/packages/sanival
string stripTags(string input, in string[] allowedTags = []) { import std.regex: Captures, replaceAll, ctRegex; auto regex = ctRegex!(`</?(\w*)>`); string regexHandler(Captures!(string) match) { string insertSlash(in string tag) in { assert(tag.length, "Argument must contain one or more characters"); } body { return tag[0..1] ~ "/" ~ tag[1..$]; } bool allowed = false; foreach (tag; allowedTags) { if (tag == match.hit || insertSlash(tag) == match.hit) { allowed = true; break; } } return allowed ? match.hit : ""; } return input.replaceAll!(regexHandler)(regex); } unittest { assert(stripTags("<html><b>bold</b></html>") == "bold"); assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>"); }
Jul 07 2020
parent reply Kagamin <spam here.lot> writes:
On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote:
 unittest
 {
 	assert(stripTags("<html><b>bold</b></html>") == "bold");
 	assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == 
 "<html>bold</html>");
 }
Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears.
Jul 07 2020
parent aberba <karabutaworld gmail.com> writes:
On Wednesday, 8 July 2020 at 05:29:16 UTC, Kagamin wrote:
 On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote:
 unittest
 {
 	assert(stripTags("<html><b>bold</b></html>") == "bold");
 	assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == 
 "<html>bold</html>");
 }
Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears.
Its not perfect and there surely can be a better implementation that covers those edge cases. However stripTags() has its place. Its a very used function available in PHP among others for specific use cases. Now I can't stress "specific" use case enough. Sometimes removing tags...those not whitelisted...is the desired behaviour. You don't want to encode, you simply want to remove them. These days manual tags entry is phasing out for rich text editors. And the rest are using markdown. Nevertheless, stripTags() has its place.
Jul 08 2020
prev sibling parent reply aberba <karabutaworld gmail.com> writes:
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:
 Hello (I am a newbie to dlang)

 What's the recommended way to escape user input when outputting 
 html?

 intent: to stop XSS/etc, see 
 https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

 thanks in advance!

 Fitz
stripTags() is for when you want to leave other safe tags in comments. If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.
Jul 06 2020
parent reply Fitz <fitz figmentengine.com> writes:
On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:

 If you want to completely removed all tags, 
 https://code.dlang.org/packages/plain might be better.
seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&amp;"; break; case '<': w ~= "&lt;"; break; case '>': w ~= "&gt;"; break; case '"': w ~= "&quot;"; break; case '\'': w ~= "&#x27;"; break; case '/': w ~= "&#x2F;"; break; default: w ~= c; break; } } return w[]; }
Jul 07 2020
next sibling parent reply bauss <jj_1337 live.dk> writes:
On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:

 If you want to completely removed all tags, 
 https://code.dlang.org/packages/plain might be better.
seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&amp;"; break; case '<': w ~= "&lt;"; break; case '>': w ~= "&gt;"; break; case '"': w ~= "&quot;"; break; case '\'': w ~= "&#x27;"; break; case '/': w ~= "&#x2F;"; break; default: w ~= c; break; } } return w[]; }
There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and '
Jul 07 2020
next sibling parent bauss <jj_1337 live.dk> writes:
On Tuesday, 7 July 2020 at 18:30:38 UTC, bauss wrote:
 On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:

 If you want to completely removed all tags, 
 https://code.dlang.org/packages/plain might be better.
seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&amp;"; break; case '<': w ~= "&lt;"; break; case '>': w ~= "&gt;"; break; case '"': w ~= "&quot;"; break; case '\'': w ~= "&#x27;"; break; case '/': w ~= "&#x2F;"; break; default: w ~= c; break; } } return w[]; }
There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and '
Oh and control characters (basically anything not tabs below space in ASCII)
Jul 07 2020
prev sibling parent reply Fitz <fitz figmentengine.com> writes:
On Tuesday, 7 July 2020 at 18:30:38 UTC, bauss wrote:
 On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:
 There is no reason to escape / and it might break some parsers 
 for links etc. You should only escape <, >, &, " and '
'/' is in on the OSWASP list. you can use it to break out of a html tag. tbh I can't think about how it can be used?
Jul 08 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 8 July 2020 at 17:27:25 UTC, Fitz wrote:
 '/' is in on the OSWASP list. you can use it to break out of a 
 html tag.
 tbh I can't think about how it can be used?
A javascript string including </script> will end the script interpreter and then spit out html. So a lot of things will do \/ instead to prevent this. If you do context-aware encoding though a lot of this goes away.
Jul 08 2020
prev sibling parent reply aberba <karabutaworld gmail.com> writes:
On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:

 If you want to completely removed all tags, 
 https://code.dlang.org/packages/plain might be better.
seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
Again I'm not sure I really understood what you want. If you're trying to escape them with html entities, then my suggestions don't apply. I believe Adam (arsd) has some function in his library for doing html entities of tags.
Jul 07 2020
parent reply aberba <karabutaworld gmail.com> writes:
On Tuesday, 7 July 2020 at 20:21:19 UTC, aberba wrote:
 On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:
 On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:

 If you want to completely removed all tags, 
 https://code.dlang.org/packages/plain might be better.
seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
I believe Adam (arsd) has some function in his library for doing html entities of tags.
See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
Jul 07 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote:
I believe Adam (arsd) has some function in his
 library for doing html entities of tags.
See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
Yeah, that function will encode basically everything so you can concat it into HTML. My libs also have sanitation functions that go even further - you can do a html tag and attribute whitelist via the dom (html.d in my repo) and construct things with those functions too (using just dom.d for this). But I haven't documented all that stuff so you're kinda on your own in figuring it all out... that's why I don't advertise as much as the others. It is easy to use once you get to know it but instead of writing beginner-friendly documentation I often just answer individual's emails. Maybe I will blog about it later though.
Jul 07 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 8 July 2020 at 02:17:31 UTC, Adam D. Ruppe wrote:
 On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote:
I believe Adam (arsd) has some function in his
 library for doing html entities of tags.
See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
oh another note: that specific function does not encode ' either. So if you using it in an attribute make sure you double quote it correctly. If you build a tree using dom.d's Element class, it will do that consistently for you.
Jul 07 2020