digitalmars.D - Html escaping for security: howto in D?
- Fitz (7/7) Jul 06 2020 Hello (I am a newbie to dlang)
- Fitz (13/20) Jul 06 2020 looks like this forum uses
- Vladimir Panteleev (6/18) Jul 06 2020 If you don't escape single quotes, then don't use single quotes
- Fitz (2/19) Jul 06 2020 thank you! I'll have a look at them to see what they provide
- aberba (19/26) Jul 06 2020 So in D you'll have to do multiple things. The first one is using
- Fitz (2/8) Jul 07 2020 Can't see stripTags? in https://code.dlang.org/packages/sanival
- aberba (5/12) Jul 06 2020 stripTags() is for when you want to leave other safe tags in
- Fitz (33/35) Jul 07 2020 seems overkill, just implemented something simple:
- bauss (3/38) Jul 07 2020 There is no reason to escape / and it might break some parsers
- bauss (3/45) Jul 07 2020 Oh and control characters (basically anything not tabs below
- Fitz (4/8) Jul 08 2020 '/' is in on the OSWASP list. you can use it to break out of a
- Adam D. Ruppe (5/8) Jul 08 2020 A javascript string including will end the script
- aberba (5/11) Jul 07 2020 Again I'm not sure I really understood what you want. If you're
- aberba (3/14) Jul 07 2020 See
- Adam D. Ruppe (12/16) Jul 07 2020 Yeah, that function will encode basically everything so you can
- Adam D. Ruppe (6/12) Jul 07 2020 oh another note: that specific function does not encode ' either.
Hello (I am a newbie to dlang) What's the recommended way to escape user input when outputting html? intent: to stop XSS/etc, see https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html thanks in advance! Fitz
Jul 06 2020
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:Hello (I am a newbie to dlang) What's the recommended way to escape user input when outputting html? intent: to stop XSS/etc, see https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html thanks in advance! Fitzlooks like this forum uses https://github.com/CyberShadow/ae/blob/master/utils/text/html.d to do escaping. This code only escape 4/6 characters, not these: ' --> ' / --> / which looks risky?, if its storeed in "$encode", given <div class='$encoded'>hello, world</div> then $encode="blue' onclick='alert()" results in: <div class='blue' onclick='alert()'>hello, world</div> could be nasty
Jul 06 2020
On Monday, 6 July 2020 at 12:26:01 UTC, Fitz wrote:looks like this forum uses https://github.com/CyberShadow/ae/blob/master/utils/text/html.d to do escaping. This code only escape 4/6 characters, not these: ' --> ' / --> / which looks risky?, if its storeed in "$encode", given <div class='$encoded'>hello, world</div> then $encode="blue' onclick='alert()" results in: <div class='blue' onclick='alert()'>hello, world</div> could be nastyIf you don't escape single quotes, then don't use single quotes to delimit attributes. I fixed the function to also escape single quotes. Thanks for the report. But, I think you should look at Vibe.d or Hunt for a more complete framework.
Jul 06 2020
On Monday, 6 July 2020 at 12:39:42 UTC, Vladimir Panteleev wrote:On Monday, 6 July 2020 at 12:26:01 UTC, Fitz wrote:thank you! I'll have a look at them to see what they providelooks like this forum uses https://github.com/CyberShadow/ae/blob/master/utils/text/html.d to do escaping. This code only escape 4/6 characters, not these: ' --> ' / --> / which looks risky?, if its storeed in "$encode", given <div class='$encoded'>hello, world</div> then $encode="blue' onclick='alert()" results in: <div class='blue' onclick='alert()'>hello, world</div> could be nastyIf you don't escape single quotes, then don't use single quotes to delimit attributes. I fixed the function to also escape single quotes. Thanks for the report. But, I think you should look at Vibe.d or Hunt for a more complete framework.
Jul 06 2020
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:Hello (I am a newbie to dlang) What's the recommended way to escape user input when outputting html? intent: to stop XSS/etc, see https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html thanks in advance! FitzSo in D you'll have to do multiple things. The first one is using some kind of stripTags() as available PHP. I had it in me some time ago to create such a collection of handy utilities...a very long long time ago...two yrs 😜. See https://code.dlang.org/packages/sanival for stripTags() Its a very limited implementation and uses std.regex which many people here who are critical about performance will speak against. I'm yet to see an alternative. So you could use that if you don't find a better alternative. That's just the first step. The second would be to use prepared statements in whatever database you use if it's vulnerable to such attacks.. SQL injection for instance. Not all databases are. Third will be to have a server-side validation function which checks for unexpected characters/tags and issue an error to the users. You should probably do the third one first 😀 You could go as deep as you want. But those are how I might do it.
Jul 06 2020
On Monday, 6 July 2020 at 14:57:22 UTC, aberba wrote:utilities...a very long long time ago...two yrs 😜. See https://code.dlang.org/packages/sanival for stripTags() Its a very limited implementation and uses std.regex which many people here who are critical about performance will speak against. I'm yet to see an alternative. So you could use that if you don't find a better alternative.Can't see stripTags? in https://code.dlang.org/packages/sanival
Jul 07 2020
On Tuesday, 7 July 2020 at 17:55:44 UTC, Fitz wrote:On Monday, 6 July 2020 at 14:57:22 UTC, aberba wrote:string stripTags(string input, in string[] allowedTags = []) { import std.regex: Captures, replaceAll, ctRegex; auto regex = ctRegex!(`</?(\w*)>`); string regexHandler(Captures!(string) match) { string insertSlash(in string tag) in { assert(tag.length, "Argument must contain one or more characters"); } body { return tag[0..1] ~ "/" ~ tag[1..$]; } bool allowed = false; foreach (tag; allowedTags) { if (tag == match.hit || insertSlash(tag) == match.hit) { allowed = true; break; } } return allowed ? match.hit : ""; } return input.replaceAll!(regexHandler)(regex); } unittest { assert(stripTags("<html><b>bold</b></html>") == "bold"); assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>"); }utilities...a very long long time ago...two yrs 😜. See https://code.dlang.org/packages/sanival for stripTags() Its a very limited implementation and uses std.regex which many people here who are critical about performance will speak against. I'm yet to see an alternative. So you could use that if you don't find a better alternative.Can't see stripTags? in https://code.dlang.org/packages/sanival
Jul 07 2020
On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote:unittest { assert(stripTags("<html><b>bold</b></html>") == "bold"); assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>"); }Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears.
Jul 07 2020
On Wednesday, 8 July 2020 at 05:29:16 UTC, Kagamin wrote:On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote:Its not perfect and there surely can be a better implementation that covers those edge cases. However stripTags() has its place. Its a very used function available in PHP among others for specific use cases. Now I can't stress "specific" use case enough. Sometimes removing tags...those not whitelisted...is the desired behaviour. You don't want to encode, you simply want to remove them. These days manual tags entry is phasing out for rich text editors. And the rest are using markdown. Nevertheless, stripTags() has its place.unittest { assert(stripTags("<html><b>bold</b></html>") == "bold"); assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>"); }Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears.
Jul 08 2020
On Monday, 6 July 2020 at 11:56:17 UTC, Fitz wrote:Hello (I am a newbie to dlang) What's the recommended way to escape user input when outputting html? intent: to stop XSS/etc, see https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html thanks in advance! FitzstripTags() is for when you want to leave other safe tags in comments. If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.
Jul 06 2020
On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&"; break; case '<': w ~= "<"; break; case '>': w ~= ">"; break; case '"': w ~= """; break; case '\'': w ~= "'"; break; case '/': w ~= "/"; break; default: w ~= c; break; } } return w[]; }
Jul 07 2020
On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and 'If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&"; break; case '<': w ~= "<"; break; case '>': w ~= ">"; break; case '"': w ~= """; break; case '\'': w ~= "'"; break; case '/': w ~= "/"; break; default: w ~= c; break; } } return w[]; }
Jul 07 2020
On Tuesday, 7 July 2020 at 18:30:38 UTC, bauss wrote:On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:Oh and control characters (basically anything not tabs below space in ASCII)On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and 'If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html string encodeSafely(string input) { auto w = appender!string; foreach (c; input) { switch (c) { case '&': w ~= "&"; break; case '<': w ~= "<"; break; case '>': w ~= ">"; break; case '"': w ~= """; break; case '\'': w ~= "'"; break; case '/': w ~= "/"; break; default: w ~= c; break; } } return w[]; }
Jul 07 2020
On Tuesday, 7 July 2020 at 18:30:38 UTC, bauss wrote:On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and ''/' is in on the OSWASP list. you can use it to break out of a html tag. tbh I can't think about how it can be used?
Jul 08 2020
On Wednesday, 8 July 2020 at 17:27:25 UTC, Fitz wrote:'/' is in on the OSWASP list. you can use it to break out of a html tag. tbh I can't think about how it can be used?A javascript string including </script> will end the script interpreter and then spit out html. So a lot of things will do \/ instead to prevent this. If you do context-aware encoding though a lot of this goes away.
Jul 08 2020
On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:Again I'm not sure I really understood what you want. If you're trying to escape them with html entities, then my suggestions don't apply. I believe Adam (arsd) has some function in his library for doing html entities of tags.If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
Jul 07 2020
On Tuesday, 7 July 2020 at 20:21:19 UTC, aberba wrote:On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote:See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.htmlOn Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote:I believe Adam (arsd) has some function in his library for doing html entities of tags.If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better.seems overkill, just implemented something simple: // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
Jul 07 2020
On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote:Yeah, that function will encode basically everything so you can concat it into HTML. My libs also have sanitation functions that go even further - you can do a html tag and attribute whitelist via the dom (html.d in my repo) and construct things with those functions too (using just dom.d for this). But I haven't documented all that stuff so you're kinda on your own in figuring it all out... that's why I don't advertise as much as the others. It is easy to use once you get to know it but instead of writing beginner-friendly documentation I often just answer individual's emails. Maybe I will blog about it later though.I believe Adam (arsd) has some function in his library for doing html entities of tags.See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
Jul 07 2020
On Wednesday, 8 July 2020 at 02:17:31 UTC, Adam D. Ruppe wrote:On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote:oh another note: that specific function does not encode ' either. So if you using it in an attribute make sure you double quote it correctly. If you build a tree using dom.d's Element class, it will do that consistently for you.I believe Adam (arsd) has some function in his library for doing html entities of tags.See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
Jul 07 2020