digitalmars.D - Crash my webserver!
- Andrea Fontana (10/10) May 13 2023 Hi everyone, as I had already announced in the discord channel, I
- Vladimir Panteleev (12/13) May 13 2023 Not bad. What I found in 10 minutes:
- Andrea Fontana (13/27) May 13 2023 I've seen your tests! Thank you Vladimir!
- Andrea Fontana (10/14) May 13 2023 From RFC:
- Vladimir Panteleev (7/14) May 13 2023 I get a 400 with 1.0 too.
- Andrea Fontana (8/24) May 14 2023 Hmm I don't think you can use utf-8 encoding in your request. I
- Andrea Fontana (8/24) May 14 2023 Hmm I don't think you can use utf-8 encoding in your request. I
- Vladimir Panteleev (9/17) May 14 2023 Well, bytes are bytes until you decide to look at them in a
- Vladimir Panteleev (6/8) May 14 2023 Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless
- Andrea Fontana (13/21) May 14 2023 I'm doing some validations on data because that data is parsed
- Vladimir Panteleev (16/18) May 14 2023 This doesn't throw for me:
- Andrea Fontana (3/21) May 14 2023 You mean %ff not \xff!
- Johan (5/14) May 13 2023 Have you already fuzzed your server code?
- psyscout (12/14) May 14 2023 Hi Andrea,
- Andrea Fontana (7/22) May 14 2023 No: workers are not separated threads, but isolated processes.
Hi everyone, as I had already announced in the discord channel, I was wondering if any of you would like to try and do some tests on my http server (serverino). I don't mean a stress test/ddos, of course. I'm interested in request parsing errors or any bug that can crash the server (5xx error). Source: https://github.com/trikko/serverino/ Online using nginx as proxy: http://test.andreafontana.it (also https) Online into the wild listening on port 57123. Andrea
May 13 2023
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:Online into the wild listening on port 57123.Not bad. What I found in 10 minutes: - LF line endings are not accepted - Host header is mandatory, but not for nginx - Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twice - `multipart/form-data` encoding errors are silently discarded - The server seems to handle `application/x-www-form-urlencoded` very differently from `multipart/form-data`? Even though they're both alternative options for HTML `<form>` parameters, and one is somewhat of a superset of the other Hope this helps.
May 13 2023
On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir Panteleev wrote:On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:I've seen your tests! Thank you Vladimir!Online into the wild listening on port 57123.Not bad. What I found in 10 minutes:- LF line endings are not acceptedDo you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLF- Host header is mandatory, but not for nginxOnly for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?- Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twiceInteresting, could you please give me an example?- `multipart/form-data` encoding errors are silently discardedIt is (and a warning is shown on server error log). Probably you're right and I should send back a 400 Bad Request. Or something else?- The server seems to handle `application/x-www-form-urlencoded` very differently from `multipart/form-data`? Even though they're both alternative options for HTML `<form>` parameters, and one is somewhat of a superset of the otherYes, somewhat. But I can't really build a superset, that's why they are managed in two different ways.Hope this helps.Sure! Thanks!
May 13 2023
On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir PanteleevFrom RFC: «Although the line terminator for the start-line and fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR» Of course MAY means it is optional (rfc2119). I don't think I'm going to implement a special case for this, it is rarely used by old clients in 2023 :) Good point, anyway. Andrea- LF line endings are not acceptedDo you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLF
May 13 2023
On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:Do you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLFAh, OK. I thought the specification allowed either.I get a 400 with 1.0 too.- Host header is mandatory, but not for nginxOnly for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.- Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twiceInteresting, could you please give me an example?
May 13 2023
On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev wrote:On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something? AndreaDo you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLFAh, OK. I thought the specification allowed either.I get a 400 with 1.0 too.- Host header is mandatory, but not for nginxOnly for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.- Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twiceInteresting, could you please give me an example?
May 14 2023
On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev wrote:On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something? AndreaDo you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLFAh, OK. I thought the specification allowed either.I get a 400 with 1.0 too.- Host header is mandatory, but not for nginxOnly for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.- Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twiceInteresting, could you please give me an example?
May 14 2023
On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:Well, bytes are bytes until you decide to look at them in a certain way. Yea, the input may be invalid as per the spec; however, if mojibake indicates that you're decoding them twice, you're probably doing something that's at least unnecessarily inefficient. Maybe you're passing the bytes as char arrays to std.algorithm, which produces dchars, which are then being cast into char before decoding again? I think that would produce this sort of mojibake.It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something?
May 14 2023
On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII.Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless you're decoding UTF for the purpose of validating that further logic doesn't have to deal with bad UTF-8, that also indicates a potential inefficiency. Web servers don't need to do any UTF-8 decoding, but it's very easy to do it accidentally in D.
May 14 2023
On Sunday, 14 May 2023 at 11:32:46 UTC, Vladimir Panteleev wrote:On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:I'm doing some validations on data because that data is parsed and stored for serverino's users :) The UTF problem is actually a catched UTFException thrown by urlencode/decode of std library. And I'm trying to keep it a bit safe for user, let's say. I don't think any browser will send an invalid utf sequence as url, it sounds like you're trying to make some attack and I give you back a 400 bad request error. It's not the only check I'm doing anyway. I'm trying to understand what's wrong with mojibake, still not sure it is a bug :) AndreaHmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII.Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless you're decoding UTF for the purpose of validating that further logic doesn't have to deal with bad UTF-8, that also indicates a potential inefficiency. Web servers don't need to do any UTF-8 decoding, but it's very easy to do it accidentally in D.
May 14 2023
On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:The UTF problem is actually a catched UTFException thrown by urlencode/decode of std library.This doesn't throw for me: ```d void main() { import std.uri; decode("\xFF"); encode("\xFF"); } ``` But... looking at the implementation, it does have a baked-in UTF-8 decoder, which is a little ridiculous. `decode` actually decodes percent-encoded UTF-8, and then encodes it back, but makes no attempt to validate the non-encoded parts of the string. The module is pretty old though, so maybe it predates the facilities in `std.utf`.
May 14 2023
On Sunday, 14 May 2023 at 14:57:07 UTC, Vladimir Panteleev wrote:On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:You mean %ff not \xff! AndreaThe UTF problem is actually a catched UTFException thrown by urlencode/decode of std library.This doesn't throw for me: ```d void main() { import std.uri; decode("\xFF"); encode("\xFF"); } ``` But... looking at the implementation, it does have a baked-in UTF-8 decoder, which is a little ridiculous. `decode` actually decodes percent-encoded UTF-8, and then encodes it back, but makes no attempt to validate the non-encoded parts of the string. The module is pretty old though, so maybe it predates the facilities in `std.utf`.
May 14 2023
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:Hi everyone, as I had already announced in the discord channel, I was wondering if any of you would like to try and do some tests on my http server (serverino). I don't mean a stress test/ddos, of course. I'm interested in request parsing errors or any bug that can crash the server (5xx error). Source: https://github.com/trikko/serverino/ Online using nginx as proxy: http://test.andreafontana.it (also https) Online into the wild listening on port 57123.Have you already fuzzed your server code? https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html cheers, Johan
May 13 2023
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:Hi everyone, as I had already announced in the discord channel...Hi Andrea, this question may be not completely related, but hopefully you can answer. I can see a worker concept and each worker is a completely separate application and doesn't share context with other workers. For example I have a __gshared state with some data which is being updated by separate thread. So I need all workers (threads) be able to access that state without recreating it multiple times. Is it possible to achieve it without introducing a separate cache or database, just inside single app and Serverino serving data requests through multiple threads?
May 14 2023
On Sunday, 14 May 2023 at 15:19:24 UTC, psyscout wrote:On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:No: workers are not separated threads, but isolated processes. You should consider that workers' count is dynamic; they can be created and killed if required. You can still use some ipc (sockets, pipes) etc but probably a db it's easier to manage. AndreaHi everyone, as I had already announced in the discord channel...Hi Andrea, this question may be not completely related, but hopefully you can answer. I can see a worker concept and each worker is a completely separate application and doesn't share context with other workers. For example I have a __gshared state with some data which is being updated by separate thread. So I need all workers (threads) be able to access that state without recreating it multiple times. Is it possible to achieve it without introducing a separate cache or database, just inside single app and Serverino serving data requests through multiple threads?
May 14 2023