digitalmars.D.learn - try find the fastest way to convert a group string into index?
- learnfirst1 (10/10) Sep 16 2018 The use case is translate http header key into enum.
- Vladimir Panteleev (32/34) Sep 16 2018 This is a more general problem than any specific programming
- learnfirst1 (29/33) Sep 16 2018 thanks for reply, minimal perfect hashing seems good for this
- learnfirst1 (5/7) Sep 16 2018 It has to be case Case Insensitive, so before I run mpfh for
- learnfirst1 (5/8) Sep 16 2018 I will test pcre solution vs mpfc for benchmark. the pcre is
- Diederik de Groot (27/27) Sep 16 2018 Homework ?
The use case is translate http header key into enum. this is the current code : https://run.dlang.io/is/lpw29w In this fake code I only list a few headers, it should be more. but less then 128 and only include the a few codes. how to make this more fast like with one loop and get the results. I am think of use PCRE compile for this, but not sure how. It should also save the non default headers in to a new array, so when search non default headers will reduce the total size. I am plan to write a more fast http server with D betterC compare to vibed.
Sep 16 2018
On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:how to make this more fast like with one loop and get the results.This is a more general problem than any specific programming language; you may want to look into perfect hashing: https://en.wikipedia.org/wiki/Perfect_hash_function There exists software to generate perfect hash functions, e.g. GNU gperf: https://www.gnu.org/software/gperf/ It looks like Basile wrote one for D: https://github.com/BBasile/IsItThere If you don't want to go that far, you can generate a string switch using string mixins: void main() { static immutable string[] keywords = ["foo", "bar", "baz"]; enum Keyword { foo, bar, baz } Keyword parseKeyword(string s) { switch (s) { mixin({ string code; foreach (keyword; keywords) code ~= `case "` ~ keyword ~ `": return Keyword.` ~ keyword ~ `;`; return code; }()); default: throw new Exception("No such keyword!"); } } assert(parseKeyword("bar") == Keyword.bar); }
Sep 16 2018
On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:thanks for reply, minimal perfect hashing seems good for this task, I will do more test for confirm. I try to storage the value into a static array, so when diff middleware try access the key it will reused the array offset without need to run hash on key string. (all middleware can access the key by static enum name). with Perfect Hash Function, I can translate the hash number into static enum and use them. The code also need provide string get(string key) method, in this case the header key may not on the static preset arrays, in this case it should search a smaller array (save on the parse process). with this solution I has to run hash for each of request headers , with less than 127 items (maybe around 16 at max) I am not sure it is fast then direct string compare. for example: foreach(int header_index, ref header; new_request.headers) { int hash = mpfh(header.key); auto enum_index = static_eum[ hash ]; new_request.header[enum_index] = header_index ; } the simple static switch is not the best way as I know here. with compiled PCRE match it will "Context-Type", "Context-Encoding" in once ( and maybe there is some SSE4 to pass 16 byte for each loop).how to make this more fast like with one loop and get the results.please ignore the Header.data assign part. which just for the demo. in the real app it will only parse once and storage in pre -alloc request/response object.Diederik de Groot
Sep 16 2018
On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:It has to be case Case Insensitive, so before I run mpfh for each new request headers, I need to convert the keys into low or up case. this value can be storage on a stack local tmp cache. and I has to run it on the get(string key) method again.
Sep 16 2018
On Sunday, 16 September 2018 at 11:51:24 UTC, learnfirst1 wrote:On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:I will test pcre solution vs mpfc for benchmark. the pcre is easy to deal with low/up case. The PCRE: /^(?:Accept(*:0)|Accept\-Charset(*:1)|Accept\-Encoding(*:2)|Accept\-Language(*:3)|Access\-Control\-Allow\-Credentials(*:4)|Access\-Control\-Allow\-Origin(*:5)|Access\-Control\-Allow\-Methods(*:6)|Access\-Control\-Allow\-Headers(*:7)|Access\-Control\-Max\-Age(*:8)|Access\-Control\-Expose\-Headers(*:9)|Access\-Control\-Request\-Method(*:10)|Access\-Control\-Request\-Headers(*:11)|Age(*:12)|Allow(*:13)|Authorization(*:14)|Cache\-Control(*:15)|Connection(*:16)|Content\-Encoding(*:17)|Content\-Language(*:18)|Content\-Length(*:19)|Content\-Location(*:20)|Content\-Range(*:21)|Content\-Security\-Policy(*:22)|Content\-Type(*:23)|Cookie(*:24)|Date(*:25)|ETag(*:26)|Expect(*:27)|Expires(*:28)|Host(*:29)|If\-Match(*:30)|If\-Modified\-Since(*:31)|If\-None\-Match(*:32)|If\-Range(*:33)|If\-Unmodified\-Since(*:34)|Last\-Modified(*:35)|Location(*:36)|Origin(*:37)|Pragma(*:38)|Proxy\-Authenticate(*:39)|Proxy\-Authorization(*:40)|Range(*:41)|Referer(*:42)|Retry\-After(*:43)|Sec\-Websocket\-Extensions(*:44)|Sec\-Websocket\-Key(*:45)|Sec\-Websocket\-Origin(*:46)|Sec\-Websocket\-Protocol(*:47)|Sec\-Websocket\-Version(*:48)|Server(*:49)|Set\-Cookie(*:50)|Strict\-Transport\-Security(*:51)|Transfer\-Encoding(*:52)|Upgrade(*:53)|User\-Agent(*:54)|Vary(*:55)|Via(*:56)|WWW\-Authenticate(*:57))$/iOn Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:
Sep 16 2018
Homework ? I think it would also be better to only walk/match the Header.data and header_keys only once (using the generated switch statement that Vladimir suggested). So your Headers struct should have a parse() function, which will match all of the headers to the keys and store them locally in a 'matches[Type] = string' array (aka map). Then the get() function can simply return matches[type] without any work. Other improvements/Bonus Points: - Because the parse() function is only called/initialize once, you could make the matches array immutable/const if. - Your Type enum and header_keys contain the same data, and if you are going to use Vlafimir's suggestion of using a mixin, you can stringify the enum names an use them instead. This way you don't have to keep anything in sync. - If these Headers are coming from a client over the internet, it might be wise to set and check some (arbitrary) bounds for header-length and content. You are currently blindly copying it and expecting it is 'clean/correct'. - Why copy the data to Header.data, instead of just providing it by ref to the get()/parse() function. You are copying it to then later indexing into Header.data using the map(). Why not copy/move piecemeal and assign them to an indexed array after inspecting/matching them. That could make for cleaner code and nicer API. (With two caveats: we could loose the original headers if we don't save them somewhere else and unmatched/unknown Header Types will not be copied).
Sep 16 2018