digitalmars.D.learn - Which option is faster...
- jicman (31/31) Aug 05 2013 Greetings!
- Timon Gehr (2/33) Aug 05 2013 They are both equally slow.
- jicman (2/42) Aug 05 2013 How would you make it faster in D1?
- bearophile (10/11) Aug 05 2013 Compute std.string.tolower(fext[0]) and put it in a temporary
- jicman (2/13) Aug 05 2013 Thanks. This is great.
- jicman (10/41) Aug 05 2013 Second option...
- dennis luehring (6/37) Aug 05 2013 did you benchmarked your current szenario - how do you know that this is...
- jicman (2/9) Aug 05 2013 Ok, how would you make it faster?
- dennis luehring (8/19) Aug 05 2013 i don't see a better solution here - how to reduce ONE lowercase and
- jicman (5/28) Aug 05 2013 It is a tool that was a script, but I have turned it into do,
- dennis luehring (8/14) Aug 05 2013 so its totaly unclear if the presented code is your 2h monster, what was
- jicman (9/27) Aug 05 2013 The files are in a network drive, so, that has some slowness
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/15) Aug 05 2013 And that is proof that a sizable amount of time is spent during
- dennis luehring (19/39) Aug 05 2013 can you describe more detailed what are you doing - are you also reading
- H. S. Teoh (15/22) Aug 05 2013 If you really want optimal performance, use std.regex:
- Andrej Mitrovic (3/4) Aug 06 2013 Yes and if you also want to bloat your executable to the point that
- John Colvin (25/56) Aug 05 2013 better:
- jicman (2/66) Aug 05 2013 As the Hispanic speaking community say, muchas gracias.
- monarch_dodra (49/74) Aug 05 2013 Arguably, you'd do even better with:
- jicman (5/86) Aug 05 2013 :-) Now that's funny. He he he he...
- David (7/24) Aug 05 2013 That's how I would do it.
- John Colvin (5/31) Aug 05 2013 In case some of the strings are actually rather long, one could
- JS (39/70) Aug 05 2013 Both are slow, those trying to optimize out tolower are missing
- H. S. Teoh (13/71) Aug 05 2013 [...]
- =?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= (74/105) Aug 05 2013 Like others said, writing if( ...) continue; if(...) continue; versus
- Andre Artus (9/40) Aug 05 2013 What exactly are you trying to do with this? I get the impression
- jicman (4/51) Aug 06 2013 The files are in a network drive and doing a list foreach *.doc,
- Andre Artus (10/65) Aug 06 2013 Again, what are you trying to achieve?
- jicman (8/75) Aug 06 2013 It's a long story and I will return in a few months and give you
- dennis luehring (24/40) Aug 06 2013 after makeing us girls all wet to help you - your reply is
- Andre Artus (11/44) Aug 06 2013 Just for a sanity check I implemented a quick client-server setup
- H. S. Teoh (12/27) Aug 06 2013 In this case, the bottleneck most likely is in the network latency, so
- H. S. Teoh (16/50) Aug 05 2013 [...]
- Jonathan M Davis (23/56) Aug 05 2013 As others have pointed out, the two code snippets are semantically ident...
Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. josé
Aug 05 2013
On 08/05/2013 03:59 PM, jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséThey are both equally slow.
Aug 05 2013
On Monday, 5 August 2013 at 14:13:33 UTC, Timon Gehr wrote:On 08/05/2013 03:59 PM, jicman wrote:How would you make it faster in D1?Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséThey are both equally slow.
Aug 05 2013
jicman:How would you make it faster in D1?Compute std.string.tolower(fext[0]) and put it in a temporary variable. And then compare that variable with all your string literals. In most cases that's fast enough. If it's not enough, you could create a little finite state machine that represents your directed acyclic word graph, and uses gotos to jump around states. The amount of strings is small, so perhaps there are not enough code cache misses to nullify this optimization. Bye, bearophile
Aug 05 2013
On Monday, 5 August 2013 at 14:50:27 UTC, bearophile wrote:jicman:Thanks. This is great.How would you make it faster in D1?Compute std.string.tolower(fext[0]) and put it in a temporary variable. And then compare that variable with all your string literals. In most cases that's fast enough. If it's not enough, you could create a little finite state machine that represents your directed acyclic word graph, and uses gotos to jump around states. The amount of strings is small, so perhaps there are not enough code cache misses to nullify this optimization. Bye, bearophile
Aug 05 2013
On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:Greetings! I have this code,First option...foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; }Second option...foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséSo, after I saw this post I asked myself, what? So, the question is: which of the two foreach loops options are faster: 1. The concatenated if || 2. The single if I am trying to see if it matters. I have a project with lots of files and if one is faster, then, it will matter to code it the faster way. Thanks.
Aug 05 2013
did you benchmarked your current szenario - how do you know that this is the slow part - or are you working on an only-extension-compare-tool? btw: they are both equal and slow - and full of partly code-duplication std.string.tolower(fext[0]) multiple times, i hope your list isn't going much longer Am 05.08.2013 15:59, schrieb jicman:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. josé
Aug 05 2013
On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring wrote:did you benchmarked your current szenario - how do you know that this is the slow part - or are you working on an only-extension-compare-tool? btw: they are both equal and slow - and full of partly code-duplication std.string.tolower(fext[0]) multiple times, i hope your list isn't going much longerOk, how would you make it faster?
Aug 05 2013
Ok, how would you make it faster?i don't see a better solution here - how to reduce ONE lowercase and SOME compares in any way? (i dont think a hash or something will help) but i know that anything like your continue-party is worth nothing (feels a little bit like script-kiddies "do it with assembler that would it make million times faster" blabla) question: is this the slow part in your project? do you know it for sure or just an emotion - HOW do you benchmark? Am 05.08.2013 16:31, schrieb jicman:On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring wrote:did you benchmarked your current szenario - how do you know that this is the slow part - or are you working on an only-extension-compare-tool? btw: they are both equal and slow - and full of partly code-duplication std.string.tolower(fext[0]) multiple times, i hope your list isn't going much longerOk, how would you make it faster?
Aug 05 2013
On Monday, 5 August 2013 at 14:47:37 UTC, dennis luehring wrote:It is a tool that was a script, but I have turned it into do, which now has taken two hours from the last jscript script. I have not benchmarked it, yet. I may. But I see that a great idea has been provided, which I will use. Thanks for the help.Ok, how would you make it faster?i don't see a better solution here - how to reduce ONE lowercase and SOME compares in any way? (i dont think a hash or something will help) but i know that anything like your continue-party is worth nothing (feels a little bit like script-kiddies "do it with assembler that would it make million times faster" blabla) question: is this the slow part in your project? do you know it for sure or just an emotion - HOW do you benchmark? Am 05.08.2013 16:31, schrieb jicman:On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring wrote:did you benchmarked your current szenario - how do you know that this is the slow part - or are you working on an only-extension-compare-tool? btw: they are both equal and slow - and full of partly code-duplication std.string.tolower(fext[0]) multiple times, i hope your list isn't going much longerOk, how would you make it faster?
Aug 05 2013
Am 05.08.2013 17:18, schrieb jicman:It is a tool that was a script, but I have turned it into do, which now has taken two hours from the last jscript script. I have not benchmarked it, yet. I may. But I see that a great idea has been provided, which I will use. Thanks for the help.have not benchmarked it, yet. I may.so its totaly unclear if the presented code is your 2h monster, what was the runtime of your jscript?But I see that a great idea has been providedusing a local variable for not lowercasing on each if is not an great idea it is default programming style and i don't think you're be able to implement the tree statemachine when doing such simple performance killer like multiple lowercase calls, and try to help youselfe by introducing "continue"...
Aug 05 2013
On Monday, 5 August 2013 at 15:27:09 UTC, dennis luehring wrote:Am 05.08.2013 17:18, schrieb jicman:The files are in a network drive, so, that has some slowness already involved because of that. The jscript use to take over 8 hours. The new D program has dropped that to under less than six. This is huge to us. But, I know that I can probably fine tune the program to make it a few minutes less. :-)It is a tool that was a script, but I have turned it into do, which now has taken two hours from the last jscript script. I have not benchmarked it, yet. I may. But I see that a great idea has been provided, which I will use. Thanks for the help.have not benchmarked it, yet. I may.so its totaly unclear if the presented code is your 2h monster, what was the runtime of your jscript?This will help. If there are 100K files, which I know that there are more than that, it will help a little bit.But I see that a great idea has been providedusing a local variable for not lowercasing on each if is not an great idea it is default programming styleand i don't think you're be able to implement the tree statemachine when doing such simple performance killer like multiple lowercase calls, and try to help youselfe by introducing "continue"...Perhaps, but it's a good idea, nonetheless.
Aug 05 2013
On 08/05/2013 10:04 AM, jicman wrote:The files are in a network drive, so, that has some slowness already involved because of that.Which may dominate the running time.The jscript use to take over 8 hours. The new D program has dropped that to under less than six.And that is proof that a sizable amount of time is spent during computations. :)But, I know that I can probably fine tune the program to make it a few minutes less. :-)You really should profile where the time is spent. If file I/O is still a big issue, there are WAN optimization products that may solve that problem amazingly efficiently: http://en.wikipedia.org/wiki/WAN_optimization Ali
Aug 05 2013
Am 05.08.2013 19:04, schrieb jicman:can you describe more detailed what are you doing - are you also reading these files? why not run your tool on the server and only collect the results over network (wouldn't that be much faster)? what is the impact of using the networkdrive? just copy your biggeste szenario on a local machine an run it against this to get a feeling how painfull the networkdrive communication isso its totaly unclear if the presented code is your 2h monster, what was the runtime of your jscript?The files are in a network drive, so, that has some slowness already involved because of that. The jscript use to take over 8 hours. The new D program has dropped that to under less than six. This is huge to us. But, I know that I can probably fine tune the program to make it a few minutes less. :-)maybe your program can be better optimized globaly, but we need more information of what is the programing doing - can you give some psuedo code like: 1. read all filenames recursivley -> 100k filenames 2. reduce down to known extensions -> 10k filenames 3. ... 4. ... 5. ... i don't think that your lower-case is the major part of the 6h, there must be other very slow parts in your project - which you don't find without putting in some bechmarking (and i don't understand why you don't start with benchmarking)This will help. If there are 100K files, which I know that there are more than that, it will help a little bit.But I see that a great idea has been providedusing a local variable for not lowercasing on each if is not an great idea it is default programming styleand i don't think you're be able to implement the tree statemachine when doing such simple performance killer like multiple lowercase calls, and try to help youselfe by introducing "continue"...Perhaps, but it's a good idea, nonetheless.
Aug 05 2013
On Mon, Aug 05, 2013 at 04:47:36PM +0200, dennis luehring wrote:If you really want optimal performance, use std.regex: import std.regex; auto reExtMatch = ctRegex!(`doc|docx|exe|...`, "i"); foreach (...) { if (fext[0].match(reExtMatch)) continue; } The regex is set up to ignore case (the "i" flag), and is compiled at compile-time to generate optimal matching code for the given list of extensions. To get any faster than this, you'll have to hand-optimize or write in assembly. :) T -- Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices. -- Gene WirchenkoOk, how would you make it faster?i don't see a better solution here - how to reduce ONE lowercase and SOME compares in any way? (i dont think a hash or something will help) but i know that anything like your continue-party is worth nothing (feels a little bit like script-kiddies "do it with assembler that would it make million times faster" blabla)
Aug 05 2013
On 8/5/13, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:If you really want optimal performance, use std.regex:Yes and if you also want to bloat your executable to the point that other parts of the system start slowing down.
Aug 06 2013
On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. josébetter: foreach (...) { auto tmp = std.string.tolower(fext[0]); if(tmp == "doc" || tmp == "docx" || tmp == "xls" || tmp == "xlsx" || tmp == "ppt" || tmp == "pptx") { continue; } } but still not super-fast as (unless the compiler is very clever) it still means multiple passes over tmp. Also, it converts the whole string to lower case even when it's not necessary. If you have large numbers of possible matches you will probably want to be clever with your data structures / algorithms. E.g. You could create a tree-like structure to quickly eliminate possibilities as you read successive letters. You read one character, follow the appropriate branch, check if there are any further branches, if not then no match and break. Else, read the next character and follow the appropriate branch and so on.... Infeasible for large (or even medium-sized) character-sets without hashing, but might be pretty fast for a-z and a large number of short strings.
Aug 05 2013
On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:As the Hispanic speaking community say, muchas gracias.Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. josébetter: foreach (...) { auto tmp = std.string.tolower(fext[0]); if(tmp == "doc" || tmp == "docx" || tmp == "xls" || tmp == "xlsx" || tmp == "ppt" || tmp == "pptx") { continue; } } but still not super-fast as (unless the compiler is very clever) it still means multiple passes over tmp. Also, it converts the whole string to lower case even when it's not necessary. If you have large numbers of possible matches you will probably want to be clever with your data structures / algorithms. E.g. You could create a tree-like structure to quickly eliminate possibilities as you read successive letters. You read one character, follow the appropriate branch, check if there are any further branches, if not then no match and break. Else, read the next character and follow the appropriate branch and so on.... Infeasible for large (or even medium-sized) character-sets without hashing, but might be pretty fast for a-z and a large number of short strings.
Aug 05 2013
On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:better: foreach (...) { auto tmp = std.string.tolower(fext[0]); if(tmp == "doc" || tmp == "docx" || tmp == "xls" || tmp == "xlsx" || tmp == "ppt" || tmp == "pptx") { continue; } } but still not super-fast as (unless the compiler is very clever) it still means multiple passes over tmp. Also, it converts the whole string to lower case even when it's not necessary. If you have large numbers of possible matches you will probably want to be clever with your data structures / algorithms. E.g. You could create a tree-like structure to quickly eliminate possibilities as you read successive letters. You read one character, follow the appropriate branch, check if there are any further branches, if not then no match and break. Else, read the next character and follow the appropriate branch and so on.... Infeasible for large (or even medium-sized) character-sets without hashing, but might be pretty fast for a-z and a large number of short strings.Arguably, you'd do even better with: foreach (...) { auto tmp = std.string.tolower(fext[0]); switch(tmp) { case "doc", "docx": case "xls", "xlsx": case "ppt", "pptx": continue; default: } } Since it gives the compiler more wiggle room to optimize things as it wishes. For example, it *could* (who knows :D !) implement the switch as a hash table, or a tree. BTW, a very convenient "tree-like" structure that could be used is a "heap": it is a basic binary tree, but stored inside an array. You could build it during compilation, and then simply search it. A possible optimization is to first switch on string length: foreach (...) { auto tmp = std.string.tolower(fext[0]); switch(tmp.length) { case 3: switch(tmp) { case "doc", "xls", "ppt": continue; default: } break; case 4: switch(tmp) { case "docx", "xlsx", "pptx": continue; default: } break; default: } } That said, I'm not even sure this would be faster, so a bench would be called for. Further more, I'd really be tempted to say that at this point, we are in the realm of premature optimization.
Aug 05 2013
On Monday, 5 August 2013 at 17:04:36 UTC, monarch_dodra wrote:On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote::-) Now that's funny. He he he he... By the way, your coding makes sense. For 100K plus files, it may not mean much, but we are thinking that this amount will be only growing.better: foreach (...) { auto tmp = std.string.tolower(fext[0]); if(tmp == "doc" || tmp == "docx" || tmp == "xls" || tmp == "xlsx" || tmp == "ppt" || tmp == "pptx") { continue; } } but still not super-fast as (unless the compiler is very clever) it still means multiple passes over tmp. Also, it converts the whole string to lower case even when it's not necessary. If you have large numbers of possible matches you will probably want to be clever with your data structures / algorithms. E.g. You could create a tree-like structure to quickly eliminate possibilities as you read successive letters. You read one character, follow the appropriate branch, check if there are any further branches, if not then no match and break. Else, read the next character and follow the appropriate branch and so on.... Infeasible for large (or even medium-sized) character-sets without hashing, but might be pretty fast for a-z and a large number of short strings.Arguably, you'd do even better with: foreach (...) { auto tmp = std.string.tolower(fext[0]); switch(tmp) { case "doc", "docx": case "xls", "xlsx": case "ppt", "pptx": continue; default: } } Since it gives the compiler more wiggle room to optimize things as it wishes. For example, it *could* (who knows :D !) implement the switch as a hash table, or a tree. BTW, a very convenient "tree-like" structure that could be used is a "heap": it is a basic binary tree, but stored inside an array. You could build it during compilation, and then simply search it. A possible optimization is to first switch on string length: foreach (...) { auto tmp = std.string.tolower(fext[0]); switch(tmp.length) { case 3: switch(tmp) { case "doc", "xls", "ppt": continue; default: } break; case 4: switch(tmp) { case "docx", "xlsx", "pptx": continue; default: } break; default: } } That said, I'm not even sure this would be faster, so a bench would be called for. Further more, I'd really be tempted to say that at this point, we are in the realm of premature optimization.
Aug 05 2013
Am 05.08.2013 15:59, schrieb jicman:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; }That's how I would do it. if(["doc", "docx", "xls", "xlsx", "ppt", "pptx"].canFind(fext[0].toLower) { continue; } With the array beeing a runtime/compiletime constant with a sensible name. WORD_EXTENSIONS or something.
Aug 05 2013
On Monday, 5 August 2013 at 15:21:25 UTC, David wrote:Am 05.08.2013 15:59, schrieb jicman:In case some of the strings are actually rather long, one could use toLower lazily (either std.ascii.toLower or std.uni.toLower, depending on which you need): if(WORD_EXT.canFind(fext[0].map!toLower()))Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; }That's how I would do it. if(["doc", "docx", "xls", "xlsx", "ppt", "pptx"].canFind(fext[0].toLower) { continue; } With the array beeing a runtime/compiletime constant with a sensible name. WORD_EXTENSIONS or something.
Aug 05 2013
On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséBoth are slow, those trying to optimize out tolower are missing the boat. A good compiler should optimize that out. The issue is that n compare are done in those cases with each successive compare getting slower and slower. if the match is doc then it happens in one compare in both cases. If it's pptx then it is 6. So if there are many pptx matches then the code will perform slower than for doc. This can be good or bad behavior depending on if you are guarantee that your order is representative of what actually happens(many doc's few pptx's). To get O(1) behavior, you must use some sort of jump list. A hash table can provide such behavior. If there are many comparisons needed then at some point this will be, on average, faster. If performance is critical and the string length is small enough then the fastest method is to use a direct lookup table. For n character of char. It is 256^n. For n = 4, this is 4GB of memory! (This is one reason to use a hash table). Since you are not using the full ascii you could compact this a bit, say 36^n, for n = 4 it is only 1.6MB of memory. The cost is packing the strings for lookup and computing the index. One way to speed up the loop is to also compare for what is not used. In this case you can compare just on the first char. if (fext[0][0]) == "d") { ... } if (fext[0][0]) == "x") { ... } if (fext[0][0]) == "p") { ... } This pares down the results. In this case it is only 3 comparisions + 2 comparisons = 5 for pptx rather than 6. You can try and develop your own hash to get optimal amortized performance.
Aug 05 2013
On Mon, Aug 05, 2013 at 07:30:36PM +0200, JS wrote:On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:[...] There's no need to reinvent the wheel here. std.regex.ctRegex will create the equivalent of jump tables for you. Besides, I doubt the OP's real performance bottleneck is in this code; sounds like the program is I/O bound. Compared to I/O, even repeatedly calling std.string.tolower repeatedly is just roundoff error in terms of total execution time. Using a profiler will reveal the *real* bottleneck. Based on past experience, I'm expecting that it will reveal that the bottleneck isn't anywhere we're predicting it is. :) T -- VI = Visual IrritationGreetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséBoth are slow, those trying to optimize out tolower are missing the boat. A good compiler should optimize that out. The issue is that n compare are done in those cases with each successive compare getting slower and slower. if the match is doc then it happens in one compare in both cases. If it's pptx then it is 6. So if there are many pptx matches then the code will perform slower than for doc. This can be good or bad behavior depending on if you are guarantee that your order is representative of what actually happens(many doc's few pptx's). To get O(1) behavior, you must use some sort of jump list. A hash table can provide such behavior. If there are many comparisons needed then at some point this will be, on average, faster.
Aug 05 2013
Le 05/08/2013 15:59, jicman a écrit :Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséLike others said, writing if( ...) continue; if(...) continue; versus if(... || ...) continue; is not the problem in the code. Computing the lower case of the string for each comparison is a greater problem that can be spotted at the first glance. Storing the value in a variable and then using this variables in comparisons would be better. Both versions of your code should be equivalent, though this should be verified. I would suggest another way of doing this: switch(std.string.tolower(fext[0])) { case "doc": case "docx": case "xls": case "xlsx": case "ppt": case "pptx": continue; default: // do something else } This way of writing your code seems far more readable and elegant to me. Even more concise: switch(std.string.tolower(fext[0])) { case "doc","docx","xls","xlsx","ppt","pptx": continue; default: // do something else } Much like what morarch_dodra proposed, I guess. But notice that tolower is deprecated in current D2 version, toLower should be used instead. If you still use D1, maybe a better thing to do before considering such optimization would be opting for D2. I would also suggest avoiding usage of continue if you don't need it. Here, it is likely that you can write something more structured like: import std.string : toLower // ... foreach(...) { switch(fext[0].toLower) { case "doc","docx","xls","xlsx","ppt","pptx": // do what is to be done here with these cases, or nothing break; default: // do something for other cases } } If you need to use fext[0].toLower in any case, I would suggest you write this kind of code instead: import std.string : toLower // ... auto ext = fext[0].toLower; switch(ext) { case "doc","docx","xls","xlsx","ppt","pptx": // do what is to be done here with these cases, or nothing // use ext instead of fext[0].toLower, if needed break; default: // do something for other cases // use ext instead of fext[0].toLower, if needed } auto ext = fext[0].toLower; should be placed before the foreach loop if fext[0] isn't changed in the loop (avoid computing a value more than one time when it is possible, though thinking before applying this rule or any other rule is not forbidden). There might exist solutions that could be faster, like testing the first letter before the second / using the length of the string / using finite state machine to save tests, but beware of the maintainability of your code and if you go into this, triple check that these solutions are indeed more efficient in terms of execution. See morarch_dodra, JS, bearophile and others posts for more precise information. If this part of your code is not known to be a bottleneck, I would opt for readability / elegance over over-optimization and for this particular case, I would trust the compiler for optimizing enough with the switch version of the code (though it is not more than a opinion here).
Aug 05 2013
On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséWhat exactly are you trying to do with this? I get the impression that there is an attempt at "local optimization" when broader approach could lead to better results. For instance. Using the OS's facilities to filter (six requests, one each for "*.doc", "*.docx") could actually end up being a lot faster. If you could give more detail about what you are trying to achieve then it could be possible to get better results.
Aug 05 2013
On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:The files are in a network drive and doing a list foreach *.doc, *.docx, etc. will be more expensive than getting the list of all the files at once and then processing them accordingly.Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséWhat exactly are you trying to do with this? I get the impression that there is an attempt at "local optimization" when broader approach could lead to better results. For instance. Using the OS's facilities to filter (six requests, one each for "*.doc", "*.docx") could actually end up being a lot faster. If you could give more detail about what you are trying to achieve then it could be possible to get better results.
Aug 06 2013
On Tuesday, 6 August 2013 at 12:32:13 UTC, jicman wrote:On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:Again, what are you trying to achieve? Your statement is not necessarily true, for a myriad of reasons, but it entirely depends on what you want to do. I would reiterate Dennis Luehring's reply, why are you not benching? It seems like you are guessing at what the problems are, that's hardly ever useful. One of the first rules of network optimization is to reduce the amount od data, that normally means filtering.at the server, the next thing is coarse grained is better than fine (BOCTAOE/L).On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:The files are in a network drive and doing a list foreach *.doc, *.docx, etc. will be more expensive than getting the list of all the files at once and then processing them accordingly.Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséWhat exactly are you trying to do with this? I get the impression that there is an attempt at "local optimization" when broader approach could lead to better results. For instance. Using the OS's facilities to filter (six requests, one each for "*.doc", "*.docx") could actually end up being a lot faster. If you could give more detail about what you are trying to achieve then it could be possible to get better results.
Aug 06 2013
On Tuesday, 6 August 2013 at 14:49:42 UTC, Andre Artus wrote:On Tuesday, 6 August 2013 at 12:32:13 UTC, jicman wrote:It's a long story and I will return in a few months and give you the whole story, but right now, time is not on my side. I have answers for all the questions you folks have asked, and I appreciate all the input. I have the answer that I was looking for, so in a few months, I will come back and explain the whole story. Thanks for all the response and suggestions. jicOn Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:Again, what are you trying to achieve? Your statement is not necessarily true, for a myriad of reasons, but it entirely depends on what you want to do. I would reiterate Dennis Luehring's reply, why are you not benching? It seems like you are guessing at what the problems are, that's hardly ever useful. One of the first rules of network optimization is to reduce the amount od data, that normally means filtering.at the server, the next thing is coarse grained is better than fine (BOCTAOE/L).On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:The files are in a network drive and doing a list foreach *.doc, *.docx, etc. will be more expensive than getting the list of all the files at once and then processing them accordingly.Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... } thanks. joséWhat exactly are you trying to do with this? I get the impression that there is an attempt at "local optimization" when broader approach could lead to better results. For instance. Using the OS's facilities to filter (six requests, one each for "*.doc", "*.docx") could actually end up being a lot faster. If you could give more detail about what you are trying to achieve then it could be possible to get better results.
Aug 06 2013
Am 07.08.2013 06:30, schrieb jicman:after makeing us girls all wet to help you - your reply is "no sex on the first date, im a gentlemen... but maybe in a few months" so: you having a jscript doing somehting with files,fileextensions over networkdrive - it runs around 8h you ported that jscript to D - now it runs for 6h you noob-guessed the lowercase-if-party could be evil (btw: it cost more time to guess then to benchmark) you get trivial answers that won't get you very much, the lowercase would not boost your speed that much and the networkdrive latency will kill all the other statemachine ideas you don't answer trivial questions about the big picture - and now you're out of time open questions: -why not collect the data on the server itself - instead of grabbing tiny bits over network? - this is for understanding your environent -how big is the speed drop with your tool on the very same drive localy and over a networkdrive? - this is for understanding the latency -are you also reading this files or just doing filename search (recursively?) and throwing out non office-extensions? this is for getting an idea if buildin OS(operating system) features can help see you in a few monthsAgain, what are you trying to achieve? Your statement is not necessarily true, for a myriad of reasons, but it entirely depends on what you want to do. I would reiterate Dennis Luehring's reply, why are you not benching? It seems like you are guessing at what the problems are, that's hardly ever useful. One of the first rules of network optimization is to reduce the amount od data, that normally means filtering.at the server, the next thing is coarse grained is better than fine (BOCTAOE/L).It's a long story and I will return in a few months and give you the whole story, but right now, time is not on my side. I have answers for all the questions you folks have asked, and I appreciate all the input. I have the answer that I was looking for, so in a few months, I will come back and explain the whole story. Thanks for all the response and suggestions.
Aug 06 2013
Just for a sanity check I implemented a quick client-server setup where the daemon takes filespec from the client and returns a line-by-line list compressed into one packet. The total running time on 8 terabytes of files stored over a dozen drives searched recursively: less than 1 minute. Same over slow WiFi, negligible difference (list compresses to a few Kb) with LZMA. I did not even bother to search each physical drive separately, just produced the list sequentially.It's a long story and I will return in a few months and give you the whole story, but right now, time is not on my side. I have answers for all the questions you folks have asked, and I appreciate all the input. I have the answer that I was looking for, so in a few months, I will come back and explain the whole story. Thanks for all the response and suggestions.after makeing us girls all wet to help you - your reply is "no sex on the first date, im a gentlemen... but maybe in a few months" so: you having a jscript doing somehting with files,fileextensions over networkdrive - it runs around 8h you ported that jscript to D - now it runs for 6h you noob-guessed the lowercase-if-party could be evil (btw: it cost more time to guess then to benchmark) you get trivial answers that won't get you very much, the lowercase would not boost your speed that much and the networkdrive latency will kill all the other statemachine ideas you don't answer trivial questions about the big picture - and now you're out of time open questions: -why not collect the data on the server itself - instead of grabbing tiny bits over network? - this is for understanding your environent-how big is the speed drop with your tool on the very same drive localy and over a networkdrive? - this is for understanding the latency -are you also reading this files or just doing filename search (recursively?) and throwing out non office-extensions? this is for getting an idea if buildin OS(operating system) features can help see you in a few monthsIt's impossible to help people who refuse to give basic information.
Aug 06 2013
On Tue, Aug 06, 2013 at 02:32:11PM +0200, jicman wrote:On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:[...]In this case, the bottleneck most likely is in the network latency, so optimizing string comparisons won't get you very far. You should instead be focusing on how to improve the performance of network accesses. What kind of processing are you doing with the filtered list of files? If it's a complicated operation, you might want to consider having a separate thread for fetching the list of files while work is being done on them in the main thread. T -- Computers shouldn't beep through the keyhole.What exactly are you trying to do with this? I get the impression that there is an attempt at "local optimization" when broader approach could lead to better results. For instance. Using the OS's facilities to filter (six requests, one each for "*.doc", "*.docx") could actually end up being a lot faster. If you could give more detail about what you are trying to achieve then it could be possible to get better results.The files are in a network drive and doing a list foreach *.doc, *.docx, etc. will be more expensive than getting the list of all the files at once and then processing them accordingly.
Aug 06 2013
On Mon, Aug 05, 2013 at 03:59:23PM +0200, jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... }[...] It would appear that your bottleneck is not in the continue statements, but in the repeated calls to std.string.tolower. It's probably better to write it this way: foreach (...) { auto ext = std.string.tolower(fext[0]); if (ext == "doc" || ext == "docx" || ... ) continue; } This way you save on the overhead of many identical function calls, which probably outweights any benefit you get by optimizing continue. T -- It's amazing how careful choice of punctuation can leave you hanging:
Aug 05 2013
On Monday, August 05, 2013 15:59:23 jicman wrote:Greetings! I have this code, foreach (...) { if (std.string.tolower(fext[0]) == "doc" || std.string.tolower(fext[0]) == "docx" || std.string.tolower(fext[0]) == "xls" || std.string.tolower(fext[0]) == "xlsx" || std.string.tolower(fext[0]) == "ppt" || std.string.tolower(fext[0]) == "pptx") continue; } foreach (...) { if (std.string.tolower(fext[0]) == "doc") continue; if (std.string.tolower(fext[0]) == "docx") continue; if (std.string.tolower(fext[0]) == "xls") continue; if (std.string.tolower(fext[0]) == "xlsx") continue; if (std.string.tolower(fext[0]) == "ppt") continue; if (std.string.tolower(fext[0]) == "pptx") continue; ... ... }As others have pointed out, the two code snippets are semantically identical, and there's a decent chance that the compiler will generate identical code for both. It wouldn't surprise me if the first snippet resulted in more concise assembly code, but there shouldn't be any performance difference between the two. But the first snippet is cleaner, so between those two, you should use that. Of course, as others have pointed out, it would be far more efficient to simply calculate std.string.tolower(fext[0]) once and re-use the result, which should be a very large performance gain here (assuming that the code around it doesn't dwarf it in cost). However, another possibility that I don't think anyone has pointed out is std.string.icmp, which will do case-insensitive comparison, meaning that you could do if(fext[0].icmp("doc") == 0 || fext[0].icmp("docx") == 0 || ...) and that might be more efficient as it avoids having to allocate a new string when fext[0] isn't all lowercase already. However, you would have to benchmark the code to be sure, since depending on the typical input, the code around this code, and how many times this code gets called, calling toLower once could be more efficient or using icmp could be more efficient, since toLower allocates whereas icmp doesn't, but icmp does more comparisons that a simple ==. But if you really care about efficiency, you should try both ways and see which is faster for your use case. - Jonathan M Davis
Aug 05 2013