digitalmars.D - A Class in Composition
- Chris (48/48) Aug 27 2013 I had a very simple method that would read a text file, parse it
- bearophile (22/29) Aug 27 2013 Two ways to reduce "boilerplate code" for that code:
- John Colvin (10/17) Aug 27 2013 If you've got a lot of repeated template constraints, simply
- H. S. Teoh (72/126) Aug 28 2013 Yep, this is a typical structure conflict caused by the mismatch between
I had a very simple method that would read a text file, parse it and create a lexicon of the type string[string]. public void loadLexicon(ref string[string] lex, string src) It was high time I made the method more sophisticated and flexible (allowing for comments in the source file, checking for formatting errors etc). Because neither the comment bit (C-style line comments "//") nor the formatting check are too demanding I decided to do it in the existing loop that looked something like: while (file.readln(buf)) { line = to!string(buf); // Do something with line } Soon, very soon indeed, I ran into all the problems associated with loops, which basically boils down to "Where the f.. am I now?". So I said to myself that component programming was the perfect fit for this kind of problem. I rearranged the program and now I have one line in the function that does the job: public void loadLexicon(ref string[string] lex, string src) { // ... // arr is of type string[] and holds the lines of the source file. auto dictionary = Lexicon(); // The output range lex = arr.byCommentFilter().byEntry().copy(dictionary).lexicon; } It's very nice indeed. The beauty of it is not only that I now have a nice, sequentially ordered "one-liner", the outsourcing of checks and filters freed my head from the loop logic, which helped me to focus on the respective algorithms. This lead to a leaner and cleaner implementation of each algorithm, because there are no dependecies on loop conditions or the position in the loop. I could easily remove the comment filter, if the deployment version of the program ships with tidied up source files, or I could add a new component if necessary. In a loop, however trivial it may appear at first glance, it would not be so simple to add or remove parts of the logic. One drawback is, that using ranges creates some overheads and code duplication. But the neatness of it is amazing, and I daresay that a lot of bugs are hidden in loops simply because the loop logic distracts and confuses the programmer and bugs finally find loopholes they can slip through. Another drawback is that ranges demand a lot of boilerplate code. If properly implemented, there is a lot of code you have to write over and over again such as if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) I don't know if this could possibly be reduced. Or maybe it's just my lack of experience.
Aug 27 2013
Chris:Another drawback is that ranges demand a lot of boilerplate code. If properly implemented, there is a lot of code you have to write over and over again such as if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) I don't know if this could possibly be reduced. Or maybe it's just my lack of experience.Two ways to reduce "boilerplate code" for that code: One way to reduce "boilerplate code" is to introduce in D a yield!int inverseGrayCode() { yield 0; foreach (x; inverseGrayCode()) if (x & 1) { yield 2 * x + 1; yield 2 * x; } else { if (x) yield 2 * x; yield 2 * x + 1; } } The yield keyword could also be used to denote a range of a type, to reduce this code of yours: if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) Bye, bearophile
Aug 27 2013
On Tuesday, 27 August 2013 at 14:13:25 UTC, Chris wrote:Another drawback is that ranges demand a lot of boilerplate code. If properly implemented, there is a lot of code you have to write over and over again such as if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) I don't know if this could possibly be reduced. Or maybe it's just my lack of experience.If you've got a lot of repeated template constraints, simply moving them out in to a separate template can be useful e.g. template isIRofIRofT(Range, T) { enum isRoRoT = isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == T); } if(isIRofIRofT!(Range, MyType))
Aug 27 2013
On Tue, Aug 27, 2013 at 04:13:24PM +0200, Chris wrote:I had a very simple method that would read a text file, parse it and create a lexicon of the type string[string]. public void loadLexicon(ref string[string] lex, string src) It was high time I made the method more sophisticated and flexible (allowing for comments in the source file, checking for formatting errors etc). Because neither the comment bit (C-style line comments "//") nor the formatting check are too demanding I decided to do it in the existing loop that looked something like: while (file.readln(buf)) { line = to!string(buf); // Do something with line } Soon, very soon indeed, I ran into all the problems associated with loops, which basically boils down to "Where the f.. am I now?".Yep, this is a typical structure conflict caused by the mismatch between the lines in a file and the structures they represent. :)So I said to myself that component programming was the perfect fit for this kind of problem. I rearranged the program and now I have one line in the function that does the job: public void loadLexicon(ref string[string] lex, string src) { // ... // arr is of type string[] and holds the lines of the source file. auto dictionary = Lexicon(); // The output range lex = arr.byCommentFilter().byEntry().copy(dictionary).lexicon; } It's very nice indeed. The beauty of it is not only that I now have a nice, sequentially ordered "one-liner", the outsourcing of checks and filters freed my head from the loop logic, which helped me to focus on the respective algorithms. This lead to a leaner and cleaner implementation of each algorithm, because there are no dependecies on loop conditions or the position in the loop.I usually separate out the loop body into smaller functions that handle each part of the parsing. Since code is the most straightforward when its structure matches that of the data, this usually means the outer loop no longer iterates over lines, but over larger, logical structures (e.g., an entry in your lexicon). That way, your code becomes something like: void loadLexicon(...) { auto dictionary = Lexicon(); do { dictionary.put(parseEntry(input, ...)); } while (!input.empty); } Entry parseEntry(...) { skipDelimitingBlankLines(...); auto key = parseHeadword(...); auto value = parseBody(...); return Entry(key, value); } Key parseHeadword(...) { ... } Value parseBody(...) { ... } This way, your code structure has 1-to-1 correspondence with the format of your input, which makes it far more readable and less bug-prone. Of course, we can then take this to the next level, which is to encapsulate each of these functions into a range-based component, and so we end up with your nice one-line function. :)I could easily remove the comment filter, if the deployment version of the program ships with tidied up source files, or I could add a new component if necessary. In a loop, however trivial it may appear at first glance, it would not be so simple to add or remove parts of the logic.Yeah, when loops get beyond the simplest incarnations, their complexity quickly explodes into something unmanageable, usually resulting in tight coupling between distant parts of the code. That makes it very hard (or outright impossible) to add/remove parts.One drawback is, that using ranges creates some overheads and code duplication. But the neatness of it is amazing, and I daresay that a lot of bugs are hidden in loops simply because the loop logic distracts and confuses the programmer and bugs finally find loopholes they can slip through.Overly complex loops make a programmer go loopy and bugs crawl out from loopholes... ;-) As for overheads, I wonder if, given enough experience over time, we can one day identify common patterns that the compiler can take advantage of and optimize into more efficient code. For example, if the compiler recognizes a linear composition of range-based components, it could transform them into optimized nested loops that eliminate some of the overhead involved. I'm not sure how feasible this is with the current DMD implementation, though. But it's certainly something to keep in mind for the future.Another drawback is that ranges demand a lot of boilerplate code. If properly implemented, there is a lot of code you have to write over and over again such as if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) I don't know if this could possibly be reduced. Or maybe it's just my lack of experience.You can encapsulate this into a template: template isRoR(R, ElemType) { enum isRoRMyType = isInputRange!R && isInputRange!(ElementType!R) && is(ElementType!(ElementType!R) == ElemType); } // Or, once 2.064 is out: enum isRoRMyType(R) = isInputRange!R && isInputRange!(ElementType!R) && is(ElementType!(ElementType!R) == ElemType); // Now the sig constraint is much more readable and easier to // type. auto myFunc(R)(R range) if (isRoR!(R, MyType)) { ... } Another source of boilerplate is in repeatedly defining .empty, .front, and .popFront. Every single input range, unless it can be defined as a composition of existing Phobos ranges, must have this boilerplate: struct MyRange(T) { property bool empty() { ... } property T front() { ... } void popFront() { ... } } And once you need a forward range, you need to also add: property typeof(this) save() { ... } There might be a way to reduce the amount of boilerplate by using mixins or factory functions, though. T -- When solving a problem, take care that you do not become part of the problem.
Aug 28 2013
On Wednesday, 28 August 2013 at 18:30:11 UTC, H. S. Teoh wrote:As for overheads, I wonder if, given enough experience over time, we can one day identify common patterns that the compiler can take advantage of and optimize into more efficient code. For example, if the compiler recognizes a linear composition of range-based components, it could transform them into optimized nested loops that eliminate some of the overhead involved. I'm not sure how feasible this is with the current DMD implementation, though. But it's certainly something to keep in mind for the future.Thanks for your comments. I was thinking the same thing, i.e. that maybe future implementations of the compiler can optimize the overhead away. If CP is THE major thing in D (and it makes perfect sense that it is), then the compiler (and syntax) will adapt to it over time, I guess (and hope). I'll try to introduce CP/Ranges where ever it makes sense from now on. It was unbelievable how well the code worked once I could free myself from the loop. And I can take the components and use them "as is" in a separate program I use to add entries to the lexicon source files. I'm only wondering how to integrate the new pattern properly into an otherwise OO framwork. I seriously wonder whether CP/Ranges will make OO obsolete. After all, structs are "classy".
Aug 29 2013
I re-used the components (see above) in a different program (again a parser with different needs) and I ended up with more or less the same one-liner. auto dictionary = Lexicon(); // output range string[string] lex = arr.byEntry().copy(dictionary).lexicon; The only change necessary was the logic bit in the Entry struct (two lines had to be changed). I could emit the byComment() bit, because there are no comments in the source, without any complaints or side effects. Great stuff!
Aug 30 2013