digitalmars.D.learn - Can't recreate a range?
- Casey (75/75) Apr 29 2020 So, I'm trying to run some tests and I had code that looks
- Paul Backus (5/15) Apr 29 2020 The code you posted looks correct to me. If you can post a
- Casey (99/99) Apr 30 2020 Here's a minimal code example that duplicates the issue:
- Paul Backus (5/15) Apr 30 2020 Using a default value like this means that it will be shared
- Simen =?UTF-8?B?S2rDpnLDpXM=?= (7/25) Apr 30 2020 Yup, that's the one. No need to assign it at all, in fact - the
- Casey (9/13) Apr 30 2020 I'll give it a try when I get back to it (fixing lint issues),
- Casey (7/15) Apr 30 2020 O.K. It's working now, but I suspect there may still be
- Steven Schveighoffer (24/37) Apr 30 2020 I would say part of the issue is that you are doing all your work in
- Casey (14/33) Apr 30 2020 Interesting. I'll take this into account. I was putting the
- Steven Schveighoffer (29/36) Apr 30 2020 You don't necessarily have to do it in a separate function, because
- Paul Backus (7/15) Apr 30 2020 [...]
- H. S. Teoh (22/26) Apr 30 2020 Really?? IME, usually you *need* to do work in popFront instead of
- Casey (12/37) Apr 30 2020 Well, I just remembered that I had a tab open to this:
- Steven Schveighoffer (8/35) Apr 30 2020 There could be expensive operations done to construct the element that
- Paul Backus (12/30) Apr 30 2020 There are certainly cases where it can't be avoided. Filter is
- Steven Schveighoffer (32/43) Apr 30 2020 generate used to do everything on front. But that meant it wasn't a true...
- Steven Schveighoffer (13/29) Apr 30 2020 In the general scheme of things, this only makes sense for the first
- Simen =?UTF-8?B?S2rDpnLDpXM=?= (10/13) Apr 29 2020 I mean, it might be you messed up in posting this, but having an
- Casey (9/13) Apr 30 2020 I was just trying to get the example to a very minimal state. I
So, I'm trying to run some tests and I had code that looks similar to this: unittest { auto range = readStream(File("test_data/multiple.xml").byLine); int count = 0; while (!range.empty) { count++; range.popFront(); } assert(count == 3); } However, if I have a second block above/below it doing the same thing, the last assert will fail because in the second block it appears the ranges have been joined together. E.g.: unittest { auto range = readStream(File("test_data/multiple.xml").byLine); int count = 0; while (!range.empty) { count++; range.popFront(); } assert(count == 3); // Passes } unittest { auto range = readStream(File("test_data/another.xml").byLine); int count = 0; while (!range.empty) { count++; range.popFront(); } assert(count == 2); // Fails } To me, it looks like range is not being overwritten. The code for readStream looks similar to this: auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range)) { struct StreamRange(Range) { alias R = Unqual!Range; R _input; this(R input) { this._input = input; } bool empty() { return this._input.empty; } string front() { // Do stuff... } void popFront() { } } return StreamRange!(Range)(r); } I feel like I'm missing something obscure and it's driving me a bit batty. Any clue as to why this is happening? I'd like to not have to worry about creating new variable names between tests. To me, it seems like each unittest block is independent of each other and I haven't come across anything that contradicts that. However, I didn't find anything that confirms it either. Thanks.
Apr 29 2020
On Wednesday, 29 April 2020 at 20:43:20 UTC, Casey wrote:So, I'm trying to run some tests and I had code that looks similar to this:[...]I feel like I'm missing something obscure and it's driving me a bit batty. Any clue as to why this is happening? I'd like to not have to worry about creating new variable names between tests. To me, it seems like each unittest block is independent of each other and I haven't come across anything that contradicts that. However, I didn't find anything that confirms it either. Thanks.The code you posted looks correct to me. If you can post a complete example that reproduces the problem, it will be much easier for others to help you debug.
Apr 29 2020
Here's a minimal code example that duplicates the issue: import std.array, std.range, std.stdio, std.traits, std.string; auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range)) { struct StreamRange(Range) { alias R = Unqual!Range; R _input; auto buff = appender!string; this(R input) { this._input = input; } bool empty() { return this._input.empty; } string front() { if (buff.capacity == 0) { bool iterate = true; bool doCapture = false; buff.reserve(1000); while (iterate) { if (this._input.empty) break; auto value = this._input.front; if (value.strip == "<main>") { doCapture = true; buff.put(value.strip); buff.put("\n"); } else if (value.strip == "</main>") { buff.put(value.strip); doCapture = false; iterate = false; } else if (doCapture) { buff.put(value.strip); buff.put("\n"); } this._input.popFront(); } } return buff[]; } void popFront() { buff = appender!string; } } return StreamRange!(Range)(r); } unittest { auto range = readStream(File("test1.xml").byLine); int count = 0; while (!range.empty) { writeln(range.front); count++; writeln("Current count: ", count); range.popFront(); } assert(count == 1); } unittest { auto range = readStream(File("test2.xml").byLine); int count = 0; while (!range.empty) { writeln(range.front); count++; writeln("Current count: ", count); range.popFront(); } assert(count == 1); } Here are the two XML files (very simple): <?xml version="1.0"> <main> <text>Here is some text for the first file.</text> </main> <?xml version="1.0"> <main> <text>Here is some text for the second file.</text> </main> I've even tried putting wrapping the code that is being tested in a function and looping over it with different parameters (filename and expected count) without any luck either. I've also tried declaring the file handle separate and ensuring it was closed. No luck.
Apr 30 2020
On Thursday, 30 April 2020 at 13:04:47 UTC, Casey wrote:Here's a minimal code example that duplicates the issue: import std.array, std.range, std.stdio, std.traits, std.string; auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range)) { struct StreamRange(Range) { alias R = Unqual!Range; R _input; auto buff = appender!string;Using a default value like this means that it will be shared among all instances of the struct. Instead, you should set `buff = appender!string` in the constructor, so that each struct will have its own appender.
Apr 30 2020
On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:On Thursday, 30 April 2020 at 13:04:47 UTC, Casey wrote:Yup, that's the one. No need to assign it at all, in fact - the line can be replaced with Appender!string buff; And things just work. -- SimenHere's a minimal code example that duplicates the issue: import std.array, std.range, std.stdio, std.traits, std.string; auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range)) { struct StreamRange(Range) { alias R = Unqual!Range; R _input; auto buff = appender!string;Using a default value like this means that it will be shared among all instances of the struct. Instead, you should set `buff = appender!string` in the constructor, so that each struct will have its own appender.
Apr 30 2020
On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:Using a default value like this means that it will be shared among all instances of the struct. Instead, you should set `buff = appender!string` in the constructor, so that each struct will have its own appender.I'll give it a try when I get back to it (fixing lint issues), but are you sure that's the issue? In popFront, I recreate the appender. So, the appender should be clear before the empty check after it processes the last of the data from _input. Still a good thing to do as a best practice; I'm just wondering if something else is causing an issue and even if initializing the appender in the constructor solves the problem, I'd be suspect of future issues cropping up.
Apr 30 2020
On Thursday, 30 April 2020 at 15:42:03 UTC, Casey wrote:I'll give it a try when I get back to it (fixing lint issues), but are you sure that's the issue? In popFront, I recreate the appender. So, the appender should be clear before the empty check after it processes the last of the data from _input. Still a good thing to do as a best practice; I'm just wondering if something else is causing an issue and even if initializing the appender in the constructor solves the problem, I'd be suspect of future issues cropping up.O.K. It's working now, but I suspect there may still be something else going on. The more I think of it, the more I suspect there's a newline at the end of the file that isn't accounted for until after the last tag is read from the file, so the input is not empty, so my code's empty check fails. Regardless, thanks for the help!
Apr 30 2020
On 4/30/20 11:42 AM, Casey wrote:On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:I would say part of the issue is that you are doing all your work in front and not popFront. What happens is that Appender is a pointer-to-implementation struct, and the compiler will allocate the first one shared amongst all initial StreamRange instances. On your first call to front, it's going to utilize that shared one. Then on the call to popFront, it will reset it to another one. For the second unittest, in your first call to front, it notices that it's already been filled, so it doesn't do any work (and returns the existing buffer). another problem, your empty condition is based on the input, which violates range expectations. indeed, on the first call to front, the range all of a sudden becomes empty. I'd say: 1. move your work to the popFront function (you then need to call popFront once before returning the range in your factory method). 2. change empty to check if the buffer is empty instead of the input. Some other observations: 3. You don't need a further Range parameter for the nested struct, it can use the template parameter from the containing function. 4. Make it a static struct, or else it will retain a pointer to the function stack frame needlessly. -SteveUsing a default value like this means that it will be shared among all instances of the struct. Instead, you should set `buff = appender!string` in the constructor, so that each struct will have its own appender.I'll give it a try when I get back to it (fixing lint issues), but are you sure that's the issue? In popFront, I recreate the appender. So, the appender should be clear before the empty check after it processes the last of the data from _input. Still a good thing to do as a best practice; I'm just wondering if something else is causing an issue and even if initializing the appender in the constructor solves the problem, I'd be suspect of future issues cropping up.
Apr 30 2020
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:I would say part of the issue is that you are doing all your work in front and not popFront. What happens is that Appender is a pointer-to-implementation struct, and the compiler will allocate the first one shared amongst all initial StreamRange instances. On your first call to front, it's going to utilize that shared one. Then on the call to popFront, it will reset it to another one. For the second unittest, in your first call to front, it notices that it's already been filled, so it doesn't do any work (and returns the existing buffer).Interesting. I'll take this into account. I was putting the work into front because I didn't want to do the work until it was requested. Putting the work in popFront makes more sense in some ways, but the fact you have to call it before getting any records seems like it would break normal range algorithms. (Please correct me if I'm wrong) I'm wondering if putting the work into it's own method and calling it one from the constructor and from popFront the rest of the way.another problem, your empty condition is based on the input, which violates range expectations. indeed, on the first call to front, the range all of a sudden becomes empty.I was about to argue the point, but after thinking about the use of popFront to execute the work, that would work.3. You don't need a further Range parameter for the nested struct, it can use the template parameter from the containing function. 4. Make it a static struct, or else it will retain a pointer to the function stack frame needlessly.I'll definitely try that out. Regarding 3, I think that was in the example code I used, so I just went with it.
Apr 30 2020
On 4/30/20 2:03 PM, Casey wrote:Interesting. I'll take this into account. I was putting the work into front because I didn't want to do the work until it was requested. Putting the work in popFront makes more sense in some ways, but the fact you have to call it before getting any records seems like it would break normal range algorithms. (Please correct me if I'm wrong) I'm wondering if putting the work into it's own method and calling it one from the constructor and from popFront the rest of the way.You don't necessarily have to do it in a separate function, because until you return it from your factory method (the only way you can create such a thing), it technically doesn't need to be "sane". So what I mean is at the bottom of your function you have: auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range)) { ... // define your struct auto result = StreamRange(r); if(!r.empty) result.popFront(); // prime range return result; } There is a valid argument to be made about not doing any work until requested. You could do this inside front, but in that case, I would say you have an additional boolean that checks if it has ever been run, and run popFront once (setting that bool to false). But I don't necessarily like that, because that puts an extra burden on front. front is supposed to be non-mutating, called as much as you need (ideally it should be inout). Practically, you can make it do work, but really the bulk of the work should be done in popFront. Note that you will have to check for priming in popFront as well, because if you did for instance: auto r = readStream(...); r.popFront(); // skip first element? This would not do what is expected. In fact, thinking about it some more, you should be able to call popFront as many times as you want and have it skip that many elements without calling front once. Your range will not do that. -Steve
Apr 30 2020
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:I would say part of the issue is that you are doing all your work in front and not popFront.[...]I'd say: 1. move your work to the popFront function (you then need to call popFront once before returning the range in your factory method). 2. change empty to check if the buffer is empty instead of the input.Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.
Apr 30 2020
On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not. E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront. Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data. Even when the range involves some expensive per-element computation, I find that it's simpler to just compute and cache in .popFront instead of adding extra baggage to .front to know when the computation has already been performed. I'm hard-pressed to come up with an example where deferring computation to .front is a good idea! T -- A mathematician is a device for turning coffee into theorems. -- P. Erdos
Apr 30 2020
On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]Well, I just remembered that I had a tab open to this: http://ddili.org/ders/d.en/ranges.html Reading, you see the following: * empty: specifies whether the range is empty; it must return true when the range is considered to be empty, and false otherwise * front: provides access to the element at the beginning of the range * popFront(): shortens the range from the beginning by removing the first element Looking at that, if popFront shortens the range, then to me it sounds like the work should be done in popFront.Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not. E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront. Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data. Even when the range involves some expensive per-element computation, I find that it's simpler to just compute and cache in .popFront instead of adding extra baggage to .front to know when the computation has already been performed. I'm hard-pressed to come up with an example where deferring computation to .front is a good idea! T
Apr 30 2020
On 4/30/20 2:30 PM, H. S. Teoh wrote:On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]There could be expensive operations done to construct the element that aren't necessary to iterate the range, but generally you achieve this by using something like map and cache. But really the only valid use case for this is if you create a range but never use it. Some may need this, but I think a wrapper would be a better solution than enforcing a certain usage pattern. -SteveDoing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not. E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront. Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data. Even when the range involves some expensive per-element computation, I find that it's simpler to just compute and cache in .popFront instead of adding extra baggage to .front to know when the computation has already been performed. I'm hard-pressed to come up with an example where deferring computation to .front is a good idea!
Apr 30 2020
On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]There are certainly cases where it can't be avoided. Filter is one; file I/O (as Steven Schveighoffer pointed out) is another one. Obviously, you gotta do what you gotta do. Still, I think that as long as work *can* be deferred to .front, it should be. That's the essence of lazy evaluation: only do your computation once you're absolutely sure it's necessary.Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not. E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront.Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.std.range.generate is actually a perfect example of the problem with doing work in popFront. Because it has to call popFront both on construction *and* after every element, consuming n elements will call the generator function n+1 times. The final call is completely wasted.
Apr 30 2020
On 4/30/20 6:39 PM, Paul Backus wrote:On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:generate used to do everything on front. But that meant it wasn't a true range as multiple calls to front generated different data (popFront was a noop). I fixed it a while ago. It should be the status quo to do all work in popFront, and then you should wrap if you need different behavior. I'm ok with something like this (I'm kind of surprised something like this doesn't exist already): struct LazyRange(R) { R src; bool frontEvaluated; void sync() { if(!frontEvaluated) { src.popFront; frontEvaluated = true; } } auto front() { sync(); return src.front; } bool empty() { sync(); return src.empty; } void popFront() { sync(); frontEvaluated = false; } } -SteveAlso, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.std.range.generate is actually a perfect example of the problem with doing work in popFront. Because it has to call popFront both on construction *and* after every element, consuming n elements will call the generator function n+1 times. The final call is completely wasted.
Apr 30 2020
On 4/30/20 2:05 PM, Paul Backus wrote:On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:In the general scheme of things, this only makes sense for the first front access or empty access. From then on, popFront should do the work. In most cases, you can't tell a range is empty unless you've done the work to determine the next element. After you call popFront, you need to call empty before calling front, or else you may be using an invalid range. So all the work is going to be done anyway. A good example, byLine. How do you know that there is no more data until you attempt to read the next line from the stream? At which point, you have to do all the work. -Steve P.S. I see H.S. Teoh said much of the same thing, but I had already typed this out before reading it ;)I would say part of the issue is that you are doing all your work in front and not popFront.[...]I'd say: 1. move your work to the popFront function (you then need to call popFront once before returning the range in your factory method). 2. change empty to check if the buffer is empty instead of the input.Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.
Apr 30 2020
On Wednesday, 29 April 2020 at 20:43:20 UTC, Casey wrote:void popFront() { }I mean, it might be you messed up in posting this, but having an empty popFront and expecting it to do something is a tad optimistic. Apart from that, it seems like the code should do what you want it to. What's the value of count when the code asserts? I'm afeared we'll need some code that actually compiles and shows off the issue to give any more answers. -- Simen
Apr 29 2020
On Wednesday, 29 April 2020 at 22:32:00 UTC, Simen Kjærås wrote:I mean, it might be you messed up in posting this, but having an empty popFront and expecting it to do something is a tad optimistic.I was just trying to get the example to a very minimal state. I just added more descriptive code that replicates the bug.What's the value of count when the code asserts?The count ends up being the count of the previous test + the count of the current test. In the code I uploaded, the first test will get 1, but the second test will get 2. I'm printing out the output of the range to force it to actually perform the loop, so you can see that the contents of the first file are printed twice.
Apr 30 2020