digitalmars.D.learn - string-ish range/stream from curl ubyte[] chunks?
- Vlad (27/27) May 16 2014 Hello D programmers,
- Steven Schveighoffer (20/44) May 16 2014 There is an effort by myself and Dmitry Olshansky to create a stream API...
- Vlad (14/45) May 16 2014 Thanks Steve for your prompt reply. This is exactly why I asked
- Steven Schveighoffer (10/56) May 16 2014 Most likely. I would expect a curl-based stream to fit right in, it's ju...
Hello D programmers, I am toying with writing my own HTML parser as a pet project, and I strive to have a range API for the tokenizer and the parser output itself. However it occurs to me that in real-life browsers the advantage of this type of 'streaming' parsing would be given by also having the string that plays as input to the tokenizer treated as a 'stream'/'range'. While D's *string classes do play as ranges, what I want to write is a 'ChunkDecoder' range that would take curl 'byChunk' output and make it consumable by the tokenizer. Now, the problem: string itself has ElementType!string == dchar. Consuming a string a dchar at a time looks like a wasteful operation if e.g. your string is UTF-8 or UTF-16. So, naturally, I would like to use indexOf() - instead of countUntil() - and opSlice (without opDollar?) on my ChunkDecoder (forward) range. Q: Is anything like this already in use somewhere in the standard library or a project you know? Q2: Or do you have any pointers for what the smallest API would be for a string-like range class? And bonus: Q3: any uses of such a string-ish range in other standard library methods that you can think of and could be contributed to? e.g. suppose this doesn't exist and I / we come up with a proposal of minimal API to consume a string from left to right. Thanks for your time and your suggestions!
May 16 2014
On Fri, 16 May 2014 16:57:41 -0400, Vlad <b100dian gmail.com> wrote:Hello D programmers, I am toying with writing my own HTML parser as a pet project, and I strive to have a range API for the tokenizer and the parser output itself. However it occurs to me that in real-life browsers the advantage of this type of 'streaming' parsing would be given by also having the string that plays as input to the tokenizer treated as a 'stream'/'range'. While D's *string classes do play as ranges, what I want to write is a 'ChunkDecoder' range that would take curl 'byChunk' output and make it consumable by the tokenizer. Now, the problem: string itself has ElementType!string == dchar. Consuming a string a dchar at a time looks like a wasteful operation if e.g. your string is UTF-8 or UTF-16. So, naturally, I would like to use indexOf() - instead of countUntil() - and opSlice (without opDollar?) on my ChunkDecoder (forward) range. Q: Is anything like this already in use somewhere in the standard library or a project you know?There is an effort by myself and Dmitry Olshansky to create a stream API that looks like a range. I am way behind on getting it to work, but I have something that compiles. The effort is to replace the underlying mechanism for std.stdio (optionally), and to replace std.streamQ2: Or do you have any pointers for what the smallest API would be for a string-like range class?I think Dmitry has a pretty good API. I will hopefully be posting my prototype soon. I hate to say wait for it, because I have been very lousy at getting things finished lately. But I want to have something to show before the conference. The code I have will support all encodings, and provide a range API that works with dchar-like ranges. The idea is to be able to make code that works with both arrays and streams seamlessly.And bonus: Q3: any uses of such a string-ish range in other standard library methods that you can think of and could be contributed to? e.g. suppose this doesn't exist and I / we come up with a proposal of minimal API to consume a string from left to right.I hate for you to duplicate efforts, hold off until we get something workable. Then we can discuss the API. Dmitry's message is here: http://forum.dlang.org/post/l9q66g$2he3$1 digitalmars.com My updates have not been posted yet to github, I don't want to post half-baked code yet. Stay tuned. -Steve
May 16 2014
On Friday, 16 May 2014 at 21:35:04 UTC, Steven Schveighoffer wrote:On Fri, 16 May 2014 16:57:41 -0400, Vlad <b100dian gmail.com> wrote:Thanks Steve for your prompt reply. This is exactly why I asked on the forums, since it was hard for me to believe I was the only one thinking of this. I would also hate to duplicate the effort, so I'll just code my parser against string and wait to see how your proposal and Dimitry's (I did checked his post, and sounds EXACTLY like the problem I was facing with my toy parser!). Just to make one thing clear: would this future module work with e.g. the ubyte[] chunks I receive from curl? Thanks! p.s. Is this the talk? http://dconf.org/2014/talks/olshansky.htmlQ: Is anything like this already in use somewhere in the standard library or a project you know?There is an effort by myself and Dmitry Olshansky to create a stream API that looks like a range. I am way behind on getting it to work, but I have something that compiles. The effort is to replace the underlying mechanism for std.stdio (optionally), and to replace std.streamQ2: Or do you have any pointers for what the smallest API would be for a string-like range class?I think Dmitry has a pretty good API. I will hopefully be posting my prototype soon. I hate to say wait for it, because I have been very lousy at getting things finished lately. But I want to have something to show before the conference. The code I have will support all encodings, and provide a range API that works with dchar-like ranges. The idea is to be able to make code that works with both arrays and streams seamlessly.And bonus: Q3: any uses of such a string-ish range in other standard library methods that you can think of and could be contributed to? e.g. suppose this doesn't exist and I / we come up with a proposal of minimal API to consume a string from left to right.I hate for you to duplicate efforts, hold off until we get something workable. Then we can discuss the API. Dmitry's message is here: http://forum.dlang.org/post/l9q66g$2he3$1 digitalmars.com My updates have not been posted yet to github, I don't want to post half-baked code yet. Stay tuned. -Steve
May 16 2014
On Fri, 16 May 2014 18:36:02 -0400, Vlad <b100dian gmail.com> wrote:On Friday, 16 May 2014 at 21:35:04 UTC, Steven Schveighoffer wrote:Most likely. I would expect a curl-based stream to fit right in, it's just passing in bytes. One piece that I haven't quite fleshed out is how to drive the process. In some cases, you are pulling data from the source (traditional stream-based I/O), in other cases, something else is pushing the data (CURL). We need to handle both seamlessly. I admit I have never looked at D's curl package, just used it via C/C++.On Fri, 16 May 2014 16:57:41 -0400, Vlad <b100dian gmail.com> wrote:Just to make one thing clear: would this future module work with e.g. the ubyte[] chunks I receive from curl?Q: Is anything like this already in use somewhere in the standard library or a project you know?There is an effort by myself and Dmitry Olshansky to create a stream API that looks like a range. I am way behind on getting it to work, but I have something that compiles. The effort is to replace the underlying mechanism for std.stdio (optionally), and to replace std.streamQ2: Or do you have any pointers for what the smallest API would be for a string-like range class?I think Dmitry has a pretty good API. I will hopefully be posting my prototype soon. I hate to say wait for it, because I have been very lousy at getting things finished lately. But I want to have something to show before the conference. The code I have will support all encodings, and provide a range API that works with dchar-like ranges. The idea is to be able to make code that works with both arrays and streams seamlessly.And bonus: Q3: any uses of such a string-ish range in other standard library methods that you can think of and could be contributed to? e.g. suppose this doesn't exist and I / we come up with a proposal of minimal API to consume a string from left to right.I hate for you to duplicate efforts, hold off until we get something workable. Then we can discuss the API. Dmitry's message is here: http://forum.dlang.org/post/l9q66g$2he3$1 digitalmars.com My updates have not been posted yet to github, I don't want to post half-baked code yet. Stay tuned. -Stevep.s. Is this the talk? http://dconf.org/2014/talks/olshansky.htmlThat is Dmitry's talk, from the same guy. But I think this is not about his I/O ideas, but his excellent std.regex package. -Steve
May 16 2014