digitalmars.D - Streaming parsers in D a novel design
- Dmitry Olshansky (11/11) Apr 18 2023 A streaming parser is doing 2 things:
- Petar Kirov [ZombineDev] (18/29) Apr 18 2023 Sounds like a natural fit! I think it's interesting to explore
- Sebastiaan Koppe (4/5) Apr 18 2023 For an implementation in D, see:
- H. S. Teoh (19/30) Apr 18 2023 [...]
- Steven Schveighoffer (7/10) Apr 18 2023 This is exactly how jsoniopipe works (though the input is not a "range"
- Walter Bright (3/6) Apr 18 2023 The lexer produces a stream of tokens, not the parser. The output of the...
- Sebastiaan Koppe (3/14) Apr 18 2023 I think it will work beautifully until the day you want to do it
- Dmitry Olshansky (3/24) Apr 18 2023 Photon takes care of that:
- Jacob Shtokolov (10/17) Apr 19 2023 Ranges can be async, in this case, you just need to return the
- Sebastiaan Koppe (4/11) Apr 19 2023 Maybe. But not without major hurdles. And you forgot (async)
- Dmitry Olshansky (6/18) Apr 19 2023 Cancelation is trivial, you can break a fiber at any safe point
- Sebastiaan Koppe (10/16) Apr 19 2023 Interrupting a fiber and cancelling work aren't the same thing.
- Dmitry Olshansky (4/20) Apr 19 2023 I think your structured concurrency stuff would fit nicely in
- Sebastiaan Koppe (18/20) Apr 20 2023 I actually considered going with photon before starting to
- Adam D Ruppe (7/8) Apr 19 2023 I recently wrote a new stream class that works kinda like this
- Dmitry Olshansky (4/13) Apr 19 2023 Cool stuff, as usual!
- ikod (23/31) Apr 20 2023 This is how `requests` handle input stream from the remote end
A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range. Destroy! -- Olshansky Dmitry
Apr 18 2023
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range. Destroy! -- Olshansky DmitrySounds like a natural fit! I think it's interesting to explore related concepts that other languages have like their versions of iterators/ranges, but also streams, observables, interaction with promises (observables vs async iterables). Another related topic is validation vs parsing (converting data from unstructured input to a more structured form). I don't have time to discuss in detail at the moment, so I'll just share a few links. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ https://serokell.io/blog/parser-combinators-in-haskell http://csl.stanford.edu/~christos/pldi2010.fit/meijer.duality.pdf https://www.youtube.com/watch?v=sTSQlYX5DU0&ab_channel=ReactConference https://dev.solita.fi/2021/10/14/grokking-clojure-transducers.html https://www.youtube.com/watch?v=6mTbuzafcII https://hypirion.com/musings/recursive-transducers https://hypirion.com/musings/haskell-transducers https://www.youtube.com/watch?v=vohGJjGxtJQ&ab_channel=CppCon https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2300r5.html
Apr 18 2023
On Tuesday, 18 April 2023 at 13:01:40 UTC, Petar Kirov [ZombineDev] wrote:https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2300r5.htmlFor an implementation in D, see: https://github.com/symmetryinvestments/concurrency
Apr 18 2023
On Tue, Apr 18, 2023 at 08:13:05AM +0000, Dmitry Olshansky via Digitalmars-d wrote:A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range.[...] In practice, the input would be an input range consumed by the parser which returns an input range of tokens. IOW the parser is a filter that transforms an input range of characters into a range of tokens. You'd only need an output range interface if you needed to explicitly feed characters to the parser, such as if you were using an async I/O API and you just received a block of fresh input, and now you need to stuff the input into the parser. But even then, I wouldn't do it this way, because on the other end, the input range wouldn't be able to return a result if, say, the next block of input hasn't arrived yet. So .empty would have to block, which is an ugly design for something with an input range API. At the end of the day, I think treating a parser as a transformation (range of char -> range of tokens) is a better abstraction than trying to shoehorn it into something involved output ranges. T -- Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln
Apr 18 2023
On 4/18/23 7:24 AM, H. S. Teoh wrote:At the end of the day, I think treating a parser as a transformation (range of char -> range of tokens) is a better abstraction than trying to shoehorn it into something involved output ranges.This is exactly how jsoniopipe works (though the input is not a "range" but an iopipe of course). The resulting thing is essentially a "range", but not with the range functions. The implementation doesn't quite fit the range paradigm. https://github.com/schveiguy/jsoniopipe -Steve
Apr 18 2023
On 4/18/2023 7:24 AM, H. S. Teoh wrote:In practice, the input would be an input range consumed by the parser which returns an input range of tokens. IOW the parser is a filter that transforms an input range of characters into a range of tokens.The lexer produces a stream of tokens, not the parser. The output of the parser is an AST.
Apr 18 2023
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range. Destroy! -- Olshansky DmitryI think it will work beautifully until the day you want to do it asynchronously.
Apr 18 2023
On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:Photon takes care of that: https://github.com/DmitryOlshansky/photonA streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range. Destroy! -- Olshansky DmitryI think it will work beautifully until the day you want to do it asynchronously.
Apr 18 2023
On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range. It will look like an infinite range, but not fully infinite: it returns "success" once the result is ready. The sync version will always return `error|success` without the 'pending' phase. That kind of interface is similar to Rust's futures, providing poll-based primitives with some wakeup hints (callbacks).A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out EventsI think it will work beautifully until the day you want to do it asynchronously.
Apr 19 2023
On Wednesday, 19 April 2023 at 16:07:18 UTC, Jacob Shtokolov wrote:On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:Maybe. But not without major hurdles. And you forgot (async) cancellation, which is a big deal, ask Rust.I think it will work beautifully until the day you want to do it asynchronously.Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range.
Apr 19 2023
On Wednesday, 19 April 2023 at 19:13:25 UTC, Sebastiaan Koppe wrote:On Wednesday, 19 April 2023 at 16:07:18 UTC, Jacob Shtokolov wrote:Cancelation is trivial, you can break a fiber at any safe point (where it yields) -- Dmitry OlshanskyOn Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:Maybe. But not without major hurdles. And you forgot (async) cancellation, which is a big deal, ask Rust.I think it will work beautifully until the day you want to do it asynchronously.Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range.
Apr 19 2023
On Wednesday, 19 April 2023 at 19:50:35 UTC, Dmitry Olshansky wrote:Cancelation is trivial, you can break a fiber at any safe point (where it yields)Interrupting a fiber and cancelling work aren't the same thing. The latter might involve some cleanup, which itself might be async. Quote from Rust's wg-async https://rust-lang.github.io/wg-async/vision/roadmap/scopes.html#cancellation:In today's Rust, any async function can be synchronously cancelled at any await point: the code simply stops executing, and destructors are run for any extant variables. This leads to a lot of bugs.Also a blog post on uring and rust, https://without.boats/blog/io-uring/ Suffice to say its tricky.
Apr 19 2023
On Wednesday, 19 April 2023 at 20:46:23 UTC, Sebastiaan Koppe wrote:On Wednesday, 19 April 2023 at 19:50:35 UTC, Dmitry Olshansky wrote:I think your structured concurrency stuff would fit nicely in photon, maybe we should get in touch and discuss it.Cancelation is trivial, you can break a fiber at any safe point (where it yields)Interrupting a fiber and cancelling work aren't the same thing. The latter might involve some cleanup, which itself might be async. Quote from Rust's wg-async https://rust-lang.github.io/wg-async/vision/roadmap/scopes.html#cancellation:In today's Rust, any async function can be synchronously cancelled at any await point: the code simply stops executing, and destructors are run for any extant variables. This leads to a lot of bugs.Also a blog post on uring and rust, https://without.boats/blog/io-uring/ Suffice to say its tricky.
Apr 19 2023
On Thursday, 20 April 2023 at 00:17:05 UTC, Dmitry Olshansky wrote:I think your structured concurrency stuff would fit nicely in photon, maybe we should get in touch and discuss it.I actually considered going with photon before starting to implement C++'s Senders/Receivers. Its an awesome idea, but what held me back was the portability concerns around fibers. WebAssembly doesn't support them for instance. We also needed support for async algorithms. Racing TCP connections is a classic example, where you initiate 2 connections in parallel and continue with whoever responds first. Key is doing that in a structured way, so there is no possibility of leaking something. There is probably a way we could integrate the 2 libs, I already have a MR for supporting D fibers, so we can probably extend it to whatever executor photon has. I bet there is a way to autowrap any regular blocking libraries that way, and get both structured concurrency and async syscalls. Interesting.
Apr 20 2023
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:This maps to D beautifully:I recently wrote a new stream class that works kinda like this and thought about doing put() and front etc, but I wanted heterogeneous types. Still, it kinda did work. https://github.com/adamdruppe/arsd/blob/master/core.d#L5040 there's the test rn to show how it works
Apr 19 2023
On Wednesday, 19 April 2023 at 16:15:53 UTC, Adam D Ruppe wrote:On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:Cool stuff, as usual! -- Dmitry OlshanskyThis maps to D beautifully:I recently wrote a new stream class that works kinda like this and thought about doing put() and front etc, but I wanted heterogeneous types. Still, it kinda did work. https://github.com/adamdruppe/arsd/blob/master/core.d#L5040 there's the test rn to show how it works
Apr 19 2023
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:A streaming parser is doing 2 things: - waiting for input - processing the current input and spitting out Events This maps to D beautifully: 1. Waiting for input means it's an OutputRange with explicit put method! 2. Processing events means it's an InputRange of events. It may even be ForwardRange, mine first of this kind is Forward range.This is how `requests` handle input stream from the remote end and convert it to "input stream" of http response body chunks. Input is a stream of bytes, arbitrary splitted by remote end or by network, etc. Moreover,response content can be gzipped, chunk-encoded and whatever else. So you have to process each network input with several "processors", like "inflate", "chunk-encoded decoder" and so on. +------------+ +--------------+ +---------+ +-----------------+ |input range | | chunk-encoded| | gzip | | input range | Net->-| of byte[] |->-| decoder |->-| decoder |->-| of response body|->- App | | | | | | | chunks: byte[] | +------------+ +--------------+ +---------+ +-----------------+ There is some problems like push all unprocessed data to the right end of the picture at the end of the network stream and maybe some other... But essentially it works. Immutability of underlying data on every step of this processing stream helps very much.
Apr 20 2023