www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - IAP Tools for D

reply Jakob Jenkov <jakob jenkov.com> writes:
Hi D Community,

I am currently working on a cloud project where we intend to 
reinvent a lot of the old, less-than-optimal technologies. Among 
the technologies we are working on is a new general purpose 
network protocol called IAP.

IAP comes with a general purpose binary data format called ION 
(IAP Object Notation). ION is similar to MessagePack and CBOR, 
but with a few additions. ION has a table mode which can be used 
to model tables (like CSV files) efficiently, and which can also 
be used in larger object graphs. Our early serialized length + 
performance benchmarks look promising (tables can be down to 1/5 
of JSON, and up to 2 x the speed of parsing CBOR).

ION can be used both inside IAP, but also separately with HTTP 
and in data and log files.

We already have a working toolkit in Java (we have Java 
backgrounds), but since we really find D interesting, we would 
like to make a D toolkit too.

Since we are rather new to D, would anyone be interested in 
helping us a bit out making such a library? We can probably do 
the coding ourselves, but might need some tips about how to pack 
it nicely into a D library which can be used with Dub etc.
Dec 16 2015
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 16/12/15 10:47 PM, Jakob Jenkov wrote:
 Hi D Community,

 I am currently working on a cloud project where we intend to reinvent a
 lot of the old, less-than-optimal technologies. Among the technologies
 we are working on is a new general purpose network protocol called IAP.

 IAP comes with a general purpose binary data format called ION (IAP
 Object Notation). ION is similar to MessagePack and CBOR, but with a few
 additions. ION has a table mode which can be used to model tables (like
 CSV files) efficiently, and which can also be used in larger object
 graphs. Our early serialized length + performance benchmarks look
 promising (tables can be down to 1/5 of JSON, and up to 2 x the speed of
 parsing CBOR).

 ION can be used both inside IAP, but also separately with HTTP and in
 data and log files.

 We already have a working toolkit in Java (we have Java backgrounds),
 but since we really find D interesting, we would like to make a D
 toolkit too.

 Since we are rather new to D, would anyone be interested in helping us a
 bit out making such a library? We can probably do the coding ourselves,
 but might need some tips about how to pack it nicely into a D library
 which can be used with Dub etc.
If you hop onto IRC #d Freenode, there maybe somebody from time to time that can give you a hand. Or at worst help solve some of your problems.
Dec 16 2015
parent reply Jakob Jenkov <jakob jenkov.com> writes:
 If you hop onto IRC #d Freenode, there maybe somebody from time 
 to time that can give you a hand. Or at worst help solve some 
 of your problems.
Thanks! Oh, I forgot to tell that the IAP Tools for D library will be open source, Apache 2 License.
Dec 16 2015
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 16 December 2015 at 10:08:14 UTC, Jakob Jenkov 
wrote:
 If you hop onto IRC #d Freenode, there maybe somebody from 
 time to time that can give you a hand. Or at worst help solve 
 some of your problems.
Thanks! Oh, I forgot to tell that the IAP Tools for D library will be open source, Apache 2 License.
Sounds like an interesting thing. I will lend a hand.
Dec 16 2015
parent reply Jakob Jenkov <jakob jenkov.com> writes:
 Sounds like an interesting thing. I will lend a hand.
Great! We probably won't get started until January, as we have some documentation work to do on the Java library still, and some more systematic benchmarks to run etc. We will announce it here again when we get there. A GitHub repo would suffice, right?
Dec 16 2015
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 16 December 2015 at 11:06:21 UTC, Jakob Jenkov 
wrote:
 Sounds like an interesting thing. I will lend a hand.
Great! We probably won't get started until January, as we have some documentation work to do on the Java library still, and some more systematic benchmarks to run etc. We will announce it here again when we get there. A GitHub repo would suffice, right?
yeah I think so
Dec 16 2015
prev sibling next sibling parent reply belkin <belkin yahoo.in.com> writes:
On Wednesday, 16 December 2015 at 09:47:35 UTC, Jakob Jenkov 
wrote:
 Hi D Community,
ION is similar to MessagePack and CBOR,
 but with a few additions. ION has a table mode which can be 
 used to model tables (like CSV files) efficiently, and which 
 can also be used in larger object graphs. Our early serialized 
 length + performance benchmarks look promising (tables can be 
 down to 1/5 of JSON, and up to 2 x the speed of parsing CBOR).
How does the performance of ION compare with Protocol Buffers (https://developers.google.com/protocol-buffers/?hl=en) and Apache Thrift ( https://thrift.apache.org/)?
Dec 19 2015
next sibling parent reply Jakob Jenkov <jakob jenkov.com> writes:
 How does the performance of ION compare with Protocol Buffers 
 (https://developers.google.com/protocol-buffers/?hl=en) and 
 Apache Thrift ( https://thrift.apache.org/)?
That depends on what API you use, and how much "meta data" (e.g. class names and property names) you write in the serialized ION data. ION is quite flexible about how much meta you want to include. If you remove property names and rely only the sequence of fields, ION can write faster than Google Protocol Buffers. When reading, if you only rely in the sequence of fields, ION is a bit slower than Google Protocol Buffers. All in all I believe performance will be on-par with Google Protocol Buffers. We have some benchmarks here: http://tutorials.jenkov.com/iap/ion-performance-benchmarks.html We still have a few minor optimizations to do, and more benchmarks to run, but perhaps also some validations to add etc, so the benchmarks on this page (for Java) are probably not too far off from the final numbers. Regarding Apache Avro and Thrift, I looked at them today. It seems that Avro's encoding is similar to ION (and MessagePack and CBOR), although without e.g. tables. According to Thrift's own docs their binary encoding is not compact. For compact encoding it seems they refer to Protobuf. ION has several advantages over Protobuf as a general purpose data format. ION is self describing, so you can iterate it without a schema. This means that you can do pretty fast arbitrary hierarchical navigation of an ION "file/message". Protobuf's own docs say that Protobuf is not good for large amounts of raw bytes (e.g. files). ION is capable of modeling both raw binary data (e.g. files), JSON, XML and CSV efficiently. You could even convert ION to a restricted XML format, edit it in a text editor, and convert it back to ION (we have not implemented this yet, but we have planned it). We also believe that ION can support cyclic object graphs, but this is also not fully implemented and tested yet. ION has a very compact encoding of arrays of objects in "Tables" which are similar to CSV files with only 1 header row, and N value rows. It is very common to transport arrays of object over the network, e.g. N search results from a service. Thus ION tables are a major advantage. Tables can also be used inside object graphs where an object has 0..N children (in an array). We have a comparison of ION to other data formats here: http://tutorials.jenkov.com/iap/ion-vs-other-formats.html
Dec 19 2015
next sibling parent reply Paolo Invernizzi <paolo.invernizzi no.address> writes:
On Sunday, 20 December 2015 at 01:16:46 UTC, Jakob Jenkov wrote:
 [...]
That depends on what API you use, and how much "meta data" (e.g. class names and property names) you write in the serialized ION data. ION is quite flexible about how much meta you want to include. [...]
I suggest to compare also against this [1]. The author, Kenton Varda, was the primary author of Protocol Buffers version 2, which is the version that Google released open source. [1] https://capnproto.org /Paolo
Dec 20 2015
next sibling parent Jakob Jenkov <jakob jenkov.com> writes:
 I suggest to compare also against this [1].
 The author, Kenton Varda, was the primary author of Protocol 
 Buffers version 2, which is the version that Google released 
 open source.

 [1] https://capnproto.org
Will do - at some point. Writing proper benchmarks against other frameworks / encodings takes time though. That's why we have started with MessagePack, CBOR and Google Protocol Buffers.
Dec 20 2015
prev sibling parent reply Jakob Jenkov <jakob jenkov.com> writes:
 I suggest to compare also against this [1].
 The author, Kenton Varda, was the primary author of Protocol 
 Buffers version 2, which is the version that Google released 
 open source.

 [1] https://capnproto.org
I just had a look at Cap'n Proto. From what I can see in the encoding spec, performance of ION will be comparable. Cap'n Proto claims to be "infinitely faster" than Google Protocol Buffers, but that is only if you do not pack the CP data - in which case it will transfer slower over the network. CP solves that using packing - but then you are back to serialization / deserialization, and the original promise of being "inifinitely faster" is gone. Cap'n Proto also has the "problem" that its messages require an external schema. To iterate through a Cap'n Proto file / message you must already know what data is in it (the schema). Some see this as an advantage, because it forces you to write a schema for your data structure, and you get slightly faster encoding / decoding time. And others see this is a disadvantage because you now have to import schemas, or generate code, in order to read a serialized message. You cannot just step through it like you can with e.g. XML or JSON. I tend to be in this camp - although I am not blind to the arguments in favor of external schemas. Speed matters, but so does ease-of-use. On a network protocol level I tend to disagree with the "distributed object" model. I know Capn Proto tries to explain why this model is not a problem with CP. However, fine grained communication between fine grained distributed objects *is* a performance killer in the long run, regardless of whether you "pipeline" requests. ION is intended to be the message format for our IAP network protocol. IAP will be message oriented, so you can do one-way messaging, request-response, subscriptions (e.g. to a stream), pipelining, routing of messages via intermediate nodes etc. Anyways, if you really want to use Cap'N Proto (or something else) over IAP (+ION) you can just nest a binary message inside an IAP message, and then parse it any way you like when it comes out.
Dec 20 2015
parent reply John Carter <john.carter taitradio.com> writes:
On Sunday, 20 December 2015 at 17:52:40 UTC, Jakob Jenkov wrote:
 I just had a look at Cap'n Proto. From what I can see in the 
 encoding spec, performance of ION will be comparable.
"If a disease has many treatments, it has no cure". This is certainly true for serialization protocols. The major advantage I see in Cap'n'Proto is the pipelining can do quite a lot to reduce round trip latency. (You don't have to google far to find rants pointing out that latency is often more important than bandwidth in determining throughput.) I was just reading your IAP web site, when I came across "No Stateful Communication" under the heading "What is Wrong With HTTP?". The designers of HTTP would strongly argue that is a major thing HTTP got right, and is the feature primarily responsible for it huge success. Certainly in the realm of IoT HTTP is way too heavy.... so in that domain I would reach for http://coap.technology/ The use case I keep challenging my colleagues with is.... So one end or the other dies. Or resets. Or fades and comes back. Or changes batteries. This is the IoT things. It will happen, and you will be required to recover the whole end to end system automatically without manual intervention. What is your plan? Too often the answer is... "We don't have a plan but we will have a wheel restarting the link.... umm, then a wheel resending the stuff that was lost in the link buffers when the link went down.... and a, errrr, maybe wheel restarting Everything when we realise the other side has lost it's state about our connection. And in practice the only wheel that works is shutting everything down and restarting everything up. Suddenly "No stateful communication" is looking really really Good. Coap clearly has thought these issues through.
Dec 20 2015
parent reply Jakob Jenkov <jakob jenkov.com> writes:
 The designers of HTTP would strongly argue that is a major 
 thing HTTP got right, and is the feature primarily responsible 
 for it huge success.
Then why is HTTP 2 moving away from it? And Web Sockets? Clearly, having the choice between keeping state and not keeping state is preferable to HTTP taking that choice away from you. Lots of apps also spend quite an effort to mimic stateful communication on top of HTTP. Sessions? Authentication tokens? Cookies? Caching in the browser? HTML5 Local Storage? No, HTTP did not get "stateless" right. Your "fix-the-network" problem is definitely valid. At this point we have mostly focused on ION - the binary object / message format for IAP. However, we have a pretty good idea about how IAP will work on a conceptual level. IAP will have a set of "semantic protocols". Each semantic protocol can address its own area of concern. File exchange, time, RPC, distributed transactions, P2P, streaming etc. You can also define your own semantic protocol to address exactly your specific situation (e.g. the Byzantine Generals Problem - distributed consensus). Everything is not yet in place - but we will get there step by step.
Dec 20 2015
parent Joakim <dlang joakim.fea.st> writes:
On Sunday, 20 December 2015 at 21:37:35 UTC, Jakob Jenkov wrote:
 The designers of HTTP would strongly argue that is a major 
 thing HTTP got right, and is the feature primarily responsible 
 for it huge success.
Then why is HTTP 2 moving away from it? And Web Sockets? Clearly, having the choice between keeping state and not keeping state is preferable to HTTP taking that choice away from you. Lots of apps also spend quite an effort to mimic stateful communication on top of HTTP. Sessions? Authentication tokens? Cookies? Caching in the browser? HTML5 Local Storage? No, HTTP did not get "stateless" right.
Yep, the whole stateless argument is a complete joke, it has not been true except maybe in the very beginning. HTTP 2 is a huge step forward for this, its binary encoding, and other reasons.
 Your "fix-the-network" problem is definitely valid.

 At this point we have mostly focused on ION - the binary object 
 / message format for IAP.
 However, we have a pretty good idea about how IAP will work on 
 a conceptual
 level.

 IAP will have a set of "semantic protocols". Each semantic 
 protocol can address
 its own area of concern. File exchange, time, RPC, distributed 
 transactions,
 P2P, streaming etc.

 You can also define your own semantic protocol to address 
 exactly your specific
 situation (e.g. the Byzantine Generals Problem - distributed 
 consensus).

 Everything is not yet in place - but we will get there step by 
 step.
Interesting effort, I'll check it out.
Dec 21 2015
prev sibling parent reply David Nadlinger <code klickverbot.at> writes:
On Sunday, 20 December 2015 at 01:16:46 UTC, Jakob Jenkov wrote:
 According to Thrift's own docs their binary encoding is not 
 compact. For compact encoding it seems they refer to Protobuf.
There seems to be a confusion of terminology here. Thrift has a "Binary" protocol, which is not compact in the sense that it consists of the data fields more or less blitted into a message. There is also a "Compact" protocol, which is also a binary format, but employs things like variable-length integers to reduce size – similar to Protobuf. — David
Dec 20 2015
parent Jakob Jenkov <jakob jenkov.com> writes:
On Sunday, 20 December 2015 at 19:16:19 UTC, David Nadlinger 
wrote:
 On Sunday, 20 December 2015 at 01:16:46 UTC, Jakob Jenkov wrote:
 According to Thrift's own docs their binary encoding is not 
 compact. For compact encoding it seems they refer to Protobuf.
There seems to be a confusion of terminology here. Thrift has a "Binary" protocol, which is not compact in the sense that it consists of the data fields more or less blitted into a message. There is also a "Compact" protocol, which is also a binary format, but employs things like variable-length integers to reduce size – similar to Protobuf. — David
Thanks for the clarification! I couldn't really make out from the Thrift website if they had their own compact protocol, or switched to Protobuf. But now you say that they do have their own compact protocol. Now I know that.
Dec 20 2015
prev sibling parent Jakob Jenkov <jakob jenkov.com> writes:
 How does the performance of ION compare with Protocol Buffers 
 (https://developers.google.com/protocol-buffers/?hl=en) and 
 Apache Thrift ( https://thrift.apache.org/)?
Oh - one final thing: If you *really* want speed you should not parse ION into objects before using the data. Since ION is self describing, you can just navigate through it and find the data you need, and ignore the rest. This should be faster than first parsing the data into objects first. Especially if you parse an array of objects which may end up scattered all over the heap, and thus lead to cache misses. Accessing these objects directly in the message buffer might save you both the ION-to-object parse time, plus it might play better with the L1, L2 and L3 caches. We have not yet benchmarked this, but we will within long. In this mode I expect the read+use time to be faster than Google Protocol Buffers.
Dec 19 2015
prev sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 16 December 2015 at 09:47:35 UTC, Jakob Jenkov 
wrote:
 Since we are rather new to D, would anyone be interested in 
 helping us a bit out making such a library? We can probably do 
 the coding ourselves, but might need some tips about how to 
 pack it nicely into a D library which can be used with Dub etc.
Be sure to look at how MsgPack is implemented in D: https://github.com/msgpack/msgpack-d It has a very easy interface, and is one of the better D library out there.
Dec 22 2015