digitalmars.D - [GSoC] Improved FlatBuffers and/or Protobuf Support ~ Binary

Ahmet Sait (116/116) Mar 28 2019 Hi,

Dragos Carp (58/99) Mar 29 2019 Hi Ahmet,

Ahmet Sait (24/129) Apr 01 2019 It doesn't immediately strike me as complicated and
Ahmet Sait (2/2) Apr 04 2019 https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25I...

Jacob Carlborg (12/16) Apr 04 2019 I think "Contributing D support to the upstream repositories" might be a...

Ahmet Sait (11/24) Apr 04 2019 That's what I thought too, but at least I want the project in a

Dragos Carp (2/4) Apr 04 2019 I added some comments directly in the document.

Ahmet Sait (3/8) Apr 09 2019 Unless there is some final touchs necessary, I'm about to submit

Jacob Carlborg (8/9) Apr 01 2019 I would love that. FlatBuffers or Protobuf could be one of the backends....

Ahmet Sait (7/15) Apr 01 2019 Thanks for the feedback! I decided I should gather some

Jacob Carlborg (4/6) Apr 02 2019 Sure, please do.

Ahmet Sait <nightmarex1337 hotmail.com> writes:

Hi,
I've been thinking about working on binary serialization as my
potential GSoC
project. It's originally one of the older entries in GSoC ideas
page [0]. I
think D is pretty much cut out for this kind of task and
serialization is a
topic I'm rather interested in so hopefully it will be a great
candidate.

My name is Ahmet Sait Koçak, currently studying Computer Science
in Turkey. I
first met with D way back in high school which in itself is an
interesting
story.

I've been introduced to programming in first year in high school
with C for

continued
coding as a hobby, having lots of fun. One of my biggest projects
was LF2 IDE
[1] - a modding tool for the game LF2. It didn't take long for me
to fall in
love with open source since that project made use of several OSS
libraries
itself which led me to using version control, embracing git and
open sourcing
nearly anything I code on GitHub from there on.

Fast forward 4 years I was hitting walls trying to do low level

and the fact that bytecode compiled languages being too easy to
reverse
engineer was hindering my motivation to do anything commercial
with them. I
never liked C++ but gave it another try telling myself "come on
it's not that
bad" but failed miserably, there had to be a better way. Besides,
I was
already crafting my dream language in my head.

One day, I sat in front of my computer and thought "I bet there
is a language
called D". It was once in a life time magical moment reading
through the home
page and seeing how it is the same language if I were to create
one (static
reflection, native compiled, GC...). My first project in D was
IDL [2] - it
made it possible for LF2 IDE to hot reload modded data files into
the game's
memory, it was amazing working with slices for the first time.
I'm a D user
and an evangelist ever since.

- Improving & updating D implementation of flatbuffers and/or
protobuf
- Contributing the D support to the upstream repositories
- Better documentation & samples
- Benchmarking and making sure D rocks

- Meta-programming (DbI, CTFE, mixin...)
I plan to make D meta features shine in this library.

- It should be possible to parse schema and output mixable D
code at
compile time
const schema = `message Person
{
required string name = 1;
required int32 id = 2;
}`;
mixin(fromProtoSchema(schema));

- There should be no need for a schema definition, a custom
type annotated
with UDAs should be enough
struct Person
{
protoID(1) string name;
protoID(2) int age;
}
serialize(Person("Walter", 42), stdout);

- Simple things should be simple
It should be dead simple to do basic stuff:
auto obj = deserialize!SomeType(stdin);
serialize(obj, stdout);

- Complex things should be possible
The library should be flexible and extensible without modification

- Support for library and tool based usage
It should be usable as a library without any additional setup but
also usable
as a schema compiler.

- Support for common Phobos types
Nullable, tuples, std.datetime, std.complex, std.bigint,
containers...

Existing work:
https://github.com/huntlabs/flatbuffers
https://github.com/dcarp/protobuf-d
https://github.com/msoucy/dproto

I'm personally not happy with any of the existing libraries but
they will
likely be a valuable resource regardless.

Questions:
- How much work would be ideal for GSoC? Should I be working on
flatbuffers
only or protobuf too? (Seems like flatbuffers need more love)
- Should I tackle the std.serialization [3] idea?
- Any other serialization related suggestions?
- Anything I'm missing?

I'm still not entirely sure about my project (probably gonna
write a few
proposals) so if you have other suggestions do not hesitate. All
kinds of
constructive feedback is welcome!

[0]
https://wiki.dlang.org/GSOC_2018_Ideas#FlatBuffers_Support_and.2For_Improved_Protocol_Buffer_Support
[1] https://github.com/ahmetsait/LF2.IDE
[2] https://github.com/ahmetsait/IDL
[3] https://wiki.dlang.org/GSOC_2019_Ideas#std.serialization

Mar 28 2019

Dragos Carp <dragoscarp gmail.com> writes:

Hi Ahmet,

welcome to the D forum.

As the author of protobuf-d I'll try to give you some feedback to
the points you made. I couldn't find the time to also do the
flatbuffers implementation, so my comments are related just to
protobuf. If you are interested to do the Flatbuffers work, I'll
be more than happy to play the mentor role for you - I have some
ideas there. But let's get to the existing, real stuff.

On Friday, 29 March 2019 at 00:18:40 UTC, Ahmet Sait wrote:
- It should be possible to parse schema and output mixable D
code at
compile time
const schema = `message Person
{
required string name = 1;
required int32 id = 2;
}`;
mixin(fromProtoSchema(schema));

I don't think that it is worth the effort.
1. A complete implementation for .proto file parsing is
complicated
(https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
2. Theoretically, protobuf definitions does not change often, and
considering that compile time parsing is somehow slow, the
benefit of parsing them at every compilation is actually a
drawback.
3. protoc plugin is the Protobuf recommended way of parsing
.proto definitions:
https://developers.google.com/protocol-buffers/docs/proto3#generating

protobuf-d does that already, see the unittest for toProtobuf:
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193

- Simple things should be simple
It should be dead simple to do basic stuff:
auto obj = deserialize!SomeType(stdin);
serialize(obj, stdout);

Again, protobuf-d has that:
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/decoding.d#L214

- Complex things should be possible
The library should be flexible and extensible without
modification

toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods are
protobuf customization points in protobuf-d. For an example see
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54

- Support for library and tool based usage
It should be usable as a library without any additional setup
but also usable
as a schema compiler.

protobuf-d is usable as library, see
https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
Regarding the usage as schema compiler, review the first comment.

- Support for common Phobos types
Nullable, tuples, std.datetime, std.complex, std.bigint,
containers...

Protobuf is a language agnostic serialization format. Having
.protobuf definitions for common Phobos types will just shift the
problem somewhere else (i.e. other programming languages).

Nevertheless Protobuf addresses probably the same problem by
defining the "well-known" types
(https://developers.google.com/protocol-buffers/docs/reference/google.protobuf).
protobuf-d also supports those, so that std.datetime.Systime is
mapped to google.protobuf.Timestamp and std.datetime.Duration to
google.protobuf.Duration

I'm personally not happy with any of the existing libraries but
they will
likely be a valuable resource regardless.

The existing protobuf libraries are quite mature and probably
improving those will be time better spent than starting once
again from scratch.

Questions:
- How much work would be ideal for GSoC? Should I be working on
flatbuffers
only or protobuf too? (Seems like flatbuffers need more love)

I'm quite satisfied with protobuf-d implementation: it is small
(aprox. 4k LOC), clean and quite feature complete - 26 failing
conformance test vs. 27 resp. 41 for the official C++ and Java
counterparts. Of course there is still enough space for
improvement, but at least in case of protobuf-d not enough for a
GSoC application.

On the other hand Flatbuffers is a very good candidate: it has
its own specialties, but is also somehow similar to protobuf.
This would reduce the planning risks considerably.

- Should I tackle the std.serialization [3] idea?

I see std.serialization as a high level API. Probably this will
be a long term std.experimental.serialization, that will require
quite some time till multiple serialization formats implements
it. Just after that, if it will ever happen, we can remove the
"experimental" part. I don't see this as a suited GSoC project.

- Any other serialization related suggestions?

https://arrow.apache.org/

Cheers, Dragos

Mar 29 2019

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Friday, 29 March 2019 at 23:19:10 UTC, Dragos Carp wrote:
Hi Ahmet,

welcome to the D forum.

As the author of protobuf-d I'll try to give you some feedback
to the points you made. I couldn't find the time to also do the
flatbuffers implementation, so my comments are related just to
protobuf. If you are interested to do the Flatbuffers work,
I'll be more than happy to play the mentor role for you - I
have some ideas there. But let's get to the existing, real
stuff.

Glad to hear, thanks!

I don't think that it is worth the effort.
1. A complete implementation for .proto file parsing is
complicated
(https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
2. Theoretically, protobuf definitions does not change often,
and considering that compile time parsing is somehow slow, the
benefit of parsing them at every compilation is actually a
drawback.
3. protoc plugin is the Protobuf recommended way of parsing
.proto definitions:
https://developers.google.com/protocol-buffers/docs/proto3#generating

It doesn't immediately strike me as complicated and
https://github.com/msoucy/dproto apparently has this feature so
I'm guessing it can be used as a reference. Compile times are of
course not expected to be good with this approach but it's
promising if Stefan's New CTFE gets completed in the future. Then
again you likely have more experience about this so I should
probably defer this to when New CTFE is ready.

protobuf-d does that already, see the unittest for toProtobuf:
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193

- Simple things should be simple
It should be dead simple to do basic stuff:
auto obj = deserialize!SomeType(stdin);
serialize(obj, stdout);

Again, protobuf-d has that:
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/prot

I assumed it wasn't the case since examples folder didn't have
such code, thanks for pointing out.

- Complex things should be possible
The library should be flexible and extensible without
modification

toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods
are protobuf customization points in protobuf-d. For an example
see
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54

- Support for library and tool based usage
It should be usable as a library without any additional setup
but also usable
as a schema compiler.

protobuf-d is usable as library, see
https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
Regarding the usage as schema compiler, review the first
comment.

These are basically a checklist that I want to fill whether it
already exists. Say, if I were to write flatbuffers-d I would
want to implement them.

- Support for common Phobos types
Nullable, tuples, std.datetime, std.complex, std.bigint,
containers...

Protobuf is a language agnostic serialization format. Having
.protobuf definitions for common Phobos types will just shift
the problem somewhere else (i.e. other programming languages).

Makes sense, I'm in the opinion that API should support common
types if there is direct correspondence or well established
conventions for said type.

I'm personally not happy with any of the existing libraries
but they will
likely be a valuable resource regardless.

The existing protobuf libraries are quite mature and probably
improving those will be time better spent than starting once
again from scratch.

I feel like there is some lack of documentation since none of
those things you mentioned are obvious looking at the repo.
Nevertheless, I'm happy to hear that protobuf-d is mature &
feature complete.

Questions:
- How much work would be ideal for GSoC? Should I be working
on flatbuffers
only or protobuf too? (Seems like flatbuffers need more love)

On the other hand Flatbuffers is a very good candidate: it has
its own specialties, but is also somehow similar to protobuf.
This would reduce the planning risks considerably.

Agreed, I'm going to focus on flatbuffers in my proposal then.

- Should I tackle the std.serialization [3] idea?

I see std.serialization as a high level API. Probably this will
be a long term std.experimental.serialization, that will
require quite some time till multiple serialization formats
implements it. Just after that, if it will ever happen, we can
remove the "experimental" part. I don't see this as a suited
GSoC project.

I see, thanks for the feedback.

- Any other serialization related suggestions?

https://arrow.apache.org/

Thanks, I'll take a look.

Apr 01 2019

Ahmet Sait <nightmarex1337 hotmail.com> writes:

https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing

Seeking some feedback, thanks in advance..!

Apr 04 2019

Jacob Carlborg <doob me.com> writes:

On 2019-04-04 18:43, Ahmet Sait wrote:
 https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoG
gI/edit?usp=sharing 
 
 
 Seeking some feedback, thanks in advance..!

I think "Contributing D support to the upstream repositories" might be a 
hurdle. You never know how much time someone else will have to review 
pull requests.

"Using D traits, UDAs and static introspection, it is possible to 
generate flatbuffer accessors without a schema file"

I don't know how flatbuffer works, but are accessors necessary?

It might be interesting to specify if you have any requirements that it 
should work with any of the attributes: "nothrow", " safe", "pure", 
" nogc" and the betterC subset.

-- 
/Jacob Carlborg

Apr 04 2019

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Thursday, 4 April 2019 at 18:27:05 UTC, Jacob Carlborg wrote:
 On 2019-04-04 18:43, Ahmet Sait wrote:
 https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing
 
 Seeking some feedback, thanks in advance..!

 I think "Contributing D support to the upstream repositories" 
 might be a hurdle. You never know how much time someone else 
 will have to review pull requests.

That's what I thought too, but at least I want the project in a 
state where I can make PR to the upstream, which is not a 
clear/measurable criteria.

 "Using D traits, UDAs and static introspection, it is possible 
 to generate flatbuffer accessors without a schema file"

 I don't know how flatbuffer works, but are accessors necessary?

AFAIU, accessors make vector (array) fields and backward/forward 
compatibility possible. I'm still learning so don't count on me.

 It might be interesting to specify if you have any requirements 
 that it should work with any of the attributes: "nothrow", 
 " safe", "pure", " nogc" and the betterC subset.

This is something that came to my mind after the fact (since I 
don't bother with attributes much), but I still couldn't decide 
yet. It makes a lot of sense to provide  nogc functionality for 
potential RPC protocol usage (not high priority right now), not 
sure about the others.

Apr 04 2019

Dragos Carp <dragoscarp gmail.com> writes:

On Thursday, 4 April 2019 at 16:43:44 UTC, Ahmet Sait wrote:
 https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing

 Seeking some feedback, thanks in advance..!

I added some comments directly in the document.

Apr 04 2019

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Thursday, 4 April 2019 at 19:54:03 UTC, Dragos Carp wrote:
 On Thursday, 4 April 2019 at 16:43:44 UTC, Ahmet Sait wrote:
 https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing

 Seeking some feedback, thanks in advance..!

 I added some comments directly in the document.

Unless there is some final touchs necessary, I'm about to submit 
my final proposal.

Apr 09 2019

Jacob Carlborg <doob me.com> writes:

On 2019-03-29 01:18, Ahmet Sait wrote:

 - Should I tackle the std.serialization [3] idea?

I would love that. FlatBuffers or Protobuf could be one of the backends. 
Although you might need to implement more than one backend to make sure 
the frontend API actually is general enough to implement multiple 
backend. Ideally two completely different kind of backend, like 
FlatBuffers and JSON, for example.

-- 
/Jacob Carlborg

Apr 01 2019

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Monday, 1 April 2019 at 09:57:08 UTC, Jacob Carlborg wrote:
 On 2019-03-29 01:18, Ahmet Sait wrote:

 - Should I tackle the std.serialization [3] idea?

 I would love that. FlatBuffers or Protobuf could be one of the 
 backends. Although you might need to implement more than one 
 backend to make sure the frontend API actually is general 
 enough to implement multiple backend. Ideally two completely 
 different kind of backend, like FlatBuffers and JSON, for 
 example.

Thanks for the feedback! I decided I should gather some 
experience building a serialization library first before thinking 
about designing std.serialization.

Also, I want to know if I can ask you questions when working on 
my project (since you're the author of orange lib and have 
experience) ?

Apr 01 2019

Jacob Carlborg <doob me.com> writes:

On 2019-04-02 02:05, Ahmet Sait wrote:

 Also, I want to know if I can ask you questions when working on my 
 project (since you're the author of orange lib and have experience) ?

Sure, please do.

-- 
/Jacob Carlborg

Apr 02 2019

D Programming

C/C++ Programming

Other

digitalmars.D - [GSoC] Improved FlatBuffers and/or Protobuf Support ~ Binary