digitalmars.D - Databases and the D Standard Library

Adam Wilson (107/107) Dec 31 2016 Hi Everyone,

rikki cattermole (7/7) Dec 31 2016 We do indeed need a good database abstraction.

Adam Wilson (6/13) Dec 31 2016 Or, alternatively, use existing drivers that have this capability built ...

rikki cattermole (4/15) Dec 31 2016 Which is fine if all you use is c's sockets or only that database

Adam D. Ruppe (4/6) Dec 31 2016 The C drivers typically offer handles of some sort (Windows

rikki cattermole (4/10) Dec 31 2016 That's fine and all, but you've still got to deal with it on D's side so...

Adam Wilson (8/19) Jan 01 2017 Vibe.d is working on a native D event loop. We would probably want to

rikki cattermole (6/22) Jan 01 2017 No, it isn't generic enough.

Adam Wilson (9/35) Jan 01 2017 How much effort to make the changes? Their plans seem to indicate that

rikki cattermole (31/68) Jan 01 2017 Read and compare the code.

Adam D. Ruppe (8/10) Dec 31 2016 Eh, I count it as would-be-nice just because it isn't that hard

Adam Wilson (10/19) Jan 01 2017 On that I beg to differ. The C libraries are not @safe, they have wildly...

Chris Wright (4/7) Jan 02 2017 If we have the database interface defined, there's no reason we couldn't...

Adam Wilson (6/13) Jan 02 2017 Precisely, I would love to enable this!

Chris Wright (17/29) Dec 31 2016 This is doable; SQL is an ANSI and ISO standard, and it strongly

Adam Wilson (11/40) Jan 01 2017 Given that the Cassandra folks wrote an ADO.NET provider for it, I would...

Jacob Carlborg (7/14) Jan 01 2017 I don't think we should try to make implementations different just

Chris Wright (8/24) Jan 01 2017 Those both limit your ability to use the underlying database to its full...

Adam Wilson (9/33) Jan 01 2017 I cannot state my agreement with this paragraph enough. Every ORM I've
Jacob Carlborg (12/15) Jan 02 2017 I'm perfectly aware of the limitations and capabilities of ORM's. I'm

Adam Wilson (9/22) Jan 02 2017 Is there a assumption here that there are no classes? Because and ORM

Chris Wright (8/15) Jan 01 2017 You *can* use classes and interfaces and type hierarchies. They do use

Adam Wilson (10/25) Jan 01 2017 That was my intention, the knee-jerk reaction that class and interfaces

Jacob Carlborg (8/11) Jan 02 2017 I think that the design should try to avoid classes as much as possible

Adam Wilson (20/29) Jan 02 2017 Ok. How would you design a database API for D?

Jacob Carlborg (14/15) Jan 02 2017 I don't know. I think it's difficult to design something upfront without...

Chris Wright (24/33) Jan 03 2017 You are unable to interact with two different databases in the same

Jacob Carlborg (10/19) Jan 03 2017 That's true. And that's why I said it's difficult to design an API

Chris Wright (29/47) Jan 03 2017 The returned row data is mandatory, and its size can be much larger than...

Jacob Carlborg (14/39) Jan 03 2017 Look, I didn't say that using the GC should be completely forbidden. I

Chris Wright (5/7) Jan 03 2017 Templates. Templates everywhere.

Nicholas Wilson (37/45) Jan 03 2017 That limits you to one DB per compilation or craploads of

Mark (11/22) Jan 01 2017 I think that these can all be seen as special cases of a

Adam Wilson (11/33) Jan 01 2017 My experience with graph DB's is that the reality has never been

Dejan Lekic (10/22) Jan 04 2017 The only thing I want, database related, in the standard library

Adam Wilson <flyboynw gmail.com> writes:

Hi Everyone,

I've seen a lot of talk on the forums over the past year about the need 
for database support in the D Standard Library and I completely agree. 
At the end of the day the purpose of any programming language and its 
attendant libraries is to allow the developer to solve their problems 
quickly and efficiently; and a large subset of those solutions require 
some form of structured data store. To my mind, this makes some form of 
interface(s) to a data-store an essential component of the D Standard 
Library. And since this is something that my particular problem spaces 
also need, I thought it would be useful to attempt to do something about it.

First, I've seen a couple of promising projects, the most complete, and 
recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) 
hasn't seen a commit since June. An additional setback came when I tried 
to use it and was greeted with a litany of compiler errors.

This is *not* a problem, it's the natural course of a volunteer 
community such as ours; and I want to thank Erik Smith profusely for his 
work. Priorities and circumstances change and that means that valuable 
projects are inexplicably dropped.

But we still lack a critical component, and to get the conversation 
started, I'd like to break down the issues I've seen brought up in past 
threads on this subject and encourage you to bring your own. I may have 
ideas, but I can't possibly know the entire problem space.

1. Isn't this an enormous amount of work?

My answer: Absolutely, depending on your preferred scope of work.

In general, I've seen two distinct camps on this issue. One says that we 
should implement everything in D from the ground up, including 
re-implementation of the database drivers themselves in D. If this is 
your preferred scope of work then you will invariably become 
disheartened at the truly stupendous amount of work you face and give up.

The other camp says that we should make use of existing drivers and 
include them in the D Standard Library. This is difficult path to follow 
as the vanilla build of the D Standard Library now requires a 
significant number of foreign libraries, all with differing licenses, be 
built and distributed to everyone; regardless of whether or not they use 
them in their project. This is more-or-less than path the dstddb is/was on.

My idea: Focus on defining the interface, not the individual driver 
implementations.

If instead we focused on defining an interface that a "conforming 
implementation" had to follow, we would allow developers to only pull in 
the library they need or build a from-scratch library if they so desire.

Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow 
and I think we would be well advised to follow their lead here. Not only 
is the methodology battle-proven, it is also well understood by a 
significant portion of D's potential user-base. By way of example, 
Npgsql is the ADO.NET implementation of a driver for PostgreSQL.

2. There are so many different types of data storage systems, how do you 
design a system generic enough for all of them?

My answer: You don't. Nobody else has bothered trying, and I believe 
that our worry over that question is a large part of why we don't have 
anything substantive today.

My idea: Split the data storage systems out by category of data-store.
For example:
	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
	- Document: std.database.document (Mongo, CouchDB, etc.)
	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

If you want something that doesn't fit into a category above, you're own 
your own, but you were also on your own in other languages.

3. We need to provide a single interface for all data-stores in the 
SQL/Document/Key-Value category.

My answer: Are you sure? The problem is that each underlying data-store 
has it's own dialect. For example, PostgreSQL and MSSQL are both 
ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, 
even in the same category, is never going to be as simple as changing a 
connection string. And additionally, you will have to implement a 
super-set of features in the interface to support all the variations and 
throw exceptions where the chosen implementation does not support a 
specific feature.

My idea: Each data store has it's own implementation with it's own 
naming convention. For example (ADO.NET):
	- SqlConnection (MSSQL)
	- NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch 
data-stores, but since you are already changing your queries, which is a 
much more difficult change, this isn't a significant additional cost. 
Also, the code becomes clearer to those who take over maintenance duties 
from the original author, especially when you are mixing data-stores. 
But in all honest, most developers will pick on technology and stick 
with it for the entirety of the software's lifespan.

4. We should hide querying from the developer because they are bad at 
it, security flaws, etc.

My answer: While agree in principal, especially with the security 
concerns, in reality what you are asking for is an ORM. In my opinion, 
that is a separate concern from a database interface, and is typically 
implemented as layer over the DB interface.

My idea: Don't do it. Save it for a different project.

5. D has so many useful features for data access, we should use as many 
as possible!

My answer: D absolutely has many useful features for data access and 
manipulation. But that doesn't mean that a good interface has to use any 
of them. The first job of a Database Interface, and indeed any library, 
is to get the job done with a minimum of overhead. Let's worry about 
that before going crazy adding in all the D goodness. Ranges have been a 
particular target for abuse here, and while I love ranges, I think the 
mechanics of data-store manipulation don't lend themselves well to 
working with ranges. I'd love to hear your ideas on this though.

My idea: Focus on a more conservative implementation in the style of JDO 
or ADO.NET. This will allow us to ship something that works in a 
reasonable time frame. I'm not saying that we can't use any of D's 
unique talents, but using those talents should be subordinate to 
designing an interface that works efficiently.

That is all I have for now. I am looking forward to hear your thoughts 
on this topic! Until then, I am going to go close out 2016 (PST) with 
family and friends and I wish you all a Happy New Year!

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Dec 31 2016

rikki cattermole <rikki cattermole.co.nz> writes:

We do indeed need a good database abstraction.
But a core requirement for any implementation has yet to be met.

There has to be a standard way for asynchronous sockets. To implement 
this we need to take into consideration the event loop that it uses and 
more importantly allow it to be integrated for e.g. windowing.

So here is a dependency before we get a database abstraction into 
Phobos, a nice fast event loop manager that is generic.

Dec 31 2016

Adam Wilson <flyboynw gmail.com> writes:

On 12/31/16 7:31 PM, rikki cattermole wrote:
 We do indeed need a good database abstraction.
 But a core requirement for any implementation has yet to be met.

 There has to be a standard way for asynchronous sockets. To implement
 this we need to take into consideration the event loop that it uses and
 more importantly allow it to be integrated for e.g. windowing.

 So here is a dependency before we get a database abstraction into
 Phobos, a nice fast event loop manager that is generic.

Or, alternatively, use existing drivers that have this capability built in?

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Dec 31 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 01/01/2017 4:46 PM, Adam Wilson wrote:
 On 12/31/16 7:31 PM, rikki cattermole wrote:
 We do indeed need a good database abstraction.
 But a core requirement for any implementation has yet to be met.

 There has to be a standard way for asynchronous sockets. To implement
 this we need to take into consideration the event loop that it uses and
 more importantly allow it to be integrated for e.g. windowing.

 So here is a dependency before we get a database abstraction into
 Phobos, a nice fast event loop manager that is generic.

 Or, alternatively, use existing drivers that have this capability built in?

Which is fine if all you use is c's sockets or only that database 
connection for a thread.

Which is not very realistic for game or web development.

Dec 31 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that 
 database connection for a thread.

The C drivers typically offer handles of some sort (Windows 
HANDLE, *nix file descriptor, that kind of thing) that you can 
integrate into other event loops.

Dec 31 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.

 The C drivers typically offer handles of some sort (Windows HANDLE, *nix
 file descriptor, that kind of thing) that you can integrate into other
 event loops.

That's fine and all, but you've still got to deal with it on D's side so 
you can mix and match libraries that require access to the same event 
loop (such as Windows).

Dec 31 2016

Adam Wilson <flyboynw gmail.com> writes:

rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.

 The C drivers typically offer handles of some sort (Windows HANDLE, *nix
 file descriptor, that kind of thing) that you can integrate into other
 event loops.

 That's fine and all, but you've still got to deal with it on D's side so
 you can mix and match libraries that require access to the same event
 loop (such as Windows).

Vibe.d is working on a native D event loop. We would probably want to 
integrate with that.

EventCore: https://code.dlang.org/packages/eventcore

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.

 The C drivers typically offer handles of some sort (Windows HANDLE, *nix
 file descriptor, that kind of thing) that you can integrate into other
 event loops.

 That's fine and all, but you've still got to deal with it on D's side so
 you can mix and match libraries that require access to the same event
 loop (such as Windows).

 Vibe.d is working on a native D event loop. We would probably want to
 integrate with that.

 EventCore: https://code.dlang.org/packages/eventcore

No, it isn't generic enough.
Nor can it handle windowing without a good bit of modifications.

Mine in SPEW[0] is however ready for this task.

[0] 
https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop

Jan 01 2017

Adam Wilson <flyboynw gmail.com> writes:

rikki cattermole wrote:
 On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.

 The C drivers typically offer handles of some sort (Windows HANDLE,
 *nix
 file descriptor, that kind of thing) that you can integrate into other
 event loops.

 That's fine and all, but you've still got to deal with it on D's side so
 you can mix and match libraries that require access to the same event
 loop (such as Windows).

 Vibe.d is working on a native D event loop. We would probably want to
 integrate with that.

 EventCore: https://code.dlang.org/packages/eventcore

 No, it isn't generic enough.
 Nor can it handle windowing without a good bit of modifications.

 Mine in SPEW[0] is however ready for this task.

 [0]
 https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop

How much effort to make the changes? Their plans seem to indicate that 
they want to support UI integration. I ask because I need to use the 
library that is going to get the most support over time and that is 
vibe.d right now.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 02/01/2017 4:44 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.

 The C drivers typically offer handles of some sort (Windows HANDLE,
 *nix
 file descriptor, that kind of thing) that you can integrate into other
 event loops.

 That's fine and all, but you've still got to deal with it on D's
 side so
 you can mix and match libraries that require access to the same event
 loop (such as Windows).

 Vibe.d is working on a native D event loop. We would probably want to
 integrate with that.

 EventCore: https://code.dlang.org/packages/eventcore

 No, it isn't generic enough.
 Nor can it handle windowing without a good bit of modifications.

 Mine in SPEW[0] is however ready for this task.

 [0]
 https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop

 How much effort to make the changes? Their plans seem to indicate that
 they want to support UI integration. I ask because I need to use the
 library that is going to get the most support over time and that is
 vibe.d right now.

Read and compare the code.
To add anything into the Vibe.d one requires direct modification which 
is unacceptable for Phobos IMO.

My aim for SPEW was to be as close as glib's[0] as reasonably possible. 
A well tested set of features.

I'm sorry but this is just not acceptable[1] for an event loop:

interface EventDriver {
 safe: /* nogc:*/ nothrow:
	 property EventDriverCore core();
	 property EventDriverTimers timers();
	 property EventDriverEvents events();
	 property EventDriverSignals signals();
	 property EventDriverSockets sockets();
	 property EventDriverDNS dns();
	 property EventDriverFiles files();
	 property EventDriverWatchers watchers();

	/// Releases all resources associated with the driver
	void dispose();
}

Its fine for a web framework, aka a specific task but not when its generic.

When we're discussing a generic event loop it shouldn't care about the 
different usage of it. All it knows is that there are events that come 
from sources and then mapped to a consumer. It isn't the most performant 
but that is ok. You would use a different implementation depending on 
your use case e.g. 1:1 is easy enough to do without a event loop manager.

Just so you're aware, windowing is extremely hard to get right. Sockets, 
DNS, signals and timers ext. are easy to implement compared.

[0] https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html
[1] 
https://github.com/vibe-d/eventcore/blob/master/source/eventcore/driver.d

Jan 01 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 interface(s) to a data-store an essential component of the D 
 Standard Library.

Eh, I count it as would-be-nice just because it isn't that hard 
to just use the C ones, or another third party lib; it doesn't 
have to be Phobos itself.

That said though, a basic db interface is quite simple and would 
be a nice batteries included bit - it is one of the things IMO 
that PHP did quite successfully (even if its interface sucked, it 
still just worked)

Dec 31 2016

Adam Wilson <flyboynw gmail.com> writes:

Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 interface(s) to a data-store an essential component of the D Standard
 Library.

 Eh, I count it as would-be-nice just because it isn't that hard to just
 use the C ones, or another third party lib; it doesn't have to be Phobos
 itself.

On that I beg to differ. The C libraries are not  safe, they have wildly 
different API's, and they have high-complexity, which is a large 
risk-factor for bugs and/or security flaws. Any place where we can

 That said though, a basic db interface is quite simple and would be a
 nice batteries included bit - it is one of the things IMO that PHP did
 quite successfully (even if its interface sucked, it still just worked)

I'll agree that it isn't hard, and I think including it would help boost 
D's usability in the web service space.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

Chris Wright <dhasenan gmail.com> writes:

On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:
 On that I beg to differ. The C libraries are not  safe, they have wildly
 different API's, and they have high-complexity, which is a large
 risk-factor for bugs and/or security flaws.

If we have the database interface defined, there's no reason we couldn't 
provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d 
driver.

Jan 02 2017

Adam Wilson <flyboynw gmail.com> writes:

On 1/2/17 8:33 AM, Chris Wright wrote:
 On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:
 On that I beg to differ. The C libraries are not  safe, they have wildly
 different API's, and they have high-complexity, which is a large
 risk-factor for bugs and/or security flaws.

 If we have the database interface defined, there's no reason we couldn't
 provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d
 driver.

Precisely, I would love to enable this!

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 02 2017

Chris Wright <dhasenan gmail.com> writes:

On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Split the data storage systems out by category of data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)

This is doable; SQL is an ANSI and ISO standard, and it strongly 
constrains what you can do with your data.

 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

I'm not so certain about this. CouchDB has a rather different approach to 
things than MongoDB -- possibly not as divergent as Mongo from MySQL, but 
far more than Postgres from MySQL.

Likewise, there are many key/value stores in existence, and they support 
many different operations. For instance, it looks like etcd2 has a notion 
of directories, where you can list items in a directory. Redis just lets 
you list keys with a given prefix. Redis lets you modify values in-place; 
etcd2 doesn't.

We could define a common subset of operations for document databases and 
key/value stores, but most people probably wouldn't be satisfied with it.

There's also a question of where you'd put Cassandra in that, since it's 
decidedly not a SQL database but tries to pretend it is.

 4. We should hide querying from the developer because they are bad at
 it, security flaws, etc.
 
 My answer: While agree in principal, especially with the security
 concerns, in reality what you are asking for is an ORM. In my opinion,
 that is a separate concern from a database interface, and is typically
 implemented as layer over the DB interface.

We can encourage people to use prepared queries with documentation and 
naming.

Dec 31 2016

Adam Wilson <flyboynw gmail.com> writes:

Chris Wright wrote:
 On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Split the data storage systems out by category of data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)

 This is doable; SQL is an ANSI and ISO standard, and it strongly
 constrains what you can do with your data.

 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

 I'm not so certain about this. CouchDB has a rather different approach to
 things than MongoDB -- possibly not as divergent as Mongo from MySQL, but
 far more than Postgres from MySQL.

 Likewise, there are many key/value stores in existence, and they support
 many different operations. For instance, it looks like etcd2 has a notion
 of directories, where you can list items in a directory. Redis just lets
 you list keys with a given prefix. Redis lets you modify values in-place;
 etcd2 doesn't.

 We could define a common subset of operations for document databases and
 key/value stores, but most people probably wouldn't be satisfied with it.

 There's also a question of where you'd put Cassandra in that, since it's
 decidedly not a SQL database but tries to pretend it is.

Given that the Cassandra folks wrote an ADO.NET provider for it, I would 
suggest that it is easiest to treat it as a SQL database from an 
interface standpoint.

 4. We should hide querying from the developer because they are bad at
 it, security flaws, etc.

 My answer: While agree in principal, especially with the security
 concerns, in reality what you are asking for is an ORM. In my opinion,
 that is a separate concern from a database interface, and is typically
 implemented as layer over the DB interface.

 We can encourage people to use prepared queries with documentation and
 naming.

Precisely, my focus would be an making the API as easy as possible to 
use with Parameterized Queries, and if that makes it harder to write 
non-parameterized queries, oh well. :D

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-01 04:24, Adam Wilson wrote:

 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
      - SqlConnection (MSSQL)
      - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you switch
 data-stores, but since you are already changing your queries, which is a
 much more difficult change, this isn't a significant additional cost.

I don't think we should try to make implementations different just 
because. If you have an SQL builder or an ORM on top of the interface 
that abstract the differences in the SQL syntax, it's possible to switch 
driver, within reason.

-- 
/Jacob Carlborg

Jan 01 2017

Chris Wright <dhasenan gmail.com> writes:

On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:

 On 2017-01-01 04:24, Adam Wilson wrote:
 
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
      - SqlConnection (MSSQL)
      - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you
 switch data-stores, but since you are already changing your queries,
 which is a much more difficult change, this isn't a significant
 additional cost.

 
 I don't think we should try to make implementations different just
 because. If you have an SQL builder or an ORM on top of the interface
 that abstract the differences in the SQL syntax, it's possible to switch
 driver, within reason.

Those both limit your ability to use the underlying database to its full 
potential. They offer a chance for queries that seem simple and efficient 
to become horribly inefficient.

I ran across a problem in NHibernate about a decade ago. We had a 
straightforward HQL query involving joins. It took over a minute to run. 
We wrote the simple equivalent in SQL and it completed in milliseconds. 
Fortunately, NHibernate had the ability to run raw SQL queries.

Jan 01 2017

Adam Wilson <flyboynw gmail.com> writes:

Chris Wright wrote:
 On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:

 On 2017-01-01 04:24, Adam Wilson wrote:

 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
       - SqlConnection (MSSQL)
       - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you
 switch data-stores, but since you are already changing your queries,
 which is a much more difficult change, this isn't a significant
 additional cost.

 I don't think we should try to make implementations different just
 because. If you have an SQL builder or an ORM on top of the interface
 that abstract the differences in the SQL syntax, it's possible to switch
 driver, within reason.

 Those both limit your ability to use the underlying database to its full
 potential. They offer a chance for queries that seem simple and efficient
 to become horribly inefficient.

I cannot state my agreement with this paragraph enough. Every ORM I've 
worked with generates some inexplicably horrific SQL in seemingly simple 
situations.

 I ran across a problem in NHibernate about a decade ago. We had a
 straightforward HQL query involving joins. It took over a minute to run.
 We wrote the simple equivalent in SQL and it completed in milliseconds.
 Fortunately, NHibernate had the ability to run raw SQL queries.

I've also seen Entity Framework 6/7 do the same thing.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-01 17:50, Chris Wright wrote:

 Those both limit your ability to use the underlying database to its full
 potential. They offer a chance for queries that seem simple and efficient
 to become horribly inefficient.

I'm perfectly aware of the limitations and capabilities of ORM's. I'm 
just saying that making the interface/names different just to make it 
different is not a good idea. It should be up to the user to choose if 
an ORM is used or not and this interface should try to, as much as 
possible, to make it possible to use an ORM just as well as not using an 
ORM.

This whole idea seems fail even before it's barely stared. If this idea 
is going to work then all the layers need to be designed correctly and 
the lower layers should not know anything about the higher layers.

-- 
/Jacob Carlborg

Jan 02 2017

Adam Wilson <flyboynw gmail.com> writes:

On 1/2/17 12:05 AM, Jacob Carlborg wrote:
 On 2017-01-01 17:50, Chris Wright wrote:

 Those both limit your ability to use the underlying database to its full
 potential. They offer a chance for queries that seem simple and efficient
 to become horribly inefficient.

 I'm perfectly aware of the limitations and capabilities of ORM's. I'm
 just saying that making the interface/names different just to make it
 different is not a good idea. It should be up to the user to choose if
 an ORM is used or not and this interface should try to, as much as
 possible, to make it possible to use an ORM just as well as not using an
 ORM.

Is there a assumption here that there are no classes? Because and ORM 
could quite easily work with base classes, and indeed both NHibernate 
and EntityFramework function exactly this way.

 This whole idea seems fail even before it's barely stared. If this idea
 is going to work then all the layers need to be designed correctly and
 the lower layers should not know anything about the higher layers.

I absolutely agree, which, ironically, is why I am having this conversation.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 02 2017

Chris Wright <dhasenan gmail.com> writes:

On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
 	- SqlConnection (MSSQL)
 	- NpgsqlConnection (Npgsql)
 
 Yes, this means that you have to change names in your code if you switch
 data-stores

You *can* use classes and interfaces and type hierarchies. They do use 
the GC by default, unlike structs, but they're kind of handy, especially 
here.

Then you have a SqlConnection interface that most people use all the time 
and all people use most of the time. If you explicitly need some 
connection properties that are specific to Postgres, you cast to a 
PostgresConnection.

Jan 01 2017

Adam Wilson <flyboynw gmail.com> writes:

Chris Wright wrote:
 On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
 	- SqlConnection (MSSQL)
 	- NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you switch
 data-stores

 You *can* use classes and interfaces and type hierarchies. They do use
 the GC by default, unlike structs, but they're kind of handy, especially
 here.

That was my intention, the knee-jerk reaction that class and interfaces 
get here sometimes strikes me as a bit histrionic sometimes. They are a 
tool with a use case. :)

 Then you have a SqlConnection interface that most people use all the time
 and all people use most of the time. If you explicitly need some
 connection properties that are specific to Postgres, you cast to a
 PostgresConnection.

That is pretty much how it works in ADO.NET and JDO. And I think it 
works well.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 01 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-02 02:34, Adam Wilson wrote:

 That was my intention, the knee-jerk reaction that class and interfaces
 get here sometimes strikes me as a bit histrionic sometimes. They are a
 tool with a use case. :)

I think that the design should try to avoid classes as much as possible 
for things that will be frequently created. It's always possible to wrap 
a struct in a class, the other way around is a bit more difficult. But 
when it comes to the connection object I think it's fine to use classes 
since it will most likely only be created once per thread.

-- 
/Jacob Carlborg

Jan 02 2017

Adam Wilson <flyboynw gmail.com> writes:

On 1/2/17 12:09 AM, Jacob Carlborg wrote:
 On 2017-01-02 02:34, Adam Wilson wrote:

 That was my intention, the knee-jerk reaction that class and interfaces
 get here sometimes strikes me as a bit histrionic sometimes. They are a
 tool with a use case. :)

 I think that the design should try to avoid classes as much as possible
 for things that will be frequently created. It's always possible to wrap
 a struct in a class, the other way around is a bit more difficult. But
 when it comes to the connection object I think it's fine to use classes
 since it will most likely only be created once per thread.

Ok. How would you design a database API for D?

The requirements I am operating under are:

1. Individual data-store driver implementations are not included in the 
D Standard Library. Driver licensing and implementation details vary. 
For example libpq5 uses it's own mix of licenses that is not Boost 
compatible.

2. The D Standard Library provides a common API and implementation of 
shared components, but leaves the data-store specific implementation up 
to the implementer. We don't care how the implementation is constructed 
or licensed, only that the API is followed.

3. In order to support higher level abstractions like ORM's we need a 
base class model that can be extended by implementers but still consumed 
by the ORM without knowing implementation specific details.

As far as I am aware, the only way to meet those requirements is to use 
a base-class model. Is there something I am missing?

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;

Jan 02 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-03 06:25, Adam Wilson wrote:

 Ok. How would you design a database API for D?

I don't know. I think it's difficult to design something upfront without 
trying out different API's to see what's possible to implement in code.

Structs and functions, with or without templates. Could something like 
this work:

module db_interface;

version (Postgres)
     public import pg.db_interface;
else version (MySQL)
     public import mysql.db_interface;

static assert(isInterfaceImplemented, "The DB interface is not 
implemented");


-- 
/Jacob Carlborg

Jan 02 2017

Chris Wright <dhasenan gmail.com> writes:

On Tue, 03 Jan 2017 08:25:55 +0100, Jacob Carlborg wrote:

 Structs and functions, with or without templates. Could something like
 this work:
 
 module db_interface;
 
 version (Postgres)
      public import pg.db_interface;
 else version (MySQL)
      public import mysql.db_interface;

You are unable to interact with two different databases in the same 
executable using the same library. For instance, if you're using 
hibernated, either you compiled it to connect to mysql, or you compiled 
it to connect to oracle.

This means you can't, for instance, use mysql for the CI server (because 
it's open source and doesn't have licensing fees), then use oracle for 
production (because it's faster for your workflow), because then you're 
testing with a different binary. You can't have some data in postgres and 
some in SQL Server because you're in the middle of a migration.

You can still use both if you are using the database interface directly. 
But if you're connecting via a library, you're SOL.

You have to recompile everything whenever you switch databases. That's a 
barrier to proprietary libraries that interact with databases. They're 
not impossible, but they have to release separate binaries for every 
database the maintainer thinks you might want to connect to.

Every library that lets you access a database must maintain a list of db 
drivers that it supports. If you have a new or private driver you want to 
use, you need to modify any library you use that talks to a database.

In exchange, you get...slightly less GC usage. It's not *no* GC usage -- 
you'll see a bunch of buffers allocated to hold incoming and outgoing 
messages. You'll just peel back one layer of it.

You'd be much better off asking that we encourage the use of 
std.experimental.allocator in the driver interface.

Jan 03 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-03 09:38, Chris Wright wrote:

 You are unable to interact with two different databases in the same
 executable using the same library. For instance, if you're using
 hibernated, either you compiled it to connect to mysql, or you compiled
 it to connect to oracle.

That's true. And that's why I said it's difficult to design an API 
without trying it in code :)

 In exchange, you get...slightly less GC usage. It's not *no* GC usage --
 you'll see a bunch of buffers allocated to hold incoming and outgoing
 messages. You'll just peel back one layer of it.

1. I hope there won't be that many buffers in the API, at least not in 
the user facing API

2. Buffers say nothing how they're allocated. With classes on the other 
hand, you're basically forced to allocate with the GC

 You'd be much better off asking that we encourage the use of
 std.experimental.allocator in the driver interface.

Then I'll ask for that as well :)

-- 
/Jacob Carlborg

Jan 03 2017

Chris Wright <dhasenan gmail.com> writes:

On Tue, 03 Jan 2017 13:23:55 +0100, Jacob Carlborg wrote:

 On 2017-01-03 09:38, Chris Wright wrote:
 
 You are unable to interact with two different databases in the same
 executable using the same library. For instance, if you're using
 hibernated, either you compiled it to connect to mysql, or you compiled
 it to connect to oracle.

 
 That's true. And that's why I said it's difficult to design an API
 without trying it in code :)

I didn't try it in code.

 In exchange, you get...slightly less GC usage. It's not *no* GC usage
 --
 you'll see a bunch of buffers allocated to hold incoming and outgoing
 messages. You'll just peel back one layer of it.

 
 1. I hope there won't be that many buffers in the API, at least not in
 the user facing API

The returned row data is mandatory, and its size can be much larger than 
the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack 
limit.)

I suppose you could have a streaming API for row data, one that has a 
stack-allocated buffer and returns slices of that:

  string fieldName;
  ubyte[] data;
  ubyte[][string] fields;
  db.query("SELECT * FROM USERS")
    // have to revisit this if a db allows large names
    .onFieldStart((fieldName) => field = fieldName)
    .onFieldData((fragment) => data ~= fragment)
    .onFieldEnd(() { fields[field] = data; data = null; })
    .onRowEnd(() => process(fields))
    .onResultsEnd!(() => writeln("done"))
    .exec();

This looks pretty terrible, to be honest. I get this sort of thing from 
nodejs because it doesn't want to potentially block and also doesn't want 
to delay letting me process things, but the worst I get there is usually 
two callbacks.

This would also result in more GC use for the majority of people who use 
the GC.

 2. Buffers say nothing how they're allocated. With classes on the other
 hand, you're basically forced to allocate with the GC

You haven't looked at std.experimental.allocator, have you?

http://dpldocs.info/experimental-docs/std.conv.emplace.3.html
http://dpldocs.info/experimental-docs/std.experimental.allocator.make.html
http://dpldocs.info/experimental-docs/
std.experimental.allocator.dispose.2.html

Jan 03 2017

Jacob Carlborg <doob me.com> writes:

On 2017-01-03 18:13, Chris Wright wrote:

 The returned row data is mandatory, and its size can be much larger than
 the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack
 limit.)

 I suppose you could have a streaming API for row data, one that has a
 stack-allocated buffer and returns slices of that:

   string fieldName;
   ubyte[] data;
   ubyte[][string] fields;
   db.query("SELECT * FROM USERS")
     // have to revisit this if a db allows large names
     .onFieldStart((fieldName) => field = fieldName)
     .onFieldData((fragment) => data ~= fragment)
     .onFieldEnd(() { fields[field] = data; data = null; })
     .onRowEnd(() => process(fields))
     .onResultsEnd!(() => writeln("done"))
     .exec();

 This looks pretty terrible, to be honest. I get this sort of thing from
 nodejs because it doesn't want to potentially block and also doesn't want
 to delay letting me process things, but the worst I get there is usually
 two callbacks.

 This would also result in more GC use for the majority of people who use
 the GC.

Look, I didn't say that using the GC should be completely forbidden. I 
just said we should try to avoid it. For example, I've been using the 
ddb Postgres driver [1]. It uses classes for most of its types, even if 
it might not be necessary. Here's one example [2], unless there some 
intention to have some form of higher level, DB independent, API on top 
of this, I don't see a reason why that type needs to be a class.

 2. Buffers say nothing how they're allocated. With classes on the other
 hand, you're basically forced to allocate with the GC

 You haven't looked at std.experimental.allocator, have you?

I know it's possible to allocate a class without the GC, hence the 
"basically". I'm not sure how other write their code but at least I make 
the assumption that all objects are allocated with the GC.

[1] https://github.com/pszturmaj/ddb
[2] https://github.com/pszturmaj/ddb/blob/master/source/ddb/postgres.d#L904

-- 
/Jacob Carlborg

Jan 03 2017

Chris Wright <dhasenan gmail.com> writes:

On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
 As far as I am aware, the only way to meet those requirements is to use
 a base-class model. Is there something I am missing?

Templates. Templates everywhere.

Every method in your application that might possibly touch a database, or 
touch anything that touches a database, and so on, needs to be templated 
according to what type of database might be used.

Jan 03 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Tuesday, 3 January 2017 at 08:09:54 UTC, Chris Wright wrote:
 On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
 As far as I am aware, the only way to meet those requirements 
 is to use a base-class model. Is there something I am missing?

 Templates. Templates everywhere.

 Every method in your application that might possibly touch a 
 database, or touch anything that touches a database, and so on, 
 needs to be templated according to what type of database might 
 be used.

That limits you to one DB per compilation or craploads of 
template bloat.

There are a number variables here: the number of DB backends you 
wish to support (b), the number of DB backends you actually use 
at runtime (r), the number of symbols (not quite the word I'm 
looking for but, oh well) you need to represent an abstract 
backend API (s),the number of class types you use to abstract the 
backend (c) and the number of template you use to abstract the 
back end (t).

b is ideally fixed at "all the backends"
r is variable and dependent on the application (e.g. I may only 
care for Postgres, but someone else may wish to support many SQL 
DBs). If r == 1 then a template approach is acceptable.

s is a function of the dissimilarity of the backends you wish to 
support. Breaking the problem up into SQL like, graph-like and 
KV-store is a tradeoff somewhere between having "one DB 
(interface) to rule them all" and one interface for each backend.

c + t = s

What this represents is a tradeoff between compile time dispatch 
and runtime dispatch. As s moves from being all classes to more 
templates + structs (from the "bottom up"), the last layer of 
dynamic dispatch before the static dispatch of the templates 
becomes an algebraic type selection (i.e. check the tag, choose 
the type, and then static dispatch).

I believe the sweet spot for this lies at the point where the 
dissimilarity of similar backends becomes apparent after the 
start of a logical operation. Or put another way the point where 
I know the result that I want and no longer care about any 
implementation details.

As an example using a compute API (sorry I don't know much about 
DBs): launching a kernel represents a single logical operation 
but is in fact many driver calls. If one wishes to abstract the 
compute API then this point becomes the point I would choose.

Finding those points will probably not be easy and may be 
different for different people, but it is worth considering.

</ramble>

Jan 03 2017

Mark <smarksc gmail.com> writes:

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 2. There are so many different types of data storage systems, 
 how do you design a system generic enough for all of them?

 My answer: You don't. Nobody else has bothered trying, and I 
 believe that our worry over that question is a large part of 
 why we don't have anything substantive today.

 My idea: Split the data storage systems out by category of 
 data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

I think that these can all be seen as special cases of a 
hypegraph database. So on the face of it you probably can build 
some common interface (apparently the fellows in hypergraphdb.org 
are trying to do something in this spirit). You can then have 
specialized interfaces inheriting from it. But given how general 
hypergraph dbs are,  I'm not sure if this is a worthwhile 
abstraction.

By the way, what about XML? The documentaion on std.xml says that 
the module will be replaced at some point in the future. I wonder 
when and with what...

Jan 01 2017

Adam Wilson <flyboynw gmail.com> writes:

Mark wrote:
 On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 2. There are so many different types of data storage systems, how do
 you design a system generic enough for all of them?

 My answer: You don't. Nobody else has bothered trying, and I believe
 that our worry over that question is a large part of why we don't have
 anything substantive today.

 My idea: Split the data storage systems out by category of data-store.
 For example:
     - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
     - Document: std.database.document (Mongo, CouchDB, etc.)
     - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

 I think that these can all be seen as special cases of a hypegraph
 database. So on the face of it you probably can build some common
 interface (apparently the fellows in hypergraphdb.org are trying to do
 something in this spirit). You can then have specialized interfaces
 inheriting from it. But given how general hypergraph dbs are,  I'm not
 sure if this is a worthwhile abstraction.

My experience with graph DB's is that the reality has never been 
anywhere close to the hype. I don't think it's a worthwhile abstraction. 
But that is my opinion. And if it is, we should be able to add it over 
the top of this layer if we really want too.

 By the way, what about XML? The documentaion on std.xml says that the
 module will be replaced at some point in the future. I wonder when and
 with what...

I have no idea. It's a great question! That said it's a bit outside the 
scope of this topic. :)

-- 
Adam Wilson
IRC: LightBender
//quiet.dlang.dev

Jan 01 2017

Dejan Lekic <dejan.lekic gmail.com> writes:

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 Hi Everyone,

 I've seen a lot of talk on the forums over the past year about 
 the need for database support in the D Standard Library and I 
 completely agree. At the end of the day the purpose of any 
 programming language and its attendant libraries is to allow 
 the developer to solve their problems quickly and efficiently; 
 and a large subset of those solutions require some form of 
 structured data store. To my mind, this makes some form of 
 interface(s) to a data-store an essential component of the D 
 Standard Library. And since this is something that my 
 particular problem spaces also need, I thought it would be 
 useful to attempt to do something about it.


The only thing I want, database related, in the standard library 
is the API! - Nothing else! There should be a standard 
implementation of that API (libd-db.so for an example), but it 
should be separated from Phobos. In general, Phobos should only 
contain the APIs in my humble opinion.
We should handle XML processing the same way (API in Phobos, 
libd-xml.so for the reference implementation), Image processing 
the same way, GUI, etc...

Why? Phobos is enormous already!

Jan 04 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Databases and the D Standard Library