digitalmars.D - Databases and the D Standard Library
- Adam Wilson (107/107) Dec 31 2016 Hi Everyone,
- rikki cattermole (7/7) Dec 31 2016 We do indeed need a good database abstraction.
- Adam Wilson (6/13) Dec 31 2016 Or, alternatively, use existing drivers that have this capability built ...
- rikki cattermole (4/15) Dec 31 2016 Which is fine if all you use is c's sockets or only that database
- Adam D. Ruppe (4/6) Dec 31 2016 The C drivers typically offer handles of some sort (Windows
- rikki cattermole (4/10) Dec 31 2016 That's fine and all, but you've still got to deal with it on D's side so...
- Adam Wilson (8/19) Jan 01 2017 Vibe.d is working on a native D event loop. We would probably want to
- rikki cattermole (6/22) Jan 01 2017 No, it isn't generic enough.
- Adam Wilson (9/35) Jan 01 2017 How much effort to make the changes? Their plans seem to indicate that
- rikki cattermole (31/68) Jan 01 2017 Read and compare the code.
- Adam D. Ruppe (8/10) Dec 31 2016 Eh, I count it as would-be-nice just because it isn't that hard
- Adam Wilson (10/19) Jan 01 2017 On that I beg to differ. The C libraries are not @safe, they have wildly...
- Chris Wright (4/7) Jan 02 2017 If we have the database interface defined, there's no reason we couldn't...
- Adam Wilson (6/13) Jan 02 2017 Precisely, I would love to enable this!
- Chris Wright (17/29) Dec 31 2016 This is doable; SQL is an ANSI and ISO standard, and it strongly
- Adam Wilson (11/40) Jan 01 2017 Given that the Cassandra folks wrote an ADO.NET provider for it, I would...
- Jacob Carlborg (7/14) Jan 01 2017 I don't think we should try to make implementations different just
- Chris Wright (8/24) Jan 01 2017 Those both limit your ability to use the underlying database to its full...
- Adam Wilson (9/33) Jan 01 2017 I cannot state my agreement with this paragraph enough. Every ORM I've
- Jacob Carlborg (12/15) Jan 02 2017 I'm perfectly aware of the limitations and capabilities of ORM's. I'm
- Adam Wilson (9/22) Jan 02 2017 Is there a assumption here that there are no classes? Because and ORM
- Chris Wright (8/15) Jan 01 2017 You *can* use classes and interfaces and type hierarchies. They do use
- Adam Wilson (10/25) Jan 01 2017 That was my intention, the knee-jerk reaction that class and interfaces
- Jacob Carlborg (8/11) Jan 02 2017 I think that the design should try to avoid classes as much as possible
- Adam Wilson (20/29) Jan 02 2017 Ok. How would you design a database API for D?
- Jacob Carlborg (14/15) Jan 02 2017 I don't know. I think it's difficult to design something upfront without...
- Chris Wright (24/33) Jan 03 2017 You are unable to interact with two different databases in the same
- Jacob Carlborg (10/19) Jan 03 2017 That's true. And that's why I said it's difficult to design an API
- Chris Wright (29/47) Jan 03 2017 The returned row data is mandatory, and its size can be much larger than...
- Jacob Carlborg (14/39) Jan 03 2017 Look, I didn't say that using the GC should be completely forbidden. I
- Chris Wright (5/7) Jan 03 2017 Templates. Templates everywhere.
- Nicholas Wilson (37/45) Jan 03 2017 That limits you to one DB per compilation or craploads of
- Mark (11/22) Jan 01 2017 I think that these can all be seen as special cases of a
- Adam Wilson (11/33) Jan 01 2017 My experience with graph DB's is that the reality has never been
- Dejan Lekic (10/22) Jan 04 2017 The only thing I want, database related, in the standard library
Hi Everyone, I've seen a lot of talk on the forums over the past year about the need for database support in the D Standard Library and I completely agree. At the end of the day the purpose of any programming language and its attendant libraries is to allow the developer to solve their problems quickly and efficiently; and a large subset of those solutions require some form of structured data store. To my mind, this makes some form of interface(s) to a data-store an essential component of the D Standard Library. And since this is something that my particular problem spaces also need, I thought it would be useful to attempt to do something about it. First, I've seen a couple of promising projects, the most complete, and recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) hasn't seen a commit since June. An additional setback came when I tried to use it and was greeted with a litany of compiler errors. This is *not* a problem, it's the natural course of a volunteer community such as ours; and I want to thank Erik Smith profusely for his work. Priorities and circumstances change and that means that valuable projects are inexplicably dropped. But we still lack a critical component, and to get the conversation started, I'd like to break down the issues I've seen brought up in past threads on this subject and encourage you to bring your own. I may have ideas, but I can't possibly know the entire problem space. 1. Isn't this an enormous amount of work? My answer: Absolutely, depending on your preferred scope of work. In general, I've seen two distinct camps on this issue. One says that we should implement everything in D from the ground up, including re-implementation of the database drivers themselves in D. If this is your preferred scope of work then you will invariably become disheartened at the truly stupendous amount of work you face and give up. The other camp says that we should make use of existing drivers and include them in the D Standard Library. This is difficult path to follow as the vanilla build of the D Standard Library now requires a significant number of foreign libraries, all with differing licenses, be built and distributed to everyone; regardless of whether or not they use them in their project. This is more-or-less than path the dstddb is/was on. My idea: Focus on defining the interface, not the individual driver implementations. If instead we focused on defining an interface that a "conforming implementation" had to follow, we would allow developers to only pull in the library they need or build a from-scratch library if they so desire. Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow and I think we would be well advised to follow their lead here. Not only is the methodology battle-proven, it is also well understood by a significant portion of D's potential user-base. By way of example, Npgsql is the ADO.NET implementation of a driver for PostgreSQL. 2. There are so many different types of data storage systems, how do you design a system generic enough for all of them? My answer: You don't. Nobody else has bothered trying, and I believe that our worry over that question is a large part of why we don't have anything substantive today. My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.) - Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.) If you want something that doesn't fit into a category above, you're own your own, but you were also on your own in other languages. 3. We need to provide a single interface for all data-stores in the SQL/Document/Key-Value category. My answer: Are you sure? The problem is that each underlying data-store has it's own dialect. For example, PostgreSQL and MSSQL are both ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, even in the same category, is never going to be as simple as changing a connection string. And additionally, you will have to implement a super-set of features in the interface to support all the variations and throw exceptions where the chosen implementation does not support a specific feature. My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost. Also, the code becomes clearer to those who take over maintenance duties from the original author, especially when you are mixing data-stores. But in all honest, most developers will pick on technology and stick with it for the entirety of the software's lifespan. 4. We should hide querying from the developer because they are bad at it, security flaws, etc. My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface. My idea: Don't do it. Save it for a different project. 5. D has so many useful features for data access, we should use as many as possible! My answer: D absolutely has many useful features for data access and manipulation. But that doesn't mean that a good interface has to use any of them. The first job of a Database Interface, and indeed any library, is to get the job done with a minimum of overhead. Let's worry about that before going crazy adding in all the D goodness. Ranges have been a particular target for abuse here, and while I love ranges, I think the mechanics of data-store manipulation don't lend themselves well to working with ranges. I'd love to hear your ideas on this though. My idea: Focus on a more conservative implementation in the style of JDO or ADO.NET. This will allow us to ship something that works in a reasonable time frame. I'm not saying that we can't use any of D's unique talents, but using those talents should be subordinate to designing an interface that works efficiently. That is all I have for now. I am looking forward to hear your thoughts on this topic! Until then, I am going to go close out 2016 (PST) with family and friends and I wish you all a Happy New Year! -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Dec 31 2016
We do indeed need a good database abstraction. But a core requirement for any implementation has yet to be met. There has to be a standard way for asynchronous sockets. To implement this we need to take into consideration the event loop that it uses and more importantly allow it to be integrated for e.g. windowing. So here is a dependency before we get a database abstraction into Phobos, a nice fast event loop manager that is generic.
Dec 31 2016
On 12/31/16 7:31 PM, rikki cattermole wrote:We do indeed need a good database abstraction. But a core requirement for any implementation has yet to be met. There has to be a standard way for asynchronous sockets. To implement this we need to take into consideration the event loop that it uses and more importantly allow it to be integrated for e.g. windowing. So here is a dependency before we get a database abstraction into Phobos, a nice fast event loop manager that is generic.Or, alternatively, use existing drivers that have this capability built in? -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Dec 31 2016
On 01/01/2017 4:46 PM, Adam Wilson wrote:On 12/31/16 7:31 PM, rikki cattermole wrote:Which is fine if all you use is c's sockets or only that database connection for a thread. Which is not very realistic for game or web development.We do indeed need a good database abstraction. But a core requirement for any implementation has yet to be met. There has to be a standard way for asynchronous sockets. To implement this we need to take into consideration the event loop that it uses and more importantly allow it to be integrated for e.g. windowing. So here is a dependency before we get a database abstraction into Phobos, a nice fast event loop manager that is generic.Or, alternatively, use existing drivers that have this capability built in?
Dec 31 2016
On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Dec 31 2016
On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Dec 31 2016
rikki cattermole wrote:On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcore -- Adam Wilson IRC: LightBender import quiet.dlang.dev;On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Jan 01 2017
On 02/01/2017 3:03 PM, Adam Wilson wrote:rikki cattermole wrote:No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loopOn 01/01/2017 5:19 PM, Adam D. Ruppe wrote:Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcoreOn Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Jan 01 2017
rikki cattermole wrote:On 02/01/2017 3:03 PM, Adam Wilson wrote:How much effort to make the changes? Their plans seem to indicate that they want to support UI integration. I ask because I need to use the library that is going to get the most support over time and that is vibe.d right now. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;rikki cattermole wrote:No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loopOn 01/01/2017 5:19 PM, Adam D. Ruppe wrote:Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcoreOn Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Jan 01 2017
On 02/01/2017 4:44 PM, Adam Wilson wrote:rikki cattermole wrote:Read and compare the code. To add anything into the Vibe.d one requires direct modification which is unacceptable for Phobos IMO. My aim for SPEW was to be as close as glib's[0] as reasonably possible. A well tested set of features. I'm sorry but this is just not acceptable[1] for an event loop: interface EventDriver { safe: /* nogc:*/ nothrow: property EventDriverCore core(); property EventDriverTimers timers(); property EventDriverEvents events(); property EventDriverSignals signals(); property EventDriverSockets sockets(); property EventDriverDNS dns(); property EventDriverFiles files(); property EventDriverWatchers watchers(); /// Releases all resources associated with the driver void dispose(); } Its fine for a web framework, aka a specific task but not when its generic. When we're discussing a generic event loop it shouldn't care about the different usage of it. All it knows is that there are events that come from sources and then mapped to a consumer. It isn't the most performant but that is ok. You would use a different implementation depending on your use case e.g. 1:1 is easy enough to do without a event loop manager. Just so you're aware, windowing is extremely hard to get right. Sockets, DNS, signals and timers ext. are easy to implement compared. [0] https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html [1] https://github.com/vibe-d/eventcore/blob/master/source/eventcore/driver.dOn 02/01/2017 3:03 PM, Adam Wilson wrote:How much effort to make the changes? Their plans seem to indicate that they want to support UI integration. I ask because I need to use the library that is going to get the most support over time and that is vibe.d right now.rikki cattermole wrote:No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loopOn 01/01/2017 5:19 PM, Adam D. Ruppe wrote:Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcoreOn Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).Which is fine if all you use is c's sockets or only that database connection for a thread.The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Jan 01 2017
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:interface(s) to a data-store an essential component of the D Standard Library.Eh, I count it as would-be-nice just because it isn't that hard to just use the C ones, or another third party lib; it doesn't have to be Phobos itself. That said though, a basic db interface is quite simple and would be a nice batteries included bit - it is one of the things IMO that PHP did quite successfully (even if its interface sucked, it still just worked)
Dec 31 2016
Adam D. Ruppe wrote:On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:On that I beg to differ. The C libraries are not safe, they have wildly different API's, and they have high-complexity, which is a large risk-factor for bugs and/or security flaws. Any place where we caninterface(s) to a data-store an essential component of the D Standard Library.Eh, I count it as would-be-nice just because it isn't that hard to just use the C ones, or another third party lib; it doesn't have to be Phobos itself.That said though, a basic db interface is quite simple and would be a nice batteries included bit - it is one of the things IMO that PHP did quite successfully (even if its interface sucked, it still just worked)I'll agree that it isn't hard, and I think including it would help boost D's usability in the web service space. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01 2017
On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:On that I beg to differ. The C libraries are not safe, they have wildly different API's, and they have high-complexity, which is a large risk-factor for bugs and/or security flaws.If we have the database interface defined, there's no reason we couldn't provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d driver.
Jan 02 2017
On 1/2/17 8:33 AM, Chris Wright wrote:On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:Precisely, I would love to enable this! -- Adam Wilson IRC: LightBender import quiet.dlang.dev;On that I beg to differ. The C libraries are not safe, they have wildly different API's, and they have high-complexity, which is a large risk-factor for bugs and/or security flaws.If we have the database interface defined, there's no reason we couldn't provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d driver.
Jan 02 2017
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)This is doable; SQL is an ANSI and ISO standard, and it strongly constrains what you can do with your data.- Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)I'm not so certain about this. CouchDB has a rather different approach to things than MongoDB -- possibly not as divergent as Mongo from MySQL, but far more than Postgres from MySQL. Likewise, there are many key/value stores in existence, and they support many different operations. For instance, it looks like etcd2 has a notion of directories, where you can list items in a directory. Redis just lets you list keys with a given prefix. Redis lets you modify values in-place; etcd2 doesn't. We could define a common subset of operations for document databases and key/value stores, but most people probably wouldn't be satisfied with it. There's also a question of where you'd put Cassandra in that, since it's decidedly not a SQL database but tries to pretend it is.4. We should hide querying from the developer because they are bad at it, security flaws, etc. My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface.We can encourage people to use prepared queries with documentation and naming.
Dec 31 2016
Chris Wright wrote:On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:Given that the Cassandra folks wrote an ADO.NET provider for it, I would suggest that it is easiest to treat it as a SQL database from an interface standpoint.My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)This is doable; SQL is an ANSI and ISO standard, and it strongly constrains what you can do with your data.- Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)I'm not so certain about this. CouchDB has a rather different approach to things than MongoDB -- possibly not as divergent as Mongo from MySQL, but far more than Postgres from MySQL. Likewise, there are many key/value stores in existence, and they support many different operations. For instance, it looks like etcd2 has a notion of directories, where you can list items in a directory. Redis just lets you list keys with a given prefix. Redis lets you modify values in-place; etcd2 doesn't. We could define a common subset of operations for document databases and key/value stores, but most people probably wouldn't be satisfied with it. There's also a question of where you'd put Cassandra in that, since it's decidedly not a SQL database but tries to pretend it is.Precisely, my focus would be an making the API as easy as possible to use with Parameterized Queries, and if that makes it harder to write non-parameterized queries, oh well. :D -- Adam Wilson IRC: LightBender import quiet.dlang.dev;4. We should hide querying from the developer because they are bad at it, security flaws, etc. My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface.We can encourage people to use prepared queries with documentation and naming.
Jan 01 2017
On 2017-01-01 04:24, Adam Wilson wrote:My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost.I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason. -- /Jacob Carlborg
Jan 01 2017
On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:On 2017-01-01 04:24, Adam Wilson wrote:Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient. I ran across a problem in NHibernate about a decade ago. We had a straightforward HQL query involving joins. It took over a minute to run. We wrote the simple equivalent in SQL and it completed in milliseconds. Fortunately, NHibernate had the ability to run raw SQL queries.My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost.I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason.
Jan 01 2017
Chris Wright wrote:On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:I cannot state my agreement with this paragraph enough. Every ORM I've worked with generates some inexplicably horrific SQL in seemingly simple situations.On 2017-01-01 04:24, Adam Wilson wrote:Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient.My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost.I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason.I ran across a problem in NHibernate about a decade ago. We had a straightforward HQL query involving joins. It took over a minute to run. We wrote the simple equivalent in SQL and it completed in milliseconds. Fortunately, NHibernate had the ability to run raw SQL queries.I've also seen Entity Framework 6/7 do the same thing. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01 2017
On 2017-01-01 17:50, Chris Wright wrote:Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient.I'm perfectly aware of the limitations and capabilities of ORM's. I'm just saying that making the interface/names different just to make it different is not a good idea. It should be up to the user to choose if an ORM is used or not and this interface should try to, as much as possible, to make it possible to use an ORM just as well as not using an ORM. This whole idea seems fail even before it's barely stared. If this idea is going to work then all the layers need to be designed correctly and the lower layers should not know anything about the higher layers. -- /Jacob Carlborg
Jan 02 2017
On 1/2/17 12:05 AM, Jacob Carlborg wrote:On 2017-01-01 17:50, Chris Wright wrote:Is there a assumption here that there are no classes? Because and ORM could quite easily work with base classes, and indeed both NHibernate and EntityFramework function exactly this way.Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient.I'm perfectly aware of the limitations and capabilities of ORM's. I'm just saying that making the interface/names different just to make it different is not a good idea. It should be up to the user to choose if an ORM is used or not and this interface should try to, as much as possible, to make it possible to use an ORM just as well as not using an ORM.This whole idea seems fail even before it's barely stared. If this idea is going to work then all the layers need to be designed correctly and the lower layers should not know anything about the higher layers.I absolutely agree, which, ironically, is why I am having this conversation. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 02 2017
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-storesYou *can* use classes and interfaces and type hierarchies. They do use the GC by default, unlike structs, but they're kind of handy, especially here. Then you have a SqlConnection interface that most people use all the time and all people use most of the time. If you explicitly need some connection properties that are specific to Postgres, you cast to a PostgresConnection.
Jan 01 2017
Chris Wright wrote:On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:That was my intention, the knee-jerk reaction that class and interfaces get here sometimes strikes me as a bit histrionic sometimes. They are a tool with a use case. :)My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-storesYou *can* use classes and interfaces and type hierarchies. They do use the GC by default, unlike structs, but they're kind of handy, especially here.Then you have a SqlConnection interface that most people use all the time and all people use most of the time. If you explicitly need some connection properties that are specific to Postgres, you cast to a PostgresConnection.That is pretty much how it works in ADO.NET and JDO. And I think it works well. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01 2017
On 2017-01-02 02:34, Adam Wilson wrote:That was my intention, the knee-jerk reaction that class and interfaces get here sometimes strikes me as a bit histrionic sometimes. They are a tool with a use case. :)I think that the design should try to avoid classes as much as possible for things that will be frequently created. It's always possible to wrap a struct in a class, the other way around is a bit more difficult. But when it comes to the connection object I think it's fine to use classes since it will most likely only be created once per thread. -- /Jacob Carlborg
Jan 02 2017
On 1/2/17 12:09 AM, Jacob Carlborg wrote:On 2017-01-02 02:34, Adam Wilson wrote:Ok. How would you design a database API for D? The requirements I am operating under are: 1. Individual data-store driver implementations are not included in the D Standard Library. Driver licensing and implementation details vary. For example libpq5 uses it's own mix of licenses that is not Boost compatible. 2. The D Standard Library provides a common API and implementation of shared components, but leaves the data-store specific implementation up to the implementer. We don't care how the implementation is constructed or licensed, only that the API is followed. 3. In order to support higher level abstractions like ORM's we need a base class model that can be extended by implementers but still consumed by the ORM without knowing implementation specific details. As far as I am aware, the only way to meet those requirements is to use a base-class model. Is there something I am missing? -- Adam Wilson IRC: LightBender import quiet.dlang.dev;That was my intention, the knee-jerk reaction that class and interfaces get here sometimes strikes me as a bit histrionic sometimes. They are a tool with a use case. :)I think that the design should try to avoid classes as much as possible for things that will be frequently created. It's always possible to wrap a struct in a class, the other way around is a bit more difficult. But when it comes to the connection object I think it's fine to use classes since it will most likely only be created once per thread.
Jan 02 2017
On 2017-01-03 06:25, Adam Wilson wrote:Ok. How would you design a database API for D?I don't know. I think it's difficult to design something upfront without trying out different API's to see what's possible to implement in code. Structs and functions, with or without templates. Could something like this work: module db_interface; version (Postgres) public import pg.db_interface; else version (MySQL) public import mysql.db_interface; static assert(isInterfaceImplemented, "The DB interface is not implemented"); -- /Jacob Carlborg
Jan 02 2017
On Tue, 03 Jan 2017 08:25:55 +0100, Jacob Carlborg wrote:Structs and functions, with or without templates. Could something like this work: module db_interface; version (Postgres) public import pg.db_interface; else version (MySQL) public import mysql.db_interface;You are unable to interact with two different databases in the same executable using the same library. For instance, if you're using hibernated, either you compiled it to connect to mysql, or you compiled it to connect to oracle. This means you can't, for instance, use mysql for the CI server (because it's open source and doesn't have licensing fees), then use oracle for production (because it's faster for your workflow), because then you're testing with a different binary. You can't have some data in postgres and some in SQL Server because you're in the middle of a migration. You can still use both if you are using the database interface directly. But if you're connecting via a library, you're SOL. You have to recompile everything whenever you switch databases. That's a barrier to proprietary libraries that interact with databases. They're not impossible, but they have to release separate binaries for every database the maintainer thinks you might want to connect to. Every library that lets you access a database must maintain a list of db drivers that it supports. If you have a new or private driver you want to use, you need to modify any library you use that talks to a database. In exchange, you get...slightly less GC usage. It's not *no* GC usage -- you'll see a bunch of buffers allocated to hold incoming and outgoing messages. You'll just peel back one layer of it. You'd be much better off asking that we encourage the use of std.experimental.allocator in the driver interface.
Jan 03 2017
On 2017-01-03 09:38, Chris Wright wrote:You are unable to interact with two different databases in the same executable using the same library. For instance, if you're using hibernated, either you compiled it to connect to mysql, or you compiled it to connect to oracle.That's true. And that's why I said it's difficult to design an API without trying it in code :)In exchange, you get...slightly less GC usage. It's not *no* GC usage -- you'll see a bunch of buffers allocated to hold incoming and outgoing messages. You'll just peel back one layer of it.1. I hope there won't be that many buffers in the API, at least not in the user facing API 2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GCYou'd be much better off asking that we encourage the use of std.experimental.allocator in the driver interface.Then I'll ask for that as well :) -- /Jacob Carlborg
Jan 03 2017
On Tue, 03 Jan 2017 13:23:55 +0100, Jacob Carlborg wrote:On 2017-01-03 09:38, Chris Wright wrote:I didn't try it in code.You are unable to interact with two different databases in the same executable using the same library. For instance, if you're using hibernated, either you compiled it to connect to mysql, or you compiled it to connect to oracle.That's true. And that's why I said it's difficult to design an API without trying it in code :)The returned row data is mandatory, and its size can be much larger than the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack limit.) I suppose you could have a streaming API for row data, one that has a stack-allocated buffer and returns slices of that: string fieldName; ubyte[] data; ubyte[][string] fields; db.query("SELECT * FROM USERS") // have to revisit this if a db allows large names .onFieldStart((fieldName) => field = fieldName) .onFieldData((fragment) => data ~= fragment) .onFieldEnd(() { fields[field] = data; data = null; }) .onRowEnd(() => process(fields)) .onResultsEnd!(() => writeln("done")) .exec(); This looks pretty terrible, to be honest. I get this sort of thing from nodejs because it doesn't want to potentially block and also doesn't want to delay letting me process things, but the worst I get there is usually two callbacks. This would also result in more GC use for the majority of people who use the GC.In exchange, you get...slightly less GC usage. It's not *no* GC usage -- you'll see a bunch of buffers allocated to hold incoming and outgoing messages. You'll just peel back one layer of it.1. I hope there won't be that many buffers in the API, at least not in the user facing API2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GCYou haven't looked at std.experimental.allocator, have you? http://dpldocs.info/experimental-docs/std.conv.emplace.3.html http://dpldocs.info/experimental-docs/std.experimental.allocator.make.html http://dpldocs.info/experimental-docs/ std.experimental.allocator.dispose.2.html
Jan 03 2017
On 2017-01-03 18:13, Chris Wright wrote:The returned row data is mandatory, and its size can be much larger than the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack limit.) I suppose you could have a streaming API for row data, one that has a stack-allocated buffer and returns slices of that: string fieldName; ubyte[] data; ubyte[][string] fields; db.query("SELECT * FROM USERS") // have to revisit this if a db allows large names .onFieldStart((fieldName) => field = fieldName) .onFieldData((fragment) => data ~= fragment) .onFieldEnd(() { fields[field] = data; data = null; }) .onRowEnd(() => process(fields)) .onResultsEnd!(() => writeln("done")) .exec(); This looks pretty terrible, to be honest. I get this sort of thing from nodejs because it doesn't want to potentially block and also doesn't want to delay letting me process things, but the worst I get there is usually two callbacks. This would also result in more GC use for the majority of people who use the GC.Look, I didn't say that using the GC should be completely forbidden. I just said we should try to avoid it. For example, I've been using the ddb Postgres driver [1]. It uses classes for most of its types, even if it might not be necessary. Here's one example [2], unless there some intention to have some form of higher level, DB independent, API on top of this, I don't see a reason why that type needs to be a class.I know it's possible to allocate a class without the GC, hence the "basically". I'm not sure how other write their code but at least I make the assumption that all objects are allocated with the GC. [1] https://github.com/pszturmaj/ddb [2] https://github.com/pszturmaj/ddb/blob/master/source/ddb/postgres.d#L904 -- /Jacob Carlborg2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GCYou haven't looked at std.experimental.allocator, have you?
Jan 03 2017
On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:As far as I am aware, the only way to meet those requirements is to use a base-class model. Is there something I am missing?Templates. Templates everywhere. Every method in your application that might possibly touch a database, or touch anything that touches a database, and so on, needs to be templated according to what type of database might be used.
Jan 03 2017
On Tuesday, 3 January 2017 at 08:09:54 UTC, Chris Wright wrote:On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:That limits you to one DB per compilation or craploads of template bloat. There are a number variables here: the number of DB backends you wish to support (b), the number of DB backends you actually use at runtime (r), the number of symbols (not quite the word I'm looking for but, oh well) you need to represent an abstract backend API (s),the number of class types you use to abstract the backend (c) and the number of template you use to abstract the back end (t). b is ideally fixed at "all the backends" r is variable and dependent on the application (e.g. I may only care for Postgres, but someone else may wish to support many SQL DBs). If r == 1 then a template approach is acceptable. s is a function of the dissimilarity of the backends you wish to support. Breaking the problem up into SQL like, graph-like and KV-store is a tradeoff somewhere between having "one DB (interface) to rule them all" and one interface for each backend. c + t = s What this represents is a tradeoff between compile time dispatch and runtime dispatch. As s moves from being all classes to more templates + structs (from the "bottom up"), the last layer of dynamic dispatch before the static dispatch of the templates becomes an algebraic type selection (i.e. check the tag, choose the type, and then static dispatch). I believe the sweet spot for this lies at the point where the dissimilarity of similar backends becomes apparent after the start of a logical operation. Or put another way the point where I know the result that I want and no longer care about any implementation details. As an example using a compute API (sorry I don't know much about DBs): launching a kernel represents a single logical operation but is in fact many driver calls. If one wishes to abstract the compute API then this point becomes the point I would choose. Finding those points will probably not be easy and may be different for different people, but it is worth considering. </ramble>As far as I am aware, the only way to meet those requirements is to use a base-class model. Is there something I am missing?Templates. Templates everywhere. Every method in your application that might possibly touch a database, or touch anything that touches a database, and so on, needs to be templated according to what type of database might be used.
Jan 03 2017
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:2. There are so many different types of data storage systems, how do you design a system generic enough for all of them? My answer: You don't. Nobody else has bothered trying, and I believe that our worry over that question is a large part of why we don't have anything substantive today. My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.) - Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)I think that these can all be seen as special cases of a hypegraph database. So on the face of it you probably can build some common interface (apparently the fellows in hypergraphdb.org are trying to do something in this spirit). You can then have specialized interfaces inheriting from it. But given how general hypergraph dbs are, I'm not sure if this is a worthwhile abstraction. By the way, what about XML? The documentaion on std.xml says that the module will be replaced at some point in the future. I wonder when and with what...
Jan 01 2017
Mark wrote:On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:My experience with graph DB's is that the reality has never been anywhere close to the hype. I don't think it's a worthwhile abstraction. But that is my opinion. And if it is, we should be able to add it over the top of this layer if we really want too.2. There are so many different types of data storage systems, how do you design a system generic enough for all of them? My answer: You don't. Nobody else has bothered trying, and I believe that our worry over that question is a large part of why we don't have anything substantive today. My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.) - Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)I think that these can all be seen as special cases of a hypegraph database. So on the face of it you probably can build some common interface (apparently the fellows in hypergraphdb.org are trying to do something in this spirit). You can then have specialized interfaces inheriting from it. But given how general hypergraph dbs are, I'm not sure if this is a worthwhile abstraction.By the way, what about XML? The documentaion on std.xml says that the module will be replaced at some point in the future. I wonder when and with what...I have no idea. It's a great question! That said it's a bit outside the scope of this topic. :) -- Adam Wilson IRC: LightBender //quiet.dlang.dev
Jan 01 2017
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:Hi Everyone, I've seen a lot of talk on the forums over the past year about the need for database support in the D Standard Library and I completely agree. At the end of the day the purpose of any programming language and its attendant libraries is to allow the developer to solve their problems quickly and efficiently; and a large subset of those solutions require some form of structured data store. To my mind, this makes some form of interface(s) to a data-store an essential component of the D Standard Library. And since this is something that my particular problem spaces also need, I thought it would be useful to attempt to do something about it.The only thing I want, database related, in the standard library is the API! - Nothing else! There should be a standard implementation of that API (libd-db.so for an example), but it should be separated from Phobos. In general, Phobos should only contain the APIs in my humble opinion. We should handle XML processing the same way (API in Phobos, libd-xml.so for the reference implementation), Image processing the same way, GUI, etc... Why? Phobos is enormous already!
Jan 04 2017