digitalmars.D - std.database

Erik Smith (33/33) Mar 01 2016 I'm back to actively working on a std.database specification &

jmh530 (6/8) Mar 01 2016 Minor typo in the fluent style select section of readme.md.

Erik Smith (15/15) Mar 01 2016 Typo fixed - thanks. Incidentally, I'm not 100% content with

Stefan Koch (6/11) Mar 01 2016 Hi Erik,

Erik Smith (23/23) Mar 01 2016 Hi Stefan,

Rikki Cattermole (13/13) Mar 01 2016 Okay I've found a problem.

Erik Smith (15/29) Mar 01 2016 Yes agree that the poly Database is broken - it isn't reference

Rikki Cattermole (14/45) Mar 01 2016 Currently its use after free. E.g. destructor gets called but there is

Erik Smith (12/70) Mar 01 2016 I will look at your managed approach to understand it better.

landaire (3/17) Mar 02 2016 How is this a UAF? Isn't the struct copied?

landaire (3/4) Mar 02 2016 Ah I think I misunderstood. You mean in the database, not the

Rikki Cattermole (2/5) Mar 02 2016 Correct. My code was just to showcase the problem.

Piotrek (20/30) Mar 02 2016 My quick comments:

Erik Smith (26/40) Mar 02 2016 I agree that std.dbc would be more accurate, although I like

Piotrek (12/21) Mar 03 2016 I agree with you we need database manipulation in Phobos. However

Erik Smith (14/24) Mar 03 2016 Your process proposal (DIP73) was helpful and gives me a better

Kagamin (4/8) Mar 04 2016 BTW in the oracle driver you use that ODBC idiom of passing

Erik Smith (3/7) Mar 04 2016 That will be fixed in the next push.

Stefan Koch (5/16) Mar 02 2016 Looks good.
Kagamin (5/9) Mar 02 2016 I believe that section is about PHP forums being locked to mysql,
Chris Wright (25/29) Mar 02 2016 It looks like you're trying to write a LevelDB analogue that implements

Piotrek (31/60) Mar 03 2016 The thing is I strongly encourage to not reserve std.database for

Chris Wright (79/143) Mar 03 2016 You were a bit vague before. I interpreted you as saying "just offer a

Piotrek (46/135) Mar 03 2016 I agree I could be better in describing the concept. But I just

Chris Wright (6/8) Mar 04 2016 Which is all I'm saying. Something in std.database sounds like it should...

Piotrek (8/11) Mar 04 2016 The one written from scratch specially for D (I'm talking in

Dejan Lekic (9/13) Mar 03 2016 I suggest you call the package stdx.db - it is not (and may not

Erik Smith (10/18) Mar 03 2016 I totally agree that it must be collaborative and community

Kagamin (3/3) Mar 03 2016 Also member names: methods are named after verbs, you use nouns.

Erik Smith (16/19) Mar 03 2016 Those are actually internal methods not intended for the

Kagamin (9/13) Mar 03 2016 db.execute("select from t").range();

Kagamin (4/4) Mar 03 2016 Other options:

Kagamin (4/4) Mar 03 2016 On the other hand execute can simply return the reader with extra

Erik Smith (11/31) Mar 03 2016 More good options (the 3rd one is there). Also at the value

Kagamin (13/29) Mar 04 2016 Can you elaborate?

Erik Smith (11/39) Mar 04 2016 Actually I like this and I think it can work. I'm trying to keep

Kagamin (10/25) Mar 04 2016 Another idea from previous attempts: rename `execute` to `query`.

Erik Smith (17/44) Mar 04 2016 On further thought, execute() should definitely return something

Bubbasaur (7/13) Mar 04 2016 Why not only:

Erik Smith (7/22) Mar 04 2016 Agree connection execute is simpler and it is supported, although

Sebastiaan Koppe (10/14) Mar 04 2016 You'll want to have some types in there. As in

Erik Smith (18/33) Mar 04 2016 I think some basic object serialization capabilities would be

Sebastiaan Koppe (26/29) Mar 05 2016 I did a project a while back using mysql-native (see

John Colvin (4/17) Mar 03 2016 std.experimental, ugly or not, is what is in phobos.

Saurabh Das (6/11) Mar 05 2016 A little late to the party, nevertheless: Thanks for doing this,

Jacob Carlborg (5/9) Mar 05 2016 Yeah, that's really important. Unfortunately it looks like a

Erik Smith (21/31) Mar 05 2016 I'm definitely going to start working in async capability (or

Jacob Carlborg (5/23) Mar 05 2016 Just for the record, there's already a native Postgres client that is
Kagamin (6/10) Mar 06 2016 Huh? Non-blocking operation is not reflected in interface. Only

Erik Smith (10/18) Mar 06 2016 I think a basic non-blocking interface should be similar to the

Erik Smith (15/15) Mar 10 2016 I've made a few updates based on some of the feedback in this

Erik Smith <erik cruiserhouse.com> writes:

I'm back to actively working on a std.database specification & 
implementation.  It's still unstable, minimally tested, and there 
is plenty of work to do, but I wanted to share an update on my 
progress.

The main focus of this project is to bring a standard interface 
for database clients.    This is similar to the purpose of JDBC 
(java) and DBI (perl).  While there is some existing work in 
place (ddbc, etc.c.odbc.sql, vibe.d and other newer projects), my 
goal, after a lengthly period of community review/revision, is to 
achieve Phobos inclusion for the interface standard and also some 
of the implementations. There is debate on whether the 
implementations belong there, but I have made the implementation 
Phobos compatible (by using templates) while this issue is sorted 
out.

The initial focus of this project is to first cover the 
synchronous interface details and get it to a refined state.  
Asynchronous I/O is also important and it will take some time to 
work that into the model with a number of complicated aspects to 
consider.

Also, this is not an ORM tool, although it does provide a 
foundational layer for building one. This is also aimed at SQL 
databases, although I will consider how no-sql targets might fit 
in.

There are now implementations for 4 initial targets (sqlite, 
mysql, Oracle, and ODBC) and they are in a basic working state 
with a common test suite.

For usage examples and other limited project details, see the 
github page:
https://github.com/cruisercoder/dstddb

Dub link coming soon (dealing with dub zombie name) along with 
more progress updates.

Please feel free to comment or question on the design as evolves.

erik

Mar 01 2016

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 Please feel free to comment or question on the design as 
 evolves.

Minor typo in the fluent style select section of readme.md.
the line
createDatabase("file:///demo.sqlite");
should be
createDatabase("file:///demo.sqlite")

Mar 01 2016

Erik Smith <erik cruiserhouse.com> writes:

Typo fixed - thanks.  Incidentally, I'm not 100% content with 
createDatabase.  With the library being template based, the types 
are no longer as easy to work with directly:

auto database = Database!DefaultPolicy();

alias cant be used because it instantiates.  The template 
argument can be defaulted, but the best I can do is this:

auto database = Database!()();

It would be nice if the compiler invoked the default type without 
the extra !().  I settled on using a no parameter template 
function named createDatabase.   I would have liked just 
database() to match the other member function style in the 
package, but I think that might be injecting a name that might be 
too common as a variable name.

Project minutiae.

erik

Mar 01 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

 [...]

Hi Erik,

As soon as sqlite-d is at the point that it is usable.
I will see if I can support the API proposed by you.

Have you ever used it in production or semi-production code ?
What is the rational for your design ?

Mar 01 2016

Erik Smith <erik cruiserhouse.com> writes:

Hi Stefan,

It might be a challenge for CTFE compatibility in the API, but it 
would be interesting to see how much of it is workable.  There 
does need to be some optional API elements in cases where it's 
not supported by the underlying database client(array binding for 
example) and these compile time policies could also be used to 
define a restricted subset if needed for your design.

I don't have any D code in production unfortunately, but I have a 
similar design in C++ that has been in production for years.

As far as the design rationale, the primary aspect of the design 
is to ensure deterministic resource lifetimes for each of the 
resources commonly exposed by clients (connections, statements, 
rowsets, buffer allocations, etc), which is why structs are used. 
  This could be done with with simple non-copyable structs for a 
more limited design.  I went with the value type structs (using 
reference counting) which provides better composibility overall.  
This can be seen in the call chaining examples, which I think 
significantly increase ease-of-use.  It also covers use cases in 
application code where, for example, lazy results are returned 
through code layers and they need to retain their underlying 
connections.

Thanks for the questions and the interest.

erik

Mar 01 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

Okay I've found a problem.

Here is some code demonstrating it.
http://dpaste.dzfl.pl/022c9e610a18

Now take a look again at Database 
https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

Do you see the problem?
The solution is simple.

The client database type can be a struct or a class. It doesn't matter 
too much (assuming you're going the way of ranges).
But the intermediary representation must be on the heap and should 
probably use the constructor and not a static create method to get it.
This way people can use other memory management solutions and construct 
it however they like.

Mar 01 2016

Erik Smith <erik cruiserhouse.com> writes:

Yes agree that the poly Database is broken - it isn't reference 
counted and I will fix that.

Your sample code had me wondering if I am missing something else, 
but I can't see another issue yet.  I think the use of classes 
would definitely lead to problems with resources being freed out 
of order or too late.

As far as memory management options, my plan is to work 
allocators into the design and that would seem to provide a lot 
of options.  I'm having a problem at the moment with 
MallocAllocator's shared interface.  I'm not sure why it's shared 
since malloc/free are thread safe and I can't seem to cast away 
the shared.  I'm sure there is a reason.

erik


On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database 
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't 
 matter too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should probably use the constructor and not a static create 
 method to get it.
 This way people can use other memory management solutions and 
 construct it however they like.

Mar 01 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

On 02/03/16 4:48 PM, Erik Smith wrote:
 Yes agree that the poly Database is broken - it isn't reference counted
 and I will fix that.

My point was, you shouldn't handle that.

 Your sample code had me wondering if I am missing something else, but I
 can't see another issue yet.  I think the use of classes would
 definitely lead to problems with resources being freed out of order or
 too late.

Currently its use after free. E.g. destructor gets called but there is 
still a copy around.
At the very least it should be an explicit call.

If classes lead to problems, so will structs.

 As far as memory management options, my plan is to work allocators into
 the design and that would seem to provide a lot of options.  I'm having
 a problem at the moment with MallocAllocator's shared interface.  I'm
 not sure why it's shared since malloc/free are thread safe and I can't
 seem to cast away the shared.  I'm sure there is a reason.

Use IAllocator. Don't touch the structs unless you want pain.

Oh and btw final class is your friend.
As an FYI here is my managed memory concept 
https://github.com/rikkimax/alphaPhobos/blob/master/source/std/experimen
al/memory/managed.d 
its not finished but it mostly works.

Of course I would want to go the more OOP way, sure thats more 
allocations but over all I think there are enough wins that its worth 
it. So if you feel it doesn't fit well with your goal, say so :)

 erik


 On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37


 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't matter
 too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and should
 probably use the constructor and not a static create method to get it.
 This way people can use other memory management solutions and
 construct it however they like.

Mar 01 2016

Erik Smith <erik cruiserhouse.com> writes:

I will look at your managed approach to understand it better.  
One drawback I can think of is that destruction might not occur 
immediately.  This can be an issue for shared libraries when the 
GC hasn't run when the library call returns to the host 
application.

Maybe it's a C++ bias, but the RefCounted approach seems like 
such a natural fit for this use case and so far it's working 
well.  There is the race condition issue but I think there is an 
allocator solution for that coming.

erik


On Wednesday, 2 March 2016 at 04:11:10 UTC, Rikki Cattermole 
wrote:
 On 02/03/16 4:48 PM, Erik Smith wrote:
 Yes agree that the poly Database is broken - it isn't 
 reference counted
 and I will fix that.

 My point was, you shouldn't handle that.

 Your sample code had me wondering if I am missing something 
 else, but I
 can't see another issue yet.  I think the use of classes would
 definitely lead to problems with resources being freed out of 
 order or
 too late.

 Currently its use after free. E.g. destructor gets called but 
 there is still a copy around.
 At the very least it should be an explicit call.

 If classes lead to problems, so will structs.

 As far as memory management options, my plan is to work 
 allocators into
 the design and that would seem to provide a lot of options.  
 I'm having
 a problem at the moment with MallocAllocator's shared 
 interface.  I'm
 not sure why it's shared since malloc/free are thread safe and 
 I can't
 seem to cast away the shared.  I'm sure there is a reason.

 Use IAllocator. Don't touch the structs unless you want pain.

 Oh and btw final class is your friend.
 As an FYI here is my managed memory concept 
 https://github.com/rikkimax/alphaPhobos/blob/master/source/std/experimen
al/memory/managed.d its not finished but it mostly works.

 Of course I would want to go the more OOP way, sure thats more 
 allocations but over all I think there are enough wins that its 
 worth it. So if you feel it doesn't fit well with your goal, 
 say so :)

 erik


 On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
 wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37


 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It 
 doesn't matter
 too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should
 probably use the constructor and not a static create method 
 to get it.
 This way people can use other memory management solutions and
 construct it however they like.

Mar 01 2016

landaire <landergriffith+dlang gmail.com> writes:

On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database 
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't 
 matter too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should probably use the constructor and not a static create 
 method to get it.
 This way people can use other memory management solutions and 
 construct it however they like.

How is this a UAF? Isn't the struct copied?

Mar 02 2016

landaire <landergriffith+dlang gmail.com> writes:

On Wednesday, 2 March 2016 at 18:28:34 UTC, landaire wrote:
 How is this a UAF? Isn't the struct copied?

Ah I think I misunderstood. You mean in the database, not the 
dpaste?

Mar 02 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

On 03/03/16 8:36 AM, landaire wrote:
 On Wednesday, 2 March 2016 at 18:28:34 UTC, landaire wrote:
 How is this a UAF? Isn't the struct copied?

 Ah I think I misunderstood. You mean in the database, not the dpaste?

Correct. My code was just to showcase the problem.

Mar 02 2016

Piotrek <starpit tlen.pl> writes:

On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 The main focus of this project is to bring a standard interface 
 for database clients.    This is similar to the purpose of JDBC 
 (java) and DBI (perl).  While there is some existing work in 
 place (ddbc, etc.c.odbc.sql, vibe.d and other newer projects), 
 my goal, after a lengthly period of community review/revision, 
 is to achieve Phobos inclusion for the interface standard and 
 also some of the implementations. There is debate on whether 
 the implementations belong there, but I have made the 
 implementation Phobos compatible (by using templates) while 
 this issue is sorted out.

My quick comments:

1. In my opinion it should not be called std.database, but let's 
say "std.dbc".
This is because it looks like a wrapper over db clients.  
Moreover one may say it's almost impossible to make a common and 
effective interface which would work with all databases. e.g. 
someone on Wikipedia argues ODBC is obolete (check "ODBC today")

2. I'm not against such a functionality in Phobos. However your 
project seems to be a duplication of the DDBC project. And it was 
not proposed for inclusion so often.

3. What I call a D Database API could be described as:
  - Database object
  - DbCollection object (equivalent to array)
  - ranges + std.algorithm

  And here is my implementation (experimental) of this API:
  
https://github.com/PiotrekDlang/AirLock/tree/master/docs/database/design.md
  https://github.com/PiotrekDlang/AirLock/tree/master/src

Piotrek

Mar 02 2016

Erik Smith <erik cruiserhouse.com> writes:

 1. In my opinion it should not be called std.database, but 
 let's say "std.dbc".
 This is because it looks like a wrapper over db clients.  
 Moreover one may say it's almost impossible to make a common 
 and effective interface which would work with all databases. 
 e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC 
 today")

I agree that std.dbc would be more accurate, although I like 
std.database better because it is more recognizable.  Another 
option could be std.database.client.

ODBC is less important now as there are fewer desktop 
applications that connect directly to databases.  There is at 
least one example related to D, which is the ODBC driver for 
presto, which is used in Excel.  Server side code is generally 
written using the underlying native client directly.  However 
ODBC support is useful in that it represents a catch all that 
covers all databases for which a native client is not yet 
supported.

At least for the SQL databases, a standard interface is 
definitely achievable and I believe that it having it would allow 
D to get more traction as an application platform.

 2. I'm not against such a functionality in Phobos. However your 
 project seems to be a duplication of the DDBC project. And it 
 was not proposed for inclusion so often.

There are a number of areas where this design is an improvement 
over DDBC: ease-of-use, better resource management (no scope, no 
GC), phobos compatibility, to name a few.  There is a lot more 
that needs to be added to make it standards grade.


 3. What I call a D Database API could be described as:
  - Database object
  - DbCollection object (equivalent to array)
  - ranges + std.algorithm

My design is indeed range based:  stmt.range() returns a forward 
range proxy for the query results (the accessor name might change 
however).

Your engine project is interesting.  I think there is common 
ground in the interfaces in the two projects, particularly in how 
the interface for the results might work. I will look more 
closely at the details to see what might be workable.

erik

Mar 02 2016

Piotrek <starpit tlen.pl> writes:

On Wednesday, 2 March 2016 at 17:13:32 UTC, Erik Smith wrote:
 There are a number of areas where this design is an improvement 
 over DDBC: ease-of-use, better resource management (no scope, 
 no GC), phobos compatibility, to name a few.  There is a lot 
 more that needs to be added to make it standards grade.

I agree with you we need database manipulation in Phobos. However 
modules like db, gui, xml or similar are too much work for a one 
developer. And as you can see from time to time there apears 
someone with its own vision.

That's why, long time ago, I suggested DIP73 
(http://wiki.dlang.org/DIP73) so the collaborative work would be 
controlled by the D community (or the D foundation). But I am 
aware that there is no agreement nor resources for that.

 Your engine project is interesting.  I think there is common 
 ground in the interfaces in the two projects, particularly in 
 how the interface for the results might work. I will look more 
 closely at the details to see what might be workable.

 erik

I agree that we (as a community) should work on common and 
effective APIs. Maybe when D foundation is big enough...

Piotrek

Mar 03 2016

Erik Smith <erik cruiserhouse.com> writes:

On Thursday, 3 March 2016 at 16:08:03 UTC, Piotrek wrote:
 I agree with you we need database manipulation in Phobos. 
 However modules like db, gui, xml or similar are too much work 
 for a one developer. And as you can see from time to time there 
 apears someone with its own vision.

 That's why, long time ago, I suggested DIP73 
 (http://wiki.dlang.org/DIP73) so the collaborative work would 
 be controlled by the D community (or the D foundation). But I 
 am aware that there is no agreement nor resources for that.

 I agree that we (as a community) should work on common and 
 effective APIs. Maybe when D foundation is big enough...


Your process proposal (DIP73) was helpful and gives me a better 
perspective on the standardization process. Thanks for 
referencing that Piotrek.  You are right that areas like this are 
too much work for one developer.  The only leverage I might have 
is a lot of  familiarity working with many of the native C client 
interfaces and experience implementing higher level interfaces on 
top.  That and is also a lot of existing work to draw on that can 
inform the design.  Also, my sense is that while there is less 
process for standardization in D at present, the rate at which 
progress can occur should be much higher (compared to ISOCCP, for 
example).   The last thing I want, however, is to get something 
accepted into std.experimental that is highly contentious or of 
subpar quality.

Mar 03 2016

Kagamin <spam here.lot> writes:

On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
 The only leverage I might have is a lot of  familiarity working 
 with many of the native C client interfaces and experience 
 implementing higher level interfaces on top.  That and is also 
 a lot of existing work to draw on that can inform the design.

BTW in the oracle driver you use that ODBC idiom of passing 
strings as pointer+length pairs. Why don't you use it in ODBC 
driver?

Mar 04 2016

Erik Smith <erik cruiserhouse.com> writes:

On Friday, 4 March 2016 at 16:54:33 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
 BTW in the oracle driver you use that ODBC idiom of passing 
 strings as pointer+length pairs. Why don't you use it in ODBC 
 driver?

That will be fixed in the next push.

erik

Mar 04 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 2 March 2016 at 15:41:56 UTC, Piotrek wrote:
 On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
[...]

 My quick comments:

 1. In my opinion it should not be called std.database, but 
 let's say "std.dbc".
 This is because it looks like a wrapper over db clients.  
 Moreover one may say it's almost impossible to make a common 
 and effective interface which would work with all databases. 
 e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC 
 today")

 [...]

Looks good.
At least at a first glace it seems pretty similar to SQLite.
I'd like to take this api for sqlite-d.

Because it seems to fit quite naturally.

Mar 02 2016

Kagamin <spam here.lot> writes:

On Wednesday, 2 March 2016 at 15:41:56 UTC, Piotrek wrote:
 Moreover one may say it's almost impossible to make a common 
 and effective interface which would work with all databases. 
 e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC 
 today")

I believe that section is about PHP forums being locked to mysql, 
and performance is not the reason for that. Mostly because mysql 
support is builtin, works for most things and is provided by 
hosters.

Mar 02 2016

Chris Wright <dhasenan gmail.com> writes:

On Wed, 02 Mar 2016 15:41:56 +0000, Piotrek wrote:
 3. What I call a D Database API could be described as:
   - Database object
   - DbCollection object (equivalent to array)
   - ranges + std.algorithm

It looks like you're trying to write a LevelDB analogue that implements 
an array rather than a key/value collection -- and that's a decent start.

If you're trying to connect to a SQL database or a document database, as 
I'd expect for something called "std.database", it's a pretty terrible 
API:
 * No index scans, lookups, or range queries.
 * No support for complex queries.
 * No support for joins.
 * No support for projections.
 * No support for transactions.
 * If you add support for transactions, you'll crash all the time because 
the transactions got too large, thanks to the full table scan mentality.
 * In your implementation, updates must bring every affected row over the 
wire, then send back the modified row.
 * Updates affect an entire row. If one process updates one field in a 
row and another one updates a different field, one of those writes gets 
clobbered.
 * The API assumes a total ordering for each DbCollection. This is not 
valid.
 * If there are multiple rows that compare as equals, there's no way to 
update only one of them in your implementation.
 * In your implementation, updating one row is a ϴ(N) operation. It still 
costs ϴ(N) when the row you want to update is the first one in the 
collection.

Mar 02 2016

Piotrek <starpit tlen.pl> writes:

On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
 If you're trying to connect to a SQL database or a document 
 database, as I'd expect for something called "std.database"

The thing is I strongly encourage to not reserve std.database for 
external database clients and even what is more limiting to SQL 
ones only.

 , it's a pretty terrible API:
  * No index scans, lookups, or range queries.

Indexes can be supported by strings and CTFE, can't they?
e.g.
filter!q{item.elements.length < 10 && item.model == "Sport"}

  * No support for complex queries.

Not sure what you mean by complex queries. Also I think the API 
allows arbitrary complex queries.

  * No support for joins.

Can be done by  attributes or other linking functionality between 
DbCollections.

  * No support for projections.

You mean something like referring to part of the item's fields?
I see no problem here.

  * No support for transactions.
  * If you add support for transactions, you'll crash all the 
 time because
 the transactions got too large, thanks to the full table scan 
 mentality.

Isn't it just the "index support" case?

  * In your implementation, updates must bring every affected 
 row over the
 wire, then send back the modified row.

In my implementation there is no wire (that's why I call it 
embedded). However I thought we talk about API and not particular 
implementation. I don't see how this API excludes RPC. Query 
strings (e.g. SQL) can be provided in old fashioned way.

  * Updates affect an entire row. If one process updates one 
 field in a
 row and another one updates a different field, one of those 
 writes gets
 clobbered.

I think this is just a "must have" for any db engine. I don't see 
how it applies to the proposed API other than any implementation 
of db engine has to handle it properly.

When I say DbCollection should behave similar to an ordinal array 
I don't mean it should be an ordinal array.

  * The API assumes a total ordering for each DbCollection. This 
 is not valid.

I don't know what you mean here. Example would be good.

  * If there are multiple rows that compare as equals, there's 
 no way to
 update only one of them in your implementation.
  * In your implementation, updating one row is a ϴ(N) 
 operation. It still
 costs ϴ(N) when the row you want to update is the first one in 
 the collection.

I'm still not sure if you are referring to my implementation or 
hypothetical API. To be clear: my current implementation is still 
proof of concept and surly *unfinished*. And in case you refer to 
my implementation I plan to support O(1), O(log n) and O(n) 
access patterns with its "rights and duties".

Cheers,
Piotrek

Mar 03 2016

Chris Wright <dhasenan gmail.com> writes:

On Thu, 03 Mar 2016 15:50:04 +0000, Piotrek wrote:

 On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
 If you're trying to connect to a SQL database or a document database,
 as I'd expect for something called "std.database"

 
 The thing is I strongly encourage to not reserve std.database for
 external database clients and even what is more limiting to SQL ones
 only.
 
 , it's a pretty terrible API:
  * No index scans, lookups, or range queries.

 
 Indexes can be supported by strings and CTFE, can't they?
 e.g.
 filter!q{item.elements.length < 10 && item.model == "Sport"}

You were a bit vague before. I interpreted you as saying "just offer a 
range and an array-like API, and then you can use it with std.algorithm". 
But if you meant to offer an API that is similar to std.algorithm and 
also array-like, that's more feasible.

You're still left with the task of transpiling D to SQL.

This model does not work with CouchDB.

You must avoid using std.algorithm and std.range functions assiduously 
because they would offer terrible performance.

  * No support for complex queries.

 
 Not sure what you mean by complex queries. Also I think the API allows
 arbitrary complex queries.

Aggregates, especially with joins. Computed fields.

  * No support for joins.

 
 Can be done by  attributes or other linking functionality between
 DbCollections.

With attributes, you need users to define aggregate types instead of just 
using Row and the like. That's ORM territory. At a previous job I 
maintained an internal BI site that exposed 50-100 different queries, 
each with their own set of result fields. We didn't want to use ORM 
there; it would have been cumbersome and inappropriate.

Also, that assumes that you will always want a join when querying a 
table. I maintained an application once, using ORM, in which we sometimes 
wanted an eager join and sometimes wanted a lazy one. This posed a 
nontrivial performance impact.

I'm not sure ORM would be a candidate for phobos.

  * No support for projections.

 
 You mean something like referring to part of the item's fields? I see no
 problem here.

Let me point you to the existence of the TEXT and BLOB datatypes. They 
can each hold 2**32 bytes of data in MySQL.

I'm not splitting those off into a separate table to port my legacy 
database to your API. I'm not dragging in multiple megabytes of data in 
every query.

If you're going full ORM, you can add lazy fields. That adds complexity. 
It's also inefficient when I know in advance that I need those fields.

  * No support for transactions.
  * If you add support for transactions, you'll crash all the
 time because the transactions got too large, thanks to the full table
 scan mentality.

 
 Isn't it just the "index support" case?

You didn't mention transactions at all in the initial outline. After 
that, yes, in large portion index support addresses this. DB-side 
aggregation also helps.

  * In your implementation, updates must bring every affected
 row over the wire, then send back the modified row.

 
 In my implementation there is no wire (that's why I call it embedded).
 However I thought we talk about API and not particular implementation. I
 don't see how this API excludes RPC. Query strings (e.g. SQL) can be
 provided in old fashioned way.

I'm running a website and decide that, with the latest changes, existing 
users need to get the new user email. So I write:

  UPDATE users SET sent_join_email = FALSE;
  -- ok; 1,377,212 rows affected

Or I'm using your database system. If it uses std.algorithm, I have to 
iterate through the users list, pulling each row into my process's memory 
from the database server, and then I have to write everything back to the 
database server.

Depending on the implementation, it's using a database cursor or issuing 
a new query for every K results. If it's using a database cursor, those 
might not be valid across transaction boundaries. I'm not sure. If they 
aren't, you get a large transaction, which causes problems.

If your database system instead offers a string-based API similar to 
std.algorithm, you might be able to turn this into a single query, but 
it's going to be a lot of work for you.

  * Updates affect an entire row. If one process updates one
 field in a row and another one updates a different field, one of those
 writes gets clobbered.

 
 I think this is just a "must have" for any db engine. I don't see how it
 applies to the proposed API other than any implementation of db engine
 has to handle it properly.

Without transactions, MySQL supports writing to two different columns in 
two different queries without those writes clobbering each other.

That's handling it properly.

 When I say DbCollection should behave similar to an ordinal array I
 don't mean it should be an ordinal array.
 
  * The API assumes a total ordering for each DbCollection. This
 is not valid.

 
 I don't know what you mean here. Example would be good.

opIndex(size_t offset) assumes the database supports a one-to-one mapping 
between offsets and rows.

SQLite, for one, does not guarantee query result ordering if the query 
does not include an ORDER BY clause. So offering on opIndex(size_t 
offset) operation -- either you load the entire table into memory in 
advance, or you might get the same row returned for every index.

Alternatively, you must examine the table and generate a sufficiently 
unique ordering for it. Any mutation to the table's contents can change 

collection iterators throw an exception if you modify the collection 
during iteration. But, to avoid silent errors you can't defend against, 
you have to dump the whole table into a transaction or have the database 
somehow tell you when someone else has modified it.

Speaking of modifications, let's say I write a simple loop like:
  for (size_t i = 0; i < dbTable.length; i++) {
    writeln(dbTable[i].id);
  }

This can go wrong in about four ways:
 * Someone inserts a row whose index is less than i. This prints out the 
same id twice.
 * Someone deletes a row whose index is less than i. This skips a 
different row.
 * Someone deletes a row when i == dbTable.length - 1, just before opIndex 
executes. I get an index out of bounds error.
 * This is issuing two queries per iteration. It's going to take probably 
fifty times longer than using a cursor.

This is a terrible usage pattern, but by offering opIndex and length 
operations, you are recommending it.

  * If there are multiple rows that compare as equals, there's
 no way to update only one of them in your implementation.
  * In your implementation, updating one row is a ϴ(N)
 operation. It still costs ϴ(N) when the row you want to update is the
 first one in the collection.

 
 I'm still not sure if you are referring to my implementation or
 hypothetical API. To be clear: my current implementation is still proof
 of concept and surly *unfinished*. And in case you refer to my
 implementation I plan to support O(1), O(log n) and O(n) access patterns
 with its "rights and duties".

I specifically said "in your implementation" for these two because I was 
referring to your implementation rather than your proposal in general. 
The rest refers to the basic idea.

Mar 03 2016

Piotrek <starpit tlen.pl> writes:

On Thursday, 3 March 2016 at 18:48:08 UTC, Chris Wright wrote:
 You were a bit vague before. I interpreted you as saying "just 
 offer a range and an array-like API, and then you can use it 
 with std.algorithm". But if you meant to offer an API that is 
 similar to std.algorithm and also array-like, that's more 
 feasible.

I agree I could be better in describing the concept. But I just 
sketched the idea.

 You're still left with the task of transpiling D to SQL.

If someone wants to use SQL in its *full power* no D API nor any 
other language will suffice. Mainly because it will be always a 
traslation layer . The only we can to is to provide an aid like 
things suggested by Andrei (sql parser, value binding, etc).

 This model does not work with CouchDB.

I don't know CouchDB so I can't comment.

 You must avoid using std.algorithm and std.range functions 
 assiduously because they would offer terrible performance.

For big data in db, plain vanilla std.algorithm won't be 
insufficient. I agree.

  * No support for complex queries.

 
 Not sure what you mean by complex queries. Also I think the 
 API allows arbitrary complex queries.

 Aggregates, especially with joins. Computed fields.

Regarding computed fields and other database vendor specific 
features you are right.
But on the other hand aggregations and joins can be represented 
as objects and proxies of objects.

  * No support for joins.

 
 Can be done by  attributes or other linking functionality 
 between DbCollections.

 With attributes, you need users to define aggregate types 
 instead of just using Row and the like. That's ORM territory.

I don't like ORM with respect to SQL. But quasi object database 
which can look similar to ORM is not a problem for me.

 At a previous job I maintained an internal BI site that exposed 
 50-100 different queries, each with their own set of result 
 fields. We didn't want to use ORM there; it would have been 
 cumbersome and inappropriate.

I can see your point. But the problem can be solved by not using 
SQL.

 Also, that assumes that you will always want a join when 
 querying a table. I maintained an application once, using ORM, 
 in which we sometimes wanted an eager join and sometimes wanted 
 a lazy one. This posed a nontrivial performance impact.

Something like DbProxy would handle lazy "joins".

 I'm not sure ORM would be a candidate for phobos.

As I don't plan to use an (traditional) ORM I'm not involved. 
However if other people would find it worthy I don't object.

  * No support for projections.

 
 You mean something like referring to part of the item's 
 fields? I see no problem here.

 Let me point you to the existence of the TEXT and BLOB 
 datatypes. They can each hold 2**32 bytes of data in MySQL.

This is something a DbProxy would handle. Eventually:

struct OrginalObject
{
   int id;
   string bigString;
}

struct StrippedObject
{
   int id;
}

then
auto collA = db.collection!OrginalObject("Big");
auto collA = db.collection!StrippedObject("Big");

In the second line the string is not fetched.

 I'm not splitting those off into a separate table to port my 
 legacy database to your API. I'm not dragging in multiple 
 megabytes of data in every query.

 If you're going full ORM, you can add lazy fields. That adds 
 complexity. It's also inefficient when I know in advance that I 
 need those fields.


  * In your implementation, updates must bring every affected
 row over the wire, then send back the modified row.

 
 In my implementation there is no wire (that's why I call it 
 embedded). However I thought we talk about API and not 
 particular implementation. I don't see how this API excludes 
 RPC. Query strings (e.g. SQL) can be provided in old fashioned 
 way.

 I'm running a website and decide that, with the latest changes, 
 existing users need to get the new user email. So I write:

   UPDATE users SET sent_join_email = FALSE;
   -- ok; 1,377,212 rows affected

 Or I'm using your database system. If it uses std.algorithm, I 
 have to iterate through the users list, pulling each row into 
 my process's memory from the database server, and then I have 
 to write everything back to the database server.

 Depending on the implementation, it's using a database cursor 
 or issuing a new query for every K results. If it's using a 
 database cursor, those might not be valid across transaction 
 boundaries. I'm not sure. If they aren't, you get a large 
 transaction, which causes problems.

 If your database system instead offers a string-based API 
 similar to std.algorithm, you might be able to turn this into a 
 single query, but it's going to be a lot of work for you.

For client-server approach I agree with the above. For embedded 
design (as in my project) this is not a case.

  * Updates affect an entire row. If one process updates one
 field in a row and another one updates a different field, one 
 of those
 writes gets clobbered.

 
 I think this is just a "must have" for any db engine. I don't 
 see how it applies to the proposed API other than any 
 implementation of db engine has to handle it properly.

 Without transactions, MySQL supports writing to two different 
 columns in two different queries without those writes 
 clobbering each other.

 That's handling it properly.

Designing a good locking mechanism will be a challenging task, 
that is what I'm sure :)

 When I say DbCollection should behave similar to an ordinal 
 array I don't mean it should be an ordinal array.
 
  * The API assumes a total ordering for each DbCollection. 
 This
 is not valid.

 
 I don't know what you mean here. Example would be good.

 opIndex(size_t offset) assumes the database supports a 
 one-to-one mapping between offsets and rows.

[...]
 This is a terrible usage pattern, but by offering opIndex and 
 length operations, you are recommending it.

I don't recommend it. I just added it for evaluation. I'm aware 
it only works when the collection is not mutated. I think the 
same goes for all shared collections (also those in memory)

Finally, IMO any DB API will be biased toward one solution.

Cheers
Piotrek

Mar 03 2016

Chris Wright <dhasenan gmail.com> writes:

On Fri, 04 Mar 2016 06:16:59 +0000, Piotrek wrote:

 For client-server approach I agree with the above. For embedded design
 (as in my project) this is not a case.

Which is all I'm saying. Something in std.database sounds like it should 
allow you to interact with databases, like JDBC.

With embedded databases, there's a lot of variety out there, probably a 
decent selection of tradeoffs, so I'm not sure any one would be 
appropriate to phobos.

Mar 04 2016

Piotrek <starpit tlen.pl> writes:

On Friday, 4 March 2016 at 16:41:35 UTC, Chris Wright wrote:
 With embedded databases, there's a lot of variety out there, 
 probably a decent selection of tradeoffs, so I'm not sure any 
 one would be appropriate to phobos.

The one written from scratch specially for D (I'm talking in 
general, nothing particular in mind).
This is my idea of the optimal solution. And I can be wrong of 
course.

And there is a need for a module with interface to other 
databases, something like Erik's project.

Piotrek

Mar 04 2016

Dejan Lekic <dejan.lekic gmail.com> writes:

On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

I suggest you call the package stdx.db - it is not (and may not 
become) a standard package, so `std` is out of question. If it is 
supposed to be *proposed* as standard package, then `stdx` is 
good because that is what some people have used in the past 
(while others used the ugly std.experimental. for the same 
purpose).

I humbly believe that this effort **must** be collaborative as 
such package is doomed to fail if done wrong.

Mar 03 2016

Erik Smith <erik cruiserhouse.com> writes:

 I suggest you call the package stdx.db - it is not (and may not 
 become) a standard package, so `std` is out of question. If it 
 is supposed to be *proposed* as standard package, then `stdx` 
 is good because that is what some people have used in the past 
 (while others used the ugly std.experimental. for the same 
 purpose).

 I humbly believe that this effort **must** be collaborative as 
 such package is doomed to fail if done wrong.

I totally agree that it must be collaborative and community 
driven.  I failed to add the proposed qualifier in this thread - 
sorry about that.   Right now I'm just trying to put together 
enough of a substantive design to be worthy of  discussion.   
While I'm presenting a design the way I think it should work, I'm 
definitely asking for feedback and especially opposition to any 
aspect of it.   My package name choice is just an example, the 
package name would have to be changed to whatever the "D 
committee" thinks is appropriate.

erik

Mar 03 2016

Kagamin <spam here.lot> writes:

Also member names: methods are named after verbs, you use nouns. 
Method `next` is ambiguous: is the first row the next row? 
`fetch` or `fetchRow` would be better.

Mar 03 2016

Erik Smith <erik cruiserhouse.com> writes:

On Thursday, 3 March 2016 at 15:07:43 UTC, Kagamin wrote:
 Also member names: methods are named after verbs, you use 
 nouns. Method `next` is ambiguous: is the first row the next 
 row? `fetch` or `fetchRow` would be better.

Those are actually internal methods not intended for the 
interface user.  They might get exposed in some way to the 
implementer so I'll have to consider this issue.

However, If I apply what you are suggesting to the the interface 
methods, it would look like this:

auto r = db.connection().statement("select from t").range();  // 
nouns

auto r = db.getConnection().getStatement("select from 
t").getRange();  // verbs

Maybe there is a more creative way to change the names rather 
than prepending with get (let me know if you have suggestions), 
but the nouns seem cleaner to me.  I'll have to look at more 
phobos code to get a better sense of the conventions.  Also range 
will be replaced by an opSlice operator.

erik

Mar 03 2016

Kagamin <spam here.lot> writes:

On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
 auto r = db.connection().statement("select from t").range();  
 // nouns

db.execute("select from t").range();
`range` is probably ok.
Or
auto connection = db.createConnection();
connection.execute("select from t").range();

 auto r = db.getConnection().getStatement("select from 
 t").getRange();  // verbs

Getters must be actually nouns, but these are not getters, but 
factory methods: createConnection, createStatement, probably low 
level and/or private.

Mar 03 2016

Kagamin <spam here.lot> writes:

Other options:
db.execute("select from t").reader; //row range
db.execute("select from t").get!long; //scalar
db.execute("select from t"); //non-query

Mar 03 2016

Kagamin <spam here.lot> writes:

On the other hand execute can simply return the reader with extra 
getter for scalar result. Just don't do stuff until it's 
iterated. Is it possible?

auto rows = db.execute("select * from t");

Mar 03 2016

Erik Smith <erik cruiserhouse.com> writes:

On Thursday, 3 March 2016 at 17:03:58 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
 auto r = db.connection().statement("select from t").range();  
 // nouns

 db.execute("select from t").range();
 `range` is probably ok.
 Or
 auto connection = db.createConnection();
 connection.execute("select from t").range();

 auto r = db.getConnection().getStatement("select from 
 t").getRange();  // verbs

 Getters must be actually nouns, but these are not getters, but 
 factory methods: createConnection, createStatement, probably 
 low level and/or private.

Good point.   I will track this as a design option to debate.

 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query

More good options (the 3rd one is there).   Also at the value 
access level there are several options to consider: v.get!long, 
v.as!long, v.to!long, etc.

 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it possible?
 auto rows = db.execute("select * from t");

I'm hedging a bit on this because there are other capabilities 
that need to be introduced that might present a problem.   
Another issue is that this might conflict with the notion of a 
container and the use of opSlice.  Great feedback though and I'm 
tracking it.

erik

Mar 03 2016

Kagamin <spam here.lot> writes:

On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query

 More good options (the 3rd one is there).   Also at the value 
 access level there are several options to consider: v.get!long, 
 v.as!long, v.to!long, etc.

to!long looks ok.

 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it 
 possible?
 auto rows = db.execute("select * from t");

 I'm hedging a bit on this because there are other capabilities 
 that need to be introduced that might present a problem.

Can you elaborate?
On the other hand these helper methods are built on top of 
abstraction API and you have to duplicate them in all drivers. 
Maybe better have them as extension methods in a single module 
that will work with all drivers? They are effectively a sort of 
minimal frontend already.

 Another issue is that this might conflict with the notion of a 
 container and the use of opSlice.  Great feedback though and 
 I'm tracking it.

The result set is even iterator/stream, i.e. conceptually has 
even less container properties than ranges themselves. Why would 
you think of it as a container? I think it's ok as input range. 
Anyway, this `execute` method is a frontend, i.e. it's 
replaceable without touching the driver.

Mar 04 2016

Erik Smith <erik cruiserhouse.com> writes:

On Friday, 4 March 2016 at 11:57:49 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query

 More good options (the 3rd one is there).   Also at the value 
 access level there are several options to consider: 
 v.get!long, v.as!long, v.to!long, etc.

 to!long looks ok.

 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it 
 possible?
 auto rows = db.execute("select * from t");

 I'm hedging a bit on this because there are other capabilities 
 that need to be introduced that might present a problem.

 Can you elaborate?

Actually I like this and I think it can work.  I'm trying to keep 
a single execute function name for both row/no-row queries. I can 
still return the range proxy for no-row queries that is either 
empty or throws on access.

 On the other hand these helper methods are built on top of 
 abstraction API and you have to duplicate them in all drivers. 
 Maybe better have them as extension methods in a single module 
 that will work with all drivers? They are effectively a sort of 
 minimal frontend already.

I'm just waiting for the code to settle a bit in a basic working 
state before I refactor into a two layer design.

 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as input 
 range. Anyway, this `execute` method is a frontend, i.e. it's 
 replaceable without touching the driver.

The range itself is an InputRange.  The statement is acting as a 
container only in that it is the source of the range and holds 
the data.  I agree that it is confusing to use the term container 
although it seems to fit the definition of one.

Mar 04 2016

Kagamin <spam here.lot> writes:

On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
 Actually I like this and I think it can work.  I'm trying to 
 keep a single execute function name for both row/no-row 
 queries. I can still return the range proxy for no-row queries 
 that is either empty or throws on access.

Yes, that's the idea.

 I'm just waiting for the code to settle a bit in a basic 
 working state before I refactor into a two layer design.

Another idea from previous attempts: rename `execute` to `query`.
foreach(row; db.query("select * from t"))
{ ... }
And name for package: std.sql

 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as input 
 range. Anyway, this `execute` method is a frontend, i.e. it's 
 replaceable without touching the driver.

 The range itself is an InputRange.  The statement is acting as 
 a container only in that it is the source of the range and 
 holds the data.  I agree that it is confusing to use the term 
 container although it seems to fit the definition of one.

`execute` should not return a statement, but a result set reader, 
that would be a range. Yeah, the result set is indeed a range 
conceptually: it has a definite number of ordered rows, a 
beginning and an end.

Mar 04 2016

Erik Smith <erik cruiserhouse.com> writes:

On Friday, 4 March 2016 at 16:43:00 UTC, Kagamin wrote:
 On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
 Actually I like this and I think it can work.  I'm trying to 
 keep a single execute function name for both row/no-row 
 queries. I can still return the range proxy for no-row queries 
 that is either empty or throws on access.

 Yes, that's the idea.

 I'm just waiting for the code to settle a bit in a basic 
 working state before I refactor into a two layer design.

 Another idea from previous attempts: rename `execute` to 
 `query`.
 foreach(row; db.query("select * from t"))
 { ... }
 And name for package: std.sql

 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as 
 input range. Anyway, this `execute` method is a frontend, 
 i.e. it's replaceable without touching the driver.

 The range itself is an InputRange.  The statement is acting as 
 a container only in that it is the source of the range and 
 holds the data.  I agree that it is confusing to use the term 
 container although it seems to fit the definition of one.

 `execute` should not return a statement, but a result set 
 reader, that would be a range. Yeah, the result set is indeed a 
 range conceptually: it has a definite number of ordered rows, a 
 beginning and an end.



On further thought, execute() should definitely return something 
but I think it needs to return a Result (alternative name 
RowSet), the actual container, rather than the range.  This more 
cleanly separates out the post execute state management into a 
separate object from Statement.  The user will often want other 

columns, columns names/types) describing the returned table and 
RowSet provides convenient access to it. This also suggests that 
execute should be explicit & once/only, which I think is better.  
Here is an updated example to illustrate:

     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
from score").execute;
     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);

I'll track query as an alternative name for execute() and std.sql 
as alternative for std.database.

Mar 04 2016

Bubbasaur <bubba gmail.com> writes:

On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);

 I'll track query as an alternative name for execute() and 
 std.sql as alternative for std.database.

Why not only:

auto rowSet = db.connection().query("select name,scorefrom 
score");

Or instead of query maybe "returnRows" or just "rows", I think 
the other way is too verbose.

Bubba.

Mar 04 2016

Erik Smith <erik cruiserhouse.com> writes:

On Friday, 4 March 2016 at 19:27:47 UTC, Bubbasaur wrote:
 On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) 
 writeln(r[0].as!string,",",r[1].as!int);

 I'll track query as an alternative name for execute() and 
 std.sql as alternative for std.database.

 Why not only:

 auto rowSet = db.connection().query("select name,scorefrom 
 score");

 Or instead of query maybe "returnRows" or just "rows", I think 
 the other way is too verbose.

 Bubba.

Agree connection execute is simpler and it is supported, although 
I need to add the bind version to connection as well.   The 
statement should only get involved when you need more specific 
control over the input binding.  I add returnRows/rows to the 
execute/query alternative name list.

erik

Mar 04 2016

Sebastiaan Koppe <mail skoppe.eu> writes:

On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);

You'll want to have some types in there. As in

struct S
{
   string name;
   int age;
}

auto rowSet = db.connection().statement("select name,score from 
score").executeAs!S;
foreach (r; rowSet) writeln(r.name,",",r.age);

Mar 04 2016

Erik Smith <erik cruiserhouse.com> writes:

On Friday, 4 March 2016 at 22:44:24 UTC, Sebastiaan Koppe wrote:
 On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) 
 writeln(r[0].as!string,",",r[1].as!int);

 You'll want to have some types in there. As in

 struct S
 {
   string name;
   int age;
 }

 auto rowSet = db.connection().statement("select name,score from 
 score").executeAs!S;
 foreach (r; rowSet) writeln(r.name,",",r.age);


I think some basic object serialization capabilities would be 
great although I'm not sure how the bare names can be accessed 
like that through the rowSet  How would that work?  I can see 
this:

S s;
auto rowSet = db.connection().statement("select name,score from 
score").into(s);
writeln(s.name,",",s.age);


Two other options (not really serialization):

// 1
string name;
int age;
auto rowSet = db.connection().execute("select name,age from 
score").into(s.name,age);
foreach (r; rowSet) writeln(name,",",age);

// 2
foreach (r; rowSet) writeln(r["name"],",",r["age"]);

Mar 04 2016

Sebastiaan Koppe <mail skoppe.eu> writes:

On Friday, 4 March 2016 at 23:55:55 UTC, Erik Smith wrote:
 I think some basic object serialization capabilities would be 
 great although I'm not sure how the bare names can be accessed 
 like that through the rowSet How would that work?

I did a project a while back using mysql-native (see 
code.dlang.org) and it has a `toStruct` function on its Row type. 
Maybe have a look there.

To access the bare named RowSet would have to be templated on the 
struct.

Follows code from said project:

template executeAs(Type)
{
	struct TypedSQLSequence
	{
		ResultSequence rs;
		alias rs this;
		 property Type front()
		{
			Row r = rs.front;
                         Type temp;
			r.toStruct!Type(temp);
			return temp;
		}
	}
	auto executeAs(ref mysql.connection.ResultSequence rs)
	{
		return TypedSQLSequence(rs);
	}
}

Mar 05 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 3 March 2016 at 11:16:03 UTC, Dejan Lekic wrote:
 On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

 I suggest you call the package stdx.db - it is not (and may not 
 become) a standard package, so `std` is out of question. If it 
 is supposed to be *proposed* as standard package, then `stdx` 
 is good because that is what some people have used in the past 
 (while others used the ugly std.experimental. for the same 
 purpose).

 I humbly believe that this effort **must** be collaborative as 
 such package is doomed to fail if done wrong.

std.experimental, ugly or not, is what is in phobos.
See std.experimental.allocator, std.experimental.logger and 
std.experimental.ndslice

Mar 03 2016

Saurabh Das <saurabh.das gmail.com> writes:

On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

 [...]

A little late to the party, nevertheless: Thanks for doing this, 
it will be super-helpful!

My only feature request will be: please make it work with minimal 
effort with Vibe.D! :)

~ sd

Mar 05 2016

Jacob Carlborg <doob me.com> writes:

On 2016-03-05 11:23, Saurabh Das wrote:

 A little late to the party, nevertheless: Thanks for doing this, it will
 be super-helpful!

 My only feature request will be: please make it work with minimal effort
 with Vibe.D! :)

Yeah, that's really important. Unfortunately it looks like a 
synchronous interface is prioritized :(.

-- 
/Jacob Carlborg

Mar 05 2016

Erik Smith <erik cruiserhouse.com> writes:

On Saturday, 5 March 2016 at 13:13:09 UTC, Jacob Carlborg wrote:
 On 2016-03-05 11:23, Saurabh Das wrote:

 A little late to the party, nevertheless: Thanks for doing 
 this, it will
 be super-helpful!

 My only feature request will be: please make it work with 
 minimal effort
 with Vibe.D! :)

 Yeah, that's really important. Unfortunately it looks like a 
 synchronous interface is prioritized :(.


I'm definitely going to start working in async capability (or 
more accurately, non-blocking) into the interface.   Both models 
are essential and there are strong use cases for both, but I know 
there is a lot of interest in NBIO for vibe.d compatibility.  
This is challenging and, as many are aware, the C clients for SQL 
databases have traditionally been synchronous only.  One 
suggested approach is to fork the driver source and change the 
I/O calls to be non-blocking. Another is to implement a 
non-blocking driver directly against the wire protocol (a 
"native" driver).  These are limited options, but to the extent 
this approach is viable in some cases (mysql-native for example), 
they could be adapted to a standard interface.  The good news is 
that some databases do have (or are working on) non-blocking 
support in their C clients. The webscalesql mysql fork, for 
example, has a non-blocking client that is in production and 
their client works with regular mysql databases.  That fork 
should eventually be merged back into mysql (or may have been 
already).  Postgres also provides non-blocking support.  These 
are two cases that I'm targeting for initial reference 
implementations.

Mar 05 2016

Jacob Carlborg <doob me.com> writes:

On 2016-03-05 19:00, Erik Smith wrote:

 I'm definitely going to start working in async capability (or more
 accurately, non-blocking) into the interface.   Both models are
 essential and there are strong use cases for both, but I know there is a
 lot of interest in NBIO for vibe.d compatibility. This is challenging
 and, as many are aware, the C clients for SQL databases have
 traditionally been synchronous only.  One suggested approach is to fork
 the driver source and change the I/O calls to be non-blocking. Another
 is to implement a non-blocking driver directly against the wire protocol
 (a "native" driver).  These are limited options, but to the extent this
 approach is viable in some cases (mysql-native for example), they could
 be adapted to a standard interface.  The good news is that some
 databases do have (or are working on) non-blocking support in their C
 clients. The webscalesql mysql fork, for example, has a non-blocking
 client that is in production and their client works with regular mysql
 databases.  That fork should eventually be merged back into mysql (or
 may have been already).  Postgres also provides non-blocking support.
 These are two cases that I'm targeting for initial reference
 implementations.

Just for the record, there's already a native Postgres client that is 
non-blocking and vibe.d compatible. https://github.com/pszturmaj/ddb

-- 
/Jacob Carlborg

Mar 05 2016

Kagamin <spam here.lot> writes:

On Saturday, 5 March 2016 at 18:00:56 UTC, Erik Smith wrote:
 I'm definitely going to start working in async capability (or 
 more accurately, non-blocking) into the interface.

Huh? Non-blocking operation is not reflected in interface. Only 
asynchronous requires special interface. Vibe interfaces are 
synchronous.

 Postgres also provides non-blocking support.  These are two 
 cases that I'm targeting for initial reference implementations.

I've looked into OCI headers, they say something about 
asynchronous operations, do you know something about that?

Mar 06 2016

Erik Smith <erik cruiserhouse.com> writes:

On Sunday, 6 March 2016 at 08:53:05 UTC, Kagamin wrote:
 On Saturday, 5 March 2016 at 18:00:56 UTC, Erik Smith wrote:
 I'm definitely going to start working in async capability (or 
 more accurately, non-blocking) into the interface.

 Huh? Non-blocking operation is not reflected in interface. Only 
 asynchronous requires special interface. Vibe interfaces are 
 synchronous.

I think a basic non-blocking interface should be similar to the 
synchronous one, but also exposes a file/socket descriptor to 
pass to the event system and also provide status method for 
whether the R/W operation is waiting.

 I've looked into OCI headers, they say something about 
 asynchronous operations, do you know something about that?

I need to look into this more, but the OCI non-blocking model is 
problematic (a missing file descriptor maybe).  It can probably 
be made to work since Node now has a driver directly from Oracle, 
but I'm not sure if it has the same scaling profile as other 
non-blocking clients.

Mar 06 2016

Erik Smith <erik cruiserhouse.com> writes:

I've made a few updates based on some of the feedback in this 
thread.

- execute() renamed to query()
- query with input binds directly from connection
- query() returns result for chaining

The design is still early stage.  I've got a growing list of 
design options which I'll bring into discussion once the 
interface becomes more stable.

Other updates:

- postgres reference implementation added (synchronous only at 
moment)
- DUB package now available as dstddb
- DMD "cyclic structs" bug fix may land soon (PR appeared) with 
design improvements to follow.

erik

Mar 10 2016

D Programming

C/C++ Programming

Other

digitalmars.D - std.database