www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [WIP] Native SQLite Database reader (works at CTFE)

reply Stefan Koch <uplink.coder googlemail.com> writes:
Hello,

It is not quite ready to post in Announce,
but I would like to inform you that I am planing to open-source 
my native-sqlite database driver. (well currently it just reads 
them).

However it works fully at CTFE.
so if you want to you can extract sourceCode out of a 
sql-database and make a mixin with it :D
given you string imported the database file.
Feb 18 2016
next sibling parent =?UTF-8?B?TcOhcmNpbw==?= Martins <marcioapm gmail.com> writes:
On Thursday, 18 February 2016 at 21:09:15 UTC, Stefan Koch wrote:
 Hello,

 It is not quite ready to post in Announce,
 but I would like to inform you that I am planing to open-source 
 my native-sqlite database driver. (well currently it just reads 
 them).

 However it works fully at CTFE.
 so if you want to you can extract sourceCode out of a 
 sql-database and make a mixin with it :D
 given you string imported the database file.
Mother of god! Looking forward to it!
Feb 18 2016
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 02/18/2016 04:09 PM, Stefan Koch wrote:
 Hello,

 It is not quite ready to post in Announce,
 but I would like to inform you that I am planing to open-source my
 native-sqlite database driver. (well currently it just reads them).

 However it works fully at CTFE.
 so if you want to you can extract sourceCode out of a sql-database and
 make a mixin with it :D
 given you string imported the database file.
That's very exciting! Are you planning a DConf talk on the topic? -- Andrei
Feb 18 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 19 February 2016 at 01:28:11 UTC, Andrei Alexandrescu 
wrote:

 That's very exciting! Are you planning a DConf talk on the 
 topic? -- Andrei
I would love to give a talk on this.
Feb 19 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/19/16 11:03 AM, Stefan Koch wrote:
 On Friday, 19 February 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote:

 That's very exciting! Are you planning a DConf talk on the topic? --
 Andrei
I would love to give a talk on this.
Submission deadline is Feb 29. -- Andrei
Feb 19 2016
prev sibling next sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Thursday, 18 February 2016 at 21:09:15 UTC, Stefan Koch wrote:
 I am planing to open-source my native-sqlite database driver. 
 (well currently it just reads them).

 However it works fully at CTFE.
Interesting... I'd almost want a page with the ddoc information to glance over the API, and a couple code snippets of how you'd see it working. <snip> I'll look forward to seeing this when it's out :)
Feb 18 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 19 February 2016 at 06:40:08 UTC, Era Scarecrow wrote:

  Interesting... I'd almost want a page with the ddoc 
 information to glance over the API, and a couple code snippets 
 of how you'd see it working. <snip>

  I'll look forward to seeing this when it's out :)
Well I can do that :) http://dpaste.dzfl.pl/d8ef67f4b22b here is a taste of the internal API please feel free to post improvements or suggestions
Feb 19 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 19 February 2016 at 16:20:50 UTC, Stefan Koch wrote:
 On Friday, 19 February 2016 at 06:40:08 UTC, Era Scarecrow 
 wrote:

  Interesting... I'd almost want a page with the ddoc 
 information to glance over the API, and a couple code snippets 
 of how you'd see it working. <snip>

  I'll look forward to seeing this when it's out :)
Well I can do that :) http://dpaste.dzfl.pl/d8ef67f4b22b here is a taste of the internal API please feel free to post improvements or suggestions
... oh crap is misspelled algorithm ...
Feb 19 2016
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
just a small update on the API
It could look something like this

auto table = db.tables("personal");
auto results = table.select("name","surname").where!("age","sex", 
(age, sex) => sex.as!Sex == Sex.female, age.as!uint < 40));
auto names = results[0].as!string;
auto surnames = results[0].as!string;

auto tbl_names = 
db.tables("master_table_copy").select("name").where!((type) => 
type.as!string == "table")("type").as!string;

please comment!
Feb 21 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 21 Feb 2016 16:05:49 +0000, Stefan Koch wrote:

 just a small update on the API It could look something like this
 
 auto results = table.select("name","surname").where!("age","sex",
 (age, sex) => sex.as!Sex == Sex.female, age.as!uint < 40));
That means never using an index.
Feb 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 21 February 2016 at 18:08:30 UTC, Chris Wright wrote:
 That means never using an index.
How so ?
Feb 21 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 21 Feb 2016 18:12:27 +0000, Stefan Koch wrote:

 On Sunday, 21 February 2016 at 18:08:30 UTC, Chris Wright wrote:
 That means never using an index.
How so ?
Or rather, parsing a query to use an index becomes exponential unless you parse X86 instructions. So in order to evaluate the lambda, you have to invoke it. You can't override <= separately from <. You only have an opCmp invocation and return value. So with a query filter like: query.where!("age", (age) => age >= 18); You know it's being compared to 10, but you don't know what comparison. So the first time, you have opCmp return -1, and the lambda yields false. Try again, have opCmp return 0, and the lambda yields true. Try a third time, have opCmp return 1, and it yields true. That's three invocations for one variable. What about two variables? query.where!("age,parental_supervision", (age, parentalSupervision) => age >= 18 || parentalSupervision == true); Now I have to try three different values for the 'age' expression and two for the 'parentalSupervision' variable, and there's a combinatorial expansion. That yields a giant truth table, and that's probably good enough. You look at which columns have an index and, of those, which have the best correlation with the lambda's result. That even lets you reduce the amount of memory you use -- you don't need to hold the whole truth table, just a series of counters. But you're still dealing with O(2**n) evaluations of the lambda.
Feb 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 21 February 2016 at 18:32:29 UTC, Chris Wright wrote:
 Or rather, parsing a query to use an index becomes exponential 
 unless you parse X86 instructions.

 D doesn't let you access the contents of a lambda expression. 

 invoke it.

 You can't override <= separately from <. You only have an opCmp 
 invocation and return value. So with a query filter like:

 query.where!("age", (age) => age >= 18);

 You know it's being compared to 10, but you don't know what 
 comparison. So the first time, you have opCmp return -1, and 
 the lambda yields false. Try again, have opCmp return 0, and 
 the lambda yields true. Try a third time, have opCmp return 1, 
 and it yields true.

 That's three invocations for one variable. What about two 
 variables?

 query.where!("age,parental_supervision",
   (age, parentalSupervision) => age >= 18 || 
 parentalSupervision == true);

 Now I have to try three different values for the 'age' 
 expression and two for the 'parentalSupervision' variable, and 
 there's a combinatorial expansion.

 That yields a giant truth table, and that's probably good 
 enough. You look at which columns have an index and, of those, 
 which have the best correlation with the lambda's result. That 
 even lets you reduce the amount of memory you use -- you don't 
 need to hold the whole truth table, just a series of counters. 
 But you're still dealing with O(2**n) evaluations of the lambda.
I don't parse anything. the string "age" gets evaluated once so I know which columns your lambda is referring to then the lambda gets evaluated exactly n times where n is the number of rows. I your example above you should have written : query.where!("age","parentalSupervision", (age,pa) => age.as!uint
= 18 || pa == true)
Feb 21 2016
next sibling parent reply Any <hot_katjans hotmail.com> writes:
On Sunday, 21 February 2016 at 19:21:31 UTC, Stefan Koch wrote:
 where n is the number of rows.
That means your doing a full table scan. When the table gets large enough, this gets problematic.
Feb 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 21 February 2016 at 19:55:38 UTC, Any wrote:
 On Sunday, 21 February 2016 at 19:21:31 UTC, Stefan Koch wrote:
 where n is the number of rows.
That means your doing a full table scan. When the table gets large enough, this gets problematic.
When the table get's large enough you probably don't worry about CTFE'ing it anymore.
Feb 21 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 21 Feb 2016 21:15:01 +0000, Stefan Koch wrote:

 On Sunday, 21 February 2016 at 19:55:38 UTC, Any wrote:
 On Sunday, 21 February 2016 at 19:21:31 UTC, Stefan Koch wrote:
 where n is the number of rows.
That means your doing a full table scan. When the table gets large enough, this gets problematic.
When the table get's large enough you probably don't worry about CTFE'ing it anymore.
So you intend this to work *only* at compile time? Or would you supply a different API for querying at runtime?
Feb 21 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 22 February 2016 at 03:33:27 UTC, Chris Wright wrote:
 On Sun, 21 Feb 2016 21:15:01 +0000, Stefan Koch wrote:

 On Sunday, 21 February 2016 at 19:55:38 UTC, Any wrote:
 On Sunday, 21 February 2016 at 19:21:31 UTC, Stefan Koch 
 wrote:
 where n is the number of rows.
That means your doing a full table scan. When the table gets large enough, this gets problematic.
When the table get's large enough you probably don't worry about CTFE'ing it anymore.
So you intend this to work *only* at compile time? Or would you supply a different API for querying at runtime?
I intend to have a D-style SQLite compatible Database solution. However I will not reinvent MySql or postgres. Performance is important to me. Scaling to bigger Data however is secondary at this point.
Feb 21 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
Another small update.

So far ldc has the lead in my performance test
ldc2 produces significantly faster results then gdc or dmd.
I will have a look at the IR and see how I can "port" the 
optimisations it does to the sourceCode.
Feb 22 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 22 February 2016 at 14:11:46 UTC, Stefan Koch wrote:
 Another small update.

 So far ldc has the lead in my performance test
 ldc2 produces significantly faster results then gdc or dmd.
 I will have a look at the IR and see how I can "port" the 
 optimisations it does to the sourceCode.
After rewriting the engine to allocate less gc memory, gdc is faster then ldc again. - In a slightly unfair benchmark I had an 3.5x performance advantage over sqlite. (unfair, because I don't parse SQL and just extract the colums raw) yet at the and of the day, this is how SQLite is meant to be used. As a convenient data-storage. more news : Write support is coming! After that is finished let's see by how much I can beat sqlite dong bulk inserts.
Feb 26 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 26 February 2016 at 09:53:46 UTC, Stefan Koch wrote:

 Write support is coming!
 After that is finished let's see by how much I can beat sqlite 
 dong bulk inserts.
dong => doing. - anyone who wants to help bench-marking please write me. uplink DOT coder AT gmail C O M
Feb 26 2016
prev sibling parent reply Laeeth Isharc <laeethnospam nospam.laeeth.com> writes:
On Monday, 22 February 2016 at 03:39:57 UTC, Stefan Koch wrote:
 On Monday, 22 February 2016 at 03:33:27 UTC, Chris Wright wrote:
 On Sun, 21 Feb 2016 21:15:01 +0000, Stefan Koch wrote:

 On Sunday, 21 February 2016 at 19:55:38 UTC, Any wrote:
 On Sunday, 21 February 2016 at 19:21:31 UTC, Stefan Koch 
 wrote:
 where n is the number of rows.
That means your doing a full table scan. When the table gets large enough, this gets problematic.
When the table get's large enough you probably don't worry about CTFE'ing it anymore.
So you intend this to work *only* at compile time? Or would you supply a different API for querying at runtime?
I intend to have a D-style SQLite compatible Database solution. However I will not reinvent MySql or postgres. Performance is important to me. Scaling to bigger Data however is secondary at this point.
Great project, Stefan. Any idea what kind of maximum database size will be feasible ? I realise it is early days so far and not your main focus. Laeeth
Feb 27 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 28 February 2016 at 01:45:48 UTC, Laeeth Isharc wrote:
 Great project, Stefan.  Any idea what kind of maximum database 
 size will be feasible ?  I realise it is early days so far and 
 not your main focus.

 Laeeth
The limits will be almost the same as in sqlite. So a few 100TB
Feb 27 2016
parent Laeeth Isharc <nospamlaeeth nospamlaeeth.com> writes:
On Sunday, 28 February 2016 at 01:55:50 UTC, Stefan Koch wrote:
 On Sunday, 28 February 2016 at 01:45:48 UTC, Laeeth Isharc 
 wrote:
 Great project, Stefan.  Any idea what kind of maximum database 
 size will be feasible ?  I realise it is early days so far and 
 not your main focus.

 Laeeth
The limits will be almost the same as in sqlite. So a few 100TB
nice. and thanks. Laeeth.
Feb 28 2016
prev sibling parent Chris Wright <dhasenan gmail.com> writes:
On Sun, 21 Feb 2016 19:21:31 +0000, Stefan Koch wrote:
 I don't parse anything.
Right, which prevents you from using indexes.
Feb 21 2016
prev sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Sunday, 21 February 2016 at 16:05:49 UTC, Stefan Koch wrote:
 just a small update on the API
 It could look something like this

 auto table = db.tables("personal");
 auto results = 
 table.select("name","surname").where!("age","sex", (age, sex) 
 => sex.as!Sex == Sex.female, age.as!uint < 40));
 auto names = results[0].as!string;
 auto surnames = results[0].as!string;
Looks good! Although took me a little bit to notice the Lambda :P The result from names, surnames, once you access the entry it consumes it? That seems wrong. Hmmm is there a short example of how the left/right/natural joins would be done? I almost worry if there will be heavy uses for lambdas for that as well. Also the heavy template/explicit type use of _as_ makes me wonder if those can be avoided somehow. Then again SQLite might not care enough from what i recall so it may be unavoidable.
Feb 21 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 22 February 2016 at 02:52:41 UTC, Era Scarecrow wrote:
  Looks good! Although took me a little bit to notice the Lambda 
 :P

  The result from names, surnames, once you access the entry it 
 consumes it? That seems wrong.

  Hmmm is there a short example of how the left/right/natural 
 joins would be done? I almost worry if there will be heavy uses 
 for lambdas for that as well.

  Also the heavy template/explicit type use of _as_ makes me 
 wonder if those can be avoided somehow. Then again SQLite might 
 not care enough from what i recall so it may be unavoidable.
The Intention is not to be another SQLite but rather to provide a compatible D-style solution. As for the conversions they could be avoided. However SQLite does not gurantee a type for a colum... And you cannot use costum types when you don't convert explicitly.
Feb 21 2016
prev sibling next sibling parent WebFreak001 <janju007 web.de> writes:
On Thursday, 18 February 2016 at 21:09:15 UTC, Stefan Koch wrote:
 Hello,

 It is not quite ready to post in Announce,
 but I would like to inform you that I am planing to open-source 
 my native-sqlite database driver. (well currently it just reads 
 them).

 However it works fully at CTFE.
 so if you want to you can extract sourceCode out of a 
 sql-database and make a mixin with it :D
 given you string imported the database file.
Awesome! Really looking forward to it. It will probably change the way I store data forever. I currently do it in json files, which suck for simply storing data :D
Feb 21 2016
prev sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
Another update :
Performance is now 3 times better then with my initial version.

Tip of the Day  : constantly reevaluate your design-decisions.
Feb 27 2016