www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Interesting performance data-point

reply Don Allen <donaldcallen gmail.com> writes:
As I've mentioned in previous messages, I've ported my personal 
finance package from C to D, having first ported some of it to 
Rust until I just couldn't stand it anymore.

One of the utilities that exists in both the D and Rust versions 
reads .csv files downloaded from American Express and loads the 
transactions into the Sqlite database that contains my financial 
data, trying to assign an expense account to each incoming 
transaction by fuzzy-comparing the transaction's description to 
existing transactions, using an algorithm based on Levenshtein 
distance. The Levenshtein calculation is done using a 
user-defined Sqlite function that is loaded as an extension.

What I've found is that the D version of this utility is about 
twice as fast (compiled with DMD) as the Rust version to get 
identical results. While I haven't done detailed enough 
measurements to explain the performance disparity with certainty, 
I've done enough to know that both versions spend most of their 
time in the Levenshtein distance function.

But I have a theory that I think is the likely explanation. And 
if I'm correct, it highlights one of D's strongest points -- the 
ability to call C libraries directly, without the need for an 
elaborate interface layer.

What I think is going on is that rusqlite, the crate that is 
Rust's primary Sqlite interface package, does not provide a way 
to step through the results of a select query, as the Sqlite 
library itself does, stopping when you are happy. Instead, you 
run the 'query' method (or one of its variants) on a prepared 
statement, which either returns an iterator for you to access all 
the returned rows or calls a closure to process each row. This 
difference matters when each row involves an expensive 
calculation.

In my case, I want the most recent transaction that meets the 
Levenshtein distance criterion, which will be the first row in 
the result set, since I order them by post-date descending. In D, 
I am able to step the match query and either I get a row or I 
don't. If I do, I stop, use that transaction's expense account 
and I'm done. The entire result set is not computed. In Rust, 
rusqlite computes the entire result set, which is expensive due 
to the Levenshtein calculation, and then hands it to me row by 
row.

It is not a simple matter to convince Sqlite to restrict the 
result set to the most recent row. 'limit 1' makes no difference 
in the Rust application's performance (I tried it). Apparently 
Sqlite applies 'limit' after computing the result set. There 
*may* be a way to do this using Sqlite's windowing capability, 
but that's a bit of a research project that I have no inclination 
to take on.

I have also not found a Rust crate that provides step-level 
control over Sqlite *and* lets you load extensions.

I think this illustrates a strength of D that I don't think 
enough people understand -- the ability to talk directly and 
easily to the C world. People complain that D doesn't have a rich 
set of libraries. It doesn't need one; all the C libraries are 
almost as easily accessible from D as they are from C or C++. And 
this has gotten even easier with the advent of ImportC, which I 
think is a very important addition to D and worth continued 
development to hide the craziness in C header files.

In my case, in D, I can use a straight-forward query and have the 
same simple interaction with Sqlite that I would have in C. There 
may be a way to match D's performance in this case with Rust, but 
it would require effort, perhaps a lot. This is typical of the 
Rust experience compared to D. Things are just more difficult, 
mainly because the user plays a bigger role in memory management 
in Rust than in languages, like D, that provide a GC (I simply do 
not understand the anti-GC religious fanatics, especially when we 
are talking about ordinary applications on today's multi-ghz 
hardware with huge amounts of memory). D's performance is 
comparable (except in the case of the AMEX utility, where it is a 
lot better) and the code is more readable. Unfortunately, people 
jump on band-wagons mindlessly.
Dec 31 2024
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 31 December 2024 at 15:36:31 UTC, Don Allen wrote:
 As I've mentioned in previous messages, I've ported my personal 
 finance package from C to D, having first ported some of it to 
 Rust until I just couldn't stand it anymore.

 One of the utilities that exists in both the D and Rust 
 versions reads .csv files downloaded from American Express and 
 loads the transactions into the Sqlite database that contains 
 my financial data, trying to assign an expense account to each 
 incoming transaction by fuzzy-comparing the transaction's 
 description to existing transactions, using an algorithm based 
 on Levenshtein distance. The Levenshtein calculation is done 
 using a user-defined Sqlite function that is loaded as an 
 extension.

 What I've found is that the D version of this utility is about 
 twice as fast (compiled with DMD) as the Rust version to get 
 identical results. While I haven't done detailed enough 
 measurements to explain the performance disparity with 
 certainty, I've done enough to know that both versions spend 
 most of their time in the Levenshtein distance function.
Great read! Yeah, I think the interoperability with C is very much a super-power of D. -Steve
Dec 31 2024
prev sibling next sibling parent Chris Piker <chris hoopjump.com> writes:
On Tuesday, 31 December 2024 at 15:36:31 UTC, Don Allen wrote:
 But I have a theory that I think is the likely explanation. And 
 if I'm correct, it highlights one of D's strongest points -- 
 the ability to call C libraries directly, without the need for 
 an elaborate interface layer.
Exactly this (plus a good standard library). In space-physics we have decades of C (and fortran) libraries whose output has been thoroughly vetted. When a measurement platform is traveling over 7.5 km/second it's wise to use codes with well characterized round-off, and that usually means using old software. Since python is the de-facto language of science, there are students whose entire master thesis is making useful interfaces to these old libraries. From D, I can just call them. It's great!
Jan 04
prev sibling next sibling parent reply Mike Shah <mshah.475 gmail.com> writes:
Nice post -- thanks for sharing this case study!

On Tuesday, 31 December 2024 at 15:36:31 UTC, Don Allen wrote:
 I think this illustrates a strength of D that I don't think 
 enough people understand -- the ability to talk directly and 
 easily to the C world. People complain that D doesn't have a 
 rich set of libraries. It doesn't need one; all the C libraries 
 are almost as easily accessible from D as they are from C or 
 C++. And this has gotten even easier with the advent of 
 ImportC, which I think is a very important addition to D and 
 worth continued development to hide the craziness in C header 
 files.
Agreed, the effectively 100% interop with C is one of DLang's awesome superpowers :) The interop with C++ is also quite good -- I think D and Swift are the only languages I've seen otherwise to have interop in a nearly perfect or good state with all three of C, C++, Objective-C interop. Some of my favorite other superpowers (that you probably already know, but in case anyone new is lurking here): 1. metaprogramming and templates 2. CTFE 3. Slicing 4. Most of the defaults match my preference (default-on bounds checking, initialized variables) 5. Module system (i.e. no messing around distinguishing with source and header files). 6. Ranges 7. Tooling: Dub, profiler, gc-profiler, cov (Might not be perfect tools, but having a package manager and default build system is really nice for those just getting started) 8. ...my list could go on -- but I'll just say I have fun doing real software engineering in D :)
 [...] Things are just more difficult, mainly because the user 
 plays a bigger role in memory management in Rust than in 
 languages, like D, that provide a GC (I simply do not 
 understand the anti-GC religious fanatics, especially when we 
 are talking about ordinary applications on today's multi-ghz 
 hardware with huge amounts of memory). D's performance is 
 comparable (except in the case of the AMEX utility, where it is 
 a lot better) and the code is more readable. Unfortunately, 
 people jump on band-wagons mindlessly.
I think the funny thing is that D provides several memory management strategies, and somehow the anti-GC crowd got stuck on the default. Applications like games where control over memory is needed (whether that means simply preallocating, explicitly calling when to GC, not using GC at all and using malloc, frame allocation, using double-stack buffers, etc.) is all possible :)
Jan 04
parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Sunday, 5 January 2025 at 02:00:17 UTC, Mike Shah wrote:
 I think the funny thing is that D provides several memory 
 management strategies, and somehow the anti-GC crowd got stuck 
 on the default. Applications like games where control over 
 memory is needed (whether that means simply preallocating, 
 explicitly calling when to GC, not using GC at all and using 
 malloc, frame allocation, using double-stack buffers, etc.) is 
 all possible :)
There is no "anti-gc" crowd There is a "anti tell me to use the gc crowd" When we ask for an Allocator api in the runtime, we get told to "just use the GC bro" It's really not hard to understand
Jan 05
parent monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 5 January 2025 at 19:11:50 UTC, ryuukk_ wrote:
 There is a "anti tell me to use the gc crowd"

 When we ask for an Allocator api in the runtime, we get told to 
 "just use the GC bro"
Who? I feel your conflating a few people, the allocator debate, the betterc-go-away, and merge datastructures debates I feel are different people; if your talking about adr-ish people I feel they couldnt give a rats ass if someone adopts a allocator api, they want to delete the betterc flag in general. Where I have grown to hate the allocator-debate, I tried to do wasm which is a betterc hell
Jan 05
prev sibling parent Serg Gini <kornburn yandex.ru> writes:
On Tuesday, 31 December 2024 at 15:36:31 UTC, Don Allen wrote:
 mindlessly.
Did you try to ask for help in chatGPT or Rust forums? Rust has several libraries for SQlite including async ones, and full rust rewrites (Limbo/Turso). So I would be very surprised if they don’t have such a simple query implementation detail. And afaik Rust has zero issues with creating C bindings
Jan 05