www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Simple DataFrames library

reply Aravinda VK <mail aravindavk.in> writes:
Hello everyone,

I am happy to share my DataFrame library for D. My primary focus 
was to make it simple to use and I haven't spent a lot of time 
optimizing the code for memory and performance.

Example:

```d
import std.stdio;
import std.algorithm;
import std.array;
import std.range;

import dataframes;

struct Product
{
     string name;
     double unitPrice;
     int quantity;
     double discount;
     double totalPrice;
}

const DISCOUNTS = [
     "WELCOME": 5,
     "HAPPY": 2
];

double[] applyDiscounts(Column!double values, string coupon = "")
{
     auto pct = coupon in DISCOUNTS;
     if (pct is null)
         return iota(values.length).map!("0.0").array;

     return values.map!(v => v * (*pct/100.0)).array;
}

void main(string[] args)
{
     auto coupon = args.length > 1 ? args[1] : "";
     auto df = new DataFrame!Product(
         name: ["p1", "p2", "p3", "p4"],
         unitPrice: [10.0, 15.0, 5.0, 20.0],
         quantity: [3, 1, 5, 2]
     );

     df.discount = df.unitPrice.applyDiscounts(coupon);
     df.totalPrice = (df.unitPrice - df.discount) * df.quantity;

     // Preview
     df.writeln;

     auto total = df.rows
         .map!(r => r.totalPrice)
         .sum;

     writeln("Total: ", total);
}
```

Highlights:

- Creates a new Class with all the fields of the given struct as 
arrays.
- Supports column operations like adding two columns or 
multiplying each values of the elements etc.
- `df.rows` will return the list of `Row` with only reference to 
the main data.
- Easy to use with `std.algorithm` goodies (Refer README).

Add `dataframes` to your project by running,

```
dub add dataframes
```

The code and the documentation are available on GitHub 
https://github.com/aravindavk/dataframes-d and 
https://code.dlang.org/packages/dataframes

Please feel free to use it and let me know your experience and 
suggestions.

Thanks
Aravinda
Oct 29
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:
 [snip]

 Please feel free to use it and let me know your experience and 
 suggestions.

 Thanks
 Aravinda
Thanks for working on this. I'm a big fan of using dataframes in R or pandas in python. That being said, I think there's more value in building dataframes either on top of or as a part of mir's ndslices. There was previously a project (magpie, I believe) that built dataframes on top of ndslices. My recollection is that the main issue for support within mir is that support for labels isn't fully implemented (also Ilya is less involved with the project these days). It's kind of complicated to implement some of this support and there is a concern that it can lead to breaking changes. Here's a list of tasks: https://github.com/libmir/mir-algorithm/issues/426
Oct 31
parent Aravinda VK <mail aravindavk.in> writes:
On Thursday, 31 October 2024 at 12:29:14 UTC, jmh530 wrote:
 On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:
 [snip]

 Please feel free to use it and let me know your experience and 
 suggestions.

 Thanks
 Aravinda
Thanks for working on this. I'm a big fan of using dataframes in R or pandas in python. That being said, I think there's more value in building dataframes either on top of or as a part of mir's ndslices. There was previously a project (magpie, I believe) that built dataframes on top of ndslices. My recollection is that the main issue for support within mir is that support for labels isn't fully implemented (also Ilya is less involved with the project these days). It's kind of complicated to implement some of this support and there is a concern that it can lead to breaking changes. Here's a list of tasks: https://github.com/libmir/mir-algorithm/issues/426
Thanks for the feedback. I haven't used mir-algorithm, I will check the Github issue shared by you. I added more details about this library here(https://forum.dlang.org/post/qaeeeytofqydxnikthqi forum.dlang.org). I am yet to check the details about labels support mentioned in the Github link. My DataFrame columns or rows can be accessed by name like `df.amount` or `df["amount"].get!double`. Does this solve your usecase? Refer the README for examples https://github.com/aravindavk/dataframes-d?tab=readme-ov-file#access-rows-and-columns
Nov 23
prev sibling next sibling parent reply tastyminerals <letian fastmail.com> writes:
On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:
 Hello everyone,

 I am happy to share my DataFrame library for D. My primary 
 focus was to make it simple to use and I haven't spent a lot of 
 time optimizing the code for memory and performance.

 [...]
Cool, something that is definitely missing in D is a native data frame library. I have a question that can be also a suggestion. Why don't you use **mir-algorithm** library instead?
Nov 22
parent Aravinda VK <mail aravindavk.in> writes:
On Friday, 22 November 2024 at 09:53:04 UTC, tastyminerals wrote:
 On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:
 Hello everyone,

 I am happy to share my DataFrame library for D. My primary 
 focus was to make it simple to use and I haven't spent a lot 
 of time optimizing the code for memory and performance.

 [...]
Cool, something that is definitely missing in D is a native data frame library. I have a question that can be also a suggestion. Why don't you use **mir-algorithm** library instead?
Thanks for the feedback. Nothing against mir-algorithm but I haven't used it. For one of my use case, I started storing each fields as different array. This really enhanced the experience of columnar operations. Later, I searched for a DataFrame library but I couldn't find any. I found the idea of storing each fields of a struct as independent array instead of storing array of struct very interesting. I started the DataFrame project. The API is very simple to use, given a struct, a new DataFrame class will be created with each field of input Struct as array of same type. For example, `double amount` of input struct will be `double[] amount` in the DataFrame class. All Column based operations are equivalent to doing array operations using D standard library. For example, `df.amount.maxElement` or `df.amount.sum`. Please check the project README (https://github.com/aravindavk/dataframes-d) for usage details. Made the new release with couple of enhancements: - Column!T.data is changed to Column!T.values (`1.0.1`) - Added support to create a new DataFrame from the list of the rows or from a DataFrame. (`1.0.1`) - Make Column and Row type available after import dataframes (`1.0.2`) - Add support to add/subtract/multiply/divide numeric to Column (`1.0.3`) - Add head and tail methods and update the documentation (`1.0.3`) - Add support to access column by label and index (`1.0.3`)
Nov 23
prev sibling parent tastyminerals <letian fastmail.com> writes:
On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:
 Hello everyone,

 I am happy to share my DataFrame library for D. My primary 
 focus was to make it simple to use and I haven't spent a lot of 
 time optimizing the code for memory and performance.

 [...]
* (update) What I meant is you could build your library on top of **mir-algorithm**.
Nov 22