digitalmars.D - Some (probably inaccurate) stats about Dub packages
- SealabJaster (37/37) Oct 21 2021 (I know I'm posting a lot lately, but I'm simply curious :D)
- drug (3/6) Oct 21 2021 Of course it is. It is not so easy to interpret statistics properly,
- SealabJaster (3/6) Oct 21 2021 Yeah, you're right.
- Imperatorn (3/7) Oct 21 2021 Don't forget. Adding it to dub makes it discoverable in one
- SealabJaster (12/13) Oct 21 2021 Didn't want to spam the forum with yet another thread, so I'll
- Imperatorn (3/17) Oct 22 2021 Maybe possible to integrate with
- SealabJaster (3/5) Oct 22 2021 I don't have any motivation to work on Dub, so someone else would
- WebFreak001 (20/25) Oct 22 2021 that PR is completely different and removes the internal MongoDB
- SealabJaster (10/15) Oct 22 2021 Meilisearch has acceptable results, and attempts to fix typos as
- Imperatorn (3/10) Oct 22 2021 It was mainly done as the current search is unusable as everyone
- WebFreak001 (13/25) Oct 22 2021 I think it's ok for searching for functionality - if you start
- Imperatorn (3/17) Oct 22 2021 Yeah, it's a proof of concept for someone to continue on. More to
- Abdulhaq (6/17) Oct 22 2021 It's interesting, I suspect that the general profile is very
- drug (6/27) Oct 22 2021 Me too. One package can have 1000 downloads/month but it can be
- Adam D Ruppe (7/9) Oct 22 2021 Yeah, most the top ones are just dependencies of each other.
(I know I'm posting a lot lately, but I'm simply curious :D) To pass the time I've been writing a thingy to show the downloads of dub packages over time. I thought to myself "how many dub packages are actually used?" So here's a few stats I've compiled. The Weekly and Monthly snapshots were taken 1 day ago: ``` TOTAL PACKAGES PERCENTAGE 1000000 5 0.24% 100000 41 2.01% 10000 165 7.90% 1000 375 17.96% 100 1058 50.67% 10 1619 77.54% 1 2018 96.65% MONTHLY PACKAGES PERCENTAGE 10000 13 0.63% 1000 48 2.30% 100 113 5.41% 10 287 13.75% 5 367 17.58% 1 657 31.47% WEEKLY PACKAGES PERCENTAGE 1000 20 0.98% 100 70 3.35% 10 149 7.14% 5 191 9.15% 1 309 14.80% ``` This is just off of data that I've scraped from dub, hence this likely isn't too accurate. But when I look at these numbers I personally think "is it even worth wasting time making a dub library?" I wonder what the average age of the most used/median used/least used packages are, and what categories they'd fall under. Hopefully others find this of interest, even if it's not very enlightening on its own.
Oct 21 2021
21.10.2021 12:00, SealabJaster пишет:But when I look at these numbers I personally think "is it even worth wasting time making a dub library?"Of course it is. It is not so easy to interpret statistics properly, without a proper model it is just numbers having no much sense.
Oct 21 2021
On Thursday, 21 October 2021 at 09:09:00 UTC, drug wrote:Of course it is. It is not so easy to interpret statistics properly, without a proper model it is just numbers having no much sense.Yeah, you're right. I feel I'm just projecting my own demoralisation, sorry.
Oct 21 2021
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:(I know I'm posting a lot lately, but I'm simply curious :D) To pass the time I've been writing a thingy to show the downloads of dub packages over time. [...]Don't forget. Adding it to dub makes it discoverable in one place. That's a huge advantage in any case.
Oct 21 2021
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:...Didn't want to spam the forum with yet another thread, so I'll just post this here since it's at least slightly on topic: "TL;DR I need a search engine for dub packages for a small thing I'm working on, and wanted a side-by-side comparison of Postgres (the database I'm using) and Meilisearch (the dedicated search engine I've been eyeing up)." It's a repo that sets up meilisearch and postgres with package data from dub, in order to see which one gives me better results from queries. Thought someone might find it interesting enough to look at: https://github.com/BradleyChatha/dubsearchtest
Oct 21 2021
On Thursday, 21 October 2021 at 13:28:50 UTC, SealabJaster wrote:On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:Maybe possible to integrate with https://github.com/dlang/dub-registry/pull/497...Didn't want to spam the forum with yet another thread, so I'll just post this here since it's at least slightly on topic: "TL;DR I need a search engine for dub packages for a small thing I'm working on, and wanted a side-by-side comparison of Postgres (the database I'm using) and Meilisearch (the dedicated search engine I've been eyeing up)." It's a repo that sets up meilisearch and postgres with package data from dub, in order to see which one gives me better results from queries. Thought someone might find it interesting enough to look at: https://github.com/BradleyChatha/dubsearchtest
Oct 22 2021
On Friday, 22 October 2021 at 07:07:01 UTC, Imperatorn wrote:Maybe possible to integrate with https://github.com/dlang/dub-registry/pull/497I don't have any motivation to work on Dub, so someone else would have to champion something like this through.
Oct 22 2021
On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:On Friday, 22 October 2021 at 07:07:01 UTC, Imperatorn wrote:that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. Additionally that PR fails to address problems that were brought up in review. It also mixes in the search changes (which I majorly disapprove of as they are right now) with bug fixes and deprecation fixes, which should really get into the code base but I won't merge in as long as they are included with the search changes. As I have already commented in that PR it should really _extend_ the MongoDB text search, not replace it. Make it an aggregate query with a MongoDB $regex match (that makes it a simple string contains, escape regex characters with std.regex) that merges with the text search, giving the regex results higher text scores. If you, the person reading this, are motivated in pushing through a search improvement in DUB, you can check that PR for hints where to start, but you should really create a new PR instead and let MongoDB do the work or introduce some other indexed search framework.Maybe possible to integrate with https://github.com/dlang/dub-registry/pull/497I don't have any motivation to work on Dub, so someone else would have to champion something like this through.
Oct 22 2021
On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:If you, the person reading this, are motivated in pushing through a search improvement in DUB, you can check that PR for hints where to start, but you should really create a new PR instead and let MongoDB do the work or introduce some other indexed search framework.Meilisearch has acceptable results, and attempts to fix typos as well. Seems lightweight as well. I kind of liked the results postgres was giving me as well, but I doubt it does anything about typos. And you have to be a bit specific. And of course dub doesn't use postgres >x3 I wanted to add Elasticsearch/Opensearch into the test as well, but couldn't be bothered since it seemed redundant. In other words, if anyone's gonna do this, a dedicated search engine would likely be the way forward?
Oct 22 2021
On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:It was mainly done as the current search is unusable as everyone knows.[...]that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. [...]
Oct 22 2021
On Friday, 22 October 2021 at 12:48:19 UTC, Imperatorn wrote:On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:I think it's ok for searching for functionality - if you start searching for package names it becomes worse, especially if they are made up words or contain punctuation. Your improvement is great and I would love to put it in, but the implementation is not good as it is right now: - it's getting worse results if you typo or have different tense or form of words - it first fetches all the documents into memory and then filters on them (which will take longer and longer the more packages we have) - the PR is mixed with unrelated deprecation fixes which should really be done separately (but should definitely be done!)On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:It was mainly done as the current search is unusable as everyone knows.[...]that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. [...]
Oct 22 2021
On Friday, 22 October 2021 at 13:43:17 UTC, WebFreak001 wrote:On Friday, 22 October 2021 at 12:48:19 UTC, Imperatorn wrote:Yeah, it's a proof of concept for someone to continue on. More to break the status quo[...]I think it's ok for searching for functionality - if you start searching for package names it becomes worse, especially if they are made up words or contain punctuation. Your improvement is great and I would love to put it in, but the implementation is not good as it is right now: - it's getting worse results if you typo or have different tense or form of words - it first fetches all the documents into memory and then filters on them (which will take longer and longer the more packages we have) - the PR is mixed with unrelated deprecation fixes which should really be done separately (but should definitely be done!)
Oct 22 2021
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:``` TOTAL PACKAGES PERCENTAGE 1000000 5 0.24% 100000 41 2.01% 10000 165 7.90% 1000 375 17.96% 100 1058 50.67% 10 1619 77.54% 1 2018 96.65%.Hopefully others find this of interest, even if it's not very enlightening on its own.It's interesting, I suspect that the general profile is very typical of open source projects. In terms of absolute firgures I'm not sure what to make of it. Personally if I had 10 different users of my project I'd consider it a great success, so....
Oct 22 2021
22.10.2021 14:35, Abdulhaq пишет:On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:Me too. One package can have 1000 downloads/month but it can be mechanical dependency performing trivial work. Other package can be downloaded 100 at all but it is directly and actively (with PR/issues) used by different people/teams and performing complex tasks. I believe these figures don't mean anything in fact.``` TOTAL PACKAGES PERCENTAGE 1000000 5 0.24% 100000 41 2.01% 10000 165 7.90% 1000 375 17.96% 100 1058 50.67% 10 1619 77.54% 1 2018 96.65%.Hopefully others find this of interest, even if it's not very enlightening on its own.It's interesting, I suspect that the general profile is very typical of open source projects. In terms of absolute firgures I'm not sure what to make of it. Personally if I had 10 different users of my project I'd consider it a great success, so....
Oct 22 2021
On Friday, 22 October 2021 at 11:54:44 UTC, drug wrote:Me too. One package can have 1000 downloads/month but it can be mechanical dependency performing trivial work.Yeah, most the top ones are just dependencies of each other. Also note that "downloads" here is just any time a thing asks for the download url. CI things for certain packages account for 90%+ of these requests. I kinda wanna track the dependency thing, gonna make a local database....
Oct 22 2021