digitalmars.D - language feature usage statistics
- aliak (40/40) Oct 18 2019 So this is something I've been wondering about for a while. And I
- Les De Ridder (6/11) Oct 18 2019 I vaguely remember there being a tool that generated such
- Adam D. Ruppe (7/7) Oct 18 2019 I've actually considered doing this with dpldocs.info before. It
- Paolo Invernizzi (3/8) Oct 18 2019 Hear, hear!
- Dennis (16/19) Oct 18 2019 Totally. I've started toying with this by cloning all packages on
- aliak (4/21) Oct 20 2019 That is great! Which APIs did you use to get all d project links?
- Dennis (28/32) Oct 20 2019 There might be an API, but I simply parsed the html pages.
- matheus (4/8) Oct 20 2019 I pretty sure Visual Studio does this:
- aliak (4/14) Oct 20 2019 Aye, VSCode does this (i mentioned it actually in my original
So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd? I can understand that networking from a compiler just sounds bad, but, there're other ways around it. e.g. write a file instead and ask dev to email it, or ask for permission before turning it on and send it, only do it in debug mode, I dunno, just spit balling here. But, having actual usage statistics will take away so many assumptions people have about how features are used, how often they're used, which features are not used, etc (of course if there're no statistics on a feature it doesn't mean it's never used). Data like this is very actionable - and is how any (probably non-enterprise) product is built these days (even vscode for example has an option to send usage stats). Crash reporting is another thing. When the compiler crashes, that can be sent somewhere (again, with user permission). Things that can be answered: * which feature is not used and can be cut * which feature is used the most and should be enhanced, fixed, polished * which combination of features are used together => can they be unified? The next time someone says they don't think lazy is useful, we can point to actual data. And then for example, from the features that are hardly used, we can start asking why they are not used. If we know why then future features that may contain the same base assumptions that led to the creation of the unused features can be avoided. Figuring out why the features are unused, or hardly used, can also better enable us to make the feature usable. These kind of stats can also be collected on any symbols that are loaded from std for eg, and then we can also get a feel for which functions and modules are used from phobos. Anyway, I'm not sure about others, but if it'd make D a better language than the competition, I'd gladly trust dmd to send stats to a place the d language foundation controls. Cheers - ali
Oct 18 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd? [...]I vaguely remember there being a tool that generated such statistics from the source code of packages registered on code.dlang.org, but I might be mistaken.
Oct 18 2019
I've actually considered doing this with dpldocs.info before. It would only hit public code but... I have copies of basically the whole dub repo and code that is already custom parsing it so I could possibly pull info like this when it does its updates. Though my parser doesn't always keep up with new features (I often skip function bodies since it isn't super important for documentation purposes) it still mostly works.
Oct 18 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd? [...]Hear, hear! +1
Oct 18 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:But, having actual usage statistics will take away so many assumptions people have about how features are used, how often they're used, which features are not used, etcTotally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases. I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723 For this: https://github.com/dlang/dmd/pull/10236 These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!
Oct 18 2019
On Friday, 18 October 2019 at 20:53:50 UTC, Dennis wrote:On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:That is great! Which APIs did you use to get all d project links? Does dub provide something? And curious, were you rate limited by github (i'm assuming this was the work of a for loop?).[...]Totally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases. I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723 For this: https://github.com/dlang/dmd/pull/10236 These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!
Oct 20 2019
On Sunday, 20 October 2019 at 21:29:24 UTC, aliak wrote:Which APIs did you use to get all d project links? Does dub provide something?There might be an API, but I simply parsed the html pages. First I get the identifiers of all packages: ``` import std.net.curl; string page = get("http://code.dlang.org/?sort=added&category=&skip=0&limit=2000").idup; string[] result; foreach(m; page.matchAll(regex(`packages/([a-zA-Z0-9_-]+)`))) result ~= m[1]; ``` Then I parse the package pages for the repository link with htmld: ``` import html; // http://code.dlang.org/packages/htmld string getRepo(string packageName) { string page = get("http://code.dlang.org/packages/"~packageName).idup; auto doc = createDocument(page); if (auto p = doc.querySelector("#repository")) { if (auto m = p.html.matchFirst(`href="([^"]+)`)) { return m[1].text; } } } ```And curious, were you rate limited by github (i'm assuming this was the work of a for loop?).I wouldn't have been surprised if I got a timeout for cloning 1600 repositories in succession, but I didn't. (I suppose the same happens when installing your average NPM package, lol)
Oct 20 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:... I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? ...I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry Matheus.
Oct 20 2019
On Sunday, 20 October 2019 at 22:08:46 UTC, matheus wrote:On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:Aye, VSCode does this (i mentioned it actually in my original post): On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:... I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? ...I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry Matheus.(probably non-enterprise) product is built these days (even vscode for example has an option to send usage stats).
Oct 20 2019