www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is this project possible?

reply Benji Smith <dlanguage benjismith.net> writes:
I'm currently working on a desktop analytics project, kind of like 
Google Analytics or Omniture, but for desktop software rather than for 
web apps (if you're interested, you can read more here: 
http://benjismith.net/index.php/2008/06/02/business-intelligence-f
r-desktop-software/ 
)

I've just finished writing the server, and now I need to write an 
embeddable client library than anyone can include in their own projects. 
The library will create its own thread and make periodic HTTP requests 
to the server, reporting various environment variables (client OS, CPU, 
memory, etc) as well as certain events (install, uninstall, session 
start & stop, etc). When the application terminates, this library's 
Thread will need to perform a few final cleanup actions (either invoking 
a remote HTTP method to report the end of the session, or saving the 
session data in the local filesystem, to be reported later).

The client library needs to expose a C interface, so that it can be 
embedded into any application (with thin wrappers for Java, .Net, 
python, etc), and it'll need to be targetted to Windows and Linux (and, 
eventually, to Mac OSX).

Ideally, I'd like to write this client library in D, but there seem to 
be some blocking issues. For example, I've read that the garbage 
collectors for D and Java conflict with one another (in that they listen 
for the same OS signals) and that any native JNI code developed with D 
needs to avoid using the garbage collector. But is it possible to create 
Threads and HTTP connections without using the GC? (Inidentally, I'd 
planned on using Tango with D 1.x, if that makes any difference.)

Does the same conflict exist with the .Net GC? Does it make any 
difference if I plan on using the Tango GC rather than the one in Phobos?

What would you guys recommend? Is it possible to develop this kind of 
library in D, or am I going to need to use C instead?

Thanks!

--benji
Aug 05 2008
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
 I'm currently working on a desktop analytics project, kind of like 
 Google Analytics or Omniture, but for desktop software rather than for 
 web apps (if you're interested, you can read more here: 
 http://benjismith.net/index.php/2008/06/02/business-intelligence-f
r-desktop-software/ 
 )
Awesome! Maximum respect to you.
 Ideally, I'd like to write this client library in D, but there seem to 
 be some blocking issues. For example, I've read that the garbage 
 collectors for D and Java conflict with one another (in that they listen 
 for the same OS signals) and that any native JNI code developed with D 
 needs to avoid using the garbage collector.
Only in Linux. This, of course, can be worked around using IPC, but it's a bit of a hassle & takes more system resources.
 But is it possible to create 
 Threads and HTTP connections without using the GC? (Inidentally, I'd 
 planned on using Tango with D 1.x, if that makes any difference.)
Yes... as long as you're not doing too much allocation & never need to collect, you don't need a GC. Tango is a lot better than Phobos at this (look at Mango; it does very little allocation [= none] once the server has been set up.
 Does the same conflict exist with the .Net GC? Does it make any 
 difference if I plan on using the Tango GC rather than the one in Phobos?
Not the .NET Windows one, since the problem is only on Linux. Not sure about Mono.
Aug 05 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Robert Fraser wrote:
 Ideally, I'd like to write this client library in D, but there seem to 
 be some blocking issues. For example, I've read that the garbage 
 collectors for D and Java conflict with one another (in that they listen 
 for the same OS signals) and that any native JNI code developed with D 
 needs to avoid using the garbage collector.
Only in Linux. This, of course, can be worked around using IPC, but it's a bit of a hassle & takes more system resources.
That's great news! Since Windows will be my dominant platform, that makes me feel much better. But since I want to also build on Linux & OSX, I'll have to be careful. Though I'm glad to hear that the problem scenario is isolated to Java/Linux. I think I can work around that... I don't think IPC is an option though, because I want my library to run in the same process as the host application (just within its own thread). When an application developer embeds the library in another product, it will make HTTP requests, and if a firewall reports those requests to the user, I think it'd look fishy if the requests come from a separate process.
 But is it possible to create 
 Threads and HTTP connections without using the GC? (Inidentally, I'd 
 planned on using Tango with D 1.x, if that makes any difference.)
Yes... as long as you're not doing too much allocation & never need to collect, you don't need a GC. Tango is a lot better than Phobos at this (look at Mango; it does very little allocation [= none] once the server has been set up.
Cool. I'll take a look to the Mango project for inspiration. It's possible that the Thread will be long-lived, since some desktop applications run for days or weeks without restarting, and my library will make occasional status reports to the server throughout the lifetime of the application. So, I'll have to come up with clever strategies for avoiding allocation.
 Does the same conflict exist with the .Net GC? Does it make any 
 difference if I plan on using the Tango GC rather than the one in Phobos?
Not the .NET Windows one, since the problem is only on Linux. Not sure about Mono.
Thanks for the quick reply, and for all the helpful info! Also: In the process of building my product, I'll probably develop a few handy bits of code for auto-generating wrappers (e.g., JNI, .NET, etc) from a D codebase. If other people are interested in those kinds of wrappers, I'd be happy to share them with the community. --benji
Aug 05 2008
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Benji Smith Wrote:

 But is it possible to create 
 Threads and HTTP connections without using the GC? (Inidentally, I'd 
 planned on using Tango with D 1.x, if that makes any difference.)
Yes... as long as you're not doing too much allocation & never need to collect, you don't need a GC. Tango is a lot better than Phobos at this (look at Mango; it does very little allocation [= none] once the server has been set up.
Cool. I'll take a look to the Mango project for inspiration. It's possible that the Thread will be long-lived, since some desktop applications run for days or weeks without restarting, and my library will make occasional status reports to the server throughout the lifetime of the application. So, I'll have to come up with clever strategies for avoiding allocation.
Just to be clear, you don't need to avoid allocation -- just allocation using the D GC if you don't manually "delete" the memory later. Basically, just make sure all your memory management is manual, and try to avoid implicit allocations like AA use, array concatenation, etc. As long as you delete anything you new and free anything you malloc, you'll be fine.
Aug 05 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Robert Fraser wrote:
 Just to be clear, you don't need to avoid allocation -- just allocation using 
 the D GC if you don't manually "delete" the memory later. Basically, just 
 make sure all your memory management is manual, and try to avoid implicit 
 allocations like AA use, array concatenation, etc. As long as you delete 
 anything you new and free anything you malloc, you'll be fine.
I should have clarified... Of course, I'll allocate memory, but I'll do it all up-front, allocating a pool of objects, and then I'll use object factories to draw from that pool, recycling objects myself when I'm finished with them rather than letting the GC reclaim the memory. It's not a programming paradigm I'm used to, since I typically program in garbage-collected environments, but I'll get used to it. Once I've written the core data structures, it shouldn't be too different. --benji
Aug 05 2008
parent Lars Ivar Igesund <larsivar igesund.net> writes:
Benji Smith wrote:

 Robert Fraser wrote:
 Just to be clear, you don't need to avoid allocation -- just allocation
 using the D GC if you don't manually "delete" the memory later.
 Basically, just make sure all your memory management is manual, and try
 to avoid implicit allocations like AA use, array concatenation, etc. As
 long as you delete anything you new and free anything you malloc, you'll
 be fine.
I should have clarified... Of course, I'll allocate memory, but I'll do it all up-front, allocating a pool of objects, and then I'll use object factories to draw from that pool, recycling objects myself when I'm finished with them rather than letting the GC reclaim the memory.
I should note that this is typically how the Mango servers operate (and the stuff in tango.net.cluster) - they should never allocate after startup. They also use the stack where possible, making the total memory usage extremely low. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Aug 06 2008
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 05 Aug 2008 20:59:11 +0300, Benji Smith <dlanguage benjismith.net>  
wrote:

 The client library needs to expose a C interface, so that it can be  
 embedded into any application (with thin wrappers for Java, .Net,  
 python, etc), and it'll need to be targetted to Windows and Linux (and,  
 eventually, to Mac OSX).
Have you considered placing the bulk of the code in an external process, and writing a simple C library to launch and communicate with it? Although this adds bulk, it does have several advantages - it takes away the GC problems, and also allows your framework to finalize successfully in the event of a crash or unexpected termination of the host application. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Aug 05 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Vladimir Panteleev wrote:
 On Tue, 05 Aug 2008 20:59:11 +0300, Benji Smith 
 <dlanguage benjismith.net> wrote:
 
 The client library needs to expose a C interface, so that it can be 
 embedded into any application (with thin wrappers for Java, .Net, 
 python, etc), and it'll need to be targetted to Windows and Linux 
 (and, eventually, to Mac OSX).
Have you considered placing the bulk of the code in an external process, and writing a simple C library to launch and communicate with it? Although this adds bulk, it does have several advantages - it takes away the GC problems, and also allows your framework to finalize successfully in the event of a crash or unexpected termination of the host application.
I've thought about it. But consider yourself the application consumer. You've just purchased a new piece of software, and the first time you launch it, your firewall notifies you that "StatisticalCollectionAgent.exe is requesting access to the internet". You're not happy. Not because you necessarily mind the reporting of some anonymous stats, especially if your software vendor shows you a disclaimer and lets you opt out. But having a 3rd party process do the reporting looks fishy, under any circumstances. It's not something you yourself installed, or have even ever heard of. Is it malware? Is it a virus? Much better if "MyFeedReader.exe" communicates directly with "myfeedreader.com". That's why I'm pretty adamant that the code must run in-process. It'll make the code a little trickier, but I think the improved user experience will make it worthwhile it in the long-run. --benji
Aug 05 2008
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Wed, 06 Aug 2008 09:35:14 +0300, Benji Smith <dlanguage benjismith.net>  
wrote:

 But consider yourself the application consumer. You've just purchased a  
 new piece of software, and the first time you launch it, your firewall  
 notifies you that "StatisticalCollectionAgent.exe is requesting access  
 to the internet".
Well, IMHO before the first Internet connection your application should ask the consumer if they wish to participate in anonymous statistical data collection which will help improve the software in the future, etc. At least, that's how the big guys (Microsoft etc.) do it. I don't think anyone would be unhappy if they got a firewall warning after approving that :) -- Best regards, Vladimir mailto:thecybershadow gmail.com
Aug 06 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Vladimir Panteleev wrote:
 On Wed, 06 Aug 2008 09:35:14 +0300, Benji Smith 
 <dlanguage benjismith.net> wrote:
 
 But consider yourself the application consumer. You've just purchased 
 a new piece of software, and the first time you launch it, your 
 firewall notifies you that "StatisticalCollectionAgent.exe is 
 requesting access to the internet".
Well, IMHO before the first Internet connection your application should ask the consumer if they wish to participate in anonymous statistical data collection which will help improve the software in the future, etc. At least, that's how the big guys (Microsoft etc.) do it. I don't think anyone would be unhappy if they got a firewall warning after approving that :)
I agree 100%. But since I'm just providing the technology for the collection, I can't enforce anyone asking the user for permission. And I think the user experience is at least somewhat better if the process making the HTTP request is the same process that's being monitored. It's no big deal, really. The in-process solution, accounting for Java/Linux, will be slightly trickier to code, but it's not impossible, so I think it'll be okay. --benji
Aug 06 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Benji Smith wrote:
 Does the same conflict exist with the .Net GC? Does it make any 
 difference if I plan on using the Tango GC rather than the one in Phobos?
The Mono GC is a slightly modified version of the Hans Boehm collector, and that conflicts with the D GC.
Aug 09 2008