digitalmars.D.learn - Tips on TCP socket to postgresql middleware
- Chris Piker (21/21) Feb 19 2022 Hi D
- eugene (23/24) Feb 20 2022 Most people will probably say this is crazy,
- Chris Piker (11/23) Feb 20 2022 Very interesting. I need to stand-up this program and two others
- eugene (16/29) Feb 20 2022 I thougt about implementing my engine using fibers but...
- Chris Piker (7/10) Feb 20 2022 The code is terse and clean, thanks for sharing :) I'm adverse
- eugene (7/17) Feb 20 2022 Nice to hear, thnx! Actually I wrote D variant just to demonstrate
- Chris Piker (7/12) Feb 22 2022 If you want to add this file (or similar) to your sources,
- =?UTF-8?Q?Ali_=c3=87ehreli?= (18/20) Feb 20 2022 I use the exact scenario that you describe: Multiple threads process
- Chris Piker (7/11) Feb 20 2022 Message sizes and rates are relatively well know so it will be
- eugene (20/23) Feb 20 2022 If I get it right you want to restore connection
- Chris Piker (4/8) Feb 20 2022 Ah, a very handy tip. It would be convoluted to multiplex
- eugene (5/13) Feb 20 2022 I am remembering psql client behavior - it sees notifications
- eugene (50/51) Feb 24 2022 I've remembered one not so obvious feature of TCP sockets
Hi D I'm about to start a small program to whose job is: 1. Connect to a server over a TCP socket 2. Read a packetized real-time data stream 3. Update/insert to a postgresql database as data arrive. In general it should buffer data in RAM to avoid exerting back pressure on the input socket and to allow for dropped connections to the PG database. Since the data stream is at most 1.5 megabits/sec (not bytes) I can buffer for quite some time before running out of space. So far, I've never written a multi-threaded D program where one thread writes a FIFO and the other reads it so I'm reading the last few chapters of Ali Cehreli's book as background. On top of that preparation, I'm looking for: * general tips on which libraries to examine * gotchas you've run into in your multi-threaded (or just concurrent) programs, * other resources to consult etc. Thanks for any advice you want to share. Best,
Feb 19 2022
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:* general tips on which libraries to examineMost people will probably say this is crazy, but as to PG, one can do without libraries. I am doing so during years (in C, not D) and did not expierienced extremely complex troubles. I mean I do not use libpq - instead I implement some subset of the protocol, which is needed for particular program. What I do not like in all these libs for working with widely used services (postgres, redis etc) is the fact that they all hide inside them i/o stuff, including TCP-connect. Why have connect() in each library? It is universal thing, as well as read() and write(). If I want several connection to DBMS in a program, libraries like libpq compel me to use multithreading. But what if I want to do many-many-many things concurrently in a single thread? Usually I design more or less complex (network) programs using event-driven paradigm (reactor pattern) plus state machines. In other words programs designed this way are, so to say, hierarchical team of state machines, interacting with each other as well as with outer world (signals, timers, events from sockets etc)
Feb 20 2022
On Sunday, 20 February 2022 at 15:20:17 UTC, eugene wrote:Most people will probably say this is crazy, but as to PG, one can do without libraries. I am doing so during years (in C, not D) and did not expierienced extremely complex troubles. I mean I do not use libpq - instead I implement some subset of the protocol, which is needed for particular program.Very interesting. I need to stand-up this program and two others in one week, so it looks like dpq2 and message passing is the good short term solution to reduce implementation effort. But I would like to return to your idea in a couple months so that I can try a fiber based implementation instead.Usually I design more or less complex (network) programs using event-driven paradigm (reactor pattern) plus state machines. In other words programs designed this way are, so to say, hierarchical team of state machines, interacting with each other as well as with outer world (signals, timers, events from sockets etc)It sounds like you might have a rigorous way of defining and keeping track of your state machines. I could probably learn quite a bit from reading your source code, or the source for similarly implemented programs. Are there examples you would recommend?
Feb 20 2022
On Sunday, 20 February 2022 at 16:55:44 UTC, Chris Piker wrote:But I would like to return to your idea in a couple months so that I can try a fiber based implementation instead.I thougt about implementing my engine using fibers but... it seemed to me they are not very convinient because coroutines yield returns to the caller, but I want to return to a single event loop (after processing an event).Yes, here is my engine with example (echo client/server pair): - [In C (for Linux)](http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz) - [In D (for Linux & FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz) edsm = 'event driven state machines' As to the program you are writing - I wrote a couple of dozens of programs more or less similar to what you are going to do (data acqusition) using the engine above (C) for production systems and they all serve very well.Usually I design more or less complex (network) programs using event-driven paradigm (reactor pattern) plus state machines. In other words programs designed this way are, so to say, hierarchical team of state machines, interacting with each other as well as with outer world (signals, timers, events from sockets etc)It sounds like you might have a rigorous way of defining and keeping track of your state machines. I could probably learn quite a bit from reading your source code, or the source for similarly implemented programs. Are there examples you would recommend?
Feb 20 2022
On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:Yes, here is my engine with example (echo client/server pair): - [In D (for Linux & FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz)The code is terse and clean, thanks for sharing :) I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright. I noticed there were no dub files in the package. Not surprised. Dub is such a restrictive tool compared to say, setup.py/.cfg in python.
Feb 20 2022
On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:Nice to hear, thnx! Actually I wrote D variant just to demonstrate the idea to my collegue who is OOP/py guy and for whom it was hard to understand C code.Yes, here is my engine with example (echo client/server pair): - [In D (for Linux & FreeBSD)]The code is terse and clean, thanks for sharing :)I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright.:) I think WTFPL will do :)I noticed there were no dub files in the package. Not surprised. Dub is such a restrictive tool compared to say, setup.py/.cfg in python.I have very little experience in D and did not think about dub at all.
Feb 20 2022
On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:If you want to add this file (or similar) to your sources, http://www.wtfpl.net/txt/copying/, but with your name in the copyright line I'd be happy to put the D version through it's paces and credit you for the basic ideas. I would not have thought of this formalism on my own (at least not right away) and want to give credit where it's due.On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote: I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright.:) I think WTFPL will do :)
Feb 22 2022
On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:ok http://zed.karelia.ru/0/e/edsm-2022-02-23.tar.gz added 2 files, AUTHOR & LICENSEOn Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:If you want to add this file (or similar) to your sources, http://www.wtfpl.net/txt/copying/, but with your name in the copyright line I'd be happy to put the D version through it's paces and credit you for the basic ideas. I would not have thought of this formalism on my own (at least not right away) and want to give credit where it's due.On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote: I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright.:) I think WTFPL will do :)
Feb 23 2022
On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:credit you for the basic ideasAs you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) Switch-based implementation of SMs is probably normal for text/token driven SMs (parsers), but is not good for event/message driven SMs (i/o and alike). Remember main OOP principle, as it is understood by Alan Key? It is message exchanging, not class hierarchy. In this sense this craft is OOP twice, both classes and message exchanging. Another important point is SM decomposition & hierarchy (I mean master/slave (or customer/provider) relation here, not 'A is B' relation) - instead of having one huge SM I decompose the task into subtasks and construct various SMs. See for example that echo-server - it has 3 level hierarchy of states machines: LISTENER <-> {WORKERS} <-> {RX,TX} When I wrote the very first version of EDSM engine (more than 5 years ago, may be 7 or so), I... I was just stuck - how to design machines themselves?!? Well, the engine is simple, but what next? How do I choose states? After a while I came to a strong rule - as long as I want to send/recv some "atomic" portion of data, it is a state (called stage in D version)! Then as a result of elimination of code duplcation I "invented" kinda universal RX and TX machine and so on... In general I've found SM very nice way of designing (networking) software. Enjoy! :)
Feb 23 2022
On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance[...]As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
Feb 23 2022
On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:Wagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API. see also http://www.stateworks.com/On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance[...]As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
Feb 23 2022
On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:Wagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API.Ah! I also have EDSM for bare metal (AVR8 to be exact) There is [some description](http://zed.karelia.ru/go.to/for.all/software/avr8-edsm) (in Russian), but you can look to the [C-source](http://zed.karelia.ru/mmedia/bin/avr8-edsm-r0.tar.gz)
Feb 23 2022
On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:Thank you so much!On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:Wagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API. see also http://www.stateworks.com/On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance[...]As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
Feb 24 2022
On Thursday, 24 February 2022 at 11:27:56 UTC, Tejas wrote:Also there is [very nice discussion](https://embeddedgurus.com/state-space/2011/06/protothreads-versus-state-machines/) I'll just cite: " Pure event-driven programming (without blocking) naturally partitions the code into small chunks that handle events. State machines partition the code even finer, because you have small chunks that are called only for a specific state-event combination. " This is exactly about what I've mentioned: event-driven + state-machines = fine-graned-concurrencyWagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API. see also http://www.stateworks.com/Thank you so much!
Feb 24 2022
On 2/19/22 12:13, Chris Piker wrote:* gotchas you've run into in your multi-threaded (or just concurrent) programs,I use the exact scenario that you describe: Multiple threads process data and pass the results to a "writer" thread that persist it in a file. The main gotcha is your thread disappearing without a trace. The most common reason is it throws an exception and dies. Another one is to set the message box sizes to throttle. Otherwise, producers could produce more than the available memory before the consumer could consume it. Unlike the main thread, there is nobody to catch an report this "uncaught" exception. https://dlang.org/phobos/std_concurrency.html#.setMaxMailboxSize You need to experiment with different number of threads, the buffer size that you mention, different lengths of message boxes, etc. For example, I could not gain more benefit in my program beyond 3 threads (but still set the number to 4 :p Humans are crazy.). In case you haven't seen yet, the recipe for std.concurrency that works for me is summarized here: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s Ali
Feb 20 2022
On Sunday, 20 February 2022 at 17:58:41 UTC, Ali Çehreli wrote:Another one is to set the message box sizes to throttle.Message sizes and rates are relatively well know so it will be easy to pick a throttle point that's unlikely to backup the source yet provide for some quick DB maintenance in the middle of a testing session.In case you haven't seen yet, the recipe for std.concurrency that works for me is summarized here: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735sThanks! I like your simple exception wrapping pattern, will use that.
Feb 20 2022
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:In general it should buffer data in RAM to avoid exerting back pressure on the input socket and to allow for dropped connections to the PG database.If I get it right you want to restore connection if it was closed by server for some reason. I use special SM for that purpose, see [this picture](http://zed.karelia.ru/0/e/db-link.jpg) In each state where this SM has to send/recv data, it takes sending/receiving SM from a pool and commands them to perform the task. Upon reaching IDLE state this machine send some messsge to the user (another SM) of the connection and seats in this state until the user detects connection lost (in which case it sends M2 to DB-LINK SM). Then DB-LINK goes to WAIT state, where it starts a timer and when it expires, it goes to CONN state, where it tries to reconnect (using sending SM - when connection is ready we get POLLOUT on socket). You can have as many such connectors as you want, so you have multiple connections within single thread. I often use two connections, one for perform main task (upload some data and alike) and the second for getting notifications from PG, 'cause it very incovinient to do both in a single connection.
Feb 20 2022
On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:I often use two connections, one for perform main task (upload some data and alike) and the second for getting notifications from PG, 'cause it very incovinient to do both in a single connection.Ah, a very handy tip. It would be convoluted to multiplex notifications on the data connection.
Feb 20 2022
On Monday, 21 February 2022 at 04:48:56 UTC, Chris Piker wrote:On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:I am remembering psql client behavior - it sees notifications only after some request. It is really inconvinient to perform regular tasks and be ready to peek up notifications at any moment in one connection.I often use two connections, one for perform main task (upload some data and alike) and the second for getting notifications from PG, 'cause it very incovinient to do both in a single connection.Ah, a very handy tip. It would be convoluted to multiplex notifications on the data connection.
Feb 20 2022
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:3. Update/insert to a postgresql database as data arrive.I've remembered one not so obvious feature of TCP sockets behavour. If the connection is closed on the server side (i.e. on the client side the socket is in CLOSE_WAIT state), **write() will not fail**, and data will go to no nowhere. For this reason I have this (commented) code: ```d bool connOk() { /* tcp_info tcpi; // linux specific, see /usr/include/linux/tcp.h socklen_t tcpi_len = tcpi.sizeof; getsockopt(id, SOL_TCP, TCP_INFO, &tcpi, &tcpi_len); if (tcpi.tcpi_state != TCP_ESTABLISHED) return false; */ return true; // TODO } ``` I've dig up this in one of my progs (uploader to pg db): ```c /* check conn */ int dbu_psgr_ckcn_enter(struct edsm *me) { struct mexprxtx_data *ctx = me->data; struct databuffer *ob = &ctx->obuf; struct databuffer *sb = &ctx->sbuf; int est = 0; if (ctx->sock > 0) est = csock_check_state(ctx->sock); if (!est) { /* save request to spare buffer */ __dbu_make_request_copy(sb, ob); if (ctx->sock > 0) log_inf_msg( "%s()/DBU_%.3d - server has closed connection (fd %d)\n", __func__, me->number, ctx->sock ); edsm_put_event(me, ECM1_CONN); } else { edsm_put_event(me, ECM0_SEND); } return 0; } ``` I definitely did not want to loose data and so I am checking socket state before doing an INSERT request. I do not think it is 100% reliable, but I could not invent anything else.
Feb 24 2022
On Thursday, 24 February 2022 at 08:46:35 UTC, eugene wrote:On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:https://stackoverflow.com/questions/26130010/avoiding-dataloss-in-go-when-writing-with-close-wait-socket In my case (I was working with REDIS KVS at the moment) exact scenario was as follows: * prog gets EPOLLOUT (write() won't block) * prog writes()'s data to REDIS ("successfully") * prog gets EPOLLERR|EPOLLHUP After this I see that I need to reconnect, but I've already send some data which had gone to nowhere. People sometimes recommend using usual non-blocking sockets, but it is not the case.3. Update/insert to a postgresql database as data arrive.I've remembered one not so obvious feature of TCP sockets behavour. If the connection is closed on the server side (i.e. on the client side the socket is in CLOSE_WAIT state), **write() will not fail**, and data will go to no nowhere.
Feb 24 2022
On Thursday, 24 February 2022 at 09:11:01 UTC, eugene wrote:In my case (I was working with REDIS KVS at the moment) exact scenario was as follows: * prog gets EPOLLOUT (write() won't block) * prog writes()'s data to REDIS ("successfully") * prog gets EPOLLERR|EPOLLHUP After this I see that I need to reconnect, but I've already send some data which had gone to nowhere. People sometimes recommend using usual non-blocking sockets, but it is not the case.errhh... usual **blocking** sockets, of course. however then I have to deal with multiprocess/multithread architechture, but my goal was *fine-graned* concurrency within single process.
Feb 24 2022