Hi all
Is it possible to send and receive key,value pairs at runtime? I would like to broadcast values at runtime so they are available on every node. This somehow seems essential for an implementation of Asynchronous batch iterations. What I would like is to have two functions in the RuntimeContext, broadcast(key,value) and receive(key). If such a thing doesn't exist now, how should I approach implementing it? Cheers! Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 |
Hi Sachin,
Do you know about Broadcast Variables? They allow you to transfer a DataSet to all nodes. https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#broadcast-variables Let us know if that fits your needs. Cheers, Max On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <[hidden email]> wrote: > Hi all > Is it possible to send and receive key,value pairs at runtime? I would like > to broadcast values at runtime so they are available on every node. This > somehow seems essential for an implementation of Asynchronous batch > iterations. > > What I would like is to have two functions in the RuntimeContext, > broadcast(key,value) and receive(key). > If such a thing doesn't exist now, how should I approach implementing it? > > Cheers! > Sachin > -- Sachin Goel > Computer Science, IIT Delhi > m. +91-9871457685 > |
In reply to this post by Sachin Goel
You are probably looking for a parameter server tool.
How about setting up one of these memory grids to use that? Apache Ignite, or Apache Geode, or one of those. On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <[hidden email]> wrote: > Hi all > Is it possible to send and receive key,value pairs at runtime? I would like > to broadcast values at runtime so they are available on every node. This > somehow seems essential for an implementation of Asynchronous batch > iterations. > > What I would like is to have two functions in the RuntimeContext, > broadcast(key,value) and receive(key). > If such a thing doesn't exist now, how should I approach implementing it? > > Cheers! > Sachin > -- Sachin Goel > Computer Science, IIT Delhi > m. +91-9871457685 > |
@Max, broadcast variables have to be declared before the program is
executed. I want to be able to do something whereby I can send data inside a map operation to all nodes. This would perhaps have an effect similar to the recent discussion on making Accumulators available before Job completion. [I don't know if it has been merged yet.] @Stephan, I was thinking more in the direction of adding a task at every worker which works as an Event handler to receive data from other nodes. I got the idea after going through the iterative runtime module which has a separate task for handling synchronization. I followed it up with going through the optimizer routine wherein it is added to some *auxiliary vertices *list. Can you elaborate on that? Further, is it possible to add an event handler to the runtime context itself? I wrote the boilerplate stuff for this but I haven't defined the channels where the event would be written to the network so other workers can also listen to it. [ https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be able to call broadcast and receive as getRuntimeContext.broadcast and getRuntimeContext.receive, which ensures complete data transmission across nodes, even at runtime. I'm not sure how to add a separate task which handles event over the network like the IterationSynchronization task [which is the best use I could find of using event handlers]. Let me know your thoughts. Or if you can think of a better way to achieve sharing of data which is *only *evaluated at runtime. Cheers! Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <[hidden email]> wrote: > You are probably looking for a parameter server tool. > > How about setting up one of these memory grids to use that? Apache Ignite, > or Apache Geode, or one of those. > > On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <[hidden email]> > wrote: > > > Hi all > > Is it possible to send and receive key,value pairs at runtime? I would > like > > to broadcast values at runtime so they are available on every node. This > > somehow seems essential for an implementation of Asynchronous batch > > iterations. > > > > What I would like is to have two functions in the RuntimeContext, > > broadcast(key,value) and receive(key). > > If such a thing doesn't exist now, how should I approach implementing it? > > > > Cheers! > > Sachin > > -- Sachin Goel > > Computer Science, IIT Delhi > > m. +91-9871457685 > > > |
I see the use case for that, but I don't think that this should be realized
through Flink's network stack. The network stack is designed for continuous streams, backpressure, and configurable throughput/latency. I think it is not the best solution for asynchronous messaging. Also, for the ML field, you will want to be able to send "fire and forget" messages, as exact and reliable updates are usually necessary for stochastic algorithms. To me, this calls for a different communication fabric. On Mon, Jul 20, 2015 at 5:22 PM, Sachin Goel <[hidden email]> wrote: > @Max, broadcast variables have to be declared before the program is > executed. I want to be able to do something whereby I can send data inside > a map operation to all nodes. This would perhaps have an effect similar to > the recent discussion on making Accumulators available before Job > completion. [I don't know if it has been merged yet.] > > @Stephan, I was thinking more in the direction of adding a task at every > worker which works as an Event handler to receive data from other nodes. I > got the idea after going through the iterative runtime module which has a > separate task for handling synchronization. I followed it up with going > through the optimizer routine wherein it is added to some *auxiliary > vertices *list. Can you elaborate on that? > > Further, is it possible to add an event handler to the runtime context > itself? I wrote the boilerplate stuff for this but I haven't defined the > channels where the event would be written to the network so other workers > can also listen to it. [ > https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be > able > to call broadcast and receive as getRuntimeContext.broadcast and > getRuntimeContext.receive, which ensures complete data transmission across > nodes, even at runtime. > I'm not sure how to add a separate task which handles event over the > network like the IterationSynchronization task [which is the best use I > could find of using event handlers]. > > Let me know your thoughts. Or if you can think of a better way to achieve > sharing of data which is *only *evaluated at runtime. > > Cheers! > Sachin > > -- Sachin Goel > Computer Science, IIT Delhi > m. +91-9871457685 > > On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <[hidden email]> wrote: > > > You are probably looking for a parameter server tool. > > > > How about setting up one of these memory grids to use that? Apache > Ignite, > > or Apache Geode, or one of those. > > > > On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <[hidden email]> > > wrote: > > > > > Hi all > > > Is it possible to send and receive key,value pairs at runtime? I would > > like > > > to broadcast values at runtime so they are available on every node. > This > > > somehow seems essential for an implementation of Asynchronous batch > > > iterations. > > > > > > What I would like is to have two functions in the RuntimeContext, > > > broadcast(key,value) and receive(key). > > > If such a thing doesn't exist now, how should I approach implementing > it? > > > > > > Cheers! > > > Sachin > > > -- Sachin Goel > > > Computer Science, IIT Delhi > > > m. +91-9871457685 > > > > > > |
@Stephan, why would there be a problem with the reliability of updates?
Also, what would be the best way to achieve this functionality? Preferably using the current API. Cheers! Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Tue, Jul 21, 2015 at 4:17 PM, Stephan Ewen <[hidden email]> wrote: > I see the use case for that, but I don't think that this should be realized > through Flink's network stack. The network stack is designed for continuous > streams, backpressure, and configurable throughput/latency. I think it is > not the best solution for asynchronous messaging. > > Also, for the ML field, you will want to be able to send "fire and forget" > messages, as exact and reliable updates are usually necessary for > stochastic algorithms. > > To me, this calls for a different communication fabric. > > > On Mon, Jul 20, 2015 at 5:22 PM, Sachin Goel <[hidden email]> > wrote: > > > @Max, broadcast variables have to be declared before the program is > > executed. I want to be able to do something whereby I can send data > inside > > a map operation to all nodes. This would perhaps have an effect similar > to > > the recent discussion on making Accumulators available before Job > > completion. [I don't know if it has been merged yet.] > > > > @Stephan, I was thinking more in the direction of adding a task at every > > worker which works as an Event handler to receive data from other nodes. > I > > got the idea after going through the iterative runtime module which has a > > separate task for handling synchronization. I followed it up with going > > through the optimizer routine wherein it is added to some *auxiliary > > vertices *list. Can you elaborate on that? > > > > Further, is it possible to add an event handler to the runtime context > > itself? I wrote the boilerplate stuff for this but I haven't defined the > > channels where the event would be written to the network so other workers > > can also listen to it. [ > > https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be > > able > > to call broadcast and receive as getRuntimeContext.broadcast and > > getRuntimeContext.receive, which ensures complete data transmission > across > > nodes, even at runtime. > > I'm not sure how to add a separate task which handles event over the > > network like the IterationSynchronization task [which is the best use I > > could find of using event handlers]. > > > > Let me know your thoughts. Or if you can think of a better way to achieve > > sharing of data which is *only *evaluated at runtime. > > > > Cheers! > > Sachin > > > > -- Sachin Goel > > Computer Science, IIT Delhi > > m. +91-9871457685 > > > > On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <[hidden email]> wrote: > > > > > You are probably looking for a parameter server tool. > > > > > > How about setting up one of these memory grids to use that? Apache > > Ignite, > > > or Apache Geode, or one of those. > > > > > > On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <[hidden email] > > > > > wrote: > > > > > > > Hi all > > > > Is it possible to send and receive key,value pairs at runtime? I > would > > > like > > > > to broadcast values at runtime so they are available on every node. > > This > > > > somehow seems essential for an implementation of Asynchronous batch > > > > iterations. > > > > > > > > What I would like is to have two functions in the RuntimeContext, > > > > broadcast(key,value) and receive(key). > > > > If such a thing doesn't exist now, how should I approach implementing > > it? > > > > > > > > Cheers! > > > > Sachin > > > > -- Sachin Goel > > > > Computer Science, IIT Delhi > > > > m. +91-9871457685 > > > > > > > > > > |
Free forum by Nabble | Edit this page |