Hey all,
I had a post a while ago about needing neural networks. We specifically need a very special type that are good for time series/sensors called LSTM. We had a talk about pros/cons of using deeplearning4j for this use case and eventually decided it made more sense to implement in native Flink for our use case. So, this is somewhat relevant to what Theodore just said, but different enough that I wanted a separate thread. "Focusing on Flink does well and implement algorithms built around inherent advantages..." One thing that jumps to mind is doing online learning. The batch nature of all of the other 'big boys' means that they are by definition going to always be offline modes. Also, even though LTSMs are somewhat of a corner case in the NN world, the streaming nature of Flink (a sequence of data) makes fairly relevant to people who would be using Flink in the first place (? IMHO) Finally, there should be some positive externalities that come from this such as a back propegation algorithm, which should then be reusable for things like HMMs. So at any rate, the research Spike for me started earlier this week- I hope to start cutting some scala code over the weekend or beginning of next week. Also I'm asking to check out FLINK-2259 because I need some sort of functionality like that before I get started, and I could use the git practice. Idk if there is any interest in adding this or if you want to make a JIRA for LTSM neural nets (or if I should write one, with appropriate papers cited, as seems to be the fashion), or maybe wait and see what I end up with? Also- I'll probably be blowing you up with questions. Best, tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* |
On Fri, Feb 12, 2016 at 8:45 AM, Trevor Grant <[hidden email]>
wrote: > Hey all, > > I had a post a while ago about needing neural networks. We specifically > need a very special type that are good for time series/sensors called > LSTM. We had a talk about pros/cons of using deeplearning4j for this use > case and eventually decided it made more sense to implement in native Flink > for our use case. > > So, this is somewhat relevant to what Theodore just said, but different > enough that I wanted a separate thread. > > "Focusing on Flink does well and implement algorithms built around inherent > advantages..." > > One thing that jumps to mind is doing online learning. The batch nature of > all of the other 'big boys' means that they are by definition going to > always be offline modes. > > Also, even though LTSMs are somewhat of a corner case in the NN world, the > streaming nature of Flink (a sequence of data) makes fairly relevant to > people who would be using Flink in the first place (? IMHO) > > Finally, there should be some positive externalities that come from this > such as a back propegation algorithm, which should then be reusable for > things like HMMs. > > So at any rate, the research Spike for me started earlier this week- I hope > to start cutting some scala code over the weekend or beginning of next > week. Also I'm asking to check out FLINK-2259 because I need some sort of > functionality like that before I get started, and I could use the git > practice. > > Idk if there is any interest in adding this or if you want to make a JIRA > for LTSM neural nets (or if I should write one, with appropriate papers > cited, as seems to be the fashion), or maybe wait and see what I end up > with? > > It would be good if we also supported Bidirectional LSTMs. http://www.cs.toronto.edu/~graves/asru_2013.pdf http://www.cs.toronto.edu/~graves/phd.pdf > Also- I'll probably be blowing you up with questions. > > Best, > > tg > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > |
Agreed. Our reasoning for for contributing straight to Flink was we plan on
doing a lot of wierd monkey-ing around with these things, and were going to have to get our hands dirty with some code eventually anyway. The LSTM isn't *that* difficult to implement, and it seems easier to write our own than to understand someone else's insanity. The plan is to get a 'basic' version going, then start tweaking the special cases. We have a use case for bi-directional, but it's not our primary motivation. I have no problem exposing new flavors as we make them. tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Fri, Feb 12, 2016 at 7:51 AM, Suneel Marthi <[hidden email]> wrote: > On Fri, Feb 12, 2016 at 8:45 AM, Trevor Grant <[hidden email]> > wrote: > > > Hey all, > > > > I had a post a while ago about needing neural networks. We specifically > > need a very special type that are good for time series/sensors called > > LSTM. We had a talk about pros/cons of using deeplearning4j for this use > > case and eventually decided it made more sense to implement in native > Flink > > for our use case. > > > > So, this is somewhat relevant to what Theodore just said, but different > > enough that I wanted a separate thread. > > > > "Focusing on Flink does well and implement algorithms built around > inherent > > advantages..." > > > > One thing that jumps to mind is doing online learning. The batch nature > of > > all of the other 'big boys' means that they are by definition going to > > always be offline modes. > > > > Also, even though LTSMs are somewhat of a corner case in the NN world, > the > > streaming nature of Flink (a sequence of data) makes fairly relevant to > > people who would be using Flink in the first place (? IMHO) > > > > Finally, there should be some positive externalities that come from this > > such as a back propegation algorithm, which should then be reusable for > > things like HMMs. > > > > So at any rate, the research Spike for me started earlier this week- I > hope > > to start cutting some scala code over the weekend or beginning of next > > week. Also I'm asking to check out FLINK-2259 because I need some sort of > > functionality like that before I get started, and I could use the git > > practice. > > > > Idk if there is any interest in adding this or if you want to make a JIRA > > for LTSM neural nets (or if I should write one, with appropriate papers > > cited, as seems to be the fashion), or maybe wait and see what I end up > > with? > > > > It would be good if we also supported Bidirectional LSTMs. > > http://www.cs.toronto.edu/~graves/asru_2013.pdf > > http://www.cs.toronto.edu/~graves/phd.pdf > > > > > > Also- I'll probably be blowing you up with questions. > > > > Best, > > > > tg > > > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > |
Asking as someone that never did NN on Flink, would you implement it using
JCuda? And would you implement it with model parallelization? Is there any theoretical limit to implement "model and data parallelism" in Flink? If you don't use GPUs and you don't parallelize models and data at the same time, what is your motivation to do such a thing on Flink instead of a local enviroment that would probably be more performant on a certain degree? 2016-02-12 14:58 GMT+01:00 Trevor Grant <[hidden email]>: > Agreed. Our reasoning for for contributing straight to Flink was we plan on > doing a lot of wierd monkey-ing around with these things, and were going to > have to get our hands dirty with some code eventually anyway. The LSTM > isn't *that* difficult to implement, and it seems easier to write our own > than to understand someone else's insanity. > > The plan is to get a 'basic' version going, then start tweaking the special > cases. We have a use case for bi-directional, but it's not our primary > motivation. I have no problem exposing new flavors as we make them. > > tg > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Fri, Feb 12, 2016 at 7:51 AM, Suneel Marthi <[hidden email]> > wrote: > > > On Fri, Feb 12, 2016 at 8:45 AM, Trevor Grant <[hidden email]> > > wrote: > > > > > Hey all, > > > > > > I had a post a while ago about needing neural networks. We > specifically > > > need a very special type that are good for time series/sensors called > > > LSTM. We had a talk about pros/cons of using deeplearning4j for this > use > > > case and eventually decided it made more sense to implement in native > > Flink > > > for our use case. > > > > > > So, this is somewhat relevant to what Theodore just said, but different > > > enough that I wanted a separate thread. > > > > > > "Focusing on Flink does well and implement algorithms built around > > inherent > > > advantages..." > > > > > > One thing that jumps to mind is doing online learning. The batch > nature > > of > > > all of the other 'big boys' means that they are by definition going to > > > always be offline modes. > > > > > > Also, even though LTSMs are somewhat of a corner case in the NN world, > > the > > > streaming nature of Flink (a sequence of data) makes fairly relevant to > > > people who would be using Flink in the first place (? IMHO) > > > > > > Finally, there should be some positive externalities that come from > this > > > such as a back propegation algorithm, which should then be reusable for > > > things like HMMs. > > > > > > So at any rate, the research Spike for me started earlier this week- I > > hope > > > to start cutting some scala code over the weekend or beginning of next > > > week. Also I'm asking to check out FLINK-2259 because I need some sort > of > > > functionality like that before I get started, and I could use the git > > > practice. > > > > > > Idk if there is any interest in adding this or if you want to make a > JIRA > > > for LTSM neural nets (or if I should write one, with appropriate papers > > > cited, as seems to be the fashion), or maybe wait and see what I end up > > > with? > > > > > > It would be good if we also supported Bidirectional LSTMs. > > > > http://www.cs.toronto.edu/~graves/asru_2013.pdf > > > > http://www.cs.toronto.edu/~graves/phd.pdf > > > > > > > > > > > Also- I'll probably be blowing you up with questions. > > > > > > Best, > > > > > > tg > > > > > > > > > > > > Trevor Grant > > > Data Scientist > > > https://github.com/rawkintrevo > > > http://stackexchange.com/users/3002022/rawkintrevo > > > http://trevorgrant.org > > > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > |
JCuda: No, I'm not willing to rely on servers having NVidia cards (some one
who is more familiar with server hardware may correct me, in which case I'll say, "No, because *my* servers don't have NVidia cards- someone else can add"). Paralleization: Yes.Admittedly, very clever use of Python could probably be used to solve this problem depending on how we cut it up (I anticipate cursing myself for not going this route several times in the weeks to come). The motivation for Flink over Python is a solution that is the hope for a more general and reusable approach. Neural networks in general are solvable so long as you have some decent linear algebra backing you up. (However, I'm also toying with the idea of additionally putting in an evolutionary algorithm approach as an alternative to back propagation through time) The thought guiding this, to borrow a term from American auto racing, is "there is no replacement for displacement" - meaning, a reasonably functional 7 liter engine will be powerful than a performance tuned 1.6 liter engine. In this case- an OK implementation in Flink spread over lots and lots of processors being more powerful than a local 'sport-tuned' implementation with clever algorithms and GPUs etc, etc. (The arguments against evolutionary algorithms in solving neural networks normally revolves around the concept of efficiency, however doing several generations on each node then reporting best parameter sets to be 'bred' then re broadcasting parameter sets is a natural fit for distributed systems. More of an academic exercise, but interesting conceptually- I know there are some grad students reading this who are itching for thesis projects; Olcay Akman and I did something similar for an implementation in R, see my github repo IRENE for a very ugly implementation) The motivation for Flink over an alternative big-data platform (see SPARK-2352) is A) online learning and sequences intuitively seems to be a better fit for Flink's streaming architecture, B) I don't know much about SparkML code base so I'd there would be an additional learning curve, C) I'd have to spend the rest of my life looking over my shoulder to maker sure Slim wasn't going jump out and get me (we live in the same city, the fear is real). tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Fri, Feb 12, 2016 at 8:04 AM, Simone Robutti < [hidden email]> wrote: > Asking as someone that never did NN on Flink, would you implement it using > JCuda? And would you implement it with model parallelization? Is there any > theoretical limit to implement "model and data parallelism" in Flink? If > you don't use GPUs and you don't parallelize models and data at the same > time, what is your motivation to do such a thing on Flink instead of a > local enviroment that would probably be more performant on a certain > degree? > > 2016-02-12 14:58 GMT+01:00 Trevor Grant <[hidden email]>: > > > Agreed. Our reasoning for for contributing straight to Flink was we plan > on > > doing a lot of wierd monkey-ing around with these things, and were going > to > > have to get our hands dirty with some code eventually anyway. The LSTM > > isn't *that* difficult to implement, and it seems easier to write our own > > than to understand someone else's insanity. > > > > The plan is to get a 'basic' version going, then start tweaking the > special > > cases. We have a use case for bi-directional, but it's not our primary > > motivation. I have no problem exposing new flavors as we make them. > > > > tg > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Fri, Feb 12, 2016 at 7:51 AM, Suneel Marthi <[hidden email]> > > wrote: > > > > > On Fri, Feb 12, 2016 at 8:45 AM, Trevor Grant < > [hidden email]> > > > wrote: > > > > > > > Hey all, > > > > > > > > I had a post a while ago about needing neural networks. We > > specifically > > > > need a very special type that are good for time series/sensors called > > > > LSTM. We had a talk about pros/cons of using deeplearning4j for this > > use > > > > case and eventually decided it made more sense to implement in native > > > Flink > > > > for our use case. > > > > > > > > So, this is somewhat relevant to what Theodore just said, but > different > > > > enough that I wanted a separate thread. > > > > > > > > "Focusing on Flink does well and implement algorithms built around > > > inherent > > > > advantages..." > > > > > > > > One thing that jumps to mind is doing online learning. The batch > > nature > > > of > > > > all of the other 'big boys' means that they are by definition going > to > > > > always be offline modes. > > > > > > > > Also, even though LTSMs are somewhat of a corner case in the NN > world, > > > the > > > > streaming nature of Flink (a sequence of data) makes fairly relevant > to > > > > people who would be using Flink in the first place (? IMHO) > > > > > > > > Finally, there should be some positive externalities that come from > > this > > > > such as a back propegation algorithm, which should then be reusable > for > > > > things like HMMs. > > > > > > > > So at any rate, the research Spike for me started earlier this week- > I > > > hope > > > > to start cutting some scala code over the weekend or beginning of > next > > > > week. Also I'm asking to check out FLINK-2259 because I need some > sort > > of > > > > functionality like that before I get started, and I could use the git > > > > practice. > > > > > > > > Idk if there is any interest in adding this or if you want to make a > > JIRA > > > > for LTSM neural nets (or if I should write one, with appropriate > papers > > > > cited, as seems to be the fashion), or maybe wait and see what I end > up > > > > with? > > > > > > > > It would be good if we also supported Bidirectional LSTMs. > > > > > > http://www.cs.toronto.edu/~graves/asru_2013.pdf > > > > > > http://www.cs.toronto.edu/~graves/phd.pdf > > > > > > > > > > > > > > > > Also- I'll probably be blowing you up with questions. > > > > > > > > Best, > > > > > > > > tg > > > > > > > > > > > > > > > > Trevor Grant > > > > Data Scientist > > > > https://github.com/rawkintrevo > > > > http://stackexchange.com/users/3002022/rawkintrevo > > > > http://trevorgrant.org > > > > > > > > *"Fortunate is he, who is able to know the causes of things." > -Virgil* > > > > > > > > > > |
Free forum by Nabble | Edit this page |