Hey,
I have a working prototype of an multi layer perceptron implementation working in Flink. I made every possible effort to utilize existing code when possible. In the process of doing this there were some hacks I want/need, and think this should be broken up into multiple PRs and possible abstract out the whole thing because the MLP implementation I came up with is itself designed to be extendable to Long Short Term Memory Networks. Top level here are some of the sub PRs - Expand SGD to allow for predicting vectors instead of just Doubles. This allows the same NN code (and other algos) to be used for classification, transformations, and regressions. - Allow for 'warm starts' -> this requires adding a parameter to IterativeSolver that basically starts on iteration N. This is somewhat akin to the idea of partial fits in sklearn OR making the iterative solver have some sort of internal counter and then when you call 'fit' it just runs another N iterations (which is set by SetIterations) instead of assuming it is back to zero. This might seem trivial but has significant impact on step size calculations. - A library of model grading metrics. Having 'calculate RSquare' as a built in method for every regressor doesn't seem like an efficient way to do this long term. -BLAS for matrix ops (this was talked about earlier) - A neural net has Arrays of matrices of weights (instead of just a vector). Currently I flatten the array of matrices out into a weight vector and reassemble it into an array of matrices, though this is probably not super effecient. - The linear regression implementation currently presumes it will be using SGD but I think that should be 'settable' as a parameter, because if not- why do we have all of those other nice SGD methods just hanging out? Similarly the loss function / partial loss is hard coded. I reccomend making the current setup the 'defaults' of a 'setOptimizer' method. I.e. if you want to just run a MLR you can do it based on the examples, but if you want to use a fancy optimizer you can create it from existing methods, or make your own, then call something like `mlr.setOptimizer( myOptimizer )` - and more At any rate- if some people could weigh in / direct me how to proceed that would be swell. Thanks! tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* |
Hello Trevor,
These are indeed a lot of issues, let's see if we can fit the discussion for all of them in one thread. I'll add some comments inline. - Expand SGD to allow for predicting vectors instead of just Doubles. We have discussed this in the past and at that point decided that it didn't make sense to change the base SGD implementation to accommodate vectors. The alternatives that were presented at the time were to abstract away the type of the input/output in the Optimizer (allowing for both Vectors and Doubles), or to create specialized classes for each case. That also gives us greater flexibility in terms of optimizing performance. In terms of the ANN, I think you can hide away the Vectors in the implementation of the ANN model, and use the Optimizer interface as is, like A. Ulanov did with the Spark ANN <https://github.com/apache/spark/pull/7621/files> implementation <https://github.com/apache/spark/pull/7621/files>. - Allow for 'warm starts' I like the idea of having a partiFit-like function, could you present a couple of use cases where we might use it? I'm wondering if savepoints already cover this functionality. - A library of model grading metrics. > We have a (perpetually) open PR <https://github.com/apache/flink/pull/871> for an evaluation framework. Could you expand on "Having 'calculate RSquare' as a built in method for every regressor doesn't seem like an efficient way to do this long term." -BLAS for matrix ops (this was talked about earlier) This will be a good addition. If they are specific to the ANN implementation however I would hide them away from the rest of the code (and include in that PR only) until another usecase comes up. - A neural net has Arrays of matrices of weights (instead of just a vector). > Yes this is probably not the most efficient way to do this, but it's the "least API breaking" I'm afraid. - The linear regression implementation currently presumes it will be using > SGD but I think that should be 'settable' as a parameter > The original Optimizer was written the way you described, but we changed it later IIRC to make it more accessible (e.g. for users that don't know that you can't match L1 regularization with L-BFGS). Maybe Till can say more about the other reasons this was changed. On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]> wrote: > Hey, > > I have a working prototype of an multi layer perceptron implementation > working in Flink. > > I made every possible effort to utilize existing code when possible. > > In the process of doing this there were some hacks I want/need, and think > this should be broken up into multiple PRs and possible abstract out the > whole thing because the MLP implementation I came up with is itself > designed to be extendable to Long Short Term Memory Networks. > > Top level here are some of the sub PRs > > - Expand SGD to allow for predicting vectors instead of just Doubles. This > allows the same NN code (and other algos) to be used for classification, > transformations, and regressions. > > - Allow for 'warm starts' -> this requires adding a parameter to > IterativeSolver that basically starts on iteration N. This is somewhat > akin to the idea of partial fits in sklearn OR making the iterative solver > have some sort of internal counter and then when you call 'fit' it just > runs another N iterations (which is set by SetIterations) instead of > assuming it is back to zero. This might seem trivial but has significant > impact on step size calculations. > > - A library of model grading metrics. Having 'calculate RSquare' as a built > in method for every regressor doesn't seem like an efficient way to do this > long term. > > -BLAS for matrix ops (this was talked about earlier) > > - A neural net has Arrays of matrices of weights (instead of just a > vector). Currently I flatten the array of matrices out into a weight > vector and reassemble it into an array of matrices, though this is probably > not super effecient. > > - The linear regression implementation currently presumes it will be using > SGD but I think that should be 'settable' as a parameter, because if not- > why do we have all of those other nice SGD methods just hanging out? > Similarly the loss function / partial loss is hard coded. I reccomend > making the current setup the 'defaults' of a 'setOptimizer' method. I.e. > if you want to just run a MLR you can do it based on the examples, but if > you want to use a fancy optimizer you can create it from existing methods, > or make your own, then call something like `mlr.setOptimizer( myOptimizer > )` > > - and more > > At any rate- if some people could weigh in / direct me how to proceed that > would be swell. > > Thanks! > tg > > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > |
In reply to this post by Trevor Grant
Hi Trevor,
great to hear that you have a working prototype :-) And it is also good that you shared your insights you gained when implementing it. Flink’s ML library is far from perfect and, thus, all kinds of feedback is highly valuable. In general it is always good to contribute code back if you think they make a valuable addition. I try to give some comments to your points as inline comments. On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]> wrote: Hey, > > I have a working prototype of an multi layer perceptron implementation > working in Flink. > > I made every possible effort to utilize existing code when possible. > > In the process of doing this there were some hacks I want/need, and think > this should be broken up into multiple PRs and possible abstract out the > whole thing because the MLP implementation I came up with is itself > designed to be extendable to Long Short Term Memory Networks. > Top level here are some of the sub PRs > > - Expand SGD to allow for predicting vectors instead of just Doubles. This > allows the same NN code (and other algos) to be used for classification, > transformations, and regressions. > I agree that we could extend the LabeledVector to support a Vector[Double] as label instead of a single Double. Initially we implemented it with a single label value for the sake of simplicity. But I remember that we also had a discussion about it. But somehow we didn’t derive any action points from that. If you have code for that, then feel free to open a PR. > - Allow for 'warm starts' -> this requires adding a parameter to > IterativeSolver that basically starts on iteration N. This is somewhat > akin to the idea of partial fits in sklearn OR making the iterative solver > have some sort of internal counter and then when you call 'fit' it just > runs another N iterations (which is set by SetIterations) instead of > assuming it is back to zero. This might seem trivial but has significant > impact on step size calculations. > That is a good point and should not be too hard to add, I would assume. > - A library of model grading metrics. Having 'calculate RSquare' as a built > in method for every regressor doesn't seem like an efficient way to do this > long term. > Agreed. The squaredResidualSum method of MLR is just a convenience method to get at least one metric for the accuracy back. There is a PR open by Theo which adds an evaluation framework to flinkML [1]. If I’m not mistaken it should add some more generalized mean to calculate grading metrics. > -BLAS for matrix ops (this was talked about earlier) > Here the recommended way is to convert your matrix to a BreezeMatrix and then use the BLAS operation from there. > - A neural net has Arrays of matrices of weights (instead of just a > vector). Currently I flatten the array of matrices out into a weight > vector and reassemble it into an array of matrices, though this is probably > not super effecient. > I would assume that you should simply operate on the flattened vector without converting from one representation to the other. > - The linear regression implementation currently presumes it will be using > SGD but I think that should be 'settable' as a parameter, because if not- > why do we have all of those other nice SGD methods just hanging out? > Similarly the loss function / partial loss is hard coded. I reccomend > making the current setup the 'defaults' of a 'setOptimizer' method. I.e. > if you want to just run a MLR you can do it based on the examples, but if > you want to use a fancy optimizer you can create it from existing methods, > or make your own, then call something like `mlr.setOptimizer( myOptimizer > )` > > Agreed. That was actually also our plan but we haven’t come to that so - and more > > At any rate- if some people could weigh in / direct me how to proceed that > would be swell. > > Thanks! > tg > > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > Cheers, Till |
In reply to this post by Theodore Vasiloudis
OK, I'm trying to respond to you and Till in one thread so someone call me
out if I missed a point but here goes: SGD Predicting Vectors : There was discussion in the past regarding this- at the time it was decided to go with only Doubles for simplicity. I feel strongly that there is cause now for predicting vectors. This should be a separate PR. I'll open an issue, we can refer to earlier mailing list and reopen discussion on best way to proceed Warm Starts : Basically all that needs to be done here is for the iterative solver to keep track of what iteration it is on, and start from that iteration is WarmStart == True, then go another N iterations. I don't think savepoints solves this because of the way stepsizes are calculated in SGD, though I don't know enough about savepoints to say for sure. As Till said, and I agree, very simple fix. Use cases: Testing how new features (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000 data point bursts and measure the error, see how it decreases as time goes on. Also, model updates. E.g. I have a huge model that gets trained on a year of data and takes a day or two to do so, but after that I just want to update it nightly with the data from the last 24 hours, or at the extreme- online learning, e.g. every new data point updates the model. Model Grading Metrics: I'll chime in on the PR you mentioned. Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies into vectors it best done inside of methods that need such functionality seems to be the concensus. I'm ok with that, as I have such things working rather elegantly, but wanted to throw it out there anyway. BLAS ops for matrices: I'll take care of this in my code. adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to Till, Till said open a PR. I'll make the default SimpleSGD to maintain backwards compatibility New issues to create: [ ] Optimizer to predict vectors or Doubles and maintain backwards compatibility. [ ] Warm Start Functionality [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first iteration, other flavors to follow). Let me know if I missed anything. I'm guessing you guys are done for the day so I'll wait until tomorrow night my time (Chicago) before a I move ahead on anything, to give you a chance to respond. Thanks! tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < [hidden email]> wrote: > Hello Trevor, > > These are indeed a lot of issues, let's see if we can fit the discussion > for all of them > in one thread. > > I'll add some comments inline. > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > We have discussed this in the past and at that point decided that it didn't > make > sense to change the base SGD implementation to accommodate vectors. > The alternatives that were presented at the time were to abstract away > the type of the input/output in the Optimizer (allowing for both Vectors > and Doubles), > or to create specialized classes for each case. That also gives us greater > flexibility > in terms of optimizing performance. > > In terms of the ANN, I think you can hide away the Vectors in the > implementation of the ANN > model, and use the Optimizer interface as is, like A. Ulanov did with the > Spark > ANN > <https://github.com/apache/spark/pull/7621/files> > implementation <https://github.com/apache/spark/pull/7621/files>. > > - Allow for 'warm starts' > > > I like the idea of having a partiFit-like function, could you present a > couple > of use cases where we might use it? I'm wondering if savepoints already > cover > this functionality. > > - A library of model grading metrics. > > > > We have a (perpetually) open PR <https://github.com/apache/flink/pull/871> > for an evaluation framework. Could you > expand on "Having 'calculate RSquare' as a built in method for every > regressor > doesn't seem like an efficient way to do this long term." > > -BLAS for matrix ops (this was talked about earlier) > > > This will be a good addition. If they are specific to the ANN > implementation > however I would hide them away from the rest of the code (and include in > that PR > only) until another usecase comes up. > > - A neural net has Arrays of matrices of weights (instead of just a > vector). > > > > Yes this is probably not the most efficient way to do this, but it's the > "least > API breaking" I'm afraid. > > - The linear regression implementation currently presumes it will be using > > SGD but I think that should be 'settable' as a parameter > > > > The original Optimizer was written the way you described, but we changed it > later IIRC to make it more accessible (e.g. for users that don't know that > you can't match L1 regularization with L-BFGS). Maybe Till can say more > about the other reasons this was changed. > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]> > wrote: > > > Hey, > > > > I have a working prototype of an multi layer perceptron implementation > > working in Flink. > > > > I made every possible effort to utilize existing code when possible. > > > > In the process of doing this there were some hacks I want/need, and think > > this should be broken up into multiple PRs and possible abstract out the > > whole thing because the MLP implementation I came up with is itself > > designed to be extendable to Long Short Term Memory Networks. > > > > Top level here are some of the sub PRs > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > This > > allows the same NN code (and other algos) to be used for classification, > > transformations, and regressions. > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > IterativeSolver that basically starts on iteration N. This is somewhat > > akin to the idea of partial fits in sklearn OR making the iterative > solver > > have some sort of internal counter and then when you call 'fit' it just > > runs another N iterations (which is set by SetIterations) instead of > > assuming it is back to zero. This might seem trivial but has significant > > impact on step size calculations. > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > built > > in method for every regressor doesn't seem like an efficient way to do > this > > long term. > > > > -BLAS for matrix ops (this was talked about earlier) > > > > - A neural net has Arrays of matrices of weights (instead of just a > > vector). Currently I flatten the array of matrices out into a weight > > vector and reassemble it into an array of matrices, though this is > probably > > not super effecient. > > > > - The linear regression implementation currently presumes it will be > using > > SGD but I think that should be 'settable' as a parameter, because if not- > > why do we have all of those other nice SGD methods just hanging out? > > Similarly the loss function / partial loss is hard coded. I reccomend > > making the current setup the 'defaults' of a 'setOptimizer' method. I.e. > > if you want to just run a MLR you can do it based on the examples, but if > > you want to use a fancy optimizer you can create it from existing > methods, > > or make your own, then call something like `mlr.setOptimizer( myOptimizer > > )` > > > > - and more > > > > At any rate- if some people could weigh in / direct me how to proceed > that > > would be swell. > > > > Thanks! > > tg > > > > > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > |
> Adding a setOptimizer to IterativeSolver.
Do you mean MLR here? IterativeSolver is implemented by different solvers, I don't think adding a method like this makes sense there. In the case of MLR a better alternative that includes a bit more work is to create a Generalized Linear Model framework that provides implementations for the most common linear models (ridge, lasso etc.) I had already started work on this here <https://github.com/thvasilo/flink/commits/glm>, but never got around to opening a PR. The relevant JIRA is here <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer method in GeneralizedLinearModel (with some restrictions/warnings regarding choice of optimizer and regularization) would be the preferred option for me at least. Other than that the list looks fine :) On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <[hidden email]> wrote: > OK, I'm trying to respond to you and Till in one thread so someone call me > out if I missed a point but here goes: > > SGD Predicting Vectors : There was discussion in the past regarding this- > at the time it was decided to go with only Doubles for simplicity. I feel > strongly that there is cause now for predicting vectors. This should be a > separate PR. I'll open an issue, we can refer to earlier mailing list and > reopen discussion on best way to proceed > > Warm Starts : Basically all that needs to be done here is for the iterative > solver to keep track of what iteration it is on, and start from that > iteration is WarmStart == True, then go another N iterations. I don't > think savepoints solves this because of the way stepsizes are calculated in > SGD, though I don't know enough about savepoints to say for sure. As Till > said, and I agree, very simple fix. Use cases: Testing how new features > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000 > data point bursts and measure the error, see how it decreases as time goes > on. Also, model updates. E.g. I have a huge model that gets trained on a > year of data and takes a day or two to do so, but after that I just want to > update it nightly with the data from the last 24 hours, or at the extreme- > online learning, e.g. every new data point updates the model. > > Model Grading Metrics: I'll chime in on the PR you mentioned. > > Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies > into vectors it best done inside of methods that need such functionality > seems to be the concensus. I'm ok with that, as I have such things working > rather elegantly, but wanted to throw it out there anyway. > > BLAS ops for matrices: I'll take care of this in my code. > > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to > Till, Till said open a PR. I'll make the default SimpleSGD to maintain > backwards compatibility > > New issues to create: > [ ] Optimizer to predict vectors or Doubles and maintain backwards > compatibility. > [ ] Warm Start Functionality > [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. > [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first > iteration, other flavors to follow). > > Let me know if I missed anything. I'm guessing you guys are done for the > day so I'll wait until tomorrow night my time (Chicago) before a I move > ahead on anything, to give you a chance to respond. > > Thanks! > tg > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < > [hidden email]> wrote: > > > Hello Trevor, > > > > These are indeed a lot of issues, let's see if we can fit the discussion > > for all of them > > in one thread. > > > > I'll add some comments inline. > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > > > > We have discussed this in the past and at that point decided that it > didn't > > make > > sense to change the base SGD implementation to accommodate vectors. > > The alternatives that were presented at the time were to abstract away > > the type of the input/output in the Optimizer (allowing for both Vectors > > and Doubles), > > or to create specialized classes for each case. That also gives us > greater > > flexibility > > in terms of optimizing performance. > > > > In terms of the ANN, I think you can hide away the Vectors in the > > implementation of the ANN > > model, and use the Optimizer interface as is, like A. Ulanov did with the > > Spark > > ANN > > <https://github.com/apache/spark/pull/7621/files> > > implementation <https://github.com/apache/spark/pull/7621/files>. > > > > - Allow for 'warm starts' > > > > > > I like the idea of having a partiFit-like function, could you present a > > couple > > of use cases where we might use it? I'm wondering if savepoints already > > cover > > this functionality. > > > > - A library of model grading metrics. > > > > > > > We have a (perpetually) open PR < > https://github.com/apache/flink/pull/871> > > for an evaluation framework. Could you > > expand on "Having 'calculate RSquare' as a built in method for every > > regressor > > doesn't seem like an efficient way to do this long term." > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > This will be a good addition. If they are specific to the ANN > > implementation > > however I would hide them away from the rest of the code (and include in > > that PR > > only) until another usecase comes up. > > > > - A neural net has Arrays of matrices of weights (instead of just a > > vector). > > > > > > > Yes this is probably not the most efficient way to do this, but it's the > > "least > > API breaking" I'm afraid. > > > > - The linear regression implementation currently presumes it will be > using > > > SGD but I think that should be 'settable' as a parameter > > > > > > > The original Optimizer was written the way you described, but we changed > it > > later IIRC to make it more accessible (e.g. for users that don't know > that > > you can't match L1 regularization with L-BFGS). Maybe Till can say more > > about the other reasons this was changed. > > > > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]> > > wrote: > > > > > Hey, > > > > > > I have a working prototype of an multi layer perceptron implementation > > > working in Flink. > > > > > > I made every possible effort to utilize existing code when possible. > > > > > > In the process of doing this there were some hacks I want/need, and > think > > > this should be broken up into multiple PRs and possible abstract out > the > > > whole thing because the MLP implementation I came up with is itself > > > designed to be extendable to Long Short Term Memory Networks. > > > > > > Top level here are some of the sub PRs > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > This > > > allows the same NN code (and other algos) to be used for > classification, > > > transformations, and regressions. > > > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > > IterativeSolver that basically starts on iteration N. This is somewhat > > > akin to the idea of partial fits in sklearn OR making the iterative > > solver > > > have some sort of internal counter and then when you call 'fit' it just > > > runs another N iterations (which is set by SetIterations) instead of > > > assuming it is back to zero. This might seem trivial but has > significant > > > impact on step size calculations. > > > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > > built > > > in method for every regressor doesn't seem like an efficient way to do > > this > > > long term. > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > vector). Currently I flatten the array of matrices out into a weight > > > vector and reassemble it into an array of matrices, though this is > > probably > > > not super effecient. > > > > > > - The linear regression implementation currently presumes it will be > > using > > > SGD but I think that should be 'settable' as a parameter, because if > not- > > > why do we have all of those other nice SGD methods just hanging out? > > > Similarly the loss function / partial loss is hard coded. I reccomend > > > making the current setup the 'defaults' of a 'setOptimizer' method. > I.e. > > > if you want to just run a MLR you can do it based on the examples, but > if > > > you want to use a fancy optimizer you can create it from existing > > methods, > > > or make your own, then call something like `mlr.setOptimizer( > myOptimizer > > > )` > > > > > > - and more > > > > > > At any rate- if some people could weigh in / direct me how to proceed > > that > > > would be swell. > > > > > > Thanks! > > > tg > > > > > > > > > > > > > > > Trevor Grant > > > Data Scientist > > > https://github.com/rawkintrevo > > > http://stackexchange.com/users/3002022/rawkintrevo > > > http://trevorgrant.org > > > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > |
I was thinking that all IterativeSolvers would benefit from a setOptimizer
method. I didn't realize you had been working on GLM. If that is the case (which I think is wise) then feel free to put a setOptimizer in GLM, I'll leave it in my NeuralNetworks, and lets just try to have some consistency in the APIs... specifically- setOptimizer is a method that takes... an optimizer. We can default to whatever is most appropriate for each learning algorithm. Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Tue, Mar 29, 2016 at 3:26 PM, Theodore Vasiloudis < [hidden email]> wrote: > > Adding a setOptimizer to IterativeSolver. > > Do you mean MLR here? IterativeSolver is implemented by different solvers, > I don't think adding a method like this makes sense there. > > In the case of MLR a better alternative that includes a bit more work is to > create a Generalized Linear Model framework that provides > implementations for the most common linear models (ridge, lasso etc.) I had > already started work on this here > <https://github.com/thvasilo/flink/commits/glm>, but never got around > to opening a PR. The relevant JIRA is here > <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer > method in GeneralizedLinearModel (with some restrictions/warnings > regarding choice of optimizer and regularization) would be the preferred > option for me at least. > > Other than that the list looks fine :) > > On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <[hidden email]> > wrote: > > > OK, I'm trying to respond to you and Till in one thread so someone call > me > > out if I missed a point but here goes: > > > > SGD Predicting Vectors : There was discussion in the past regarding > this- > > at the time it was decided to go with only Doubles for simplicity. I feel > > strongly that there is cause now for predicting vectors. This should be > a > > separate PR. I'll open an issue, we can refer to earlier mailing list > and > > reopen discussion on best way to proceed > > > > Warm Starts : Basically all that needs to be done here is for the > iterative > > solver to keep track of what iteration it is on, and start from that > > iteration is WarmStart == True, then go another N iterations. I don't > > think savepoints solves this because of the way stepsizes are calculated > in > > SGD, though I don't know enough about savepoints to say for sure. As > Till > > said, and I agree, very simple fix. Use cases: Testing how new features > > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in > 1000 > > data point bursts and measure the error, see how it decreases as time > goes > > on. Also, model updates. E.g. I have a huge model that gets trained on a > > year of data and takes a day or two to do so, but after that I just want > to > > update it nightly with the data from the last 24 hours, or at the > extreme- > > online learning, e.g. every new data point updates the model. > > > > Model Grading Metrics: I'll chime in on the PR you mentioned. > > > > Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies > > into vectors it best done inside of methods that need such functionality > > seems to be the concensus. I'm ok with that, as I have such things > working > > rather elegantly, but wanted to throw it out there anyway. > > > > BLAS ops for matrices: I'll take care of this in my code. > > > > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred > to > > Till, Till said open a PR. I'll make the default SimpleSGD to maintain > > backwards compatibility > > > > New issues to create: > > [ ] Optimizer to predict vectors or Doubles and maintain backwards > > compatibility. > > [ ] Warm Start Functionality > > [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. > > [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first > > iteration, other flavors to follow). > > > > Let me know if I missed anything. I'm guessing you guys are done for the > > day so I'll wait until tomorrow night my time (Chicago) before a I move > > ahead on anything, to give you a chance to respond. > > > > Thanks! > > tg > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < > > [hidden email]> wrote: > > > > > Hello Trevor, > > > > > > These are indeed a lot of issues, let's see if we can fit the > discussion > > > for all of them > > > in one thread. > > > > > > I'll add some comments inline. > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > > > > > > > We have discussed this in the past and at that point decided that it > > didn't > > > make > > > sense to change the base SGD implementation to accommodate vectors. > > > The alternatives that were presented at the time were to abstract away > > > the type of the input/output in the Optimizer (allowing for both > Vectors > > > and Doubles), > > > or to create specialized classes for each case. That also gives us > > greater > > > flexibility > > > in terms of optimizing performance. > > > > > > In terms of the ANN, I think you can hide away the Vectors in the > > > implementation of the ANN > > > model, and use the Optimizer interface as is, like A. Ulanov did with > the > > > Spark > > > ANN > > > <https://github.com/apache/spark/pull/7621/files> > > > implementation <https://github.com/apache/spark/pull/7621/files>. > > > > > > - Allow for 'warm starts' > > > > > > > > > I like the idea of having a partiFit-like function, could you present a > > > couple > > > of use cases where we might use it? I'm wondering if savepoints already > > > cover > > > this functionality. > > > > > > - A library of model grading metrics. > > > > > > > > > > We have a (perpetually) open PR < > > https://github.com/apache/flink/pull/871> > > > for an evaluation framework. Could you > > > expand on "Having 'calculate RSquare' as a built in method for every > > > regressor > > > doesn't seem like an efficient way to do this long term." > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > > > > This will be a good addition. If they are specific to the ANN > > > implementation > > > however I would hide them away from the rest of the code (and include > in > > > that PR > > > only) until another usecase comes up. > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > vector). > > > > > > > > > > Yes this is probably not the most efficient way to do this, but it's > the > > > "least > > > API breaking" I'm afraid. > > > > > > - The linear regression implementation currently presumes it will be > > using > > > > SGD but I think that should be 'settable' as a parameter > > > > > > > > > > The original Optimizer was written the way you described, but we > changed > > it > > > later IIRC to make it more accessible (e.g. for users that don't know > > that > > > you can't match L1 regularization with L-BFGS). Maybe Till can say more > > > about the other reasons this was changed. > > > > > > > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant < > [hidden email]> > > > wrote: > > > > > > > Hey, > > > > > > > > I have a working prototype of an multi layer perceptron > implementation > > > > working in Flink. > > > > > > > > I made every possible effort to utilize existing code when possible. > > > > > > > > In the process of doing this there were some hacks I want/need, and > > think > > > > this should be broken up into multiple PRs and possible abstract out > > the > > > > whole thing because the MLP implementation I came up with is itself > > > > designed to be extendable to Long Short Term Memory Networks. > > > > > > > > Top level here are some of the sub PRs > > > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > This > > > > allows the same NN code (and other algos) to be used for > > classification, > > > > transformations, and regressions. > > > > > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > > > IterativeSolver that basically starts on iteration N. This is > somewhat > > > > akin to the idea of partial fits in sklearn OR making the iterative > > > solver > > > > have some sort of internal counter and then when you call 'fit' it > just > > > > runs another N iterations (which is set by SetIterations) instead of > > > > assuming it is back to zero. This might seem trivial but has > > significant > > > > impact on step size calculations. > > > > > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > > > built > > > > in method for every regressor doesn't seem like an efficient way to > do > > > this > > > > long term. > > > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > > vector). Currently I flatten the array of matrices out into a weight > > > > vector and reassemble it into an array of matrices, though this is > > > probably > > > > not super effecient. > > > > > > > > - The linear regression implementation currently presumes it will be > > > using > > > > SGD but I think that should be 'settable' as a parameter, because if > > not- > > > > why do we have all of those other nice SGD methods just hanging out? > > > > Similarly the loss function / partial loss is hard coded. I > reccomend > > > > making the current setup the 'defaults' of a 'setOptimizer' method. > > I.e. > > > > if you want to just run a MLR you can do it based on the examples, > but > > if > > > > you want to use a fancy optimizer you can create it from existing > > > methods, > > > > or make your own, then call something like `mlr.setOptimizer( > > myOptimizer > > > > )` > > > > > > > > - and more > > > > > > > > At any rate- if some people could weigh in / direct me how to proceed > > > that > > > > would be swell. > > > > > > > > Thanks! > > > > tg > > > > > > > > > > > > > > > > > > > > Trevor Grant > > > > Data Scientist > > > > https://github.com/rawkintrevo > > > > http://stackexchange.com/users/3002022/rawkintrevo > > > > http://trevorgrant.org > > > > > > > > *"Fortunate is he, who is able to know the causes of things." > -Virgil* > > > > > > > > > > |
Free forum by Nabble | Edit this page |