(DEPRECATED) Apache Flink Mailing List archive.

A whole bag of ML issues

Classic

List

Threaded

6 messages Options

Trevor Grant

A whole bag of ML issues

Hey,

I have a working prototype of an multi layer perceptron implementation
working in Flink.

I made every possible effort to utilize existing code when possible.

In the process of doing this there were some hacks I want/need, and think
this should be broken up into multiple PRs and possible abstract out the
whole thing because the MLP implementation I came up with is itself
designed to be extendable to Long Short Term Memory Networks.

Top level here are some of the sub PRs

- Expand SGD to allow for predicting vectors instead of just Doubles. This
allows the same NN code (and other algos) to be used for classification,
transformations, and regressions.

- Allow for 'warm starts' -> this requires adding a parameter to
IterativeSolver that basically starts on iteration N. This is somewhat
akin to the idea of partial fits in sklearn OR making the iterative solver
have some sort of internal counter and then when you call 'fit' it just
runs another N iterations (which is set by SetIterations) instead of
assuming it is back to zero. This might seem trivial but has significant
impact on step size calculations.

- A library of model grading metrics. Having 'calculate RSquare' as a built
in method for every regressor doesn't seem like an efficient way to do this
long term.

-BLAS for matrix ops (this was talked about earlier)

- A neural net has Arrays of matrices of weights (instead of just a
vector). Currently I flatten the array of matrices out into a weight
vector and reassemble it into an array of matrices, though this is probably
not super effecient.

- The linear regression implementation currently presumes it will be using
SGD but I think that should be 'settable' as a parameter, because if not-
why do we have all of those other nice SGD methods just hanging out?
Similarly the loss function / partial loss is hard coded. I reccomend
making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
if you want to just run a MLR you can do it based on the examples, but if
you want to use a fancy optimizer you can create it from existing methods,
or make your own, then call something like `mlr.setOptimizer( myOptimizer )`

- and more

At any rate- if some people could weigh in / direct me how to proceed that
would be swell.

Thanks!
tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

Theodore Vasiloudis

Re: A whole bag of ML issues

Hello Trevor,

These are indeed a lot of issues, let's see if we can fit the discussion
for all of them
in one thread.

I'll add some comments inline.

- Expand SGD to allow for predicting vectors instead of just Doubles.

We have discussed this in the past and at that point decided that it didn't
make
sense to change the base SGD implementation to accommodate vectors.
The alternatives that were presented at the time were to abstract away
the type of the input/output in the Optimizer (allowing for both Vectors
and Doubles),
or to create specialized classes for each case. That also gives us greater
flexibility
in terms of optimizing performance.

In terms of the ANN, I think you can hide away the Vectors in the
implementation of the ANN
model, and use the Optimizer interface as is, like A. Ulanov did with the Spark
ANN
<https://github.com/apache/spark/pull/7621/files>
implementation <https://github.com/apache/spark/pull/7621/files>.

- Allow for 'warm starts'

I like the idea of having a partiFit-like function, could you present a
couple
of use cases where we might use it? I'm wondering if savepoints already
cover
this functionality.

- A library of model grading metrics.
>

We have a (perpetually) open PR <https://github.com/apache/flink/pull/871>
for an evaluation framework. Could you
expand on "Having 'calculate RSquare' as a built in method for every
regressor
doesn't seem like an efficient way to do this long term."

-BLAS for matrix ops (this was talked about earlier)

This will be a good addition. If they are specific to the ANN implementation
however I would hide them away from the rest of the code (and include in
that PR
only) until another usecase comes up.

- A neural net has Arrays of matrices of weights (instead of just a vector).
>

Yes this is probably not the most efficient way to do this, but it's the
"least
API breaking" I'm afraid.

- The linear regression implementation currently presumes it will be using
> SGD but I think that should be 'settable' as a parameter
>

The original Optimizer was written the way you described, but we changed it
later IIRC to make it more accessible (e.g. for users that don't know that
you can't match L1 regularization with L-BFGS). Maybe Till can say more
about the other reasons this was changed.

On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]>
wrote:

> Hey,
>
> I have a working prototype of an multi layer perceptron implementation
> working in Flink.
>
> I made every possible effort to utilize existing code when possible.
>
> In the process of doing this there were some hacks I want/need, and think
> this should be broken up into multiple PRs and possible abstract out the
> whole thing because the MLP implementation I came up with is itself
> designed to be extendable to Long Short Term Memory Networks.
>
> Top level here are some of the sub PRs
>
> - Expand SGD to allow for predicting vectors instead of just Doubles. This
> allows the same NN code (and other algos) to be used for classification,
> transformations, and regressions.
>
> - Allow for 'warm starts' -> this requires adding a parameter to
> IterativeSolver that basically starts on iteration N. This is somewhat
> akin to the idea of partial fits in sklearn OR making the iterative solver
> have some sort of internal counter and then when you call 'fit' it just
> runs another N iterations (which is set by SetIterations) instead of
> assuming it is back to zero. This might seem trivial but has significant
> impact on step size calculations.
>
> - A library of model grading metrics. Having 'calculate RSquare' as a built
> in method for every regressor doesn't seem like an efficient way to do this
> long term.
>
> -BLAS for matrix ops (this was talked about earlier)
>
> - A neural net has Arrays of matrices of weights (instead of just a
> vector). Currently I flatten the array of matrices out into a weight
> vector and reassemble it into an array of matrices, though this is probably
> not super effecient.
>
> - The linear regression implementation currently presumes it will be using
> SGD but I think that should be 'settable' as a parameter, because if not-
> why do we have all of those other nice SGD methods just hanging out?
> Similarly the loss function / partial loss is hard coded. I reccomend
> making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
> if you want to just run a MLR you can do it based on the examples, but if
> you want to use a fancy optimizer you can create it from existing methods,
> or make your own, then call something like `mlr.setOptimizer( myOptimizer
> )`
>
> - and more
>
> At any rate- if some people could weigh in / direct me how to proceed that
> would be swell.
>
> Thanks!
> tg
>
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things." -Virgil*
>

Till Rohrmann

Re: A whole bag of ML issues

In reply to this post by Trevor Grant

Hi Trevor,

great to hear that you have a working prototype :-) And it is also good
that you shared your insights you gained when implementing it. Flink’s ML
library is far from perfect and, thus, all kinds of feedback is highly
valuable. In general it is always good to contribute code back if you think
they make a valuable addition. I try to give some comments to your points
as inline comments.

On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]>
wrote:

Hey,

>
> I have a working prototype of an multi layer perceptron implementation
> working in Flink.
>
> I made every possible effort to utilize existing code when possible.
>
> In the process of doing this there were some hacks I want/need, and think
> this should be broken up into multiple PRs and possible abstract out the
> whole thing because the MLP implementation I came up with is itself
> designed to be extendable to Long Short Term Memory Networks.

> Top level here are some of the sub PRs
>
> - Expand SGD to allow for predicting vectors instead of just Doubles. This
> allows the same NN code (and other algos) to be used for classification,
> transformations, and regressions.
>
I agree that we could extend the LabeledVector to support a Vector[Double]
as label instead of a single Double. Initially we implemented it with a
single label value for the sake of simplicity. But I remember that we also
had a discussion about it. But somehow we didn’t derive any action points
from that. If you have code for that, then feel free to open a PR.

> - Allow for 'warm starts' -> this requires adding a parameter to
> IterativeSolver that basically starts on iteration N. This is somewhat
> akin to the idea of partial fits in sklearn OR making the iterative solver
> have some sort of internal counter and then when you call 'fit' it just
> runs another N iterations (which is set by SetIterations) instead of
> assuming it is back to zero. This might seem trivial but has significant
> impact on step size calculations.
>
That is a good point and should not be too hard to add, I would assume.

> - A library of model grading metrics. Having 'calculate RSquare' as a built
> in method for every regressor doesn't seem like an efficient way to do this
> long term.
>
Agreed. The squaredResidualSum method of MLR is just a convenience method
to get at least one metric for the accuracy back. There is a PR open by
Theo which adds an evaluation framework to flinkML [1]. If I’m not mistaken
it should add some more generalized mean to calculate grading metrics.

> -BLAS for matrix ops (this was talked about earlier)
>
Here the recommended way is to convert your matrix to a BreezeMatrix and
then use the BLAS operation from there.

> - A neural net has Arrays of matrices of weights (instead of just a
> vector). Currently I flatten the array of matrices out into a weight
> vector and reassemble it into an array of matrices, though this is probably
> not super effecient.
>
I would assume that you should simply operate on the flattened vector
without converting from one representation to the other.

> - The linear regression implementation currently presumes it will be using
> SGD but I think that should be 'settable' as a parameter, because if not-
> why do we have all of those other nice SGD methods just hanging out?
> Similarly the loss function / partial loss is hard coded. I reccomend
> making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
> if you want to just run a MLR you can do it based on the examples, but if
> you want to use a fancy optimizer you can create it from existing methods,
> or make your own, then call something like `mlr.setOptimizer( myOptimizer
> )`
>
> Agreed. That was actually also our plan but we haven’t come to that so

far. If you have the code available, then please open a PR for it.

- and more

>
> At any rate- if some people could weigh in / direct me how to proceed that
> would be swell.
>
> Thanks!
> tg
>
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things." -Virgil*
>

[1] https://github.com/apache/flink/pull/871

Cheers,
Till

Trevor Grant

Re: A whole bag of ML issues

In reply to this post by Theodore Vasiloudis

OK, I'm trying to respond to you and Till in one thread so someone call me
out if I missed a point but here goes:

SGD Predicting Vectors : There was discussion in the past regarding this-
at the time it was decided to go with only Doubles for simplicity. I feel
strongly that there is cause now for predicting vectors. This should be a
separate PR. I'll open an issue, we can refer to earlier mailing list and
reopen discussion on best way to proceed

Warm Starts : Basically all that needs to be done here is for the iterative
solver to keep track of what iteration it is on, and start from that
iteration is WarmStart == True, then go another N iterations. I don't
think savepoints solves this because of the way stepsizes are calculated in
SGD, though I don't know enough about savepoints to say for sure. As Till
said, and I agree, very simple fix. Use cases: Testing how new features
(e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000
data point bursts and measure the error, see how it decreases as time goes
on. Also, model updates. E.g. I have a huge model that gets trained on a
year of data and takes a day or two to do so, but after that I just want to
update it nightly with the data from the last 24 hours, or at the extreme-
online learning, e.g. every new data point updates the model.

Model Grading Metrics: I'll chime in on the PR you mentioned.

Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies
into vectors it best done inside of methods that need such functionality
seems to be the concensus. I'm ok with that, as I have such things working
rather elegantly, but wanted to throw it out there anyway.

BLAS ops for matrices: I'll take care of this in my code.

adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to
Till, Till said open a PR. I'll make the default SimpleSGD to maintain
backwards compatibility

New issues to create:
[ ] Optimizer to predict vectors or Doubles and maintain backwards
compatibility.
[ ] Warm Start Functionality
[ ] setOptimizer to Iterative Solver, with default to SimpleSGD.
[ ] Add neuralnets package to FlinkML (Multilayer perceptron is first
iteration, other flavors to follow).

Let me know if I missed anything. I'm guessing you guys are done for the
day so I'll wait until tomorrow night my time (Chicago) before a I move
ahead on anything, to give you a chance to respond.

Thanks!
tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
[hidden email]> wrote:

> Hello Trevor,
>
> These are indeed a lot of issues, let's see if we can fit the discussion
> for all of them
> in one thread.
>
> I'll add some comments inline.
>
> - Expand SGD to allow for predicting vectors instead of just Doubles.
>
>
> We have discussed this in the past and at that point decided that it didn't
> make
> sense to change the base SGD implementation to accommodate vectors.
> The alternatives that were presented at the time were to abstract away
> the type of the input/output in the Optimizer (allowing for both Vectors
> and Doubles),
> or to create specialized classes for each case. That also gives us greater
> flexibility
> in terms of optimizing performance.
>
> In terms of the ANN, I think you can hide away the Vectors in the
> implementation of the ANN
> model, and use the Optimizer interface as is, like A. Ulanov did with the
> Spark
> ANN
> <https://github.com/apache/spark/pull/7621/files>
> implementation <https://github.com/apache/spark/pull/7621/files>.
>
> - Allow for 'warm starts'
>
>
> I like the idea of having a partiFit-like function, could you present a
> couple
> of use cases where we might use it? I'm wondering if savepoints already
> cover
> this functionality.
>
> - A library of model grading metrics.
> >
>
> We have a (perpetually) open PR <https://github.com/apache/flink/pull/871>
> for an evaluation framework. Could you
> expand on "Having 'calculate RSquare' as a built in method for every
> regressor
> doesn't seem like an efficient way to do this long term."
>
> -BLAS for matrix ops (this was talked about earlier)
>
>
> This will be a good addition. If they are specific to the ANN
> implementation
> however I would hide them away from the rest of the code (and include in
> that PR
> only) until another usecase comes up.
>
> - A neural net has Arrays of matrices of weights (instead of just a
> vector).
> >
>
> Yes this is probably not the most efficient way to do this, but it's the
> "least
> API breaking" I'm afraid.
>
> - The linear regression implementation currently presumes it will be using
> > SGD but I think that should be 'settable' as a parameter
> >
>
> The original Optimizer was written the way you described, but we changed it
> later IIRC to make it more accessible (e.g. for users that don't know that
> you can't match L1 regularization with L-BFGS). Maybe Till can say more
> about the other reasons this was changed.
>
>
> On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]>
> wrote:
>
> > Hey,
> >
> > I have a working prototype of an multi layer perceptron implementation
> > working in Flink.
> >
> > I made every possible effort to utilize existing code when possible.
> >
> > In the process of doing this there were some hacks I want/need, and think
> > this should be broken up into multiple PRs and possible abstract out the
> > whole thing because the MLP implementation I came up with is itself
> > designed to be extendable to Long Short Term Memory Networks.
> >
> > Top level here are some of the sub PRs
> >
> > - Expand SGD to allow for predicting vectors instead of just Doubles.
> This
> > allows the same NN code (and other algos) to be used for classification,
> > transformations, and regressions.
> >
> > - Allow for 'warm starts' -> this requires adding a parameter to
> > IterativeSolver that basically starts on iteration N. This is somewhat
> > akin to the idea of partial fits in sklearn OR making the iterative
> solver
> > have some sort of internal counter and then when you call 'fit' it just
> > runs another N iterations (which is set by SetIterations) instead of
> > assuming it is back to zero. This might seem trivial but has significant
> > impact on step size calculations.
> >
> > - A library of model grading metrics. Having 'calculate RSquare' as a
> built
> > in method for every regressor doesn't seem like an efficient way to do
> this
> > long term.
> >
> > -BLAS for matrix ops (this was talked about earlier)
> >
> > - A neural net has Arrays of matrices of weights (instead of just a
> > vector). Currently I flatten the array of matrices out into a weight
> > vector and reassemble it into an array of matrices, though this is
> probably
> > not super effecient.
> >
> > - The linear regression implementation currently presumes it will be
> using
> > SGD but I think that should be 'settable' as a parameter, because if not-
> > why do we have all of those other nice SGD methods just hanging out?
> > Similarly the loss function / partial loss is hard coded. I reccomend
> > making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
> > if you want to just run a MLR you can do it based on the examples, but if
> > you want to use a fancy optimizer you can create it from existing
> methods,
> > or make your own, then call something like `mlr.setOptimizer( myOptimizer
> > )`
> >
> > - and more
> >
> > At any rate- if some people could weigh in / direct me how to proceed
> that
> > would be swell.
> >
> > Thanks!
> > tg
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things." -Virgil*
> >
>

Theodore Vasiloudis

Re: A whole bag of ML issues

> Adding a setOptimizer to IterativeSolver.

Do you mean MLR here? IterativeSolver is implemented by different solvers,
I don't think adding a method like this makes sense there.

In the case of MLR a better alternative that includes a bit more work is to
create a Generalized Linear Model framework that provides
implementations for the most common linear models (ridge, lasso etc.) I had
already started work on this here
<https://github.com/thvasilo/flink/commits/glm>, but never got around
to opening a PR. The relevant JIRA is here
<https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer
method in GeneralizedLinearModel (with some restrictions/warnings
regarding choice of optimizer and regularization) would be the preferred
option for me at least.

Other than that the list looks fine :)

On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <[hidden email]>
wrote:

> OK, I'm trying to respond to you and Till in one thread so someone call me
> out if I missed a point but here goes:
>
> SGD Predicting Vectors : There was discussion in the past regarding this-
> at the time it was decided to go with only Doubles for simplicity. I feel
> strongly that there is cause now for predicting vectors. This should be a
> separate PR. I'll open an issue, we can refer to earlier mailing list and
> reopen discussion on best way to proceed
>
> Warm Starts : Basically all that needs to be done here is for the iterative
> solver to keep track of what iteration it is on, and start from that
> iteration is WarmStart == True, then go another N iterations. I don't
> think savepoints solves this because of the way stepsizes are calculated in
> SGD, though I don't know enough about savepoints to say for sure. As Till
> said, and I agree, very simple fix. Use cases: Testing how new features
> (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000
> data point bursts and measure the error, see how it decreases as time goes
> on. Also, model updates. E.g. I have a huge model that gets trained on a
> year of data and takes a day or two to do so, but after that I just want to
> update it nightly with the data from the last 24 hours, or at the extreme-
> online learning, e.g. every new data point updates the model.
>
> Model Grading Metrics: I'll chime in on the PR you mentioned.
>
> Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies
> into vectors it best done inside of methods that need such functionality
> seems to be the concensus. I'm ok with that, as I have such things working
> rather elegantly, but wanted to throw it out there anyway.
>
> BLAS ops for matrices: I'll take care of this in my code.
>
> adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to
> Till, Till said open a PR. I'll make the default SimpleSGD to maintain
> backwards compatibility
>
> New issues to create:
> [ ] Optimizer to predict vectors or Doubles and maintain backwards
> compatibility.
> [ ] Warm Start Functionality
> [ ] setOptimizer to Iterative Solver, with default to SimpleSGD.
> [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first
> iteration, other flavors to follow).
>
> Let me know if I missed anything. I'm guessing you guys are done for the
> day so I'll wait until tomorrow night my time (Chicago) before a I move
> ahead on anything, to give you a chance to respond.
>
> Thanks!
> tg
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things." -Virgil*
>
>
> On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
> [hidden email]> wrote:
>
> > Hello Trevor,
> >
> > These are indeed a lot of issues, let's see if we can fit the discussion
> > for all of them
> > in one thread.
> >
> > I'll add some comments inline.
> >
> > - Expand SGD to allow for predicting vectors instead of just Doubles.
> >
> >
> > We have discussed this in the past and at that point decided that it
> didn't
> > make
> > sense to change the base SGD implementation to accommodate vectors.
> > The alternatives that were presented at the time were to abstract away
> > the type of the input/output in the Optimizer (allowing for both Vectors
> > and Doubles),
> > or to create specialized classes for each case. That also gives us
> greater
> > flexibility
> > in terms of optimizing performance.
> >
> > In terms of the ANN, I think you can hide away the Vectors in the
> > implementation of the ANN
> > model, and use the Optimizer interface as is, like A. Ulanov did with the
> > Spark
> > ANN
> > <https://github.com/apache/spark/pull/7621/files>
> > implementation <https://github.com/apache/spark/pull/7621/files>.
> >
> > - Allow for 'warm starts'
> >
> >
> > I like the idea of having a partiFit-like function, could you present a
> > couple
> > of use cases where we might use it? I'm wondering if savepoints already
> > cover
> > this functionality.
> >
> > - A library of model grading metrics.
> > >
> >
> > We have a (perpetually) open PR <
> https://github.com/apache/flink/pull/871>
> > for an evaluation framework. Could you
> > expand on "Having 'calculate RSquare' as a built in method for every
> > regressor
> > doesn't seem like an efficient way to do this long term."
> >
> > -BLAS for matrix ops (this was talked about earlier)
> >
> >
> > This will be a good addition. If they are specific to the ANN
> > implementation
> > however I would hide them away from the rest of the code (and include in
> > that PR
> > only) until another usecase comes up.
> >
> > - A neural net has Arrays of matrices of weights (instead of just a
> > vector).
> > >
> >
> > Yes this is probably not the most efficient way to do this, but it's the
> > "least
> > API breaking" I'm afraid.
> >
> > - The linear regression implementation currently presumes it will be
> using
> > > SGD but I think that should be 'settable' as a parameter
> > >
> >
> > The original Optimizer was written the way you described, but we changed
> it
> > later IIRC to make it more accessible (e.g. for users that don't know
> that
> > you can't match L1 regularization with L-BFGS). Maybe Till can say more
> > about the other reasons this was changed.
> >
> >
> > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <[hidden email]>
> > wrote:
> >
> > > Hey,
> > >
> > > I have a working prototype of an multi layer perceptron implementation
> > > working in Flink.
> > >
> > > I made every possible effort to utilize existing code when possible.
> > >
> > > In the process of doing this there were some hacks I want/need, and
> think
> > > this should be broken up into multiple PRs and possible abstract out
> the
> > > whole thing because the MLP implementation I came up with is itself
> > > designed to be extendable to Long Short Term Memory Networks.
> > >
> > > Top level here are some of the sub PRs
> > >
> > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > This
> > > allows the same NN code (and other algos) to be used for
> classification,
> > > transformations, and regressions.
> > >
> > > - Allow for 'warm starts' -> this requires adding a parameter to
> > > IterativeSolver that basically starts on iteration N. This is somewhat
> > > akin to the idea of partial fits in sklearn OR making the iterative
> > solver
> > > have some sort of internal counter and then when you call 'fit' it just
> > > runs another N iterations (which is set by SetIterations) instead of
> > > assuming it is back to zero. This might seem trivial but has
> significant
> > > impact on step size calculations.
> > >
> > > - A library of model grading metrics. Having 'calculate RSquare' as a
> > built
> > > in method for every regressor doesn't seem like an efficient way to do
> > this
> > > long term.
> > >
> > > -BLAS for matrix ops (this was talked about earlier)
> > >
> > > - A neural net has Arrays of matrices of weights (instead of just a
> > > vector). Currently I flatten the array of matrices out into a weight
> > > vector and reassemble it into an array of matrices, though this is
> > probably
> > > not super effecient.
> > >
> > > - The linear regression implementation currently presumes it will be
> > using
> > > SGD but I think that should be 'settable' as a parameter, because if
> not-
> > > why do we have all of those other nice SGD methods just hanging out?
> > > Similarly the loss function / partial loss is hard coded. I reccomend
> > > making the current setup the 'defaults' of a 'setOptimizer' method.
> I.e.
> > > if you want to just run a MLR you can do it based on the examples, but
> if
> > > you want to use a fancy optimizer you can create it from existing
> > methods,
> > > or make your own, then call something like `mlr.setOptimizer(
> myOptimizer
> > > )`
> > >
> > > - and more
> > >
> > > At any rate- if some people could weigh in / direct me how to proceed
> > that
> > > would be swell.
> > >
> > > Thanks!
> > > tg
> > >
> > >
> > >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> > > http://stackexchange.com/users/3002022/rawkintrevo
> > > http://trevorgrant.org
> > >
> > > *"Fortunate is he, who is able to know the causes of things." -Virgil*
> > >
> >
>

Trevor Grant

Re: A whole bag of ML issues

I was thinking that all IterativeSolvers would benefit from a setOptimizer
method. I didn't realize you had been working on GLM. If that is the case
(which I think is wise) then feel free to put a setOptimizer in GLM, I'll
leave it in my NeuralNetworks, and lets just try to have some consistency
in the APIs... specifically- setOptimizer is a method that takes... an
optimizer. We can default to whatever is most appropriate for each
learning algorithm.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

On Tue, Mar 29, 2016 at 3:26 PM, Theodore Vasiloudis <
[hidden email]> wrote:

> > Adding a setOptimizer to IterativeSolver.
>
> Do you mean MLR here? IterativeSolver is implemented by different solvers,
> I don't think adding a method like this makes sense there.
>
> In the case of MLR a better alternative that includes a bit more work is to
> create a Generalized Linear Model framework that provides
> implementations for the most common linear models (ridge, lasso etc.) I had
> already started work on this here
> <https://github.com/thvasilo/flink/commits/glm>, but never got around
> to opening a PR. The relevant JIRA is here
> <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer
> method in GeneralizedLinearModel (with some restrictions/warnings
> regarding choice of optimizer and regularization) would be the preferred
> option for me at least.
>
> Other than that the list looks fine :)
>
> On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <[hidden email]>
> wrote:
>
> > OK, I'm trying to respond to you and Till in one thread so someone call
> me
> > out if I missed a point but here goes:
> >
> > SGD Predicting Vectors : There was discussion in the past regarding
> this-
> > at the time it was decided to go with only Doubles for simplicity. I feel
> > strongly that there is cause now for predicting vectors. This should be
> a
> > separate PR. I'll open an issue, we can refer to earlier mailing list
> and
> > reopen discussion on best way to proceed
> >
> > Warm Starts : Basically all that needs to be done here is for the
> iterative
> > solver to keep track of what iteration it is on, and start from that
> > iteration is WarmStart == True, then go another N iterations. I don't
> > think savepoints solves this because of the way stepsizes are calculated
> in
> > SGD, though I don't know enough about savepoints to say for sure. As
> Till
> > said, and I agree, very simple fix. Use cases: Testing how new features
> > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in
> 1000
> > data point bursts and measure the error, see how it decreases as time
> goes
> > on. Also, model updates. E.g. I have a huge model that gets trained on a
> > year of data and takes a day or two to do so, but after that I just want
> to
> > update it nightly with the data from the last 24 hours, or at the
> extreme-
> > online learning, e.g. every new data point updates the model.
> >
> > Model Grading Metrics: I'll chime in on the PR you mentioned.
> >
> > Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies
> > into vectors it best done inside of methods that need such functionality
> > seems to be the concensus. I'm ok with that, as I have such things
> working
> > rather elegantly, but wanted to throw it out there anyway.
> >
> > BLAS ops for matrices: I'll take care of this in my code.
> >
> > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred
> to
> > Till, Till said open a PR. I'll make the default SimpleSGD to maintain
> > backwards compatibility
> >
> > New issues to create:
> > [ ] Optimizer to predict vectors or Doubles and maintain backwards
> > compatibility.
> > [ ] Warm Start Functionality
> > [ ] setOptimizer to Iterative Solver, with default to SimpleSGD.
> > [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first
> > iteration, other flavors to follow).
> >
> > Let me know if I missed anything. I'm guessing you guys are done for the
> > day so I'll wait until tomorrow night my time (Chicago) before a I move
> > ahead on anything, to give you a chance to respond.
> >
> > Thanks!
> > tg
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things." -Virgil*
> >
> >
> > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
> > [hidden email]> wrote:
> >
> > > Hello Trevor,
> > >
> > > These are indeed a lot of issues, let's see if we can fit the
> discussion
> > > for all of them
> > > in one thread.
> > >
> > > I'll add some comments inline.
> > >
> > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > >
> > >
> > > We have discussed this in the past and at that point decided that it
> > didn't
> > > make
> > > sense to change the base SGD implementation to accommodate vectors.
> > > The alternatives that were presented at the time were to abstract away
> > > the type of the input/output in the Optimizer (allowing for both
> Vectors
> > > and Doubles),
> > > or to create specialized classes for each case. That also gives us
> > greater
> > > flexibility
> > > in terms of optimizing performance.
> > >
> > > In terms of the ANN, I think you can hide away the Vectors in the
> > > implementation of the ANN
> > > model, and use the Optimizer interface as is, like A. Ulanov did with
> the
> > > Spark
> > > ANN
> > > <https://github.com/apache/spark/pull/7621/files>
> > > implementation <https://github.com/apache/spark/pull/7621/files>.
> > >
> > > - Allow for 'warm starts'
> > >
> > >
> > > I like the idea of having a partiFit-like function, could you present a
> > > couple
> > > of use cases where we might use it? I'm wondering if savepoints already
> > > cover
> > > this functionality.
> > >
> > > - A library of model grading metrics.
> > > >
> > >
> > > We have a (perpetually) open PR <
> > https://github.com/apache/flink/pull/871>
> > > for an evaluation framework. Could you
> > > expand on "Having 'calculate RSquare' as a built in method for every
> > > regressor
> > > doesn't seem like an efficient way to do this long term."
> > >
> > > -BLAS for matrix ops (this was talked about earlier)
> > >
> > >
> > > This will be a good addition. If they are specific to the ANN
> > > implementation
> > > however I would hide them away from the rest of the code (and include
> in
> > > that PR
> > > only) until another usecase comes up.
> > >
> > > - A neural net has Arrays of matrices of weights (instead of just a
> > > vector).
> > > >
> > >
> > > Yes this is probably not the most efficient way to do this, but it's
> the
> > > "least
> > > API breaking" I'm afraid.
> > >
> > > - The linear regression implementation currently presumes it will be
> > using
> > > > SGD but I think that should be 'settable' as a parameter
> > > >
> > >
> > > The original Optimizer was written the way you described, but we
> changed
> > it
> > > later IIRC to make it more accessible (e.g. for users that don't know
> > that
> > > you can't match L1 regularization with L-BFGS). Maybe Till can say more
> > > about the other reasons this was changed.
> > >
> > >
> > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <
> [hidden email]>
> > > wrote:
> > >
> > > > Hey,
> > > >
> > > > I have a working prototype of an multi layer perceptron
> implementation
> > > > working in Flink.
> > > >
> > > > I made every possible effort to utilize existing code when possible.
> > > >
> > > > In the process of doing this there were some hacks I want/need, and
> > think
> > > > this should be broken up into multiple PRs and possible abstract out
> > the
> > > > whole thing because the MLP implementation I came up with is itself
> > > > designed to be extendable to Long Short Term Memory Networks.
> > > >
> > > > Top level here are some of the sub PRs
> > > >
> > > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > > This
> > > > allows the same NN code (and other algos) to be used for
> > classification,
> > > > transformations, and regressions.
> > > >
> > > > - Allow for 'warm starts' -> this requires adding a parameter to
> > > > IterativeSolver that basically starts on iteration N. This is
> somewhat
> > > > akin to the idea of partial fits in sklearn OR making the iterative
> > > solver
> > > > have some sort of internal counter and then when you call 'fit' it
> just
> > > > runs another N iterations (which is set by SetIterations) instead of
> > > > assuming it is back to zero. This might seem trivial but has
> > significant
> > > > impact on step size calculations.
> > > >
> > > > - A library of model grading metrics. Having 'calculate RSquare' as a
> > > built
> > > > in method for every regressor doesn't seem like an efficient way to
> do
> > > this
> > > > long term.
> > > >
> > > > -BLAS for matrix ops (this was talked about earlier)
> > > >
> > > > - A neural net has Arrays of matrices of weights (instead of just a
> > > > vector). Currently I flatten the array of matrices out into a weight
> > > > vector and reassemble it into an array of matrices, though this is
> > > probably
> > > > not super effecient.
> > > >
> > > > - The linear regression implementation currently presumes it will be
> > > using
> > > > SGD but I think that should be 'settable' as a parameter, because if
> > not-
> > > > why do we have all of those other nice SGD methods just hanging out?
> > > > Similarly the loss function / partial loss is hard coded. I
> reccomend
> > > > making the current setup the 'defaults' of a 'setOptimizer' method.
> > I.e.
> > > > if you want to just run a MLR you can do it based on the examples,
> but
> > if
> > > > you want to use a fancy optimizer you can create it from existing
> > > methods,
> > > > or make your own, then call something like `mlr.setOptimizer(
> > myOptimizer
> > > > )`
> > > >
> > > > - and more
> > > >
> > > > At any rate- if some people could weigh in / direct me how to proceed
> > > that
> > > > would be swell.
> > > >
> > > > Thanks!
> > > > tg
> > > >
> > > >
> > > >
> > > >
> > > > Trevor Grant
> > > > Data Scientist
> > > > https://github.com/rawkintrevo
> > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > http://trevorgrant.org
> > > >
> > > > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > > >
> > >
> >
>