MultipleLinearRegression - Strange results

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

MultipleLinearRegression - Strange results

Felix Neutatz
Hi,

I want to use MultipleLinearRegression, but I got really strange results.
So I tested it with the housing price dataset:
http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data

And here I get negative house prices - even when I use the training set as
dataset:
LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0, 2978.0,
1369.0, 1451.0))
LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0, 4038.0,
4223.0, 4868.0))
LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0, 4351.0,
4129.0, 4617.0))
LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0, 1992.0,
2008.0, 2504.0))
LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0, 1983.0,
2300.0, 3811.0))
LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0, 1965.0,
2425.0, 3178.0))
...

and a huge squared error:
Squared error: 4.799184832395361E159

You can find my code here:
https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala

Can you help me? What did I do wrong?

Thank you for your help,
Felix
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Till Rohrmann
Since MLR uses stochastic gradient descent, you probably have to configure
the step size right. SGD is very sensitive to the right step size choice.
If the step size is too high, then the SGD algorithm does not converge. You
can find the parameter description here [1].

Cheers,
Till

[1]
http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html

On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <[hidden email]>
wrote:

> Hi,
>
> I want to use MultipleLinearRegression, but I got really strange results.
> So I tested it with the housing price dataset:
>
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
>
> And here I get negative house prices - even when I use the training set as
> dataset:
> LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0, 2978.0,
> 1369.0, 1451.0))
> LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0, 4038.0,
> 4223.0, 4868.0))
> LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0, 4351.0,
> 4129.0, 4617.0))
> LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0, 1992.0,
> 2008.0, 2504.0))
> LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0, 1983.0,
> 2300.0, 3811.0))
> LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0, 1965.0,
> 2425.0, 3178.0))
> ...
>
> and a huge squared error:
> Squared error: 4.799184832395361E159
>
> You can find my code here:
>
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
>
> Can you help me? What did I do wrong?
>
> Thank you for your help,
> Felix
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

aalexandrov
I've seen some work on adaptive learning rates in the past days.

Maybe we can think about extending the base algorithm and comparing the use
case setting for the IMPRO-3 project.

@Felix you can discuss this with the others on Wednesday, Manu will be also
there and can give some feedback, I'll try to send a link tomorrow
morning...


2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:

> Since MLR uses stochastic gradient descent, you probably have to configure
> the step size right. SGD is very sensitive to the right step size choice.
> If the step size is too high, then the SGD algorithm does not converge. You
> can find the parameter description here [1].
>
> Cheers,
> Till
>
> [1]
>
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
>
> On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I want to use MultipleLinearRegression, but I got really strange results.
> > So I tested it with the housing price dataset:
> >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> >
> > And here I get negative house prices - even when I use the training set
> as
> > dataset:
> > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0, 2978.0,
> > 1369.0, 1451.0))
> > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0, 4038.0,
> > 4223.0, 4868.0))
> > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0, 4351.0,
> > 4129.0, 4617.0))
> > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0, 1992.0,
> > 2008.0, 2504.0))
> > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0, 1983.0,
> > 2300.0, 3811.0))
> > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0, 1965.0,
> > 2425.0, 3178.0))
> > ...
> >
> > and a huge squared error:
> > Squared error: 4.799184832395361E159
> >
> > You can find my code here:
> >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> >
> > Can you help me? What did I do wrong?
> >
> > Thank you for your help,
> > Felix
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Sachin Goel
You can set the learning rate to be 1/sqrt(iteration number). This usually
works.

Regards
Sachin Goel

On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
[hidden email]> wrote:

> I've seen some work on adaptive learning rates in the past days.
>
> Maybe we can think about extending the base algorithm and comparing the use
> case setting for the IMPRO-3 project.
>
> @Felix you can discuss this with the others on Wednesday, Manu will be also
> there and can give some feedback, I'll try to send a link tomorrow
> morning...
>
>
> 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
>
> > Since MLR uses stochastic gradient descent, you probably have to
> configure
> > the step size right. SGD is very sensitive to the right step size choice.
> > If the step size is too high, then the SGD algorithm does not converge.
> You
> > can find the parameter description here [1].
> >
> > Cheers,
> > Till
> >
> > [1]
> >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> >
> > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <[hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > I want to use MultipleLinearRegression, but I got really strange
> results.
> > > So I tested it with the housing price dataset:
> > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > >
> > > And here I get negative house prices - even when I use the training set
> > as
> > > dataset:
> > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0,
> 2978.0,
> > > 1369.0, 1451.0))
> > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0,
> 4038.0,
> > > 4223.0, 4868.0))
> > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0,
> 4351.0,
> > > 4129.0, 4617.0))
> > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0,
> 1992.0,
> > > 2008.0, 2504.0))
> > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0,
> 1983.0,
> > > 2300.0, 3811.0))
> > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0,
> 1965.0,
> > > 2425.0, 3178.0))
> > > ...
> > >
> > > and a huge squared error:
> > > Squared error: 4.799184832395361E159
> > >
> > > You can find my code here:
> > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > >
> > > Can you help me? What did I do wrong?
> > >
> > > Thank you for your help,
> > > Felix
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

till.rohrmann
The SGD algorithm adapts the learning rate accordingly. However, this does
not help if you choose the initial learning rate too large because then you
calculate a weight vector in the first iterations from which it takes
really long to recover.

Cheer,
Till

On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <[hidden email]>
wrote:

> You can set the learning rate to be 1/sqrt(iteration number). This usually
> works.
>
> Regards
> Sachin Goel
>
> On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> [hidden email]> wrote:
>
> > I've seen some work on adaptive learning rates in the past days.
> >
> > Maybe we can think about extending the base algorithm and comparing the
> use
> > case setting for the IMPRO-3 project.
> >
> > @Felix you can discuss this with the others on Wednesday, Manu will be
> also
> > there and can give some feedback, I'll try to send a link tomorrow
> > morning...
> >
> >
> > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> >
> > > Since MLR uses stochastic gradient descent, you probably have to
> > configure
> > > the step size right. SGD is very sensitive to the right step size
> choice.
> > > If the step size is too high, then the SGD algorithm does not converge.
> > You
> > > can find the parameter description here [1].
> > >
> > > Cheers,
> > > Till
> > >
> > > [1]
> > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > >
> > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <[hidden email]
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I want to use MultipleLinearRegression, but I got really strange
> > results.
> > > > So I tested it with the housing price dataset:
> > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > >
> > > > And here I get negative house prices - even when I use the training
> set
> > > as
> > > > dataset:
> > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0,
> > 2978.0,
> > > > 1369.0, 1451.0))
> > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0,
> > 4038.0,
> > > > 4223.0, 4868.0))
> > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0,
> > 4351.0,
> > > > 4129.0, 4617.0))
> > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0,
> > 1992.0,
> > > > 2008.0, 2504.0))
> > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0,
> > 1983.0,
> > > > 2300.0, 3811.0))
> > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0,
> > 1965.0,
> > > > 2425.0, 3178.0))
> > > > ...
> > > >
> > > > and a huge squared error:
> > > > Squared error: 4.799184832395361E159
> > > >
> > > > You can find my code here:
> > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > >
> > > > Can you help me? What did I do wrong?
> > > >
> > > > Thank you for your help,
> > > > Felix
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Felix Neutatz
Yes, grid search solved the problem :)

2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:

> The SGD algorithm adapts the learning rate accordingly. However, this does
> not help if you choose the initial learning rate too large because then you
> calculate a weight vector in the first iterations from which it takes
> really long to recover.
>
> Cheer,
> Till
>
> On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <[hidden email]>
> wrote:
>
> > You can set the learning rate to be 1/sqrt(iteration number). This
> usually
> > works.
> >
> > Regards
> > Sachin Goel
> >
> > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > [hidden email]> wrote:
> >
> > > I've seen some work on adaptive learning rates in the past days.
> > >
> > > Maybe we can think about extending the base algorithm and comparing the
> > use
> > > case setting for the IMPRO-3 project.
> > >
> > > @Felix you can discuss this with the others on Wednesday, Manu will be
> > also
> > > there and can give some feedback, I'll try to send a link tomorrow
> > > morning...
> > >
> > >
> > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> > >
> > > > Since MLR uses stochastic gradient descent, you probably have to
> > > configure
> > > > the step size right. SGD is very sensitive to the right step size
> > choice.
> > > > If the step size is too high, then the SGD algorithm does not
> converge.
> > > You
> > > > can find the parameter description here [1].
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > >
> > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> [hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I want to use MultipleLinearRegression, but I got really strange
> > > results.
> > > > > So I tested it with the housing price dataset:
> > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > >
> > > > > And here I get negative house prices - even when I use the training
> > set
> > > > as
> > > > > dataset:
> > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0,
> > > 2978.0,
> > > > > 1369.0, 1451.0))
> > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0,
> > > 4038.0,
> > > > > 4223.0, 4868.0))
> > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0,
> > > 4351.0,
> > > > > 4129.0, 4617.0))
> > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0,
> > > 1992.0,
> > > > > 2008.0, 2504.0))
> > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0,
> > > 1983.0,
> > > > > 2300.0, 3811.0))
> > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0,
> > > 1965.0,
> > > > > 2425.0, 3178.0))
> > > > > ...
> > > > >
> > > > > and a huge squared error:
> > > > > Squared error: 4.799184832395361E159
> > > > >
> > > > > You can find my code here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > >
> > > > > Can you help me? What did I do wrong?
> > > > >
> > > > > Thank you for your help,
> > > > > Felix
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

till.rohrmann
Great to hear. This should no longer be a pain point once we support proper
cross validation.

On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <[hidden email]>
wrote:

> Yes, grid search solved the problem :)
>
> 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
>
> > The SGD algorithm adapts the learning rate accordingly. However, this
> does
> > not help if you choose the initial learning rate too large because then
> you
> > calculate a weight vector in the first iterations from which it takes
> > really long to recover.
> >
> > Cheer,
> > Till
> >
> > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <[hidden email]>
> > wrote:
> >
> > > You can set the learning rate to be 1/sqrt(iteration number). This
> > usually
> > > works.
> > >
> > > Regards
> > > Sachin Goel
> > >
> > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > [hidden email]> wrote:
> > >
> > > > I've seen some work on adaptive learning rates in the past days.
> > > >
> > > > Maybe we can think about extending the base algorithm and comparing
> the
> > > use
> > > > case setting for the IMPRO-3 project.
> > > >
> > > > @Felix you can discuss this with the others on Wednesday, Manu will
> be
> > > also
> > > > there and can give some feedback, I'll try to send a link tomorrow
> > > > morning...
> > > >
> > > >
> > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> > > >
> > > > > Since MLR uses stochastic gradient descent, you probably have to
> > > > configure
> > > > > the step size right. SGD is very sensitive to the right step size
> > > choice.
> > > > > If the step size is too high, then the SGD algorithm does not
> > converge.
> > > > You
> > > > > can find the parameter description here [1].
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > >
> > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > [hidden email]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I want to use MultipleLinearRegression, but I got really strange
> > > > results.
> > > > > > So I tested it with the housing price dataset:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > >
> > > > > > And here I get negative house prices - even when I use the
> training
> > > set
> > > > > as
> > > > > > dataset:
> > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0,
> > > > 2978.0,
> > > > > > 1369.0, 1451.0))
> > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0,
> > > > 4038.0,
> > > > > > 4223.0, 4868.0))
> > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0,
> > > > 4351.0,
> > > > > > 4129.0, 4617.0))
> > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0,
> > > > 1992.0,
> > > > > > 2008.0, 2504.0))
> > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0,
> > > > 1983.0,
> > > > > > 2300.0, 3811.0))
> > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0,
> > > > 1965.0,
> > > > > > 2425.0, 3178.0))
> > > > > > ...
> > > > > >
> > > > > > and a huge squared error:
> > > > > > Squared error: 4.799184832395361E159
> > > > > >
> > > > > > You can find my code here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > >
> > > > > > Can you help me? What did I do wrong?
> > > > > >
> > > > > > Thank you for your help,
> > > > > > Felix
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

AW: MultipleLinearRegression - Strange results

Mikio Braun
We should probably look into this nevertheless. Requiring full grid search for a simple algorithm like mlr sounds like overkill.

Do you have written down the math of your implementation somewhere?

-M

----- Ursprüngliche Nachricht -----
Von: "Till Rohrmann" <[hidden email]>
Gesendet: ‎02.‎06.‎2015 11:31
An: "[hidden email]" <[hidden email]>
Betreff: Re: MultipleLinearRegression - Strange results

Great to hear. This should no longer be a pain point once we support proper
cross validation.

On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <[hidden email]>
wrote:

> Yes, grid search solved the problem :)
>
> 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
>
> > The SGD algorithm adapts the learning rate accordingly. However, this
> does
> > not help if you choose the initial learning rate too large because then
> you
> > calculate a weight vector in the first iterations from which it takes
> > really long to recover.
> >
> > Cheer,
> > Till
> >
> > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <[hidden email]>
> > wrote:
> >
> > > You can set the learning rate to be 1/sqrt(iteration number). This
> > usually
> > > works.
> > >
> > > Regards
> > > Sachin Goel
> > >
> > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > [hidden email]> wrote:
> > >
> > > > I've seen some work on adaptive learning rates in the past days.
> > > >
> > > > Maybe we can think about extending the base algorithm and comparing
> the
> > > use
> > > > case setting for the IMPRO-3 project.
> > > >
> > > > @Felix you can discuss this with the others on Wednesday, Manu will
> be
> > > also
> > > > there and can give some feedback, I'll try to send a link tomorrow
> > > > morning...
> > > >
> > > >
> > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> > > >
> > > > > Since MLR uses stochastic gradient descent, you probably have to
> > > > configure
> > > > > the step size right. SGD is very sensitive to the right step size
> > > choice.
> > > > > If the step size is too high, then the SGD algorithm does not
> > converge.
> > > > You
> > > > > can find the parameter description here [1].
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > >
> > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > [hidden email]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I want to use MultipleLinearRegression, but I got really strange
> > > > results.
> > > > > > So I tested it with the housing price dataset:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > >
> > > > > > And here I get negative house prices - even when I use the
> training
> > > set
> > > > > as
> > > > > > dataset:
> > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0,
> > > > 2978.0,
> > > > > > 1369.0, 1451.0))
> > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0,
> > > > 4038.0,
> > > > > > 4223.0, 4868.0))
> > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0,
> > > > 4351.0,
> > > > > > 4129.0, 4617.0))
> > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0,
> > > > 1992.0,
> > > > > > 2008.0, 2504.0))
> > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0,
> > > > 1983.0,
> > > > > > 2300.0, 3811.0))
> > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0,
> > > > 1965.0,
> > > > > > 2425.0, 3178.0))
> > > > > > ...
> > > > > >
> > > > > > and a huge squared error:
> > > > > > Squared error: 4.799184832395361E159
> > > > > >
> > > > > > You can find my code here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > >
> > > > > > Can you help me? What did I do wrong?
> > > > > >
> > > > > > Thank you for your help,
> > > > > > Felix
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Ted Dunning
Any form of generalized linear regression should use adaptive learning
rates rather than simple SGD.  One of the current best methods is adagrad
although there are variants such as RMS prop and adadelta.  All are pretty
easy to implement.

Here is some visualization of various methods that provides some insights:
http://imgur.com/a/Hqolp

Vowpal wabbit has some tricks that allow very large initial learning rates
to be used without divergence.  I don't know the details.






On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <[hidden email]>
wrote:

> We should probably look into this nevertheless. Requiring full grid search
> for a simple algorithm like mlr sounds like overkill.
>
> Do you have written down the math of your implementation somewhere?
>
> -M
>
> ----- Ursprüngliche Nachricht -----
> Von: "Till Rohrmann" <[hidden email]>
> Gesendet: ‎02.‎06.‎2015 11:31
> An: "[hidden email]" <[hidden email]>
> Betreff: Re: MultipleLinearRegression - Strange results
>
> Great to hear. This should no longer be a pain point once we support proper
> cross validation.
>
> On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <[hidden email]>
> wrote:
>
> > Yes, grid search solved the problem :)
> >
> > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
> >
> > > The SGD algorithm adapts the learning rate accordingly. However, this
> > does
> > > not help if you choose the initial learning rate too large because then
> > you
> > > calculate a weight vector in the first iterations from which it takes
> > > really long to recover.
> > >
> > > Cheer,
> > > Till
> > >
> > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <[hidden email]>
> > > wrote:
> > >
> > > > You can set the learning rate to be 1/sqrt(iteration number). This
> > > usually
> > > > works.
> > > >
> > > > Regards
> > > > Sachin Goel
> > > >
> > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > > [hidden email]> wrote:
> > > >
> > > > > I've seen some work on adaptive learning rates in the past days.
> > > > >
> > > > > Maybe we can think about extending the base algorithm and comparing
> > the
> > > > use
> > > > > case setting for the IMPRO-3 project.
> > > > >
> > > > > @Felix you can discuss this with the others on Wednesday, Manu will
> > be
> > > > also
> > > > > there and can give some feedback, I'll try to send a link tomorrow
> > > > > morning...
> > > > >
> > > > >
> > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> > > > >
> > > > > > Since MLR uses stochastic gradient descent, you probably have to
> > > > > configure
> > > > > > the step size right. SGD is very sensitive to the right step size
> > > > choice.
> > > > > > If the step size is too high, then the SGD algorithm does not
> > > converge.
> > > > > You
> > > > > > can find the parameter description here [1].
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > > >
> > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > > [hidden email]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I want to use MultipleLinearRegression, but I got really
> strange
> > > > > results.
> > > > > > > So I tested it with the housing price dataset:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > > >
> > > > > > > And here I get negative house prices - even when I use the
> > training
> > > > set
> > > > > > as
> > > > > > > dataset:
> > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
> 2197.0,
> > > > > 2978.0,
> > > > > > > 1369.0, 1451.0))
> > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
> 4522.0,
> > > > > 4038.0,
> > > > > > > 4223.0, 4868.0))
> > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
> 4038.0,
> > > > > 4351.0,
> > > > > > > 4129.0, 4617.0))
> > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
> 2059.0,
> > > > > 1992.0,
> > > > > > > 2008.0, 2504.0))
> > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
> 1965.0,
> > > > > 1983.0,
> > > > > > > 2300.0, 3811.0))
> > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
> 1992.0,
> > > > > 1965.0,
> > > > > > > 2425.0, 3178.0))
> > > > > > > ...
> > > > > > >
> > > > > > > and a huge squared error:
> > > > > > > Squared error: 4.799184832395361E159
> > > > > > >
> > > > > > > You can find my code here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > > >
> > > > > > > Can you help me? What did I do wrong?
> > > > > > >
> > > > > > > Thank you for your help,
> > > > > > > Felix
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Till Rohrmann
At the moment the current SGD implementation works like (modulo
regularization): newWeights = oldWeights - adaptedStepsize *
sumOfGradients/numberOfGradients where adaptedStepsize =
initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple sum
of the gradients for all points in the batch.

Thanks for the pointer Ted. These methods look really promising. We
definitely have to update our SGD implementation to use a better adaptive
learning rate strategy. I’ll open a JIRA for that.

Maybe also the default learning rate of 0.1 is set too high.


On Thu, Jun 4, 2015 at 1:20 AM, Ted Dunning <[hidden email]> wrote:

> Any form of generalized linear regression should use adaptive learning
> rates rather than simple SGD.  One of the current best methods is adagrad
> although there are variants such as RMS prop and adadelta.  All are pretty
> easy to implement.
>
> Here is some visualization of various methods that provides some insights:
> http://imgur.com/a/Hqolp
>
> Vowpal wabbit has some tricks that allow very large initial learning rates
> to be used without divergence.  I don't know the details.
>
>
>
>
>
>
> On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <[hidden email]>
> wrote:
>
> > We should probably look into this nevertheless. Requiring full grid
> search
> > for a simple algorithm like mlr sounds like overkill.
> >
> > Do you have written down the math of your implementation somewhere?
> >
> > -M
> >
> > ----- Ursprüngliche Nachricht -----
> > Von: "Till Rohrmann" <[hidden email]>
> > Gesendet: ‎02.‎06.‎2015 11:31
> > An: "[hidden email]" <[hidden email]>
> > Betreff: Re: MultipleLinearRegression - Strange results
> >
> > Great to hear. This should no longer be a pain point once we support
> proper
> > cross validation.
> >
> > On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <[hidden email]>
> > wrote:
> >
> > > Yes, grid search solved the problem :)
> > >
> > > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
> > >
> > > > The SGD algorithm adapts the learning rate accordingly. However, this
> > > does
> > > > not help if you choose the initial learning rate too large because
> then
> > > you
> > > > calculate a weight vector in the first iterations from which it takes
> > > > really long to recover.
> > > >
> > > > Cheer,
> > > > Till
> > > >
> > > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > You can set the learning rate to be 1/sqrt(iteration number). This
> > > > usually
> > > > > works.
> > > > >
> > > > > Regards
> > > > > Sachin Goel
> > > > >
> > > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > > > [hidden email]> wrote:
> > > > >
> > > > > > I've seen some work on adaptive learning rates in the past days.
> > > > > >
> > > > > > Maybe we can think about extending the base algorithm and
> comparing
> > > the
> > > > > use
> > > > > > case setting for the IMPRO-3 project.
> > > > > >
> > > > > > @Felix you can discuss this with the others on Wednesday, Manu
> will
> > > be
> > > > > also
> > > > > > there and can give some feedback, I'll try to send a link
> tomorrow
> > > > > > morning...
> > > > > >
> > > > > >
> > > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
> > > > > >
> > > > > > > Since MLR uses stochastic gradient descent, you probably have
> to
> > > > > > configure
> > > > > > > the step size right. SGD is very sensitive to the right step
> size
> > > > > choice.
> > > > > > > If the step size is too high, then the SGD algorithm does not
> > > > converge.
> > > > > > You
> > > > > > > can find the parameter description here [1].
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > > > >
> > > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > > > [hidden email]
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I want to use MultipleLinearRegression, but I got really
> > strange
> > > > > > results.
> > > > > > > > So I tested it with the housing price dataset:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > > > >
> > > > > > > > And here I get negative house prices - even when I use the
> > > training
> > > > > set
> > > > > > > as
> > > > > > > > dataset:
> > > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
> > 2197.0,
> > > > > > 2978.0,
> > > > > > > > 1369.0, 1451.0))
> > > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
> > 4522.0,
> > > > > > 4038.0,
> > > > > > > > 4223.0, 4868.0))
> > > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
> > 4038.0,
> > > > > > 4351.0,
> > > > > > > > 4129.0, 4617.0))
> > > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
> > 2059.0,
> > > > > > 1992.0,
> > > > > > > > 2008.0, 2504.0))
> > > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
> > 1965.0,
> > > > > > 1983.0,
> > > > > > > > 2300.0, 3811.0))
> > > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
> > 1992.0,
> > > > > > 1965.0,
> > > > > > > > 2425.0, 3178.0))
> > > > > > > > ...
> > > > > > > >
> > > > > > > > and a huge squared error:
> > > > > > > > Squared error: 4.799184832395361E159
> > > > > > > >
> > > > > > > > You can find my code here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > > > >
> > > > > > > > Can you help me? What did I do wrong?
> > > > > > > >
> > > > > > > > Thank you for your help,
> > > > > > > > Felix
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Mikio Braun
It's true that we can and should look into methods to make sgd more
resilient, however, especially for linear regression, which even has a
closed form solution, all this seems too excessive.

I mean in the end, if the number of features is small (lets say less
than 2000), the best way is to compute the covariance matrix and then
just solve the problem. Even for larger method, we could use something
like conjugate gradients to just compute the result. All of this will
be much faster and have now additional parameters to tune.

On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[hidden email]> wrote:

> At the moment the current SGD implementation works like (modulo
> regularization): newWeights = oldWeights - adaptedStepsize *
> sumOfGradients/numberOfGradients where adaptedStepsize =
> initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple sum
> of the gradients for all points in the batch.
>
> Thanks for the pointer Ted. These methods look really promising. We
> definitely have to update our SGD implementation to use a better adaptive
> learning rate strategy. I’ll open a JIRA for that.
>
> Maybe also the default learning rate of 0.1 is set too high.
>
>
> On Thu, Jun 4, 2015 at 1:20 AM, Ted Dunning <[hidden email]> wrote:
>
>> Any form of generalized linear regression should use adaptive learning
>> rates rather than simple SGD.  One of the current best methods is adagrad
>> although there are variants such as RMS prop and adadelta.  All are pretty
>> easy to implement.
>>
>> Here is some visualization of various methods that provides some insights:
>> http://imgur.com/a/Hqolp
>>
>> Vowpal wabbit has some tricks that allow very large initial learning rates
>> to be used without divergence.  I don't know the details.
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <[hidden email]>
>> wrote:
>>
>> > We should probably look into this nevertheless. Requiring full grid
>> search
>> > for a simple algorithm like mlr sounds like overkill.
>> >
>> > Do you have written down the math of your implementation somewhere?
>> >
>> > -M
>> >
>> > ----- Ursprüngliche Nachricht -----
>> > Von: "Till Rohrmann" <[hidden email]>
>> > Gesendet: ‎02.‎06.‎2015 11:31
>> > An: "[hidden email]" <[hidden email]>
>> > Betreff: Re: MultipleLinearRegression - Strange results
>> >
>> > Great to hear. This should no longer be a pain point once we support
>> proper
>> > cross validation.
>> >
>> > On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <[hidden email]>
>> > wrote:
>> >
>> > > Yes, grid search solved the problem :)
>> > >
>> > > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
>> > >
>> > > > The SGD algorithm adapts the learning rate accordingly. However, this
>> > > does
>> > > > not help if you choose the initial learning rate too large because
>> then
>> > > you
>> > > > calculate a weight vector in the first iterations from which it takes
>> > > > really long to recover.
>> > > >
>> > > > Cheer,
>> > > > Till
>> > > >
>> > > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <
>> [hidden email]>
>> > > > wrote:
>> > > >
>> > > > > You can set the learning rate to be 1/sqrt(iteration number). This
>> > > > usually
>> > > > > works.
>> > > > >
>> > > > > Regards
>> > > > > Sachin Goel
>> > > > >
>> > > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
>> > > > > [hidden email]> wrote:
>> > > > >
>> > > > > > I've seen some work on adaptive learning rates in the past days.
>> > > > > >
>> > > > > > Maybe we can think about extending the base algorithm and
>> comparing
>> > > the
>> > > > > use
>> > > > > > case setting for the IMPRO-3 project.
>> > > > > >
>> > > > > > @Felix you can discuss this with the others on Wednesday, Manu
>> will
>> > > be
>> > > > > also
>> > > > > > there and can give some feedback, I'll try to send a link
>> tomorrow
>> > > > > > morning...
>> > > > > >
>> > > > > >
>> > > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <[hidden email]>:
>> > > > > >
>> > > > > > > Since MLR uses stochastic gradient descent, you probably have
>> to
>> > > > > > configure
>> > > > > > > the step size right. SGD is very sensitive to the right step
>> size
>> > > > > choice.
>> > > > > > > If the step size is too high, then the SGD algorithm does not
>> > > > converge.
>> > > > > > You
>> > > > > > > can find the parameter description here [1].
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Till
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
>> > > > > > >
>> > > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
>> > > > [hidden email]
>> > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I want to use MultipleLinearRegression, but I got really
>> > strange
>> > > > > > results.
>> > > > > > > > So I tested it with the housing price dataset:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
>> > > > > > > >
>> > > > > > > > And here I get negative house prices - even when I use the
>> > > training
>> > > > > set
>> > > > > > > as
>> > > > > > > > dataset:
>> > > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
>> > 2197.0,
>> > > > > > 2978.0,
>> > > > > > > > 1369.0, 1451.0))
>> > > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
>> > 4522.0,
>> > > > > > 4038.0,
>> > > > > > > > 4223.0, 4868.0))
>> > > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
>> > 4038.0,
>> > > > > > 4351.0,
>> > > > > > > > 4129.0, 4617.0))
>> > > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
>> > 2059.0,
>> > > > > > 1992.0,
>> > > > > > > > 2008.0, 2504.0))
>> > > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
>> > 1965.0,
>> > > > > > 1983.0,
>> > > > > > > > 2300.0, 3811.0))
>> > > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
>> > 1992.0,
>> > > > > > 1965.0,
>> > > > > > > > 2425.0, 3178.0))
>> > > > > > > > ...
>> > > > > > > >
>> > > > > > > > and a huge squared error:
>> > > > > > > > Squared error: 4.799184832395361E159
>> > > > > > > >
>> > > > > > > > You can find my code here:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
>> > > > > > > >
>> > > > > > > > Can you help me? What did I do wrong?
>> > > > > > > >
>> > > > > > > > Thank you for your help,
>> > > > > > > > Felix
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>



--
Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Ted Dunning
In reply to this post by Till Rohrmann
On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[hidden email]> wrote:

> Maybe also the default learning rate of 0.1 is set too high.
>

Could be.

But grid search on learning rate is pretty standard practice. Running
multiple learning engines at the same time with different learning rates is
pretty plausible.

Also, using something like adagrad will knock down high learning rates very
quickly if you get a nearly divergent step. This can make initially high
learning rates quite plausible.
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Ted Dunning
In reply to this post by Mikio Braun
+1 for simple learning for simple cases.

Where normal equations have a reasonable condition number, using them is
good.

For large sparse systems, SGD with Adagrad will crush direct solutions,
however, even for linear problems.



On Thu, Jun 4, 2015 at 2:38 PM, Mikio Braun <[hidden email]>
wrote:

> It's true that we can and should look into methods to make sgd more
> resilient, however, especially for linear regression, which even has a
> closed form solution, all this seems too excessive.
>
> I mean in the end, if the number of features is small (lets say less
> than 2000), the best way is to compute the covariance matrix and then
> just solve the problem. Even for larger method, we could use something
> like conjugate gradients to just compute the result. All of this will
> be much faster and have now additional parameters to tune.
>
> On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[hidden email]>
> wrote:
> > At the moment the current SGD implementation works like (modulo
> > regularization): newWeights = oldWeights - adaptedStepsize *
> > sumOfGradients/numberOfGradients where adaptedStepsize =
> > initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple
> sum
> > of the gradients for all points in the batch.
> >
> > Thanks for the pointer Ted. These methods look really promising. We
> > definitely have to update our SGD implementation to use a better adaptive
> > learning rate strategy. I’ll open a JIRA for that.
> >
> > Maybe also the default learning rate of 0.1 is set too high.
> >
> >
> > On Thu, Jun 4, 2015 at 1:20 AM, Ted Dunning <[hidden email]>
> wrote:
> >
> >> Any form of generalized linear regression should use adaptive learning
> >> rates rather than simple SGD.  One of the current best methods is
> adagrad
> >> although there are variants such as RMS prop and adadelta.  All are
> pretty
> >> easy to implement.
> >>
> >> Here is some visualization of various methods that provides some
> insights:
> >> http://imgur.com/a/Hqolp
> >>
> >> Vowpal wabbit has some tricks that allow very large initial learning
> rates
> >> to be used without divergence.  I don't know the details.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <[hidden email]>
> >> wrote:
> >>
> >> > We should probably look into this nevertheless. Requiring full grid
> >> search
> >> > for a simple algorithm like mlr sounds like overkill.
> >> >
> >> > Do you have written down the math of your implementation somewhere?
> >> >
> >> > -M
> >> >
> >> > ----- Ursprüngliche Nachricht -----
> >> > Von: "Till Rohrmann" <[hidden email]>
> >> > Gesendet: ‎02.‎06.‎2015 11:31
> >> > An: "[hidden email]" <[hidden email]>
> >> > Betreff: Re: MultipleLinearRegression - Strange results
> >> >
> >> > Great to hear. This should no longer be a pain point once we support
> >> proper
> >> > cross validation.
> >> >
> >> > On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <
> [hidden email]>
> >> > wrote:
> >> >
> >> > > Yes, grid search solved the problem :)
> >> > >
> >> > > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <[hidden email]>:
> >> > >
> >> > > > The SGD algorithm adapts the learning rate accordingly. However,
> this
> >> > > does
> >> > > > not help if you choose the initial learning rate too large because
> >> then
> >> > > you
> >> > > > calculate a weight vector in the first iterations from which it
> takes
> >> > > > really long to recover.
> >> > > >
> >> > > > Cheer,
> >> > > > Till
> >> > > >
> >> > > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <
> >> [hidden email]>
> >> > > > wrote:
> >> > > >
> >> > > > > You can set the learning rate to be 1/sqrt(iteration number).
> This
> >> > > > usually
> >> > > > > works.
> >> > > > >
> >> > > > > Regards
> >> > > > > Sachin Goel
> >> > > > >
> >> > > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> >> > > > > [hidden email]> wrote:
> >> > > > >
> >> > > > > > I've seen some work on adaptive learning rates in the past
> days.
> >> > > > > >
> >> > > > > > Maybe we can think about extending the base algorithm and
> >> comparing
> >> > > the
> >> > > > > use
> >> > > > > > case setting for the IMPRO-3 project.
> >> > > > > >
> >> > > > > > @Felix you can discuss this with the others on Wednesday, Manu
> >> will
> >> > > be
> >> > > > > also
> >> > > > > > there and can give some feedback, I'll try to send a link
> >> tomorrow
> >> > > > > > morning...
> >> > > > > >
> >> > > > > >
> >> > > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <
> [hidden email]>:
> >> > > > > >
> >> > > > > > > Since MLR uses stochastic gradient descent, you probably
> have
> >> to
> >> > > > > > configure
> >> > > > > > > the step size right. SGD is very sensitive to the right step
> >> size
> >> > > > > choice.
> >> > > > > > > If the step size is too high, then the SGD algorithm does
> not
> >> > > > converge.
> >> > > > > > You
> >> > > > > > > can find the parameter description here [1].
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Till
> >> > > > > > >
> >> > > > > > > [1]
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> >> > > > > > >
> >> > > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> >> > > > [hidden email]
> >> > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I want to use MultipleLinearRegression, but I got really
> >> > strange
> >> > > > > > results.
> >> > > > > > > > So I tested it with the housing price dataset:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> >> > > > > > > >
> >> > > > > > > > And here I get negative house prices - even when I use the
> >> > > training
> >> > > > > set
> >> > > > > > > as
> >> > > > > > > > dataset:
> >> > > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
> >> > 2197.0,
> >> > > > > > 2978.0,
> >> > > > > > > > 1369.0, 1451.0))
> >> > > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
> >> > 4522.0,
> >> > > > > > 4038.0,
> >> > > > > > > > 4223.0, 4868.0))
> >> > > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
> >> > 4038.0,
> >> > > > > > 4351.0,
> >> > > > > > > > 4129.0, 4617.0))
> >> > > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
> >> > 2059.0,
> >> > > > > > 1992.0,
> >> > > > > > > > 2008.0, 2504.0))
> >> > > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
> >> > 1965.0,
> >> > > > > > 1983.0,
> >> > > > > > > > 2300.0, 3811.0))
> >> > > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
> >> > 1992.0,
> >> > > > > > 1965.0,
> >> > > > > > > > 2425.0, 3178.0))
> >> > > > > > > > ...
> >> > > > > > > >
> >> > > > > > > > and a huge squared error:
> >> > > > > > > > Squared error: 4.799184832395361E159
> >> > > > > > > >
> >> > > > > > > > You can find my code here:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> >> > > > > > > >
> >> > > > > > > > Can you help me? What did I do wrong?
> >> > > > > > > >
> >> > > > > > > > Thank you for your help,
> >> > > > > > > > Felix
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>
>
>
> --
> Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Till Rohrmann
In reply to this post by Ted Dunning
I agree that given a small data set it's probably better to solve the
linear regression problem directly. However, I'm not so sure how well this
performs if the data gets really big (more in terms of number of data
points). But maybe we can find something like a sweet spot when to switch
between both methods. And maybe a distributed conjugate gradient methods
can also beat SGD if the data is too large to be computed on a single
machine.

Until we have adagrad or another more robust learning rate strategy, we
could also deactivate the default value for simple SGD. This makes users
aware that they have to tweak this parameter.

On Thu, Jun 4, 2015 at 2:54 PM, Ted Dunning <[hidden email]> wrote:

> On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[hidden email]>
> wrote:
>
> > Maybe also the default learning rate of 0.1 is set too high.
> >
>
> Could be.
>
> But grid search on learning rate is pretty standard practice. Running
> multiple learning engines at the same time with different learning rates is
> pretty plausible.
>
> Also, using something like adagrad will knock down high learning rates very
> quickly if you get a nearly divergent step. This can make initially high
> learning rates quite plausible.
>
Reply | Threaded
Open this post in threaded view
|

Re: MultipleLinearRegression - Strange results

Mikio Braun
For linear regression, the main tasks are computing the covariance
matrix and X * y, which can both be parallelized well, and then you
need to solve a linear equation whose dimension consists of the number
of features. So if number of features is small, it actually makes
sense to do the setup in Flink but then solve it directly.

Working on some example code on this one... .

On Thu, Jun 4, 2015 at 4:51 PM, Till Rohrmann <[hidden email]> wrote:

> I agree that given a small data set it's probably better to solve the
> linear regression problem directly. However, I'm not so sure how well this
> performs if the data gets really big (more in terms of number of data
> points). But maybe we can find something like a sweet spot when to switch
> between both methods. And maybe a distributed conjugate gradient methods
> can also beat SGD if the data is too large to be computed on a single
> machine.
>
> Until we have adagrad or another more robust learning rate strategy, we
> could also deactivate the default value for simple SGD. This makes users
> aware that they have to tweak this parameter.
>
> On Thu, Jun 4, 2015 at 2:54 PM, Ted Dunning <[hidden email]> wrote:
>
>> On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[hidden email]>
>> wrote:
>>
>> > Maybe also the default learning rate of 0.1 is set too high.
>> >
>>
>> Could be.
>>
>> But grid search on learning rate is pretty standard practice. Running
>> multiple learning engines at the same time with different learning rates is
>> pretty plausible.
>>
>> Also, using something like adagrad will knock down high learning rates very
>> quickly if you get a nearly divergent step. This can make initially high
>> learning rates quite plausible.
>>



--
Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun