Is it just me or does MSE tend to increase with more iterations of Linear
Regression? Using 1.0.2 (or 1.1) %flink import org.apache.flink.ml.optimization.SimpleGradientDescent import org.apache.flink.ml.optimization.LearningRateMethod import org.apache.flink.ml.regression.MultipleLinearRegression import org.apache.flink.ml.common.LabeledVector import org.apache.flink.ml.math.DenseVector val survival = env.readCsvFile[(String, String, String, String)]("file:///home/trevor/gits/datasets/haberman/haberman.data") val survivalLV = survival .map{tuple => val list = tuple.productIterator.toList val numList = list.map(_.asInstanceOf[String].toDouble) LabeledVector(numList(3), DenseVector(numList.take(3).toArray)) } val mlr_default = MultipleLinearRegression() .setIterations(5) mlr_default.fit(survivalLV) val mse1 = mlr_default.squaredResidualSum(survivalLV).collect() val mlr_default = MultipleLinearRegression() .setIterations(10) mlr_default.fit(survivalLV) val mse2 = mlr_default.squaredResidualSum(survivalLV).collect() println(mse1 , mse2 ) Results in : (Buffer(4.047910100612734E28),Buffer(2.6223205846507677E52)) Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* |
Hi Trevor,
the multiple linear regression implementation is quite sensitive to the initial learning rate. If the value is not set right, it might be the case that the algorithm alternates between ever increasing values left and right of the minimum. Could you try to set a smaller initial learning rate? If the error should still persist, then we should file a JIRA issue for that. Cheers, Till On Tue, May 3, 2016 at 4:58 PM, Trevor Grant <[hidden email]> wrote: > Is it just me or does MSE tend to increase with more iterations of Linear > Regression? > > Using 1.0.2 (or 1.1) > > %flink > import org.apache.flink.ml.optimization.SimpleGradientDescent > import org.apache.flink.ml.optimization.LearningRateMethod > import org.apache.flink.ml.regression.MultipleLinearRegression > import org.apache.flink.ml.common.LabeledVector > import org.apache.flink.ml.math.DenseVector > > val survival = env.readCsvFile[(String, String, String, > String)]("file:///home/trevor/gits/datasets/haberman/haberman.data") > val survivalLV = survival > .map{tuple => > val list = tuple.productIterator.toList > val numList = list.map(_.asInstanceOf[String].toDouble) > LabeledVector(numList(3), DenseVector(numList.take(3).toArray)) > } > > > val mlr_default = MultipleLinearRegression() > .setIterations(5) > > > mlr_default.fit(survivalLV) > > val mse1 = mlr_default.squaredResidualSum(survivalLV).collect() > > > val mlr_default = MultipleLinearRegression() > .setIterations(10) > > mlr_default.fit(survivalLV) > > val mse2 = mlr_default.squaredResidualSum(survivalLV).collect() > println(mse1 , mse2 ) > > > Results in : > > (Buffer(4.047910100612734E28),Buffer(2.6223205846507677E52)) > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > |
Free forum by Nabble | Edit this page |