(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-2162) Implement adaptive learning rate strategies for SGD

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-2162) Implement adaptive learning rate strategies for SGD

Till Rohrmann created FLINK-2162:
------------------------------------

Summary: Implement adaptive learning rate strategies for SGD
Key: FLINK-2162
URL: https://issues.apache.org/jira/browse/FLINK-2162
Project: Flink
Issue Type: Improvement
Components: Machine Learning Library
Reporter: Till Rohrmann
Priority: Minor

At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate / sqrt(iterationNumber) }}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches.

Resources:
[1] [http://imgur.com/a/Hqolp]
[2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)