Till Rohrmann created FLINK-2162:
------------------------------------
Summary: Implement adaptive learning rate strategies for SGD
Key: FLINK-2162
URL:
https://issues.apache.org/jira/browse/FLINK-2162 Project: Flink
Issue Type: Improvement
Components: Machine Learning Library
Reporter: Till Rohrmann
Priority: Minor
At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate / sqrt(iterationNumber) }}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable.
There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches.
Resources:
[1] [
http://imgur.com/a/Hqolp]
[2] [
http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [
http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [
http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [
http://www.willamette.edu/~gorr/classes/cs449/momrate.html]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)