[jira] [Created] (FLINK-2162) Implement adaptive learning rate strategies for SGD

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-2162) Implement adaptive learning rate strategies for SGD

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-2162:
------------------------------------

             Summary: Implement adaptive learning rate strategies for SGD
                 Key: FLINK-2162
                 URL: https://issues.apache.org/jira/browse/FLINK-2162
             Project: Flink
          Issue Type: Improvement
          Components: Machine Learning Library
            Reporter: Till Rohrmann
            Priority: Minor


At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate / sqrt(iterationNumber) }}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches.

Resources:
[1] [http://imgur.com/a/Hqolp]
[2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)