Hi, community.
For streaming applications, auto scaling is critical to guarantee our resource provision, not big or small. i.e. it is cost efficiency requirements from public cloud (AWS) players. I read the document of FLIP-159 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode>, and it is claimed that it aims at the requirements of elastic scale. Unfortunately, it only supports standalone clusters. So I'd like to figure out whether the community has already tried out this feature on Yarn, and which kinds of issues will be the blocker to support this feature from Yarn perspective? I appreciate your insights! Thanks Bing -- |
Hey Bing,
The idea of reactive mode is that it reacts to changing resource availability, where an outside service is adding or removing machines to a Flink cluster. Reactive Mode doesn't work on an active resource manager such as YARN, because you can not tell YARN from the outside to remove resources from a Flink Application. The Flink application has to do this by itself. For reactive mode, we introduced Adaptive Scheduler (FLIP-160) and declarative resource management (FLIP-138). Both of these changes are very important building blocks for adding autoscaling to active resource managers -- I would say we've completed most of the heavy lifting for autoscaling already. The main blocker for autoscaling in my opinion is figuring out a good API for determining the scale of a running job: Do we expose a REST API where users can adjust the parallelism of individual operators? Do we expose an API where some code runs somewhere adjusting the parallelism of operators? Do we provide a set of configuration parameters defining the "min", "target", "max" parallelism, or some scaling based on some metrics (Kafka consumer lag, latency, throughput, backpressure, cpu load, ...). I've personally started playing a bit with a small prototype for autoscaling, but it's not ready to share yet. From the topics the main contributors are currently working on, I'm not aware of anybody working on autoscaling in the Flink 1.14 release cycle, hence I believe it is unlikely to be part of 1.14. Let me know what you think, I'm in particular interested in how you would like to use autoscaling! Best, Robert On Thu, May 20, 2021 at 9:11 AM Bing Jiang <[hidden email]> wrote: > Hi, community. > > For streaming applications, auto scaling is critical to guarantee our > resource provision, not big or small. i.e. it is cost efficiency > requirements from public cloud (AWS) players. > > I read the document of FLIP-159 > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode > >, > and it is claimed that it aims at the requirements of elastic scale. > Unfortunately, it only supports standalone clusters. So I'd like to figure > out whether the community has already tried out this feature on Yarn, and > which kinds of issues will be the blocker to support this feature from Yarn > perspective? > > I appreciate your insights! > > Thanks > Bing > -- > |
Free forum by Nabble | Edit this page |