[DISCUSS] FLIP-56: Dynamic Slot Allocation

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Hi everyone,

We would like to start a discussion thread on "FLIP-56: Dynamic Slot
Allocation" [1]. This is originally part of the discussion thread for
"FLIP-53: Fine Grained Resource Management" [2]. As Till suggested, we
would like split the original discussion into two topics, and start a
separate new discussion thread as well as FLIP process for this one.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

tison
We suddenly skipped FLIP-55 lol.


Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:

> Hi everyone,
>
> We would like to start a discussion thread on "FLIP-56: Dynamic Slot
> Allocation" [1]. This is originally part of the discussion thread for
> "FLIP-53: Fine Grained Resource Management" [2]. As Till suggested, we
> would like split the original discussion into two topics, and start a
> separate new discussion thread as well as FLIP process for this one.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
@Zili

As far as I know, Timo is drafting a FLIP that has taken the number 55.
There is a round-up number maintained on the FLIP wiki page [1] shows which
number should be used for the new FLIP, which should be increased by
whoever takes the number for a new FLIP.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]> wrote:

> We suddenly skipped FLIP-55 lol.
>
>
> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
>
> > Hi everyone,
> >
> > We would like to start a discussion thread on "FLIP-56: Dynamic Slot
> > Allocation" [1]. This is originally part of the discussion thread for
> > "FLIP-53: Fine Grained Resource Management" [2]. As Till suggested, we
> > would like split the original discussion into two topics, and start a
> > separate new discussion thread as well as FLIP process for this one.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >
> > [2]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Added implementation steps for this FLIP on the wiki page [1].


Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation



On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <[hidden email]> wrote:

> @Zili
>
> As far as I know, Timo is drafting a FLIP that has taken the number 55.
> There is a round-up number maintained on the FLIP wiki page [1] shows
> which number should be used for the new FLIP, which should be increased by
> whoever takes the number for a new FLIP.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]> wrote:
>
>> We suddenly skipped FLIP-55 lol.
>>
>>
>> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
>>
>> > Hi everyone,
>> >
>> > We would like to start a discussion thread on "FLIP-56: Dynamic Slot
>> > Allocation" [1]. This is originally part of the discussion thread for
>> > "FLIP-53: Fine Grained Resource Management" [2]. As Till suggested, we
>> > would like split the original discussion into two topics, and start a
>> > separate new discussion thread as well as FLIP process for this one.
>> >
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> >
>> > [2]
>> >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Till Rohrmann
Thanks for the update Xintong. From a high level perspective the
implementation plan looks good to me.

Cheers,
Till

On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <[hidden email]> wrote:

> Added implementation steps for this FLIP on the wiki page [1].
>
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>
>
>
> On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <[hidden email]>
> wrote:
>
> > @Zili
> >
> > As far as I know, Timo is drafting a FLIP that has taken the number 55.
> > There is a round-up number maintained on the FLIP wiki page [1] shows
> > which number should be used for the new FLIP, which should be increased
> by
> > whoever takes the number for a new FLIP.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]> wrote:
> >
> >> We suddenly skipped FLIP-55 lol.
> >>
> >>
> >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> >>
> >> > Hi everyone,
> >> >
> >> > We would like to start a discussion thread on "FLIP-56: Dynamic Slot
> >> > Allocation" [1]. This is originally part of the discussion thread for
> >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till suggested, we
> >> > would like split the original discussion into two topics, and start a
> >> > separate new discussion thread as well as FLIP process for this one.
> >> >
> >> > Thank you~
> >> >
> >> > Xintong Song
> >> >
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >> >
> >> > [2]
> >> >
> >> >
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

wenlong.lwl
Hi, Xintong, thanks for the great proposal. big +1 for the feature! It is
something like mapreduce-1.0 to mapreduce-2.0.

I like the design on the whole. One point may need to be included in the
proposal:How we deal with slot share group and dynamic slot allocation? It
can be quite different with dynamic slot allocation.

On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]> wrote:

> Thanks for the update Xintong. From a high level perspective the
> implementation plan looks good to me.
>
> Cheers,
> Till
>
> On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <[hidden email]>
> wrote:
>
> > Added implementation steps for this FLIP on the wiki page [1].
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >
> >
> >
> > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <[hidden email]>
> > wrote:
> >
> > > @Zili
> > >
> > > As far as I know, Timo is drafting a FLIP that has taken the number 55.
> > > There is a round-up number maintained on the FLIP wiki page [1] shows
> > > which number should be used for the new FLIP, which should be increased
> > by
> > > whoever takes the number for a new FLIP.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]>
> wrote:
> > >
> > >> We suddenly skipped FLIP-55 lol.
> > >>
> > >>
> > >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > We would like to start a discussion thread on "FLIP-56: Dynamic Slot
> > >> > Allocation" [1]. This is originally part of the discussion thread
> for
> > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till suggested,
> we
> > >> > would like split the original discussion into two topics, and start
> a
> > >> > separate new discussion thread as well as FLIP process for this one.
> > >> >
> > >> > Thank you~
> > >> >
> > >> > Xintong Song
> > >> >
> > >> >
> > >> > [1]
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >> >
> > >> > [2]
> > >> >
> > >> >
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > >> >
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Thanks for the comments, Till and Wenlong.

@Wenlong
Regarding slot sharing, the general idea is to request a slot with
resources for tasks of the entire slot sharing group. Details can be found
in FLIP-53 [1], regarding how to decide the slot sharing groups and how to
manage task resources within the shared slots.

Thank you~

Xintong Song



On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <[hidden email]>
wrote:

> Hi, Xintong, thanks for the great proposal. big +1 for the feature! It is
> something like mapreduce-1.0 to mapreduce-2.0.
>
> I like the design on the whole. One point may need to be included in the
> proposal:How we deal with slot share group and dynamic slot allocation? It
> can be quite different with dynamic slot allocation.
>
> On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]> wrote:
>
> > Thanks for the update Xintong. From a high level perspective the
> > implementation plan looks good to me.
> >
> > Cheers,
> > Till
> >
> > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Added implementation steps for this FLIP on the wiki page [1].
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >
> > >
> > >
> > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > @Zili
> > > >
> > > > As far as I know, Timo is drafting a FLIP that has taken the number
> 55.
> > > > There is a round-up number maintained on the FLIP wiki page [1] shows
> > > > which number should be used for the new FLIP, which should be
> increased
> > > by
> > > > whoever takes the number for a new FLIP.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > >
> > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]>
> > wrote:
> > > >
> > > >> We suddenly skipped FLIP-55 lol.
> > > >>
> > > >>
> > > >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> > > >>
> > > >> > Hi everyone,
> > > >> >
> > > >> > We would like to start a discussion thread on "FLIP-56: Dynamic
> Slot
> > > >> > Allocation" [1]. This is originally part of the discussion thread
> > for
> > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> suggested,
> > we
> > > >> > would like split the original discussion into two topics, and
> start
> > a
> > > >> > separate new discussion thread as well as FLIP process for this
> one.
> > > >> >
> > > >> > Thank you~
> > > >> >
> > > >> > Xintong Song
> > > >> >
> > > >> >
> > > >> > [1]
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >> >
> > > >> > [2]
> > > >> >
> > > >> >
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > >> >
> > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Andrey Zagrebin-3
Hi Xintong,

Thanks for sharing the implementation steps. I also think they makes sense
with the feature option.

I was wondering if we could order the steps in a way that each change does
not affect other components too much, always having a working system
then maybe the feature option does not always need to split the code. Here
are some thoughts.

- We could do default slot profile firstly and include it into the TM
registration. I would suggest to add
to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
  This way RM knows about it but does not use at this point. (parts of step
4,6)

- We could try to do step 3 firstly in a way that it also supports the
current way of allocation in TaskExecutorGateway#requestSlot with the
default slot profile
  and sends reports both with available resources and with free default
slots which correspond to the available resources. We can just remove free
default slots later.
  The new way of TaskExecutorGateway#requestResource could be also
implemented here but not used yet.

- Then step 5 can use the new TaskExecutorGateway#requestResource and the
default slot profile

- Not sure, step 5 and 7 can be implemented independently without
regression of what we have. Maybe if we do step 7 firstly it will have only
default slots firstly and it will simplify step 5 later.

Best,
Andrey

On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]> wrote:

> Thanks for the comments, Till and Wenlong.
>
> @Wenlong
> Regarding slot sharing, the general idea is to request a slot with
> resources for tasks of the entire slot sharing group. Details can be found
> in FLIP-53 [1], regarding how to decide the slot sharing groups and how to
> manage task resources within the shared slots.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <[hidden email]>
> wrote:
>
> > Hi, Xintong, thanks for the great proposal. big +1 for the feature! It is
> > something like mapreduce-1.0 to mapreduce-2.0.
> >
> > I like the design on the whole. One point may need to be included in the
> > proposal:How we deal with slot share group and dynamic slot allocation?
> It
> > can be quite different with dynamic slot allocation.
> >
> > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]>
> wrote:
> >
> > > Thanks for the update Xintong. From a high level perspective the
> > > implementation plan looks good to me.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > Added implementation steps for this FLIP on the wiki page [1].
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >
> > > >
> > > >
> > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <[hidden email]>
> > > > wrote:
> > > >
> > > > > @Zili
> > > > >
> > > > > As far as I know, Timo is drafting a FLIP that has taken the number
> > 55.
> > > > > There is a round-up number maintained on the FLIP wiki page [1]
> shows
> > > > > which number should be used for the new FLIP, which should be
> > increased
> > > > by
> > > > > whoever takes the number for a new FLIP.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > >
> > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]>
> > > wrote:
> > > > >
> > > > >> We suddenly skipped FLIP-55 lol.
> > > > >>
> > > > >>
> > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> > > > >>
> > > > >> > Hi everyone,
> > > > >> >
> > > > >> > We would like to start a discussion thread on "FLIP-56: Dynamic
> > Slot
> > > > >> > Allocation" [1]. This is originally part of the discussion
> thread
> > > for
> > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> > suggested,
> > > we
> > > > >> > would like split the original discussion into two topics, and
> > start
> > > a
> > > > >> > separate new discussion thread as well as FLIP process for this
> > one.
> > > > >> >
> > > > >> > Thank you~
> > > > >> >
> > > > >> > Xintong Song
> > > > >> >
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >> >
> > > > >> > [2]
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Thanks for the comments, Andrey.

- I agree that instead of ResourceManagerGateway#sendSlotReport, we should
add the default slot resource profile to
ResourceManagerGateway#registerTaskExecutor.

- If I understand correctly, the reason you suggest do default slot
resource profile first and then do step 3 in a way that support both
TaskExecutorGateway#requestSlot and TaskExecutorGateway#requestResource, is
to try to avoid splitting code paths with the feature option? I think we
can do that, but I also want to bring it up that this can only reduce the
code split by the feature option (which is good) but not eliminate it. We
still need the feature option for the fundamental differences, e.g.
creating new SlotIDs on allocation vs. allocate to free slots with existing
SlotIDs.

- I don't really think we can do step 5, 6 and 7 independently. Basically
they are all making changes to the same component. We probably can do step
6 and 7 independently, but I think they both depends on step 5.

In general, I would say it's good to have as less as possible codes split
by the feature option, which makes the later clean-up easier. But if it
cannot be easily done, I would rather not to put too much efforts on having
a good abstraction and deduplication between the new code path and the
original one that we are removing soon.

What do you think?

Thank you~

Xintong Song



On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <[hidden email]>
wrote:

> Hi Xintong,
>
> Thanks for sharing the implementation steps. I also think they makes sense
> with the feature option.
>
> I was wondering if we could order the steps in a way that each change does
> not affect other components too much, always having a working system
> then maybe the feature option does not always need to split the code. Here
> are some thoughts.
>
> - We could do default slot profile firstly and include it into the TM
> registration. I would suggest to add
> to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
>   This way RM knows about it but does not use at this point. (parts of step
> 4,6)
>
> - We could try to do step 3 firstly in a way that it also supports the
> current way of allocation in TaskExecutorGateway#requestSlot with the
> default slot profile
>   and sends reports both with available resources and with free default
> slots which correspond to the available resources. We can just remove free
> default slots later.
>   The new way of TaskExecutorGateway#requestResource could be also
> implemented here but not used yet.
>
> - Then step 5 can use the new TaskExecutorGateway#requestResource and the
> default slot profile
>
> - Not sure, step 5 and 7 can be implemented independently without
> regression of what we have. Maybe if we do step 7 firstly it will have only
> default slots firstly and it will simplify step 5 later.
>
> Best,
> Andrey
>
> On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]>
> wrote:
>
> > Thanks for the comments, Till and Wenlong.
> >
> > @Wenlong
> > Regarding slot sharing, the general idea is to request a slot with
> > resources for tasks of the entire slot sharing group. Details can be
> found
> > in FLIP-53 [1], regarding how to decide the slot sharing groups and how
> to
> > manage task resources within the shared slots.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <[hidden email]>
> > wrote:
> >
> > > Hi, Xintong, thanks for the great proposal. big +1 for the feature! It
> is
> > > something like mapreduce-1.0 to mapreduce-2.0.
> > >
> > > I like the design on the whole. One point may need to be included in
> the
> > > proposal:How we deal with slot share group and dynamic slot allocation?
> > It
> > > can be quite different with dynamic slot allocation.
> > >
> > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]>
> > wrote:
> > >
> > > > Thanks for the update Xintong. From a high level perspective the
> > > > implementation plan looks good to me.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > Added implementation steps for this FLIP on the wiki page [1].
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > @Zili
> > > > > >
> > > > > > As far as I know, Timo is drafting a FLIP that has taken the
> number
> > > 55.
> > > > > > There is a round-up number maintained on the FLIP wiki page [1]
> > shows
> > > > > > which number should be used for the new FLIP, which should be
> > > increased
> > > > > by
> > > > > > whoever takes the number for a new FLIP.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > >
> > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <[hidden email]>
> > > > wrote:
> > > > > >
> > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > >>
> > > > > >>
> > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> > > > > >>
> > > > > >> > Hi everyone,
> > > > > >> >
> > > > > >> > We would like to start a discussion thread on "FLIP-56:
> Dynamic
> > > Slot
> > > > > >> > Allocation" [1]. This is originally part of the discussion
> > thread
> > > > for
> > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> > > suggested,
> > > > we
> > > > > >> > would like split the original discussion into two topics, and
> > > start
> > > > a
> > > > > >> > separate new discussion thread as well as FLIP process for
> this
> > > one.
> > > > > >> >
> > > > > >> > Thank you~
> > > > > >> >
> > > > > >> > Xintong Song
> > > > > >> >
> > > > > >> >
> > > > > >> > [1]
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> >
> > > > > >> > [2]
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Till Rohrmann
One thing which was briefly mentioned in the Flip but not in the
implementation plan is the update of the web UI. I think it is worth
putting an extra item for updating the web UI to properly display the
resources a TM has still to offer with dynamic slot allocation. I guess we
need to pull in some JavaScript help in order to implement this step.

Cheers,
Till

On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]> wrote:

> Thanks for the comments, Andrey.
>
> - I agree that instead of ResourceManagerGateway#sendSlotReport, we should
> add the default slot resource profile to
> ResourceManagerGateway#registerTaskExecutor.
>
> - If I understand correctly, the reason you suggest do default slot
> resource profile first and then do step 3 in a way that support both
> TaskExecutorGateway#requestSlot and TaskExecutorGateway#requestResource, is
> to try to avoid splitting code paths with the feature option? I think we
> can do that, but I also want to bring it up that this can only reduce the
> code split by the feature option (which is good) but not eliminate it. We
> still need the feature option for the fundamental differences, e.g.
> creating new SlotIDs on allocation vs. allocate to free slots with existing
> SlotIDs.
>
> - I don't really think we can do step 5, 6 and 7 independently. Basically
> they are all making changes to the same component. We probably can do step
> 6 and 7 independently, but I think they both depends on step 5.
>
> In general, I would say it's good to have as less as possible codes split
> by the feature option, which makes the later clean-up easier. But if it
> cannot be easily done, I would rather not to put too much efforts on having
> a good abstraction and deduplication between the new code path and the
> original one that we are removing soon.
>
> What do you think?
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <[hidden email]>
> wrote:
>
> > Hi Xintong,
> >
> > Thanks for sharing the implementation steps. I also think they makes
> sense
> > with the feature option.
> >
> > I was wondering if we could order the steps in a way that each change
> does
> > not affect other components too much, always having a working system
> > then maybe the feature option does not always need to split the code.
> Here
> > are some thoughts.
> >
> > - We could do default slot profile firstly and include it into the TM
> > registration. I would suggest to add
> > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
> >   This way RM knows about it but does not use at this point. (parts of
> step
> > 4,6)
> >
> > - We could try to do step 3 firstly in a way that it also supports the
> > current way of allocation in TaskExecutorGateway#requestSlot with the
> > default slot profile
> >   and sends reports both with available resources and with free default
> > slots which correspond to the available resources. We can just remove
> free
> > default slots later.
> >   The new way of TaskExecutorGateway#requestResource could be also
> > implemented here but not used yet.
> >
> > - Then step 5 can use the new TaskExecutorGateway#requestResource and the
> > default slot profile
> >
> > - Not sure, step 5 and 7 can be implemented independently without
> > regression of what we have. Maybe if we do step 7 firstly it will have
> only
> > default slots firstly and it will simplify step 5 later.
> >
> > Best,
> > Andrey
> >
> > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Thanks for the comments, Till and Wenlong.
> > >
> > > @Wenlong
> > > Regarding slot sharing, the general idea is to request a slot with
> > > resources for tasks of the entire slot sharing group. Details can be
> > found
> > > in FLIP-53 [1], regarding how to decide the slot sharing groups and how
> > to
> > > manage task resources within the shared slots.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <[hidden email]>
> > > wrote:
> > >
> > > > Hi, Xintong, thanks for the great proposal. big +1 for the feature!
> It
> > is
> > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > >
> > > > I like the design on the whole. One point may need to be included in
> > the
> > > > proposal:How we deal with slot share group and dynamic slot
> allocation?
> > > It
> > > > can be quite different with dynamic slot allocation.
> > > >
> > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]>
> > > wrote:
> > > >
> > > > > Thanks for the update Xintong. From a high level perspective the
> > > > > implementation plan looks good to me.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> [hidden email]
> > >
> > > > > wrote:
> > > > >
> > > > > > Added implementation steps for this FLIP on the wiki page [1].
> > > > > >
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > @Zili
> > > > > > >
> > > > > > > As far as I know, Timo is drafting a FLIP that has taken the
> > number
> > > > 55.
> > > > > > > There is a round-up number maintained on the FLIP wiki page [1]
> > > shows
> > > > > > > which number should be used for the new FLIP, which should be
> > > > increased
> > > > > > by
> > > > > > > whoever takes the number for a new FLIP.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > > >
> > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > > >>
> > > > > > >>
> > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一 下午10:23写道:
> > > > > > >>
> > > > > > >> > Hi everyone,
> > > > > > >> >
> > > > > > >> > We would like to start a discussion thread on "FLIP-56:
> > Dynamic
> > > > Slot
> > > > > > >> > Allocation" [1]. This is originally part of the discussion
> > > thread
> > > > > for
> > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> > > > suggested,
> > > > > we
> > > > > > >> > would like split the original discussion into two topics,
> and
> > > > start
> > > > > a
> > > > > > >> > separate new discussion thread as well as FLIP process for
> > this
> > > > one.
> > > > > > >> >
> > > > > > >> > Thank you~
> > > > > > >> >
> > > > > > >> > Xintong Song
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > [1]
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > > >> >
> > > > > > >> > [2]
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Andrey Zagrebin-3
@Xintong

Thanks for the feedback.

Just to clarify step 6:
If the first point is done before step 5 (e.g. as part of 4) then it is
just keeping the info about the default slot in RM's data structure
associated the TM and no real change in the behaviour.
When this info is available, I think it can be straightforwardly used
during step 5 where we get either concrete slot requirement
or the unknown one (step 6, point 2) which simply grabs some of the
concrete default ones (btw not clear which one, seems just some random?)

For steps 5,7, true, it is not quite clear whether we can avoid some split,
e.g. after step 5 before doing step 7.
I agree that we should introduce the feature flag if we clearly see that it
would be a bigger effort without the flag.

Best,
Andrey

On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]> wrote:

> One thing which was briefly mentioned in the Flip but not in the
> implementation plan is the update of the web UI. I think it is worth
> putting an extra item for updating the web UI to properly display the
> resources a TM has still to offer with dynamic slot allocation. I guess we
> need to pull in some JavaScript help in order to implement this step.
>
> Cheers,
> Till
>
> On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]>
> wrote:
>
> > Thanks for the comments, Andrey.
> >
> > - I agree that instead of ResourceManagerGateway#sendSlotReport, we
> should
> > add the default slot resource profile to
> > ResourceManagerGateway#registerTaskExecutor.
> >
> > - If I understand correctly, the reason you suggest do default slot
> > resource profile first and then do step 3 in a way that support both
> > TaskExecutorGateway#requestSlot and TaskExecutorGateway#requestResource,
> is
> > to try to avoid splitting code paths with the feature option? I think we
> > can do that, but I also want to bring it up that this can only reduce the
> > code split by the feature option (which is good) but not eliminate it. We
> > still need the feature option for the fundamental differences, e.g.
> > creating new SlotIDs on allocation vs. allocate to free slots with
> existing
> > SlotIDs.
> >
> > - I don't really think we can do step 5, 6 and 7 independently. Basically
> > they are all making changes to the same component. We probably can do
> step
> > 6 and 7 independently, but I think they both depends on step 5.
> >
> > In general, I would say it's good to have as less as possible codes split
> > by the feature option, which makes the later clean-up easier. But if it
> > cannot be easily done, I would rather not to put too much efforts on
> having
> > a good abstraction and deduplication between the new code path and the
> > original one that we are removing soon.
> >
> > What do you think?
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <[hidden email]>
> > wrote:
> >
> > > Hi Xintong,
> > >
> > > Thanks for sharing the implementation steps. I also think they makes
> > sense
> > > with the feature option.
> > >
> > > I was wondering if we could order the steps in a way that each change
> > does
> > > not affect other components too much, always having a working system
> > > then maybe the feature option does not always need to split the code.
> > Here
> > > are some thoughts.
> > >
> > > - We could do default slot profile firstly and include it into the TM
> > > registration. I would suggest to add
> > > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
> > >   This way RM knows about it but does not use at this point. (parts of
> > step
> > > 4,6)
> > >
> > > - We could try to do step 3 firstly in a way that it also supports the
> > > current way of allocation in TaskExecutorGateway#requestSlot with the
> > > default slot profile
> > >   and sends reports both with available resources and with free default
> > > slots which correspond to the available resources. We can just remove
> > free
> > > default slots later.
> > >   The new way of TaskExecutorGateway#requestResource could be also
> > > implemented here but not used yet.
> > >
> > > - Then step 5 can use the new TaskExecutorGateway#requestResource and
> the
> > > default slot profile
> > >
> > > - Not sure, step 5 and 7 can be implemented independently without
> > > regression of what we have. Maybe if we do step 7 firstly it will have
> > only
> > > default slots firstly and it will simplify step 5 later.
> > >
> > > Best,
> > > Andrey
> > >
> > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > Thanks for the comments, Till and Wenlong.
> > > >
> > > > @Wenlong
> > > > Regarding slot sharing, the general idea is to request a slot with
> > > > resources for tasks of the entire slot sharing group. Details can be
> > > found
> > > > in FLIP-53 [1], regarding how to decide the slot sharing groups and
> how
> > > to
> > > > manage task resources within the shared slots.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi, Xintong, thanks for the great proposal. big +1 for the feature!
> > It
> > > is
> > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > > >
> > > > > I like the design on the whole. One point may need to be included
> in
> > > the
> > > > > proposal:How we deal with slot share group and dynamic slot
> > allocation?
> > > > It
> > > > > can be quite different with dynamic slot allocation.
> > > > >
> > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <[hidden email]>
> > > > wrote:
> > > > >
> > > > > > Thanks for the update Xintong. From a high level perspective the
> > > > > > implementation plan looks good to me.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Added implementation steps for this FLIP on the wiki page [1].
> > > > > > >
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > @Zili
> > > > > > > >
> > > > > > > > As far as I know, Timo is drafting a FLIP that has taken the
> > > number
> > > > > 55.
> > > > > > > > There is a round-up number maintained on the FLIP wiki page
> [1]
> > > > shows
> > > > > > > > which number should be used for the new FLIP, which should be
> > > > > increased
> > > > > > > by
> > > > > > > > whoever takes the number for a new FLIP.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > > > >
> > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
> 下午10:23写道:
> > > > > > > >>
> > > > > > > >> > Hi everyone,
> > > > > > > >> >
> > > > > > > >> > We would like to start a discussion thread on "FLIP-56:
> > > Dynamic
> > > > > Slot
> > > > > > > >> > Allocation" [1]. This is originally part of the discussion
> > > > thread
> > > > > > for
> > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> > > > > suggested,
> > > > > > we
> > > > > > > >> > would like split the original discussion into two topics,
> > and
> > > > > start
> > > > > > a
> > > > > > > >> > separate new discussion thread as well as FLIP process for
> > > this
> > > > > one.
> > > > > > > >> >
> > > > > > > >> > Thank you~
> > > > > > > >> >
> > > > > > > >> > Xintong Song
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > [1]
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > > > >> >
> > > > > > > >> > [2]
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
@Till
Thanks for the reminding. I'll add a step for updating the web ui. I'll try
to involve Lining to help us with this step.

@Andrey
I was thinking that after we define the RM-TM interfaces in step 2, it
would be good to concurrently work on both RM and TM side. But yes, if we
finish Step 4 early, then it would make step 6 easier. We can start to have
some IT/E2E tests, with the default slot resource profiles being available.

Thank you~

Xintong Song



On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <[hidden email]>
wrote:

> @Xintong
>
> Thanks for the feedback.
>
> Just to clarify step 6:
> If the first point is done before step 5 (e.g. as part of 4) then it is
> just keeping the info about the default slot in RM's data structure
> associated the TM and no real change in the behaviour.
> When this info is available, I think it can be straightforwardly used
> during step 5 where we get either concrete slot requirement
> or the unknown one (step 6, point 2) which simply grabs some of the
> concrete default ones (btw not clear which one, seems just some random?)
>
> For steps 5,7, true, it is not quite clear whether we can avoid some split,
> e.g. after step 5 before doing step 7.
> I agree that we should introduce the feature flag if we clearly see that it
> would be a bigger effort without the flag.
>
> Best,
> Andrey
>
> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > One thing which was briefly mentioned in the Flip but not in the
> > implementation plan is the update of the web UI. I think it is worth
> > putting an extra item for updating the web UI to properly display the
> > resources a TM has still to offer with dynamic slot allocation. I guess
> we
> > need to pull in some JavaScript help in order to implement this step.
> >
> > Cheers,
> > Till
> >
> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Thanks for the comments, Andrey.
> > >
> > > - I agree that instead of ResourceManagerGateway#sendSlotReport, we
> > should
> > > add the default slot resource profile to
> > > ResourceManagerGateway#registerTaskExecutor.
> > >
> > > - If I understand correctly, the reason you suggest do default slot
> > > resource profile first and then do step 3 in a way that support both
> > > TaskExecutorGateway#requestSlot and
> TaskExecutorGateway#requestResource,
> > is
> > > to try to avoid splitting code paths with the feature option? I think
> we
> > > can do that, but I also want to bring it up that this can only reduce
> the
> > > code split by the feature option (which is good) but not eliminate it.
> We
> > > still need the feature option for the fundamental differences, e.g.
> > > creating new SlotIDs on allocation vs. allocate to free slots with
> > existing
> > > SlotIDs.
> > >
> > > - I don't really think we can do step 5, 6 and 7 independently.
> Basically
> > > they are all making changes to the same component. We probably can do
> > step
> > > 6 and 7 independently, but I think they both depends on step 5.
> > >
> > > In general, I would say it's good to have as less as possible codes
> split
> > > by the feature option, which makes the later clean-up easier. But if it
> > > cannot be easily done, I would rather not to put too much efforts on
> > having
> > > a good abstraction and deduplication between the new code path and the
> > > original one that we are removing soon.
> > >
> > > What do you think?
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <[hidden email]>
> > > wrote:
> > >
> > > > Hi Xintong,
> > > >
> > > > Thanks for sharing the implementation steps. I also think they makes
> > > sense
> > > > with the feature option.
> > > >
> > > > I was wondering if we could order the steps in a way that each change
> > > does
> > > > not affect other components too much, always having a working system
> > > > then maybe the feature option does not always need to split the code.
> > > Here
> > > > are some thoughts.
> > > >
> > > > - We could do default slot profile firstly and include it into the TM
> > > > registration. I would suggest to add
> > > > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
> > > >   This way RM knows about it but does not use at this point. (parts
> of
> > > step
> > > > 4,6)
> > > >
> > > > - We could try to do step 3 firstly in a way that it also supports
> the
> > > > current way of allocation in TaskExecutorGateway#requestSlot with the
> > > > default slot profile
> > > >   and sends reports both with available resources and with free
> default
> > > > slots which correspond to the available resources. We can just remove
> > > free
> > > > default slots later.
> > > >   The new way of TaskExecutorGateway#requestResource could be also
> > > > implemented here but not used yet.
> > > >
> > > > - Then step 5 can use the new TaskExecutorGateway#requestResource and
> > the
> > > > default slot profile
> > > >
> > > > - Not sure, step 5 and 7 can be implemented independently without
> > > > regression of what we have. Maybe if we do step 7 firstly it will
> have
> > > only
> > > > default slots firstly and it will simplify step 5 later.
> > > >
> > > > Best,
> > > > Andrey
> > > >
> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]>
> > > > wrote:
> > > >
> > > > > Thanks for the comments, Till and Wenlong.
> > > > >
> > > > > @Wenlong
> > > > > Regarding slot sharing, the general idea is to request a slot with
> > > > > resources for tasks of the entire slot sharing group. Details can
> be
> > > > found
> > > > > in FLIP-53 [1], regarding how to decide the slot sharing groups and
> > how
> > > > to
> > > > > manage task resources within the shared slots.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
> feature!
> > > It
> > > > is
> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > > > >
> > > > > > I like the design on the whole. One point may need to be included
> > in
> > > > the
> > > > > > proposal:How we deal with slot share group and dynamic slot
> > > allocation?
> > > > > It
> > > > > > can be quite different with dynamic slot allocation.
> > > > > >
> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> [hidden email]>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for the update Xintong. From a high level perspective
> the
> > > > > > > implementation plan looks good to me.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > > [hidden email]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Added implementation steps for this FLIP on the wiki page
> [1].
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > @Zili
> > > > > > > > >
> > > > > > > > > As far as I know, Timo is drafting a FLIP that has taken
> the
> > > > number
> > > > > > 55.
> > > > > > > > > There is a round-up number maintained on the FLIP wiki page
> > [1]
> > > > > shows
> > > > > > > > > which number should be used for the new FLIP, which should
> be
> > > > > > increased
> > > > > > > > by
> > > > > > > > > whoever takes the number for a new FLIP.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > > > > >
> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
> > 下午10:23写道:
> > > > > > > > >>
> > > > > > > > >> > Hi everyone,
> > > > > > > > >> >
> > > > > > > > >> > We would like to start a discussion thread on "FLIP-56:
> > > > Dynamic
> > > > > > Slot
> > > > > > > > >> > Allocation" [1]. This is originally part of the
> discussion
> > > > > thread
> > > > > > > for
> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As Till
> > > > > > suggested,
> > > > > > > we
> > > > > > > > >> > would like split the original discussion into two
> topics,
> > > and
> > > > > > start
> > > > > > > a
> > > > > > > > >> > separate new discussion thread as well as FLIP process
> for
> > > > this
> > > > > > one.
> > > > > > > > >> >
> > > > > > > > >> > Thank you~
> > > > > > > > >> >
> > > > > > > > >> > Xintong Song
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > [1]
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > > > > >> >
> > > > > > > > >> > [2]
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
The implementation plan [1] is updated, with the following changes:

   - Add default slot resource profile to
   ResourceManagerGateway#registerTaskExecutor rather than #sendSlotReport.
   - Swap 'TaskExecutor derive and register with default slot resource
   profile' and 'Extend TaskExecutor to support dynamic slot allocation'
   - Add step for updating RestAPI / Web UI

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation

On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <[hidden email]> wrote:

> @Till
> Thanks for the reminding. I'll add a step for updating the web ui. I'll
> try to involve Lining to help us with this step.
>
> @Andrey
> I was thinking that after we define the RM-TM interfaces in step 2, it
> would be good to concurrently work on both RM and TM side. But yes, if we
> finish Step 4 early, then it would make step 6 easier. We can start to have
> some IT/E2E tests, with the default slot resource profiles being available.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <[hidden email]>
> wrote:
>
>> @Xintong
>>
>> Thanks for the feedback.
>>
>> Just to clarify step 6:
>> If the first point is done before step 5 (e.g. as part of 4) then it is
>> just keeping the info about the default slot in RM's data structure
>> associated the TM and no real change in the behaviour.
>> When this info is available, I think it can be straightforwardly used
>> during step 5 where we get either concrete slot requirement
>> or the unknown one (step 6, point 2) which simply grabs some of the
>> concrete default ones (btw not clear which one, seems just some random?)
>>
>> For steps 5,7, true, it is not quite clear whether we can avoid some
>> split,
>> e.g. after step 5 before doing step 7.
>> I agree that we should introduce the feature flag if we clearly see that
>> it
>> would be a bigger effort without the flag.
>>
>> Best,
>> Andrey
>>
>> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]>
>> wrote:
>>
>> > One thing which was briefly mentioned in the Flip but not in the
>> > implementation plan is the update of the web UI. I think it is worth
>> > putting an extra item for updating the web UI to properly display the
>> > resources a TM has still to offer with dynamic slot allocation. I guess
>> we
>> > need to pull in some JavaScript help in order to implement this step.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]>
>> > wrote:
>> >
>> > > Thanks for the comments, Andrey.
>> > >
>> > > - I agree that instead of ResourceManagerGateway#sendSlotReport, we
>> > should
>> > > add the default slot resource profile to
>> > > ResourceManagerGateway#registerTaskExecutor.
>> > >
>> > > - If I understand correctly, the reason you suggest do default slot
>> > > resource profile first and then do step 3 in a way that support both
>> > > TaskExecutorGateway#requestSlot and
>> TaskExecutorGateway#requestResource,
>> > is
>> > > to try to avoid splitting code paths with the feature option? I think
>> we
>> > > can do that, but I also want to bring it up that this can only reduce
>> the
>> > > code split by the feature option (which is good) but not eliminate
>> it. We
>> > > still need the feature option for the fundamental differences, e.g.
>> > > creating new SlotIDs on allocation vs. allocate to free slots with
>> > existing
>> > > SlotIDs.
>> > >
>> > > - I don't really think we can do step 5, 6 and 7 independently.
>> Basically
>> > > they are all making changes to the same component. We probably can do
>> > step
>> > > 6 and 7 independently, but I think they both depends on step 5.
>> > >
>> > > In general, I would say it's good to have as less as possible codes
>> split
>> > > by the feature option, which makes the later clean-up easier. But if
>> it
>> > > cannot be easily done, I would rather not to put too much efforts on
>> > having
>> > > a good abstraction and deduplication between the new code path and the
>> > > original one that we are removing soon.
>> > >
>> > > What do you think?
>> > >
>> > > Thank you~
>> > >
>> > > Xintong Song
>> > >
>> > >
>> > >
>> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <[hidden email]
>> >
>> > > wrote:
>> > >
>> > > > Hi Xintong,
>> > > >
>> > > > Thanks for sharing the implementation steps. I also think they makes
>> > > sense
>> > > > with the feature option.
>> > > >
>> > > > I was wondering if we could order the steps in a way that each
>> change
>> > > does
>> > > > not affect other components too much, always having a working system
>> > > > then maybe the feature option does not always need to split the
>> code.
>> > > Here
>> > > > are some thoughts.
>> > > >
>> > > > - We could do default slot profile firstly and include it into the
>> TM
>> > > > registration. I would suggest to add
>> > > > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
>> > > >   This way RM knows about it but does not use at this point. (parts
>> of
>> > > step
>> > > > 4,6)
>> > > >
>> > > > - We could try to do step 3 firstly in a way that it also supports
>> the
>> > > > current way of allocation in TaskExecutorGateway#requestSlot with
>> the
>> > > > default slot profile
>> > > >   and sends reports both with available resources and with free
>> default
>> > > > slots which correspond to the available resources. We can just
>> remove
>> > > free
>> > > > default slots later.
>> > > >   The new way of TaskExecutorGateway#requestResource could be also
>> > > > implemented here but not used yet.
>> > > >
>> > > > - Then step 5 can use the new TaskExecutorGateway#requestResource
>> and
>> > the
>> > > > default slot profile
>> > > >
>> > > > - Not sure, step 5 and 7 can be implemented independently without
>> > > > regression of what we have. Maybe if we do step 7 firstly it will
>> have
>> > > only
>> > > > default slots firstly and it will simplify step 5 later.
>> > > >
>> > > > Best,
>> > > > Andrey
>> > > >
>> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <[hidden email]
>> >
>> > > > wrote:
>> > > >
>> > > > > Thanks for the comments, Till and Wenlong.
>> > > > >
>> > > > > @Wenlong
>> > > > > Regarding slot sharing, the general idea is to request a slot with
>> > > > > resources for tasks of the entire slot sharing group. Details can
>> be
>> > > > found
>> > > > > in FLIP-53 [1], regarding how to decide the slot sharing groups
>> and
>> > how
>> > > > to
>> > > > > manage task resources within the shared slots.
>> > > > >
>> > > > > Thank you~
>> > > > >
>> > > > > Xintong Song
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
>> > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
>> feature!
>> > > It
>> > > > is
>> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
>> > > > > >
>> > > > > > I like the design on the whole. One point may need to be
>> included
>> > in
>> > > > the
>> > > > > > proposal:How we deal with slot share group and dynamic slot
>> > > allocation?
>> > > > > It
>> > > > > > can be quite different with dynamic slot allocation.
>> > > > > >
>> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
>> [hidden email]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Thanks for the update Xintong. From a high level perspective
>> the
>> > > > > > > implementation plan looks good to me.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Till
>> > > > > > >
>> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
>> > > [hidden email]
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Added implementation steps for this FLIP on the wiki page
>> [1].
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Thank you~
>> > > > > > > >
>> > > > > > > > Xintong Song
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
>> > > > [hidden email]>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > @Zili
>> > > > > > > > >
>> > > > > > > > > As far as I know, Timo is drafting a FLIP that has taken
>> the
>> > > > number
>> > > > > > 55.
>> > > > > > > > > There is a round-up number maintained on the FLIP wiki
>> page
>> > [1]
>> > > > > shows
>> > > > > > > > > which number should be used for the new FLIP, which
>> should be
>> > > > > > increased
>> > > > > > > > by
>> > > > > > > > > whoever takes the number for a new FLIP.
>> > > > > > > > >
>> > > > > > > > > Thank you~
>> > > > > > > > >
>> > > > > > > > > Xintong Song
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> > > > > > > > >
>> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
>> > > [hidden email]>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> We suddenly skipped FLIP-55 lol.
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
>> > 下午10:23写道:
>> > > > > > > > >>
>> > > > > > > > >> > Hi everyone,
>> > > > > > > > >> >
>> > > > > > > > >> > We would like to start a discussion thread on "FLIP-56:
>> > > > Dynamic
>> > > > > > Slot
>> > > > > > > > >> > Allocation" [1]. This is originally part of the
>> discussion
>> > > > > thread
>> > > > > > > for
>> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As
>> Till
>> > > > > > suggested,
>> > > > > > > we
>> > > > > > > > >> > would like split the original discussion into two
>> topics,
>> > > and
>> > > > > > start
>> > > > > > > a
>> > > > > > > > >> > separate new discussion thread as well as FLIP process
>> for
>> > > > this
>> > > > > > one.
>> > > > > > > > >> >
>> > > > > > > > >> > Thank you~
>> > > > > > > > >> >
>> > > > > > > > >> > Xintong Song
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > [1]
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> > > > > > > > >> >
>> > > > > > > > >> > [2]
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Andrey Zagrebin-4
Thanks for the update @Xintong.
I would be ok with starting the vote.

Best,
Andrey

On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <[hidden email]> wrote:

> The implementation plan [1] is updated, with the following changes:
>
>    - Add default slot resource profile to
>    ResourceManagerGateway#registerTaskExecutor rather than #sendSlotReport.
>    - Swap 'TaskExecutor derive and register with default slot resource
>    profile' and 'Extend TaskExecutor to support dynamic slot allocation'
>    - Add step for updating RestAPI / Web UI
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>
> On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <[hidden email]>
> wrote:
>
> > @Till
> > Thanks for the reminding. I'll add a step for updating the web ui. I'll
> > try to involve Lining to help us with this step.
> >
> > @Andrey
> > I was thinking that after we define the RM-TM interfaces in step 2, it
> > would be good to concurrently work on both RM and TM side. But yes, if we
> > finish Step 4 early, then it would make step 6 easier. We can start to
> have
> > some IT/E2E tests, with the default slot resource profiles being
> available.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <[hidden email]>
> > wrote:
> >
> >> @Xintong
> >>
> >> Thanks for the feedback.
> >>
> >> Just to clarify step 6:
> >> If the first point is done before step 5 (e.g. as part of 4) then it is
> >> just keeping the info about the default slot in RM's data structure
> >> associated the TM and no real change in the behaviour.
> >> When this info is available, I think it can be straightforwardly used
> >> during step 5 where we get either concrete slot requirement
> >> or the unknown one (step 6, point 2) which simply grabs some of the
> >> concrete default ones (btw not clear which one, seems just some random?)
> >>
> >> For steps 5,7, true, it is not quite clear whether we can avoid some
> >> split,
> >> e.g. after step 5 before doing step 7.
> >> I agree that we should introduce the feature flag if we clearly see that
> >> it
> >> would be a bigger effort without the flag.
> >>
> >> Best,
> >> Andrey
> >>
> >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]>
> >> wrote:
> >>
> >> > One thing which was briefly mentioned in the Flip but not in the
> >> > implementation plan is the update of the web UI. I think it is worth
> >> > putting an extra item for updating the web UI to properly display the
> >> > resources a TM has still to offer with dynamic slot allocation. I
> guess
> >> we
> >> > need to pull in some JavaScript help in order to implement this step.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]>
> >> > wrote:
> >> >
> >> > > Thanks for the comments, Andrey.
> >> > >
> >> > > - I agree that instead of ResourceManagerGateway#sendSlotReport, we
> >> > should
> >> > > add the default slot resource profile to
> >> > > ResourceManagerGateway#registerTaskExecutor.
> >> > >
> >> > > - If I understand correctly, the reason you suggest do default slot
> >> > > resource profile first and then do step 3 in a way that support both
> >> > > TaskExecutorGateway#requestSlot and
> >> TaskExecutorGateway#requestResource,
> >> > is
> >> > > to try to avoid splitting code paths with the feature option? I
> think
> >> we
> >> > > can do that, but I also want to bring it up that this can only
> reduce
> >> the
> >> > > code split by the feature option (which is good) but not eliminate
> >> it. We
> >> > > still need the feature option for the fundamental differences, e.g.
> >> > > creating new SlotIDs on allocation vs. allocate to free slots with
> >> > existing
> >> > > SlotIDs.
> >> > >
> >> > > - I don't really think we can do step 5, 6 and 7 independently.
> >> Basically
> >> > > they are all making changes to the same component. We probably can
> do
> >> > step
> >> > > 6 and 7 independently, but I think they both depends on step 5.
> >> > >
> >> > > In general, I would say it's good to have as less as possible codes
> >> split
> >> > > by the feature option, which makes the later clean-up easier. But if
> >> it
> >> > > cannot be easily done, I would rather not to put too much efforts on
> >> > having
> >> > > a good abstraction and deduplication between the new code path and
> the
> >> > > original one that we are removing soon.
> >> > >
> >> > > What do you think?
> >> > >
> >> > > Thank you~
> >> > >
> >> > > Xintong Song
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> [hidden email]
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Xintong,
> >> > > >
> >> > > > Thanks for sharing the implementation steps. I also think they
> makes
> >> > > sense
> >> > > > with the feature option.
> >> > > >
> >> > > > I was wondering if we could order the steps in a way that each
> >> change
> >> > > does
> >> > > > not affect other components too much, always having a working
> system
> >> > > > then maybe the feature option does not always need to split the
> >> code.
> >> > > Here
> >> > > > are some thoughts.
> >> > > >
> >> > > > - We could do default slot profile firstly and include it into the
> >> TM
> >> > > > registration. I would suggest to add
> >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> sendSlotReport.
> >> > > >   This way RM knows about it but does not use at this point.
> (parts
> >> of
> >> > > step
> >> > > > 4,6)
> >> > > >
> >> > > > - We could try to do step 3 firstly in a way that it also supports
> >> the
> >> > > > current way of allocation in TaskExecutorGateway#requestSlot with
> >> the
> >> > > > default slot profile
> >> > > >   and sends reports both with available resources and with free
> >> default
> >> > > > slots which correspond to the available resources. We can just
> >> remove
> >> > > free
> >> > > > default slots later.
> >> > > >   The new way of TaskExecutorGateway#requestResource could be also
> >> > > > implemented here but not used yet.
> >> > > >
> >> > > > - Then step 5 can use the new TaskExecutorGateway#requestResource
> >> and
> >> > the
> >> > > > default slot profile
> >> > > >
> >> > > > - Not sure, step 5 and 7 can be implemented independently without
> >> > > > regression of what we have. Maybe if we do step 7 firstly it will
> >> have
> >> > > only
> >> > > > default slots firstly and it will simplify step 5 later.
> >> > > >
> >> > > > Best,
> >> > > > Andrey
> >> > > >
> >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> [hidden email]
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Thanks for the comments, Till and Wenlong.
> >> > > > >
> >> > > > > @Wenlong
> >> > > > > Regarding slot sharing, the general idea is to request a slot
> with
> >> > > > > resources for tasks of the entire slot sharing group. Details
> can
> >> be
> >> > > > found
> >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing groups
> >> and
> >> > how
> >> > > > to
> >> > > > > manage task resources within the shared slots.
> >> > > > >
> >> > > > > Thank you~
> >> > > > >
> >> > > > > Xintong Song
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> >> > [hidden email]>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
> >> feature!
> >> > > It
> >> > > > is
> >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> >> > > > > >
> >> > > > > > I like the design on the whole. One point may need to be
> >> included
> >> > in
> >> > > > the
> >> > > > > > proposal:How we deal with slot share group and dynamic slot
> >> > > allocation?
> >> > > > > It
> >> > > > > > can be quite different with dynamic slot allocation.
> >> > > > > >
> >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> >> [hidden email]>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Thanks for the update Xintong. From a high level perspective
> >> the
> >> > > > > > > implementation plan looks good to me.
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Till
> >> > > > > > >
> >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> >> > > [hidden email]
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Added implementation steps for this FLIP on the wiki page
> >> [1].
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Thank you~
> >> > > > > > > >
> >> > > > > > > > Xintong Song
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > [1]
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> >> > > > [hidden email]>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > @Zili
> >> > > > > > > > >
> >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has taken
> >> the
> >> > > > number
> >> > > > > > 55.
> >> > > > > > > > > There is a round-up number maintained on the FLIP wiki
> >> page
> >> > [1]
> >> > > > > shows
> >> > > > > > > > > which number should be used for the new FLIP, which
> >> should be
> >> > > > > > increased
> >> > > > > > > > by
> >> > > > > > > > > whoever takes the number for a new FLIP.
> >> > > > > > > > >
> >> > > > > > > > > Thank you~
> >> > > > > > > > >
> >> > > > > > > > > Xintong Song
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > [1]
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> >> > > [hidden email]>
> >> > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
> >> > 下午10:23写道:
> >> > > > > > > > >>
> >> > > > > > > > >> > Hi everyone,
> >> > > > > > > > >> >
> >> > > > > > > > >> > We would like to start a discussion thread on
> "FLIP-56:
> >> > > > Dynamic
> >> > > > > > Slot
> >> > > > > > > > >> > Allocation" [1]. This is originally part of the
> >> discussion
> >> > > > > thread
> >> > > > > > > for
> >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As
> >> Till
> >> > > > > > suggested,
> >> > > > > > > we
> >> > > > > > > > >> > would like split the original discussion into two
> >> topics,
> >> > > and
> >> > > > > > start
> >> > > > > > > a
> >> > > > > > > > >> > separate new discussion thread as well as FLIP
> process
> >> for
> >> > > > this
> >> > > > > > one.
> >> > > > > > > > >> >
> >> > > > > > > > >> > Thank you~
> >> > > > > > > > >> >
> >> > > > > > > > >> > Xintong Song
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > [1]
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >> > > > > > > > >> >
> >> > > > > > > > >> > [2]
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Thanks for the feedback, Andrey.

I'll start the vote.

Thank you~

Xintong Song



On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <[hidden email]>
wrote:

> Thanks for the update @Xintong.
> I would be ok with starting the vote.
>
> Best,
> Andrey
>
> On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <[hidden email]>
> wrote:
>
> > The implementation plan [1] is updated, with the following changes:
> >
> >    - Add default slot resource profile to
> >    ResourceManagerGateway#registerTaskExecutor rather than
> #sendSlotReport.
> >    - Swap 'TaskExecutor derive and register with default slot resource
> >    profile' and 'Extend TaskExecutor to support dynamic slot allocation'
> >    - Add step for updating RestAPI / Web UI
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >
> > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > @Till
> > > Thanks for the reminding. I'll add a step for updating the web ui. I'll
> > > try to involve Lining to help us with this step.
> > >
> > > @Andrey
> > > I was thinking that after we define the RM-TM interfaces in step 2, it
> > > would be good to concurrently work on both RM and TM side. But yes, if
> we
> > > finish Step 4 early, then it would make step 6 easier. We can start to
> > have
> > > some IT/E2E tests, with the default slot resource profiles being
> > available.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <[hidden email]>
> > > wrote:
> > >
> > >> @Xintong
> > >>
> > >> Thanks for the feedback.
> > >>
> > >> Just to clarify step 6:
> > >> If the first point is done before step 5 (e.g. as part of 4) then it
> is
> > >> just keeping the info about the default slot in RM's data structure
> > >> associated the TM and no real change in the behaviour.
> > >> When this info is available, I think it can be straightforwardly used
> > >> during step 5 where we get either concrete slot requirement
> > >> or the unknown one (step 6, point 2) which simply grabs some of the
> > >> concrete default ones (btw not clear which one, seems just some
> random?)
> > >>
> > >> For steps 5,7, true, it is not quite clear whether we can avoid some
> > >> split,
> > >> e.g. after step 5 before doing step 7.
> > >> I agree that we should introduce the feature flag if we clearly see
> that
> > >> it
> > >> would be a bigger effort without the flag.
> > >>
> > >> Best,
> > >> Andrey
> > >>
> > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]>
> > >> wrote:
> > >>
> > >> > One thing which was briefly mentioned in the Flip but not in the
> > >> > implementation plan is the update of the web UI. I think it is worth
> > >> > putting an extra item for updating the web UI to properly display
> the
> > >> > resources a TM has still to offer with dynamic slot allocation. I
> > guess
> > >> we
> > >> > need to pull in some JavaScript help in order to implement this
> step.
> > >> >
> > >> > Cheers,
> > >> > Till
> > >> >
> > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <[hidden email]
> >
> > >> > wrote:
> > >> >
> > >> > > Thanks for the comments, Andrey.
> > >> > >
> > >> > > - I agree that instead of ResourceManagerGateway#sendSlotReport,
> we
> > >> > should
> > >> > > add the default slot resource profile to
> > >> > > ResourceManagerGateway#registerTaskExecutor.
> > >> > >
> > >> > > - If I understand correctly, the reason you suggest do default
> slot
> > >> > > resource profile first and then do step 3 in a way that support
> both
> > >> > > TaskExecutorGateway#requestSlot and
> > >> TaskExecutorGateway#requestResource,
> > >> > is
> > >> > > to try to avoid splitting code paths with the feature option? I
> > think
> > >> we
> > >> > > can do that, but I also want to bring it up that this can only
> > reduce
> > >> the
> > >> > > code split by the feature option (which is good) but not eliminate
> > >> it. We
> > >> > > still need the feature option for the fundamental differences,
> e.g.
> > >> > > creating new SlotIDs on allocation vs. allocate to free slots with
> > >> > existing
> > >> > > SlotIDs.
> > >> > >
> > >> > > - I don't really think we can do step 5, 6 and 7 independently.
> > >> Basically
> > >> > > they are all making changes to the same component. We probably can
> > do
> > >> > step
> > >> > > 6 and 7 independently, but I think they both depends on step 5.
> > >> > >
> > >> > > In general, I would say it's good to have as less as possible
> codes
> > >> split
> > >> > > by the feature option, which makes the later clean-up easier. But
> if
> > >> it
> > >> > > cannot be easily done, I would rather not to put too much efforts
> on
> > >> > having
> > >> > > a good abstraction and deduplication between the new code path and
> > the
> > >> > > original one that we are removing soon.
> > >> > >
> > >> > > What do you think?
> > >> > >
> > >> > > Thank you~
> > >> > >
> > >> > > Xintong Song
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> > [hidden email]
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Xintong,
> > >> > > >
> > >> > > > Thanks for sharing the implementation steps. I also think they
> > makes
> > >> > > sense
> > >> > > > with the feature option.
> > >> > > >
> > >> > > > I was wondering if we could order the steps in a way that each
> > >> change
> > >> > > does
> > >> > > > not affect other components too much, always having a working
> > system
> > >> > > > then maybe the feature option does not always need to split the
> > >> code.
> > >> > > Here
> > >> > > > are some thoughts.
> > >> > > >
> > >> > > > - We could do default slot profile firstly and include it into
> the
> > >> TM
> > >> > > > registration. I would suggest to add
> > >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> > sendSlotReport.
> > >> > > >   This way RM knows about it but does not use at this point.
> > (parts
> > >> of
> > >> > > step
> > >> > > > 4,6)
> > >> > > >
> > >> > > > - We could try to do step 3 firstly in a way that it also
> supports
> > >> the
> > >> > > > current way of allocation in TaskExecutorGateway#requestSlot
> with
> > >> the
> > >> > > > default slot profile
> > >> > > >   and sends reports both with available resources and with free
> > >> default
> > >> > > > slots which correspond to the available resources. We can just
> > >> remove
> > >> > > free
> > >> > > > default slots later.
> > >> > > >   The new way of TaskExecutorGateway#requestResource could be
> also
> > >> > > > implemented here but not used yet.
> > >> > > >
> > >> > > > - Then step 5 can use the new
> TaskExecutorGateway#requestResource
> > >> and
> > >> > the
> > >> > > > default slot profile
> > >> > > >
> > >> > > > - Not sure, step 5 and 7 can be implemented independently
> without
> > >> > > > regression of what we have. Maybe if we do step 7 firstly it
> will
> > >> have
> > >> > > only
> > >> > > > default slots firstly and it will simplify step 5 later.
> > >> > > >
> > >> > > > Best,
> > >> > > > Andrey
> > >> > > >
> > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> > [hidden email]
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Thanks for the comments, Till and Wenlong.
> > >> > > > >
> > >> > > > > @Wenlong
> > >> > > > > Regarding slot sharing, the general idea is to request a slot
> > with
> > >> > > > > resources for tasks of the entire slot sharing group. Details
> > can
> > >> be
> > >> > > > found
> > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing
> groups
> > >> and
> > >> > how
> > >> > > > to
> > >> > > > > manage task resources within the shared slots.
> > >> > > > >
> > >> > > > > Thank you~
> > >> > > > >
> > >> > > > > Xintong Song
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > >> > [hidden email]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
> > >> feature!
> > >> > > It
> > >> > > > is
> > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > >> > > > > >
> > >> > > > > > I like the design on the whole. One point may need to be
> > >> included
> > >> > in
> > >> > > > the
> > >> > > > > > proposal:How we deal with slot share group and dynamic slot
> > >> > > allocation?
> > >> > > > > It
> > >> > > > > > can be quite different with dynamic slot allocation.
> > >> > > > > >
> > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> > >> [hidden email]>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Thanks for the update Xintong. From a high level
> perspective
> > >> the
> > >> > > > > > > implementation plan looks good to me.
> > >> > > > > > >
> > >> > > > > > > Cheers,
> > >> > > > > > > Till
> > >> > > > > > >
> > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > >> > > [hidden email]
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Added implementation steps for this FLIP on the wiki
> page
> > >> [1].
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Thank you~
> > >> > > > > > > >
> > >> > > > > > > > Xintong Song
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > [1]
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > >> > > > [hidden email]>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > @Zili
> > >> > > > > > > > >
> > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has
> taken
> > >> the
> > >> > > > number
> > >> > > > > > 55.
> > >> > > > > > > > > There is a round-up number maintained on the FLIP wiki
> > >> page
> > >> > [1]
> > >> > > > > shows
> > >> > > > > > > > > which number should be used for the new FLIP, which
> > >> should be
> > >> > > > > > increased
> > >> > > > > > > > by
> > >> > > > > > > > > whoever takes the number for a new FLIP.
> > >> > > > > > > > >
> > >> > > > > > > > > Thank you~
> > >> > > > > > > > >
> > >> > > > > > > > > Xintong Song
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > [1]
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >> > > > > > > > >
> > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > >> > > [hidden email]>
> > >> > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
> > >> > 下午10:23写道:
> > >> > > > > > > > >>
> > >> > > > > > > > >> > Hi everyone,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > We would like to start a discussion thread on
> > "FLIP-56:
> > >> > > > Dynamic
> > >> > > > > > Slot
> > >> > > > > > > > >> > Allocation" [1]. This is originally part of the
> > >> discussion
> > >> > > > > thread
> > >> > > > > > > for
> > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As
> > >> Till
> > >> > > > > > suggested,
> > >> > > > > > > we
> > >> > > > > > > > >> > would like split the original discussion into two
> > >> topics,
> > >> > > and
> > >> > > > > > start
> > >> > > > > > > a
> > >> > > > > > > > >> > separate new discussion thread as well as FLIP
> > process
> > >> for
> > >> > > > this
> > >> > > > > > one.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Thank you~
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Xintong Song
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > [1]
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >>
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > [2]
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >>
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > >> > > > > > > > >> >
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

tao xiao
Sorry if I ask a question that has been addressed before. please point me
to the reference.

How do we limit the cpu usage to a slot?  Does the thread that executes the
slot get paused when it uses CPU cycles more than it requests?

On Tue, Sep 17, 2019 at 10:23 PM Xintong Song <[hidden email]> wrote:

> Thanks for the feedback, Andrey.
>
> I'll start the vote.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <[hidden email]>
> wrote:
>
> > Thanks for the update @Xintong.
> > I would be ok with starting the vote.
> >
> > Best,
> > Andrey
> >
> > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > The implementation plan [1] is updated, with the following changes:
> > >
> > >    - Add default slot resource profile to
> > >    ResourceManagerGateway#registerTaskExecutor rather than
> > #sendSlotReport.
> > >    - Swap 'TaskExecutor derive and register with default slot resource
> > >    profile' and 'Extend TaskExecutor to support dynamic slot
> allocation'
> > >    - Add step for updating RestAPI / Web UI
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >
> > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > @Till
> > > > Thanks for the reminding. I'll add a step for updating the web ui.
> I'll
> > > > try to involve Lining to help us with this step.
> > > >
> > > > @Andrey
> > > > I was thinking that after we define the RM-TM interfaces in step 2,
> it
> > > > would be good to concurrently work on both RM and TM side. But yes,
> if
> > we
> > > > finish Step 4 early, then it would make step 6 easier. We can start
> to
> > > have
> > > > some IT/E2E tests, with the default slot resource profiles being
> > > available.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <
> [hidden email]>
> > > > wrote:
> > > >
> > > >> @Xintong
> > > >>
> > > >> Thanks for the feedback.
> > > >>
> > > >> Just to clarify step 6:
> > > >> If the first point is done before step 5 (e.g. as part of 4) then it
> > is
> > > >> just keeping the info about the default slot in RM's data structure
> > > >> associated the TM and no real change in the behaviour.
> > > >> When this info is available, I think it can be straightforwardly
> used
> > > >> during step 5 where we get either concrete slot requirement
> > > >> or the unknown one (step 6, point 2) which simply grabs some of the
> > > >> concrete default ones (btw not clear which one, seems just some
> > random?)
> > > >>
> > > >> For steps 5,7, true, it is not quite clear whether we can avoid some
> > > >> split,
> > > >> e.g. after step 5 before doing step 7.
> > > >> I agree that we should introduce the feature flag if we clearly see
> > that
> > > >> it
> > > >> would be a bigger effort without the flag.
> > > >>
> > > >> Best,
> > > >> Andrey
> > > >>
> > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <[hidden email]
> >
> > > >> wrote:
> > > >>
> > > >> > One thing which was briefly mentioned in the Flip but not in the
> > > >> > implementation plan is the update of the web UI. I think it is
> worth
> > > >> > putting an extra item for updating the web UI to properly display
> > the
> > > >> > resources a TM has still to offer with dynamic slot allocation. I
> > > guess
> > > >> we
> > > >> > need to pull in some JavaScript help in order to implement this
> > step.
> > > >> >
> > > >> > Cheers,
> > > >> > Till
> > > >> >
> > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <
> [hidden email]
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Thanks for the comments, Andrey.
> > > >> > >
> > > >> > > - I agree that instead of ResourceManagerGateway#sendSlotReport,
> > we
> > > >> > should
> > > >> > > add the default slot resource profile to
> > > >> > > ResourceManagerGateway#registerTaskExecutor.
> > > >> > >
> > > >> > > - If I understand correctly, the reason you suggest do default
> > slot
> > > >> > > resource profile first and then do step 3 in a way that support
> > both
> > > >> > > TaskExecutorGateway#requestSlot and
> > > >> TaskExecutorGateway#requestResource,
> > > >> > is
> > > >> > > to try to avoid splitting code paths with the feature option? I
> > > think
> > > >> we
> > > >> > > can do that, but I also want to bring it up that this can only
> > > reduce
> > > >> the
> > > >> > > code split by the feature option (which is good) but not
> eliminate
> > > >> it. We
> > > >> > > still need the feature option for the fundamental differences,
> > e.g.
> > > >> > > creating new SlotIDs on allocation vs. allocate to free slots
> with
> > > >> > existing
> > > >> > > SlotIDs.
> > > >> > >
> > > >> > > - I don't really think we can do step 5, 6 and 7 independently.
> > > >> Basically
> > > >> > > they are all making changes to the same component. We probably
> can
> > > do
> > > >> > step
> > > >> > > 6 and 7 independently, but I think they both depends on step 5.
> > > >> > >
> > > >> > > In general, I would say it's good to have as less as possible
> > codes
> > > >> split
> > > >> > > by the feature option, which makes the later clean-up easier.
> But
> > if
> > > >> it
> > > >> > > cannot be easily done, I would rather not to put too much
> efforts
> > on
> > > >> > having
> > > >> > > a good abstraction and deduplication between the new code path
> and
> > > the
> > > >> > > original one that we are removing soon.
> > > >> > >
> > > >> > > What do you think?
> > > >> > >
> > > >> > > Thank you~
> > > >> > >
> > > >> > > Xintong Song
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> > > [hidden email]
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Xintong,
> > > >> > > >
> > > >> > > > Thanks for sharing the implementation steps. I also think they
> > > makes
> > > >> > > sense
> > > >> > > > with the feature option.
> > > >> > > >
> > > >> > > > I was wondering if we could order the steps in a way that each
> > > >> change
> > > >> > > does
> > > >> > > > not affect other components too much, always having a working
> > > system
> > > >> > > > then maybe the feature option does not always need to split
> the
> > > >> code.
> > > >> > > Here
> > > >> > > > are some thoughts.
> > > >> > > >
> > > >> > > > - We could do default slot profile firstly and include it into
> > the
> > > >> TM
> > > >> > > > registration. I would suggest to add
> > > >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> > > sendSlotReport.
> > > >> > > >   This way RM knows about it but does not use at this point.
> > > (parts
> > > >> of
> > > >> > > step
> > > >> > > > 4,6)
> > > >> > > >
> > > >> > > > - We could try to do step 3 firstly in a way that it also
> > supports
> > > >> the
> > > >> > > > current way of allocation in TaskExecutorGateway#requestSlot
> > with
> > > >> the
> > > >> > > > default slot profile
> > > >> > > >   and sends reports both with available resources and with
> free
> > > >> default
> > > >> > > > slots which correspond to the available resources. We can just
> > > >> remove
> > > >> > > free
> > > >> > > > default slots later.
> > > >> > > >   The new way of TaskExecutorGateway#requestResource could be
> > also
> > > >> > > > implemented here but not used yet.
> > > >> > > >
> > > >> > > > - Then step 5 can use the new
> > TaskExecutorGateway#requestResource
> > > >> and
> > > >> > the
> > > >> > > > default slot profile
> > > >> > > >
> > > >> > > > - Not sure, step 5 and 7 can be implemented independently
> > without
> > > >> > > > regression of what we have. Maybe if we do step 7 firstly it
> > will
> > > >> have
> > > >> > > only
> > > >> > > > default slots firstly and it will simplify step 5 later.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Andrey
> > > >> > > >
> > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> > > [hidden email]
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Thanks for the comments, Till and Wenlong.
> > > >> > > > >
> > > >> > > > > @Wenlong
> > > >> > > > > Regarding slot sharing, the general idea is to request a
> slot
> > > with
> > > >> > > > > resources for tasks of the entire slot sharing group.
> Details
> > > can
> > > >> be
> > > >> > > > found
> > > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing
> > groups
> > > >> and
> > > >> > how
> > > >> > > > to
> > > >> > > > > manage task resources within the shared slots.
> > > >> > > > >
> > > >> > > > > Thank you~
> > > >> > > > >
> > > >> > > > > Xintong Song
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > > >> > [hidden email]>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
> > > >> feature!
> > > >> > > It
> > > >> > > > is
> > > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > >> > > > > >
> > > >> > > > > > I like the design on the whole. One point may need to be
> > > >> included
> > > >> > in
> > > >> > > > the
> > > >> > > > > > proposal:How we deal with slot share group and dynamic
> slot
> > > >> > > allocation?
> > > >> > > > > It
> > > >> > > > > > can be quite different with dynamic slot allocation.
> > > >> > > > > >
> > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> > > >> [hidden email]>
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Thanks for the update Xintong. From a high level
> > perspective
> > > >> the
> > > >> > > > > > > implementation plan looks good to me.
> > > >> > > > > > >
> > > >> > > > > > > Cheers,
> > > >> > > > > > > Till
> > > >> > > > > > >
> > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > > >> > > [hidden email]
> > > >> > > > >
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Added implementation steps for this FLIP on the wiki
> > page
> > > >> [1].
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > Thank you~
> > > >> > > > > > > >
> > > >> > > > > > > > Xintong Song
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > [1]
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > >> > > > [hidden email]>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > @Zili
> > > >> > > > > > > > >
> > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has
> > taken
> > > >> the
> > > >> > > > number
> > > >> > > > > > 55.
> > > >> > > > > > > > > There is a round-up number maintained on the FLIP
> wiki
> > > >> page
> > > >> > [1]
> > > >> > > > > shows
> > > >> > > > > > > > > which number should be used for the new FLIP, which
> > > >> should be
> > > >> > > > > > increased
> > > >> > > > > > > > by
> > > >> > > > > > > > > whoever takes the number for a new FLIP.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thank you~
> > > >> > > > > > > > >
> > > >> > > > > > > > > Xintong Song
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > [1]
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > > >> > > [hidden email]>
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Xintong Song <[hidden email]> 于2019年8月19日周一
> > > >> > 下午10:23写道:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> > Hi everyone,
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > We would like to start a discussion thread on
> > > "FLIP-56:
> > > >> > > > Dynamic
> > > >> > > > > > Slot
> > > >> > > > > > > > >> > Allocation" [1]. This is originally part of the
> > > >> discussion
> > > >> > > > > thread
> > > >> > > > > > > for
> > > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2].
> As
> > > >> Till
> > > >> > > > > > suggested,
> > > >> > > > > > > we
> > > >> > > > > > > > >> > would like split the original discussion into two
> > > >> topics,
> > > >> > > and
> > > >> > > > > > start
> > > >> > > > > > > a
> > > >> > > > > > > > >> > separate new discussion thread as well as FLIP
> > > process
> > > >> for
> > > >> > > > this
> > > >> > > > > > one.
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > Thank you~
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > Xintong Song
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > [1]
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >>
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > [2]
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >>
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >>
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>


--
Regards,
Tao
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
@tao

I think we cannot limit the cpu usage of a slot, nor isolate the usages
between slots. We do have cpu limits for the task executor in some
scenarios, such as on yarn with strict cgroup mode.

The purpose of bookkeep and dynamic allocation of cpu cores is to prevent
scheduling tasks with too many computation loads to the task executor,
rather than limit the cpu usage of each slot.

Thank you~

Xintong Song



On Wed, Sep 18, 2019 at 12:18 AM tao xiao <[hidden email]> wrote:

> Sorry if I ask a question that has been addressed before. please point me
> to the reference.
>
> How do we limit the cpu usage to a slot?  Does the thread that executes the
> slot get paused when it uses CPU cycles more than it requests?
>
> On Tue, Sep 17, 2019 at 10:23 PM Xintong Song <[hidden email]>
> wrote:
>
> > Thanks for the feedback, Andrey.
> >
> > I'll start the vote.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <[hidden email]>
> > wrote:
> >
> > > Thanks for the update @Xintong.
> > > I would be ok with starting the vote.
> > >
> > > Best,
> > > Andrey
> > >
> > > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > The implementation plan [1] is updated, with the following changes:
> > > >
> > > >    - Add default slot resource profile to
> > > >    ResourceManagerGateway#registerTaskExecutor rather than
> > > #sendSlotReport.
> > > >    - Swap 'TaskExecutor derive and register with default slot
> resource
> > > >    profile' and 'Extend TaskExecutor to support dynamic slot
> > allocation'
> > > >    - Add step for updating RestAPI / Web UI
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >
> > > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > @Till
> > > > > Thanks for the reminding. I'll add a step for updating the web ui.
> > I'll
> > > > > try to involve Lining to help us with this step.
> > > > >
> > > > > @Andrey
> > > > > I was thinking that after we define the RM-TM interfaces in step 2,
> > it
> > > > > would be good to concurrently work on both RM and TM side. But yes,
> > if
> > > we
> > > > > finish Step 4 early, then it would make step 6 easier. We can start
> > to
> > > > have
> > > > > some IT/E2E tests, with the default slot resource profiles being
> > > > available.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> @Xintong
> > > > >>
> > > > >> Thanks for the feedback.
> > > > >>
> > > > >> Just to clarify step 6:
> > > > >> If the first point is done before step 5 (e.g. as part of 4) then
> it
> > > is
> > > > >> just keeping the info about the default slot in RM's data
> structure
> > > > >> associated the TM and no real change in the behaviour.
> > > > >> When this info is available, I think it can be straightforwardly
> > used
> > > > >> during step 5 where we get either concrete slot requirement
> > > > >> or the unknown one (step 6, point 2) which simply grabs some of
> the
> > > > >> concrete default ones (btw not clear which one, seems just some
> > > random?)
> > > > >>
> > > > >> For steps 5,7, true, it is not quite clear whether we can avoid
> some
> > > > >> split,
> > > > >> e.g. after step 5 before doing step 7.
> > > > >> I agree that we should introduce the feature flag if we clearly
> see
> > > that
> > > > >> it
> > > > >> would be a bigger effort without the flag.
> > > > >>
> > > > >> Best,
> > > > >> Andrey
> > > > >>
> > > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <
> [hidden email]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > One thing which was briefly mentioned in the Flip but not in the
> > > > >> > implementation plan is the update of the web UI. I think it is
> > worth
> > > > >> > putting an extra item for updating the web UI to properly
> display
> > > the
> > > > >> > resources a TM has still to offer with dynamic slot allocation.
> I
> > > > guess
> > > > >> we
> > > > >> > need to pull in some JavaScript help in order to implement this
> > > step.
> > > > >> >
> > > > >> > Cheers,
> > > > >> > Till
> > > > >> >
> > > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <
> > [hidden email]
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Thanks for the comments, Andrey.
> > > > >> > >
> > > > >> > > - I agree that instead of
> ResourceManagerGateway#sendSlotReport,
> > > we
> > > > >> > should
> > > > >> > > add the default slot resource profile to
> > > > >> > > ResourceManagerGateway#registerTaskExecutor.
> > > > >> > >
> > > > >> > > - If I understand correctly, the reason you suggest do default
> > > slot
> > > > >> > > resource profile first and then do step 3 in a way that
> support
> > > both
> > > > >> > > TaskExecutorGateway#requestSlot and
> > > > >> TaskExecutorGateway#requestResource,
> > > > >> > is
> > > > >> > > to try to avoid splitting code paths with the feature option?
> I
> > > > think
> > > > >> we
> > > > >> > > can do that, but I also want to bring it up that this can only
> > > > reduce
> > > > >> the
> > > > >> > > code split by the feature option (which is good) but not
> > eliminate
> > > > >> it. We
> > > > >> > > still need the feature option for the fundamental differences,
> > > e.g.
> > > > >> > > creating new SlotIDs on allocation vs. allocate to free slots
> > with
> > > > >> > existing
> > > > >> > > SlotIDs.
> > > > >> > >
> > > > >> > > - I don't really think we can do step 5, 6 and 7
> independently.
> > > > >> Basically
> > > > >> > > they are all making changes to the same component. We probably
> > can
> > > > do
> > > > >> > step
> > > > >> > > 6 and 7 independently, but I think they both depends on step
> 5.
> > > > >> > >
> > > > >> > > In general, I would say it's good to have as less as possible
> > > codes
> > > > >> split
> > > > >> > > by the feature option, which makes the later clean-up easier.
> > But
> > > if
> > > > >> it
> > > > >> > > cannot be easily done, I would rather not to put too much
> > efforts
> > > on
> > > > >> > having
> > > > >> > > a good abstraction and deduplication between the new code path
> > and
> > > > the
> > > > >> > > original one that we are removing soon.
> > > > >> > >
> > > > >> > > What do you think?
> > > > >> > >
> > > > >> > > Thank you~
> > > > >> > >
> > > > >> > > Xintong Song
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> > > > [hidden email]
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi Xintong,
> > > > >> > > >
> > > > >> > > > Thanks for sharing the implementation steps. I also think
> they
> > > > makes
> > > > >> > > sense
> > > > >> > > > with the feature option.
> > > > >> > > >
> > > > >> > > > I was wondering if we could order the steps in a way that
> each
> > > > >> change
> > > > >> > > does
> > > > >> > > > not affect other components too much, always having a
> working
> > > > system
> > > > >> > > > then maybe the feature option does not always need to split
> > the
> > > > >> code.
> > > > >> > > Here
> > > > >> > > > are some thoughts.
> > > > >> > > >
> > > > >> > > > - We could do default slot profile firstly and include it
> into
> > > the
> > > > >> TM
> > > > >> > > > registration. I would suggest to add
> > > > >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> > > > sendSlotReport.
> > > > >> > > >   This way RM knows about it but does not use at this point.
> > > > (parts
> > > > >> of
> > > > >> > > step
> > > > >> > > > 4,6)
> > > > >> > > >
> > > > >> > > > - We could try to do step 3 firstly in a way that it also
> > > supports
> > > > >> the
> > > > >> > > > current way of allocation in TaskExecutorGateway#requestSlot
> > > with
> > > > >> the
> > > > >> > > > default slot profile
> > > > >> > > >   and sends reports both with available resources and with
> > free
> > > > >> default
> > > > >> > > > slots which correspond to the available resources. We can
> just
> > > > >> remove
> > > > >> > > free
> > > > >> > > > default slots later.
> > > > >> > > >   The new way of TaskExecutorGateway#requestResource could
> be
> > > also
> > > > >> > > > implemented here but not used yet.
> > > > >> > > >
> > > > >> > > > - Then step 5 can use the new
> > > TaskExecutorGateway#requestResource
> > > > >> and
> > > > >> > the
> > > > >> > > > default slot profile
> > > > >> > > >
> > > > >> > > > - Not sure, step 5 and 7 can be implemented independently
> > > without
> > > > >> > > > regression of what we have. Maybe if we do step 7 firstly it
> > > will
> > > > >> have
> > > > >> > > only
> > > > >> > > > default slots firstly and it will simplify step 5 later.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Andrey
> > > > >> > > >
> > > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> > > > [hidden email]
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Thanks for the comments, Till and Wenlong.
> > > > >> > > > >
> > > > >> > > > > @Wenlong
> > > > >> > > > > Regarding slot sharing, the general idea is to request a
> > slot
> > > > with
> > > > >> > > > > resources for tasks of the entire slot sharing group.
> > Details
> > > > can
> > > > >> be
> > > > >> > > > found
> > > > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing
> > > groups
> > > > >> and
> > > > >> > how
> > > > >> > > > to
> > > > >> > > > > manage task resources within the shared slots.
> > > > >> > > > >
> > > > >> > > > > Thank you~
> > > > >> > > > >
> > > > >> > > > > Xintong Song
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > > > >> > [hidden email]>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for
> the
> > > > >> feature!
> > > > >> > > It
> > > > >> > > > is
> > > > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > > >> > > > > >
> > > > >> > > > > > I like the design on the whole. One point may need to be
> > > > >> included
> > > > >> > in
> > > > >> > > > the
> > > > >> > > > > > proposal:How we deal with slot share group and dynamic
> > slot
> > > > >> > > allocation?
> > > > >> > > > > It
> > > > >> > > > > > can be quite different with dynamic slot allocation.
> > > > >> > > > > >
> > > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> > > > >> [hidden email]>
> > > > >> > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Thanks for the update Xintong. From a high level
> > > perspective
> > > > >> the
> > > > >> > > > > > > implementation plan looks good to me.
> > > > >> > > > > > >
> > > > >> > > > > > > Cheers,
> > > > >> > > > > > > Till
> > > > >> > > > > > >
> > > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > > > >> > > [hidden email]
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Added implementation steps for this FLIP on the wiki
> > > page
> > > > >> [1].
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thank you~
> > > > >> > > > > > > >
> > > > >> > > > > > > > Xintong Song
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > [1]
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > > >> > > > [hidden email]>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > @Zili
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has
> > > taken
> > > > >> the
> > > > >> > > > number
> > > > >> > > > > > 55.
> > > > >> > > > > > > > > There is a round-up number maintained on the FLIP
> > wiki
> > > > >> page
> > > > >> > [1]
> > > > >> > > > > shows
> > > > >> > > > > > > > > which number should be used for the new FLIP,
> which
> > > > >> should be
> > > > >> > > > > > increased
> > > > >> > > > > > > > by
> > > > >> > > > > > > > > whoever takes the number for a new FLIP.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thank you~
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Xintong Song
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > [1]
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > > > >> > > [hidden email]>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Xintong Song <[hidden email]>
> 于2019年8月19日周一
> > > > >> > 下午10:23写道:
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> > Hi everyone,
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > We would like to start a discussion thread on
> > > > "FLIP-56:
> > > > >> > > > Dynamic
> > > > >> > > > > > Slot
> > > > >> > > > > > > > >> > Allocation" [1]. This is originally part of the
> > > > >> discussion
> > > > >> > > > > thread
> > > > >> > > > > > > for
> > > > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management"
> [2].
> > As
> > > > >> Till
> > > > >> > > > > > suggested,
> > > > >> > > > > > > we
> > > > >> > > > > > > > >> > would like split the original discussion into
> two
> > > > >> topics,
> > > > >> > > and
> > > > >> > > > > > start
> > > > >> > > > > > > a
> > > > >> > > > > > > > >> > separate new discussion thread as well as FLIP
> > > > process
> > > > >> for
> > > > >> > > > this
> > > > >> > > > > > one.
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > Thank you~
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > Xintong Song
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > [1]
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >>
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > [2]
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >>
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Regards,
> Tao
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

tao xiao
That makes sense. I suggest we add one note to the KIP to avoid confusion

On Wed, Sep 18, 2019 at 9:51 AM Xintong Song <[hidden email]> wrote:

> @tao
>
> I think we cannot limit the cpu usage of a slot, nor isolate the usages
> between slots. We do have cpu limits for the task executor in some
> scenarios, such as on yarn with strict cgroup mode.
>
> The purpose of bookkeep and dynamic allocation of cpu cores is to prevent
> scheduling tasks with too many computation loads to the task executor,
> rather than limit the cpu usage of each slot.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 18, 2019 at 12:18 AM tao xiao <[hidden email]> wrote:
>
> > Sorry if I ask a question that has been addressed before. please point me
> > to the reference.
> >
> > How do we limit the cpu usage to a slot?  Does the thread that executes
> the
> > slot get paused when it uses CPU cycles more than it requests?
> >
> > On Tue, Sep 17, 2019 at 10:23 PM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Thanks for the feedback, Andrey.
> > >
> > > I'll start the vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <[hidden email]
> >
> > > wrote:
> > >
> > > > Thanks for the update @Xintong.
> > > > I would be ok with starting the vote.
> > > >
> > > > Best,
> > > > Andrey
> > > >
> > > > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <[hidden email]>
> > > > wrote:
> > > >
> > > > > The implementation plan [1] is updated, with the following changes:
> > > > >
> > > > >    - Add default slot resource profile to
> > > > >    ResourceManagerGateway#registerTaskExecutor rather than
> > > > #sendSlotReport.
> > > > >    - Swap 'TaskExecutor derive and register with default slot
> > resource
> > > > >    profile' and 'Extend TaskExecutor to support dynamic slot
> > > allocation'
> > > > >    - Add step for updating RestAPI / Web UI
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >
> > > > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <
> [hidden email]
> > >
> > > > > wrote:
> > > > >
> > > > > > @Till
> > > > > > Thanks for the reminding. I'll add a step for updating the web
> ui.
> > > I'll
> > > > > > try to involve Lining to help us with this step.
> > > > > >
> > > > > > @Andrey
> > > > > > I was thinking that after we define the RM-TM interfaces in step
> 2,
> > > it
> > > > > > would be good to concurrently work on both RM and TM side. But
> yes,
> > > if
> > > > we
> > > > > > finish Step 4 early, then it would make step 6 easier. We can
> start
> > > to
> > > > > have
> > > > > > some IT/E2E tests, with the default slot resource profiles being
> > > > > available.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > >> @Xintong
> > > > > >>
> > > > > >> Thanks for the feedback.
> > > > > >>
> > > > > >> Just to clarify step 6:
> > > > > >> If the first point is done before step 5 (e.g. as part of 4)
> then
> > it
> > > > is
> > > > > >> just keeping the info about the default slot in RM's data
> > structure
> > > > > >> associated the TM and no real change in the behaviour.
> > > > > >> When this info is available, I think it can be straightforwardly
> > > used
> > > > > >> during step 5 where we get either concrete slot requirement
> > > > > >> or the unknown one (step 6, point 2) which simply grabs some of
> > the
> > > > > >> concrete default ones (btw not clear which one, seems just some
> > > > random?)
> > > > > >>
> > > > > >> For steps 5,7, true, it is not quite clear whether we can avoid
> > some
> > > > > >> split,
> > > > > >> e.g. after step 5 before doing step 7.
> > > > > >> I agree that we should introduce the feature flag if we clearly
> > see
> > > > that
> > > > > >> it
> > > > > >> would be a bigger effort without the flag.
> > > > > >>
> > > > > >> Best,
> > > > > >> Andrey
> > > > > >>
> > > > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <
> > [hidden email]
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > One thing which was briefly mentioned in the Flip but not in
> the
> > > > > >> > implementation plan is the update of the web UI. I think it is
> > > worth
> > > > > >> > putting an extra item for updating the web UI to properly
> > display
> > > > the
> > > > > >> > resources a TM has still to offer with dynamic slot
> allocation.
> > I
> > > > > guess
> > > > > >> we
> > > > > >> > need to pull in some JavaScript help in order to implement
> this
> > > > step.
> > > > > >> >
> > > > > >> > Cheers,
> > > > > >> > Till
> > > > > >> >
> > > > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <
> > > [hidden email]
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Thanks for the comments, Andrey.
> > > > > >> > >
> > > > > >> > > - I agree that instead of
> > ResourceManagerGateway#sendSlotReport,
> > > > we
> > > > > >> > should
> > > > > >> > > add the default slot resource profile to
> > > > > >> > > ResourceManagerGateway#registerTaskExecutor.
> > > > > >> > >
> > > > > >> > > - If I understand correctly, the reason you suggest do
> default
> > > > slot
> > > > > >> > > resource profile first and then do step 3 in a way that
> > support
> > > > both
> > > > > >> > > TaskExecutorGateway#requestSlot and
> > > > > >> TaskExecutorGateway#requestResource,
> > > > > >> > is
> > > > > >> > > to try to avoid splitting code paths with the feature
> option?
> > I
> > > > > think
> > > > > >> we
> > > > > >> > > can do that, but I also want to bring it up that this can
> only
> > > > > reduce
> > > > > >> the
> > > > > >> > > code split by the feature option (which is good) but not
> > > eliminate
> > > > > >> it. We
> > > > > >> > > still need the feature option for the fundamental
> differences,
> > > > e.g.
> > > > > >> > > creating new SlotIDs on allocation vs. allocate to free
> slots
> > > with
> > > > > >> > existing
> > > > > >> > > SlotIDs.
> > > > > >> > >
> > > > > >> > > - I don't really think we can do step 5, 6 and 7
> > independently.
> > > > > >> Basically
> > > > > >> > > they are all making changes to the same component. We
> probably
> > > can
> > > > > do
> > > > > >> > step
> > > > > >> > > 6 and 7 independently, but I think they both depends on step
> > 5.
> > > > > >> > >
> > > > > >> > > In general, I would say it's good to have as less as
> possible
> > > > codes
> > > > > >> split
> > > > > >> > > by the feature option, which makes the later clean-up
> easier.
> > > But
> > > > if
> > > > > >> it
> > > > > >> > > cannot be easily done, I would rather not to put too much
> > > efforts
> > > > on
> > > > > >> > having
> > > > > >> > > a good abstraction and deduplication between the new code
> path
> > > and
> > > > > the
> > > > > >> > > original one that we are removing soon.
> > > > > >> > >
> > > > > >> > > What do you think?
> > > > > >> > >
> > > > > >> > > Thank you~
> > > > > >> > >
> > > > > >> > > Xintong Song
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> > > > > [hidden email]
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Xintong,
> > > > > >> > > >
> > > > > >> > > > Thanks for sharing the implementation steps. I also think
> > they
> > > > > makes
> > > > > >> > > sense
> > > > > >> > > > with the feature option.
> > > > > >> > > >
> > > > > >> > > > I was wondering if we could order the steps in a way that
> > each
> > > > > >> change
> > > > > >> > > does
> > > > > >> > > > not affect other components too much, always having a
> > working
> > > > > system
> > > > > >> > > > then maybe the feature option does not always need to
> split
> > > the
> > > > > >> code.
> > > > > >> > > Here
> > > > > >> > > > are some thoughts.
> > > > > >> > > >
> > > > > >> > > > - We could do default slot profile firstly and include it
> > into
> > > > the
> > > > > >> TM
> > > > > >> > > > registration. I would suggest to add
> > > > > >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> > > > > sendSlotReport.
> > > > > >> > > >   This way RM knows about it but does not use at this
> point.
> > > > > (parts
> > > > > >> of
> > > > > >> > > step
> > > > > >> > > > 4,6)
> > > > > >> > > >
> > > > > >> > > > - We could try to do step 3 firstly in a way that it also
> > > > supports
> > > > > >> the
> > > > > >> > > > current way of allocation in
> TaskExecutorGateway#requestSlot
> > > > with
> > > > > >> the
> > > > > >> > > > default slot profile
> > > > > >> > > >   and sends reports both with available resources and with
> > > free
> > > > > >> default
> > > > > >> > > > slots which correspond to the available resources. We can
> > just
> > > > > >> remove
> > > > > >> > > free
> > > > > >> > > > default slots later.
> > > > > >> > > >   The new way of TaskExecutorGateway#requestResource could
> > be
> > > > also
> > > > > >> > > > implemented here but not used yet.
> > > > > >> > > >
> > > > > >> > > > - Then step 5 can use the new
> > > > TaskExecutorGateway#requestResource
> > > > > >> and
> > > > > >> > the
> > > > > >> > > > default slot profile
> > > > > >> > > >
> > > > > >> > > > - Not sure, step 5 and 7 can be implemented independently
> > > > without
> > > > > >> > > > regression of what we have. Maybe if we do step 7 firstly
> it
> > > > will
> > > > > >> have
> > > > > >> > > only
> > > > > >> > > > default slots firstly and it will simplify step 5 later.
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Andrey
> > > > > >> > > >
> > > > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> > > > > [hidden email]
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Thanks for the comments, Till and Wenlong.
> > > > > >> > > > >
> > > > > >> > > > > @Wenlong
> > > > > >> > > > > Regarding slot sharing, the general idea is to request a
> > > slot
> > > > > with
> > > > > >> > > > > resources for tasks of the entire slot sharing group.
> > > Details
> > > > > can
> > > > > >> be
> > > > > >> > > > found
> > > > > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing
> > > > groups
> > > > > >> and
> > > > > >> > how
> > > > > >> > > > to
> > > > > >> > > > > manage task resources within the shared slots.
> > > > > >> > > > >
> > > > > >> > > > > Thank you~
> > > > > >> > > > >
> > > > > >> > > > > Xintong Song
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > > > > >> > [hidden email]>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for
> > the
> > > > > >> feature!
> > > > > >> > > It
> > > > > >> > > > is
> > > > > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > > > >> > > > > >
> > > > > >> > > > > > I like the design on the whole. One point may need to
> be
> > > > > >> included
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > > proposal:How we deal with slot share group and dynamic
> > > slot
> > > > > >> > > allocation?
> > > > > >> > > > > It
> > > > > >> > > > > > can be quite different with dynamic slot allocation.
> > > > > >> > > > > >
> > > > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> > > > > >> [hidden email]>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Thanks for the update Xintong. From a high level
> > > > perspective
> > > > > >> the
> > > > > >> > > > > > > implementation plan looks good to me.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Cheers,
> > > > > >> > > > > > > Till
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > > > > >> > > [hidden email]
> > > > > >> > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Added implementation steps for this FLIP on the
> wiki
> > > > page
> > > > > >> [1].
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thank you~
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Xintong Song
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > [1]
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > > > >> > > > [hidden email]>
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > @Zili
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that
> has
> > > > taken
> > > > > >> the
> > > > > >> > > > number
> > > > > >> > > > > > 55.
> > > > > >> > > > > > > > > There is a round-up number maintained on the
> FLIP
> > > wiki
> > > > > >> page
> > > > > >> > [1]
> > > > > >> > > > > shows
> > > > > >> > > > > > > > > which number should be used for the new FLIP,
> > which
> > > > > >> should be
> > > > > >> > > > > > increased
> > > > > >> > > > > > > > by
> > > > > >> > > > > > > > > whoever takes the number for a new FLIP.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thank you~
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Xintong Song
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > [1]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > > > > >> > > [hidden email]>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Xintong Song <[hidden email]>
> > 于2019年8月19日周一
> > > > > >> > 下午10:23写道:
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> > Hi everyone,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > We would like to start a discussion thread on
> > > > > "FLIP-56:
> > > > > >> > > > Dynamic
> > > > > >> > > > > > Slot
> > > > > >> > > > > > > > >> > Allocation" [1]. This is originally part of
> the
> > > > > >> discussion
> > > > > >> > > > > thread
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management"
> > [2].
> > > As
> > > > > >> Till
> > > > > >> > > > > > suggested,
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > >> > would like split the original discussion into
> > two
> > > > > >> topics,
> > > > > >> > > and
> > > > > >> > > > > > start
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > >> > separate new discussion thread as well as
> FLIP
> > > > > process
> > > > > >> for
> > > > > >> > > > this
> > > > > >> > > > > > one.
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thank you~
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Xintong Song
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > [1]
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > [2]
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Tao
> >
>


--
Regards,
Tao
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

shaoxun
In reply to this post by Xintong Song
Hi Xintong, it it a huge plan to carry on. And I get a few questions about
the details.

First, does "specific request" for the slots mean the requesting slot
profile contains detailed information about memory and cpu? And how does a
job manager determine to ask how much memory? Is it done when
 it scheduled the execution graph? Or maybe I miss something here.

Second, will the dynamic allocation create the fragments? For example, if a
task executor has 100mb memory left and maybe other tasks all ask for a
larger memory size.



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song
Hi Shaoxun,

You're right, that supporting end-to-end fine grained resource management
is a huge plan, and FLIP-56 is only one step towards it.

Regarding your questions:

First, does "specific request" for the slots mean the requesting slot
> profile contains detailed information about memory and cpu? Is it done when
> it scheduled the execution graph?


Yes, that means slot requests contains detailed information about how many
cpu/memory is needed.

And how does a job manager determine to ask how much memory?
>

A job graph should contains how many resources each vertex/task needs, and
the JobMaster knows how many resource to request for each slot by adding up
the resources of tasks it plans to deploy in the slot.
Regarding how to initially set the resources in the job graph, there could
be various ways.

   - We can expose interface to let the user decide how many resources each
   operator needs, like what you can do currently in DataStream API. But we
   probably want to change that later for better usability.
   - The compiler can set it automatically, according to the operator type
   and some configured default values for each type.

Anyway, the fine grained resource management is an advanced feature,
targeting expert users who knows well how many resources their jobs/tasks
need. There are also various efforts trying to make the task-level fine
grained resource configuration automatically, which are not in the scope of
this FLIP.

Second, will the dynamic allocation create the fragments?


Yes, it will. You can also look at FLINK-14106, where we try to make the
slot allocation strategy pluggable, so we can have different strategies for
different use cases. E.g., we can have a strategy to start TMs only when
slot requests are received, with the exact resources requested by the
slots. That avoids fragments, at the cost of longer scheduling time due to
starting TMs late, which should be suitable for long running streaming
jobs. We can also have another strategy that starts a configured amount of
TMs before receiving any slot request, with predefined resources. The
benefit is that job gets scheduled immediately, and the cost is potential
fragments, which I believe is more suitable for short batch queries.

Thank you~

Xintong Song



On Tue, Mar 3, 2020 at 3:00 PM shaoxun <[hidden email]> wrote:

> Hi Xintong, it it a huge plan to carry on. And I get a few questions about
> the details.
>
> First, does "specific request" for the slots mean the requesting slot
> profile contains detailed information about memory and cpu? And how does a
> job manager determine to ask how much memory? Is it done when
>  it scheduled the execution graph? Or maybe I miss something here.
>
> Second, will the dynamic allocation create the fragments? For example, if a
> task executor has 100mb memory left and maybe other tasks all ask for a
> larger memory size.
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>