[DISCUSS] FLIP-108: Add GPU support in Flink

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-108: Add GPU support in Flink

Yangze Guo
Hi everyone,

We would like to start a discussion thread on "FLIP-108: Add GPU
support in Flink"[1].

This FLIP mainly discusses the following issues:

- Enable user to configure how many GPUs in a task executor and
forward such requirements to the external resource managers (for
Kubernetes/Yarn/Mesos setups).
- Provide information of available GPU resources to operators.

Key changes proposed in the FLIP are as follows:

- Forward GPU resource requirements to Yarn/Kubernetes.
- Introduce GPUManager as one of the task manager services to discover
and expose GPU resource information to the context of functions.
- Introduce the default script for GPU discovery, in which we provide
the privilege mode to help user to achieve worker-level isolation in
standalone mode.

Please find more details in the FLIP wiki document [1]. Looking forward to
your feedbacks.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink

Best,
Yangze Guo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song
Thanks for drafting the FLIP and kicking off the discussion, Yangze.

Big +1 for this feature. Supporting using of GPU in Flink is significant,
especially for the ML scenarios.
I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
very good first step for Flink's GPU supports.

Thank you~

Xintong Song



On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]> wrote:

> Hi everyone,
>
> We would like to start a discussion thread on "FLIP-108: Add GPU
> support in Flink"[1].
>
> This FLIP mainly discusses the following issues:
>
> - Enable user to configure how many GPUs in a task executor and
> forward such requirements to the external resource managers (for
> Kubernetes/Yarn/Mesos setups).
> - Provide information of available GPU resources to operators.
>
> Key changes proposed in the FLIP are as follows:
>
> - Forward GPU resource requirements to Yarn/Kubernetes.
> - Introduce GPUManager as one of the task manager services to discover
> and expose GPU resource information to the context of functions.
> - Introduce the default script for GPU discovery, in which we provide
> the privilege mode to help user to achieve worker-level isolation in
> standalone mode.
>
> Please find more details in the FLIP wiki document [1]. Looking forward to
> your feedbacks.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
>
> Best,
> Yangze Guo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Becket Qin
Thanks for the FLIP Yangze. GPU resource management support is a must-have
for machine learning use cases. Actually it is one of the mostly asked
question from the users who are interested in using Flink for ML.

Some quick comments / questions to the wiki.
1. The WebUI / REST API should probably also be mentioned in the public
interface section.
2. Is the data structure that holds GPU info also a public API?

Thanks,

Jiangjie (Becket) Qin

On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]> wrote:

> Thanks for drafting the FLIP and kicking off the discussion, Yangze.
>
> Big +1 for this feature. Supporting using of GPU in Flink is significant,
> especially for the ML scenarios.
> I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> very good first step for Flink's GPU supports.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > We would like to start a discussion thread on "FLIP-108: Add GPU
> > support in Flink"[1].
> >
> > This FLIP mainly discusses the following issues:
> >
> > - Enable user to configure how many GPUs in a task executor and
> > forward such requirements to the external resource managers (for
> > Kubernetes/Yarn/Mesos setups).
> > - Provide information of available GPU resources to operators.
> >
> > Key changes proposed in the FLIP are as follows:
> >
> > - Forward GPU resource requirements to Yarn/Kubernetes.
> > - Introduce GPUManager as one of the task manager services to discover
> > and expose GPU resource information to the context of functions.
> > - Introduce the default script for GPU discovery, in which we provide
> > the privilege mode to help user to achieve worker-level isolation in
> > standalone mode.
> >
> > Please find more details in the FLIP wiki document [1]. Looking forward
> to
> > your feedbacks.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> >
> > Best,
> > Yangze Guo
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Stephan Ewen
Thank you for writing this FLIP.

I cannot really give much input into the mechanics of GPU-aware scheduling
and GPU allocation, as I have no experience with that.

One thought I had when reading the proposal is if it makes sense to look at
the "GPU Manager" as an "External Resource Manager", and GPU is one such
resource.
The way I understand the ResourceProfile and ResourceSpec, that is how it
is done there.
It has the advantage that it looks more extensible. Maybe there is a GPU
Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
TPU Resource, etc.

Best,
Stephan


On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]> wrote:

> Thanks for the FLIP Yangze. GPU resource management support is a must-have
> for machine learning use cases. Actually it is one of the mostly asked
> question from the users who are interested in using Flink for ML.
>
> Some quick comments / questions to the wiki.
> 1. The WebUI / REST API should probably also be mentioned in the public
> interface section.
> 2. Is the data structure that holds GPU info also a public API?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]>
> wrote:
>
> > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> >
> > Big +1 for this feature. Supporting using of GPU in Flink is significant,
> > especially for the ML scenarios.
> > I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> > very good first step for Flink's GPU supports.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > support in Flink"[1].
> > >
> > > This FLIP mainly discusses the following issues:
> > >
> > > - Enable user to configure how many GPUs in a task executor and
> > > forward such requirements to the external resource managers (for
> > > Kubernetes/Yarn/Mesos setups).
> > > - Provide information of available GPU resources to operators.
> > >
> > > Key changes proposed in the FLIP are as follows:
> > >
> > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > - Introduce GPUManager as one of the task manager services to discover
> > > and expose GPU resource information to the context of functions.
> > > - Introduce the default script for GPU discovery, in which we provide
> > > the privilege mode to help user to achieve worker-level isolation in
> > > standalone mode.
> > >
> > > Please find more details in the FLIP wiki document [1]. Looking forward
> > to
> > > your feedbacks.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > >
> > > Best,
> > > Yangze Guo
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Becket Qin
That's a good point, Stephan. It makes total sense to generalize the
resource management to support custom resources. Having that allows users
to add new resources by themselves. The general resource management may
involve two different aspects:

1. The custom resource type definition. It is supported by the extended
resources in ResourceProfile and ResourceSpec. This will likely cover
majority of the cases.

2. The custom resource allocation logic, i.e. how to assign the resources
to different tasks, operators, and so on. This may require two levels /
steps:
    a. Subtask level - make sure the subtasks are put into suitable slots.
It is done by the global RM and is not customizable right now.
    b. Operator level - map the exact resource to the operators in TM. e.g.
GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
the global RM does not distinguish individual resources of the same type.
It is true for memory, but not for GPU.

The GPU manager is designed to do 2.b here. So it should discover the
physical GPU information and bind/match them to each operators. Making this
general will fill in the missing piece to support custom resource type
definition. But I'd avoid calling it a "External Resource Manager" to avoid
confusion with RM, maybe something like "Operator Resource Assigner" would
be more accurate. So for each resource type users can have an optional
"Operator Resource Assigner" in the TM. For memory, users don't need this,
but for other extended resources, users may need that.

Personally I think a pluggable "Operator Resource Assigner" is achievable
in this FLIP. But I am also OK with having that in a separate FLIP because
the interface between the "Operator Resource Assigner" and operator may
take a while to settle down if we want to make it generic. But I think our
implementation should take this future work into consideration so that we
don't need to break backwards compatibility once we have that.

Thanks,

Jiangjie (Becket) Qin

On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]> wrote:

> Thank you for writing this FLIP.
>
> I cannot really give much input into the mechanics of GPU-aware scheduling
> and GPU allocation, as I have no experience with that.
>
> One thought I had when reading the proposal is if it makes sense to look at
> the "GPU Manager" as an "External Resource Manager", and GPU is one such
> resource.
> The way I understand the ResourceProfile and ResourceSpec, that is how it
> is done there.
> It has the advantage that it looks more extensible. Maybe there is a GPU
> Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
> TPU Resource, etc.
>
> Best,
> Stephan
>
>
> On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]> wrote:
>
> > Thanks for the FLIP Yangze. GPU resource management support is a
> must-have
> > for machine learning use cases. Actually it is one of the mostly asked
> > question from the users who are interested in using Flink for ML.
> >
> > Some quick comments / questions to the wiki.
> > 1. The WebUI / REST API should probably also be mentioned in the public
> > interface section.
> > 2. Is the data structure that holds GPU info also a public API?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> > >
> > > Big +1 for this feature. Supporting using of GPU in Flink is
> significant,
> > > especially for the ML scenarios.
> > > I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> > > very good first step for Flink's GPU supports.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > support in Flink"[1].
> > > >
> > > > This FLIP mainly discusses the following issues:
> > > >
> > > > - Enable user to configure how many GPUs in a task executor and
> > > > forward such requirements to the external resource managers (for
> > > > Kubernetes/Yarn/Mesos setups).
> > > > - Provide information of available GPU resources to operators.
> > > >
> > > > Key changes proposed in the FLIP are as follows:
> > > >
> > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > - Introduce GPUManager as one of the task manager services to
> discover
> > > > and expose GPU resource information to the context of functions.
> > > > - Introduce the default script for GPU discovery, in which we provide
> > > > the privilege mode to help user to achieve worker-level isolation in
> > > > standalone mode.
> > > >
> > > > Please find more details in the FLIP wiki document [1]. Looking
> forward
> > > to
> > > > your feedbacks.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song
@Stephan, @Becket,

Actually, Yangze, Yang and I also had an offline discussion about making
the "GPU Support" as some general "Extended Resource Support". We believe
supporting extended resources in a general mechanism is definitely a good
and extensible way. The reason we propose this FLIP narrowing its scope
down to GPU alone, is mainly for the concern on extra efforts and review
capacity needed for a general mechanism.

To come up with a well design on a general extended resource management
mechanism, we would need to investigate more on how people use different
kind of resources in practice. For GPU, we learnt such knowledge from the
experts, Becket and his team members. But for FPGA, or other potential
extended resources, we don't have such convenient information sources,
making the investigation requires more efforts, which I tend to think is
not necessary atm.

On the other hand, we also looked into how Spark supports a general "Custom
Resource Scheduling". Assuming we want to have a similar general extended
resource mechanism in the future, we believe that the current GPU support
design can be easily extended, in an incremental way without too many
reworks.

   - The most important part is probably user interfaces. Spark offers
   configuration options to define the amount, discovery script and vendor (on
   k8s) in a per resource type bias [1], which is very similar to what we
   proposed in this FLIP. I think it's not necessary to expose config options
   in the general way atm, since we do not have supports for other resource
   types now. If later we decided to have per resource type config options, we
   can have backwards compatibility on the current proposed options with
   simple key mapping.
   - For the GPU Manager, if later needed we can change it to a "Extended
   Resource Manager" (or whatever it is called). That should be a pure
   component-internal refactoring.
   - For ResourceProfile and ResourceSpec, there are already fields for
   general extended resource. We can of course leverage them when supporting
   fine grained GPU scheduling. That is also not in the scope of this first
   step proposal, and would require FLIP-56 to be finished first.

To summary up, I agree with Becket that have a separate FLIP for the
general extended resource mechanism, and keep it in mind when discussing
and implementing the current one.

Thank you~

Xintong Song


[1]
https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview

On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]> wrote:

> That's a good point, Stephan. It makes total sense to generalize the
> resource management to support custom resources. Having that allows users
> to add new resources by themselves. The general resource management may
> involve two different aspects:
>
> 1. The custom resource type definition. It is supported by the extended
> resources in ResourceProfile and ResourceSpec. This will likely cover
> majority of the cases.
>
> 2. The custom resource allocation logic, i.e. how to assign the resources
> to different tasks, operators, and so on. This may require two levels /
> steps:
>     a. Subtask level - make sure the subtasks are put into suitable slots.
> It is done by the global RM and is not customizable right now.
>     b. Operator level - map the exact resource to the operators in TM. e.g.
> GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> the global RM does not distinguish individual resources of the same type.
> It is true for memory, but not for GPU.
>
> The GPU manager is designed to do 2.b here. So it should discover the
> physical GPU information and bind/match them to each operators. Making this
> general will fill in the missing piece to support custom resource type
> definition. But I'd avoid calling it a "External Resource Manager" to avoid
> confusion with RM, maybe something like "Operator Resource Assigner" would
> be more accurate. So for each resource type users can have an optional
> "Operator Resource Assigner" in the TM. For memory, users don't need this,
> but for other extended resources, users may need that.
>
> Personally I think a pluggable "Operator Resource Assigner" is achievable
> in this FLIP. But I am also OK with having that in a separate FLIP because
> the interface between the "Operator Resource Assigner" and operator may
> take a while to settle down if we want to make it generic. But I think our
> implementation should take this future work into consideration so that we
> don't need to break backwards compatibility once we have that.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]> wrote:
>
> > Thank you for writing this FLIP.
> >
> > I cannot really give much input into the mechanics of GPU-aware
> scheduling
> > and GPU allocation, as I have no experience with that.
> >
> > One thought I had when reading the proposal is if it makes sense to look
> at
> > the "GPU Manager" as an "External Resource Manager", and GPU is one such
> > resource.
> > The way I understand the ResourceProfile and ResourceSpec, that is how it
> > is done there.
> > It has the advantage that it looks more extensible. Maybe there is a GPU
> > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
> > TPU Resource, etc.
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]> wrote:
> >
> > > Thanks for the FLIP Yangze. GPU resource management support is a
> > must-have
> > > for machine learning use cases. Actually it is one of the mostly asked
> > > question from the users who are interested in using Flink for ML.
> > >
> > > Some quick comments / questions to the wiki.
> > > 1. The WebUI / REST API should probably also be mentioned in the public
> > > interface section.
> > > 2. Is the data structure that holds GPU info also a public API?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> > > >
> > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > significant,
> > > > especially for the ML scenarios.
> > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> it's a
> > > > very good first step for Flink's GPU supports.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]>
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > support in Flink"[1].
> > > > >
> > > > > This FLIP mainly discusses the following issues:
> > > > >
> > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > forward such requirements to the external resource managers (for
> > > > > Kubernetes/Yarn/Mesos setups).
> > > > > - Provide information of available GPU resources to operators.
> > > > >
> > > > > Key changes proposed in the FLIP are as follows:
> > > > >
> > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > - Introduce GPUManager as one of the task manager services to
> > discover
> > > > > and expose GPU resource information to the context of functions.
> > > > > - Introduce the default script for GPU discovery, in which we
> provide
> > > > > the privilege mode to help user to achieve worker-level isolation
> in
> > > > > standalone mode.
> > > > >
> > > > > Please find more details in the FLIP wiki document [1]. Looking
> > forward
> > > > to
> > > > > your feedbacks.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xingbo Huang
Thanks a lot for the FLIP, Yangze.

There is no doubt that GPU resource management support will greatly
facilitate the development of AI-related applications by PyFlink users.

I have only one comment about this wiki:

Regarding the names of several GPU configurations, I think it is better to
delete the resource field makes it consistent with the names of other
resource-related configurations in TaskManagerOption.

e.g. taskmanager.resource.gpu.discovery-script.path ->
taskmanager.gpu.discovery-script.path

Best,

Xingbo


Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:

> @Stephan, @Becket,
>
> Actually, Yangze, Yang and I also had an offline discussion about making
> the "GPU Support" as some general "Extended Resource Support". We believe
> supporting extended resources in a general mechanism is definitely a good
> and extensible way. The reason we propose this FLIP narrowing its scope
> down to GPU alone, is mainly for the concern on extra efforts and review
> capacity needed for a general mechanism.
>
> To come up with a well design on a general extended resource management
> mechanism, we would need to investigate more on how people use different
> kind of resources in practice. For GPU, we learnt such knowledge from the
> experts, Becket and his team members. But for FPGA, or other potential
> extended resources, we don't have such convenient information sources,
> making the investigation requires more efforts, which I tend to think is
> not necessary atm.
>
> On the other hand, we also looked into how Spark supports a general "Custom
> Resource Scheduling". Assuming we want to have a similar general extended
> resource mechanism in the future, we believe that the current GPU support
> design can be easily extended, in an incremental way without too many
> reworks.
>
>    - The most important part is probably user interfaces. Spark offers
>    configuration options to define the amount, discovery script and vendor
> (on
>    k8s) in a per resource type bias [1], which is very similar to what we
>    proposed in this FLIP. I think it's not necessary to expose config
> options
>    in the general way atm, since we do not have supports for other resource
>    types now. If later we decided to have per resource type config
> options, we
>    can have backwards compatibility on the current proposed options with
>    simple key mapping.
>    - For the GPU Manager, if later needed we can change it to a "Extended
>    Resource Manager" (or whatever it is called). That should be a pure
>    component-internal refactoring.
>    - For ResourceProfile and ResourceSpec, there are already fields for
>    general extended resource. We can of course leverage them when
> supporting
>    fine grained GPU scheduling. That is also not in the scope of this first
>    step proposal, and would require FLIP-56 to be finished first.
>
> To summary up, I agree with Becket that have a separate FLIP for the
> general extended resource mechanism, and keep it in mind when discussing
> and implementing the current one.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
>
> On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]> wrote:
>
> > That's a good point, Stephan. It makes total sense to generalize the
> > resource management to support custom resources. Having that allows users
> > to add new resources by themselves. The general resource management may
> > involve two different aspects:
> >
> > 1. The custom resource type definition. It is supported by the extended
> > resources in ResourceProfile and ResourceSpec. This will likely cover
> > majority of the cases.
> >
> > 2. The custom resource allocation logic, i.e. how to assign the resources
> > to different tasks, operators, and so on. This may require two levels /
> > steps:
> >     a. Subtask level - make sure the subtasks are put into suitable
> slots.
> > It is done by the global RM and is not customizable right now.
> >     b. Operator level - map the exact resource to the operators in TM.
> e.g.
> > GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> > the global RM does not distinguish individual resources of the same type.
> > It is true for memory, but not for GPU.
> >
> > The GPU manager is designed to do 2.b here. So it should discover the
> > physical GPU information and bind/match them to each operators. Making
> this
> > general will fill in the missing piece to support custom resource type
> > definition. But I'd avoid calling it a "External Resource Manager" to
> avoid
> > confusion with RM, maybe something like "Operator Resource Assigner"
> would
> > be more accurate. So for each resource type users can have an optional
> > "Operator Resource Assigner" in the TM. For memory, users don't need
> this,
> > but for other extended resources, users may need that.
> >
> > Personally I think a pluggable "Operator Resource Assigner" is achievable
> > in this FLIP. But I am also OK with having that in a separate FLIP
> because
> > the interface between the "Operator Resource Assigner" and operator may
> > take a while to settle down if we want to make it generic. But I think
> our
> > implementation should take this future work into consideration so that we
> > don't need to break backwards compatibility once we have that.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]> wrote:
> >
> > > Thank you for writing this FLIP.
> > >
> > > I cannot really give much input into the mechanics of GPU-aware
> > scheduling
> > > and GPU allocation, as I have no experience with that.
> > >
> > > One thought I had when reading the proposal is if it makes sense to
> look
> > at
> > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> such
> > > resource.
> > > The way I understand the ResourceProfile and ResourceSpec, that is how
> it
> > > is done there.
> > > It has the advantage that it looks more extensible. Maybe there is a
> GPU
> > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> Alibaba
> > > TPU Resource, etc.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]>
> wrote:
> > >
> > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > must-have
> > > > for machine learning use cases. Actually it is one of the mostly
> asked
> > > > question from the users who are interested in using Flink for ML.
> > > >
> > > > Some quick comments / questions to the wiki.
> > > > 1. The WebUI / REST API should probably also be mentioned in the
> public
> > > > interface section.
> > > > 2. Is the data structure that holds GPU info also a public API?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]>
> > > > wrote:
> > > >
> > > > > Thanks for drafting the FLIP and kicking off the discussion,
> Yangze.
> > > > >
> > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > significant,
> > > > > especially for the ML scenarios.
> > > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> > it's a
> > > > > very good first step for Flink's GPU supports.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > > support in Flink"[1].
> > > > > >
> > > > > > This FLIP mainly discusses the following issues:
> > > > > >
> > > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > > forward such requirements to the external resource managers (for
> > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > - Provide information of available GPU resources to operators.
> > > > > >
> > > > > > Key changes proposed in the FLIP are as follows:
> > > > > >
> > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > - Introduce GPUManager as one of the task manager services to
> > > discover
> > > > > > and expose GPU resource information to the context of functions.
> > > > > > - Introduce the default script for GPU discovery, in which we
> > provide
> > > > > > the privilege mode to help user to achieve worker-level isolation
> > in
> > > > > > standalone mode.
> > > > > >
> > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > forward
> > > > > to
> > > > > > your feedbacks.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Yangze Guo
Thanks for all the feedbacks.

@Becket
Regarding the WebUI and GPUInfo, you're right, I'll add them to the
Public API section.


@Stephan @Becket
Regarding the general extended resource mechanism, I second Xintong's
suggestion.
- It's better to leverage ResourceProfile and ResourceSpec after we
supporting fine-grained GPU scheduling. As a first step proposal, I
prefer to not include it in the scope of this FLIP.
- Regarding the "Extended Resource Manager", if I understand
correctly, it just a code refactoring atm, we could extract the
open/close/allocateExtendResources of GPUManager to that interface. If
that is the case, +1 to do it during implementation.

@Xingbo
As Xintong said, we looked into how Spark supports a general "Custom
Resource Scheduling" before and decided to introduce a common resource
configuration schema(taskmanager.resource.{resourceName}.amount/discovery-script)
to make it more extensible. I think the "resource" is a proper level
to contain all the configs of extended resources.

Best,
Yangze Guo

On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]> wrote:

>
> Thanks a lot for the FLIP, Yangze.
>
> There is no doubt that GPU resource management support will greatly
> facilitate the development of AI-related applications by PyFlink users.
>
> I have only one comment about this wiki:
>
> Regarding the names of several GPU configurations, I think it is better to
> delete the resource field makes it consistent with the names of other
> resource-related configurations in TaskManagerOption.
>
> e.g. taskmanager.resource.gpu.discovery-script.path ->
> taskmanager.gpu.discovery-script.path
>
> Best,
>
> Xingbo
>
>
> Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
>
> > @Stephan, @Becket,
> >
> > Actually, Yangze, Yang and I also had an offline discussion about making
> > the "GPU Support" as some general "Extended Resource Support". We believe
> > supporting extended resources in a general mechanism is definitely a good
> > and extensible way. The reason we propose this FLIP narrowing its scope
> > down to GPU alone, is mainly for the concern on extra efforts and review
> > capacity needed for a general mechanism.
> >
> > To come up with a well design on a general extended resource management
> > mechanism, we would need to investigate more on how people use different
> > kind of resources in practice. For GPU, we learnt such knowledge from the
> > experts, Becket and his team members. But for FPGA, or other potential
> > extended resources, we don't have such convenient information sources,
> > making the investigation requires more efforts, which I tend to think is
> > not necessary atm.
> >
> > On the other hand, we also looked into how Spark supports a general "Custom
> > Resource Scheduling". Assuming we want to have a similar general extended
> > resource mechanism in the future, we believe that the current GPU support
> > design can be easily extended, in an incremental way without too many
> > reworks.
> >
> >    - The most important part is probably user interfaces. Spark offers
> >    configuration options to define the amount, discovery script and vendor
> > (on
> >    k8s) in a per resource type bias [1], which is very similar to what we
> >    proposed in this FLIP. I think it's not necessary to expose config
> > options
> >    in the general way atm, since we do not have supports for other resource
> >    types now. If later we decided to have per resource type config
> > options, we
> >    can have backwards compatibility on the current proposed options with
> >    simple key mapping.
> >    - For the GPU Manager, if later needed we can change it to a "Extended
> >    Resource Manager" (or whatever it is called). That should be a pure
> >    component-internal refactoring.
> >    - For ResourceProfile and ResourceSpec, there are already fields for
> >    general extended resource. We can of course leverage them when
> > supporting
> >    fine grained GPU scheduling. That is also not in the scope of this first
> >    step proposal, and would require FLIP-56 to be finished first.
> >
> > To summary up, I agree with Becket that have a separate FLIP for the
> > general extended resource mechanism, and keep it in mind when discussing
> > and implementing the current one.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> >
> > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]> wrote:
> >
> > > That's a good point, Stephan. It makes total sense to generalize the
> > > resource management to support custom resources. Having that allows users
> > > to add new resources by themselves. The general resource management may
> > > involve two different aspects:
> > >
> > > 1. The custom resource type definition. It is supported by the extended
> > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > majority of the cases.
> > >
> > > 2. The custom resource allocation logic, i.e. how to assign the resources
> > > to different tasks, operators, and so on. This may require two levels /
> > > steps:
> > >     a. Subtask level - make sure the subtasks are put into suitable
> > slots.
> > > It is done by the global RM and is not customizable right now.
> > >     b. Operator level - map the exact resource to the operators in TM.
> > e.g.
> > > GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> > > the global RM does not distinguish individual resources of the same type.
> > > It is true for memory, but not for GPU.
> > >
> > > The GPU manager is designed to do 2.b here. So it should discover the
> > > physical GPU information and bind/match them to each operators. Making
> > this
> > > general will fill in the missing piece to support custom resource type
> > > definition. But I'd avoid calling it a "External Resource Manager" to
> > avoid
> > > confusion with RM, maybe something like "Operator Resource Assigner"
> > would
> > > be more accurate. So for each resource type users can have an optional
> > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > this,
> > > but for other extended resources, users may need that.
> > >
> > > Personally I think a pluggable "Operator Resource Assigner" is achievable
> > > in this FLIP. But I am also OK with having that in a separate FLIP
> > because
> > > the interface between the "Operator Resource Assigner" and operator may
> > > take a while to settle down if we want to make it generic. But I think
> > our
> > > implementation should take this future work into consideration so that we
> > > don't need to break backwards compatibility once we have that.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]> wrote:
> > >
> > > > Thank you for writing this FLIP.
> > > >
> > > > I cannot really give much input into the mechanics of GPU-aware
> > > scheduling
> > > > and GPU allocation, as I have no experience with that.
> > > >
> > > > One thought I had when reading the proposal is if it makes sense to
> > look
> > > at
> > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > such
> > > > resource.
> > > > The way I understand the ResourceProfile and ResourceSpec, that is how
> > it
> > > > is done there.
> > > > It has the advantage that it looks more extensible. Maybe there is a
> > GPU
> > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > Alibaba
> > > > TPU Resource, etc.
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]>
> > wrote:
> > > >
> > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > must-have
> > > > > for machine learning use cases. Actually it is one of the mostly
> > asked
> > > > > question from the users who are interested in using Flink for ML.
> > > > >
> > > > > Some quick comments / questions to the wiki.
> > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > public
> > > > > interface section.
> > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <[hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > Yangze.
> > > > > >
> > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > significant,
> > > > > > especially for the ML scenarios.
> > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> > > it's a
> > > > > > very good first step for Flink's GPU supports.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > > > support in Flink"[1].
> > > > > > >
> > > > > > > This FLIP mainly discusses the following issues:
> > > > > > >
> > > > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > > > forward such requirements to the external resource managers (for
> > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > - Provide information of available GPU resources to operators.
> > > > > > >
> > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > >
> > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > discover
> > > > > > > and expose GPU resource information to the context of functions.
> > > > > > > - Introduce the default script for GPU discovery, in which we
> > > provide
> > > > > > > the privilege mode to help user to achieve worker-level isolation
> > > in
> > > > > > > standalone mode.
> > > > > > >
> > > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > > forward
> > > > > > to
> > > > > > > your feedbacks.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Stephan Ewen
It sounds fine to initially start with GPU specific support and think about
generalizing this once we better understand the space.

About the implementation suggested in FLIP-108:
  - Can we somehow keep this out of the TaskManager services? Anything we
have to pull through all layers of the TM makes the TM components yet more
complex and harder to maintain.

  - What parts need information about this?
    -> do the slot profiles need information about the GPU?
    -> Can the GPU Manager be a "self contained" thing that simply takes
the configuration, and then abstracts everything internally? Operators can
access it via "GPUManager.get()" or so?



On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:

> Thanks for all the feedbacks.
>
> @Becket
> Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> Public API section.
>
>
> @Stephan @Becket
> Regarding the general extended resource mechanism, I second Xintong's
> suggestion.
> - It's better to leverage ResourceProfile and ResourceSpec after we
> supporting fine-grained GPU scheduling. As a first step proposal, I
> prefer to not include it in the scope of this FLIP.
> - Regarding the "Extended Resource Manager", if I understand
> correctly, it just a code refactoring atm, we could extract the
> open/close/allocateExtendResources of GPUManager to that interface. If
> that is the case, +1 to do it during implementation.
>
> @Xingbo
> As Xintong said, we looked into how Spark supports a general "Custom
> Resource Scheduling" before and decided to introduce a common resource
> configuration
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> to make it more extensible. I think the "resource" is a proper level
> to contain all the configs of extended resources.
>
> Best,
> Yangze Guo
>
> On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]> wrote:
> >
> > Thanks a lot for the FLIP, Yangze.
> >
> > There is no doubt that GPU resource management support will greatly
> > facilitate the development of AI-related applications by PyFlink users.
> >
> > I have only one comment about this wiki:
> >
> > Regarding the names of several GPU configurations, I think it is better
> to
> > delete the resource field makes it consistent with the names of other
> > resource-related configurations in TaskManagerOption.
> >
> > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > taskmanager.gpu.discovery-script.path
> >
> > Best,
> >
> > Xingbo
> >
> >
> > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> >
> > > @Stephan, @Becket,
> > >
> > > Actually, Yangze, Yang and I also had an offline discussion about
> making
> > > the "GPU Support" as some general "Extended Resource Support". We
> believe
> > > supporting extended resources in a general mechanism is definitely a
> good
> > > and extensible way. The reason we propose this FLIP narrowing its scope
> > > down to GPU alone, is mainly for the concern on extra efforts and
> review
> > > capacity needed for a general mechanism.
> > >
> > > To come up with a well design on a general extended resource management
> > > mechanism, we would need to investigate more on how people use
> different
> > > kind of resources in practice. For GPU, we learnt such knowledge from
> the
> > > experts, Becket and his team members. But for FPGA, or other potential
> > > extended resources, we don't have such convenient information sources,
> > > making the investigation requires more efforts, which I tend to think
> is
> > > not necessary atm.
> > >
> > > On the other hand, we also looked into how Spark supports a general
> "Custom
> > > Resource Scheduling". Assuming we want to have a similar general
> extended
> > > resource mechanism in the future, we believe that the current GPU
> support
> > > design can be easily extended, in an incremental way without too many
> > > reworks.
> > >
> > >    - The most important part is probably user interfaces. Spark offers
> > >    configuration options to define the amount, discovery script and
> vendor
> > > (on
> > >    k8s) in a per resource type bias [1], which is very similar to what
> we
> > >    proposed in this FLIP. I think it's not necessary to expose config
> > > options
> > >    in the general way atm, since we do not have supports for other
> resource
> > >    types now. If later we decided to have per resource type config
> > > options, we
> > >    can have backwards compatibility on the current proposed options
> with
> > >    simple key mapping.
> > >    - For the GPU Manager, if later needed we can change it to a
> "Extended
> > >    Resource Manager" (or whatever it is called). That should be a pure
> > >    component-internal refactoring.
> > >    - For ResourceProfile and ResourceSpec, there are already fields for
> > >    general extended resource. We can of course leverage them when
> > > supporting
> > >    fine grained GPU scheduling. That is also not in the scope of this
> first
> > >    step proposal, and would require FLIP-56 to be finished first.
> > >
> > > To summary up, I agree with Becket that have a separate FLIP for the
> > > general extended resource mechanism, and keep it in mind when
> discussing
> > > and implementing the current one.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > >
> > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> wrote:
> > >
> > > > That's a good point, Stephan. It makes total sense to generalize the
> > > > resource management to support custom resources. Having that allows
> users
> > > > to add new resources by themselves. The general resource management
> may
> > > > involve two different aspects:
> > > >
> > > > 1. The custom resource type definition. It is supported by the
> extended
> > > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > > majority of the cases.
> > > >
> > > > 2. The custom resource allocation logic, i.e. how to assign the
> resources
> > > > to different tasks, operators, and so on. This may require two
> levels /
> > > > steps:
> > > >     a. Subtask level - make sure the subtasks are put into suitable
> > > slots.
> > > > It is done by the global RM and is not customizable right now.
> > > >     b. Operator level - map the exact resource to the operators in
> TM.
> > > e.g.
> > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> assuming
> > > > the global RM does not distinguish individual resources of the same
> type.
> > > > It is true for memory, but not for GPU.
> > > >
> > > > The GPU manager is designed to do 2.b here. So it should discover the
> > > > physical GPU information and bind/match them to each operators.
> Making
> > > this
> > > > general will fill in the missing piece to support custom resource
> type
> > > > definition. But I'd avoid calling it a "External Resource Manager" to
> > > avoid
> > > > confusion with RM, maybe something like "Operator Resource Assigner"
> > > would
> > > > be more accurate. So for each resource type users can have an
> optional
> > > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > > this,
> > > > but for other extended resources, users may need that.
> > > >
> > > > Personally I think a pluggable "Operator Resource Assigner" is
> achievable
> > > > in this FLIP. But I am also OK with having that in a separate FLIP
> > > because
> > > > the interface between the "Operator Resource Assigner" and operator
> may
> > > > take a while to settle down if we want to make it generic. But I
> think
> > > our
> > > > implementation should take this future work into consideration so
> that we
> > > > don't need to break backwards compatibility once we have that.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> wrote:
> > > >
> > > > > Thank you for writing this FLIP.
> > > > >
> > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > scheduling
> > > > > and GPU allocation, as I have no experience with that.
> > > > >
> > > > > One thought I had when reading the proposal is if it makes sense to
> > > look
> > > > at
> > > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > > such
> > > > > resource.
> > > > > The way I understand the ResourceProfile and ResourceSpec, that is
> how
> > > it
> > > > > is done there.
> > > > > It has the advantage that it looks more extensible. Maybe there is
> a
> > > GPU
> > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > > Alibaba
> > > > > TPU Resource, etc.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]>
> > > wrote:
> > > > >
> > > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > > must-have
> > > > > > for machine learning use cases. Actually it is one of the mostly
> > > asked
> > > > > > question from the users who are interested in using Flink for ML.
> > > > > >
> > > > > > Some quick comments / questions to the wiki.
> > > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > > public
> > > > > > interface section.
> > > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > > Yangze.
> > > > > > >
> > > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > > significant,
> > > > > > > especially for the ML scenarios.
> > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> think
> > > > it's a
> > > > > > > very good first step for Flink's GPU supports.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We would like to start a discussion thread on "FLIP-108: Add
> GPU
> > > > > > > > support in Flink"[1].
> > > > > > > >
> > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > >
> > > > > > > > - Enable user to configure how many GPUs in a task executor
> and
> > > > > > > > forward such requirements to the external resource managers
> (for
> > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > - Provide information of available GPU resources to
> operators.
> > > > > > > >
> > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > >
> > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > > discover
> > > > > > > > and expose GPU resource information to the context of
> functions.
> > > > > > > > - Introduce the default script for GPU discovery, in which we
> > > > provide
> > > > > > > > the privilege mode to help user to achieve worker-level
> isolation
> > > > in
> > > > > > > > standalone mode.
> > > > > > > >
> > > > > > > > Please find more details in the FLIP wiki document [1].
> Looking
> > > > > forward
> > > > > > > to
> > > > > > > > your feedbacks.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Yangze Guo
Thanks for the feedback, Stephan.

> Can we somehow keep this out of the TaskManager services
I fear that we could not. IMO, the GPUManager(or
ExternalServicesManagers in future) is conceptually one of the task
manager services, just like MemoryManager before 1.10.
- It maintains/holds the GPU resource at TM level and all of the
operators allocate the GPU resources from it. So, it should be
exclusive to a single TaskExecutor.
- We could add a collection called ExternalResourceManagers to hold
all managers of other external resources in the future.

> What parts need information about this?
In this FLIP, operators need the information. Thus, we expose GPU
information to the RuntimeContext/FunctionContext. The slot profile is
not aware of GPU resources as GPU is TM level resource now.

> Can the GPU Manager be a "self contained" thing that simply takes the configuration, and then abstracts everything internally?
Yes, we just pass the path/args of the discover script and how many
GPUs per TM to it. It takes the responsibility to get the GPU
information and expose them to the RuntimeContext/FunctionContext of
Operators. Meanwhile, we'd better not allow operators to directly
access GPUManager, it should get what they want from Context. We could
then decouple the interface/implementation of GPUManager and Public
API.

Best,
Yangze Guo

On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:

>
> It sounds fine to initially start with GPU specific support and think about
> generalizing this once we better understand the space.
>
> About the implementation suggested in FLIP-108:
>   - Can we somehow keep this out of the TaskManager services? Anything we
> have to pull through all layers of the TM makes the TM components yet more
> complex and harder to maintain.
>
>   - What parts need information about this?
>     -> do the slot profiles need information about the GPU?
>     -> Can the GPU Manager be a "self contained" thing that simply takes
> the configuration, and then abstracts everything internally? Operators can
> access it via "GPUManager.get()" or so?
>
>
>
> On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:
>
> > Thanks for all the feedbacks.
> >
> > @Becket
> > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > Public API section.
> >
> >
> > @Stephan @Becket
> > Regarding the general extended resource mechanism, I second Xintong's
> > suggestion.
> > - It's better to leverage ResourceProfile and ResourceSpec after we
> > supporting fine-grained GPU scheduling. As a first step proposal, I
> > prefer to not include it in the scope of this FLIP.
> > - Regarding the "Extended Resource Manager", if I understand
> > correctly, it just a code refactoring atm, we could extract the
> > open/close/allocateExtendResources of GPUManager to that interface. If
> > that is the case, +1 to do it during implementation.
> >
> > @Xingbo
> > As Xintong said, we looked into how Spark supports a general "Custom
> > Resource Scheduling" before and decided to introduce a common resource
> > configuration
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > to make it more extensible. I think the "resource" is a proper level
> > to contain all the configs of extended resources.
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]> wrote:
> > >
> > > Thanks a lot for the FLIP, Yangze.
> > >
> > > There is no doubt that GPU resource management support will greatly
> > > facilitate the development of AI-related applications by PyFlink users.
> > >
> > > I have only one comment about this wiki:
> > >
> > > Regarding the names of several GPU configurations, I think it is better
> > to
> > > delete the resource field makes it consistent with the names of other
> > > resource-related configurations in TaskManagerOption.
> > >
> > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > taskmanager.gpu.discovery-script.path
> > >
> > > Best,
> > >
> > > Xingbo
> > >
> > >
> > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > >
> > > > @Stephan, @Becket,
> > > >
> > > > Actually, Yangze, Yang and I also had an offline discussion about
> > making
> > > > the "GPU Support" as some general "Extended Resource Support". We
> > believe
> > > > supporting extended resources in a general mechanism is definitely a
> > good
> > > > and extensible way. The reason we propose this FLIP narrowing its scope
> > > > down to GPU alone, is mainly for the concern on extra efforts and
> > review
> > > > capacity needed for a general mechanism.
> > > >
> > > > To come up with a well design on a general extended resource management
> > > > mechanism, we would need to investigate more on how people use
> > different
> > > > kind of resources in practice. For GPU, we learnt such knowledge from
> > the
> > > > experts, Becket and his team members. But for FPGA, or other potential
> > > > extended resources, we don't have such convenient information sources,
> > > > making the investigation requires more efforts, which I tend to think
> > is
> > > > not necessary atm.
> > > >
> > > > On the other hand, we also looked into how Spark supports a general
> > "Custom
> > > > Resource Scheduling". Assuming we want to have a similar general
> > extended
> > > > resource mechanism in the future, we believe that the current GPU
> > support
> > > > design can be easily extended, in an incremental way without too many
> > > > reworks.
> > > >
> > > >    - The most important part is probably user interfaces. Spark offers
> > > >    configuration options to define the amount, discovery script and
> > vendor
> > > > (on
> > > >    k8s) in a per resource type bias [1], which is very similar to what
> > we
> > > >    proposed in this FLIP. I think it's not necessary to expose config
> > > > options
> > > >    in the general way atm, since we do not have supports for other
> > resource
> > > >    types now. If later we decided to have per resource type config
> > > > options, we
> > > >    can have backwards compatibility on the current proposed options
> > with
> > > >    simple key mapping.
> > > >    - For the GPU Manager, if later needed we can change it to a
> > "Extended
> > > >    Resource Manager" (or whatever it is called). That should be a pure
> > > >    component-internal refactoring.
> > > >    - For ResourceProfile and ResourceSpec, there are already fields for
> > > >    general extended resource. We can of course leverage them when
> > > > supporting
> > > >    fine grained GPU scheduling. That is also not in the scope of this
> > first
> > > >    step proposal, and would require FLIP-56 to be finished first.
> > > >
> > > > To summary up, I agree with Becket that have a separate FLIP for the
> > > > general extended resource mechanism, and keep it in mind when
> > discussing
> > > > and implementing the current one.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > >
> > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> > wrote:
> > > >
> > > > > That's a good point, Stephan. It makes total sense to generalize the
> > > > > resource management to support custom resources. Having that allows
> > users
> > > > > to add new resources by themselves. The general resource management
> > may
> > > > > involve two different aspects:
> > > > >
> > > > > 1. The custom resource type definition. It is supported by the
> > extended
> > > > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > > > majority of the cases.
> > > > >
> > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > resources
> > > > > to different tasks, operators, and so on. This may require two
> > levels /
> > > > > steps:
> > > > >     a. Subtask level - make sure the subtasks are put into suitable
> > > > slots.
> > > > > It is done by the global RM and is not customizable right now.
> > > > >     b. Operator level - map the exact resource to the operators in
> > TM.
> > > > e.g.
> > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > assuming
> > > > > the global RM does not distinguish individual resources of the same
> > type.
> > > > > It is true for memory, but not for GPU.
> > > > >
> > > > > The GPU manager is designed to do 2.b here. So it should discover the
> > > > > physical GPU information and bind/match them to each operators.
> > Making
> > > > this
> > > > > general will fill in the missing piece to support custom resource
> > type
> > > > > definition. But I'd avoid calling it a "External Resource Manager" to
> > > > avoid
> > > > > confusion with RM, maybe something like "Operator Resource Assigner"
> > > > would
> > > > > be more accurate. So for each resource type users can have an
> > optional
> > > > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > > > this,
> > > > > but for other extended resources, users may need that.
> > > > >
> > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > achievable
> > > > > in this FLIP. But I am also OK with having that in a separate FLIP
> > > > because
> > > > > the interface between the "Operator Resource Assigner" and operator
> > may
> > > > > take a while to settle down if we want to make it generic. But I
> > think
> > > > our
> > > > > implementation should take this future work into consideration so
> > that we
> > > > > don't need to break backwards compatibility once we have that.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> > wrote:
> > > > >
> > > > > > Thank you for writing this FLIP.
> > > > > >
> > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > scheduling
> > > > > > and GPU allocation, as I have no experience with that.
> > > > > >
> > > > > > One thought I had when reading the proposal is if it makes sense to
> > > > look
> > > > > at
> > > > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > > > such
> > > > > > resource.
> > > > > > The way I understand the ResourceProfile and ResourceSpec, that is
> > how
> > > > it
> > > > > > is done there.
> > > > > > It has the advantage that it looks more extensible. Maybe there is
> > a
> > > > GPU
> > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > > > Alibaba
> > > > > > TPU Resource, etc.
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <[hidden email]>
> > > > wrote:
> > > > > >
> > > > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > > > must-have
> > > > > > > for machine learning use cases. Actually it is one of the mostly
> > > > asked
> > > > > > > question from the users who are interested in using Flink for ML.
> > > > > > >
> > > > > > > Some quick comments / questions to the wiki.
> > > > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > > > public
> > > > > > > interface section.
> > > > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > > > Yangze.
> > > > > > > >
> > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > > > significant,
> > > > > > > > especially for the ML scenarios.
> > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > think
> > > > > it's a
> > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <[hidden email]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > We would like to start a discussion thread on "FLIP-108: Add
> > GPU
> > > > > > > > > support in Flink"[1].
> > > > > > > > >
> > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > >
> > > > > > > > > - Enable user to configure how many GPUs in a task executor
> > and
> > > > > > > > > forward such requirements to the external resource managers
> > (for
> > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > - Provide information of available GPU resources to
> > operators.
> > > > > > > > >
> > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > >
> > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > > > discover
> > > > > > > > > and expose GPU resource information to the context of
> > functions.
> > > > > > > > > - Introduce the default script for GPU discovery, in which we
> > > > > provide
> > > > > > > > > the privilege mode to help user to achieve worker-level
> > isolation
> > > > > in
> > > > > > > > > standalone mode.
> > > > > > > > >
> > > > > > > > > Please find more details in the FLIP wiki document [1].
> > Looking
> > > > > > forward
> > > > > > > > to
> > > > > > > > > your feedbacks.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Stephan Ewen
> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> >   - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> >   - What parts need information about this?
> >     -> do the slot profiles need information about the GPU?
> >     -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > >    - The most important part is probably user interfaces. Spark
> offers
> > > > >    configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > >    k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > >    proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > >    in the general way atm, since we do not have supports for other
> > > resource
> > > > >    types now. If later we decided to have per resource type config
> > > > > options, we
> > > > >    can have backwards compatibility on the current proposed options
> > > with
> > > > >    simple key mapping.
> > > > >    - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > >    Resource Manager" (or whatever it is called). That should be a
> pure
> > > > >    component-internal refactoring.
> > > > >    - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > >    general extended resource. We can of course leverage them when
> > > > > supporting
> > > > >    fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > >    step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > >     a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > >     b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Isaac Godfried




---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----

> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> > - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> > - What parts need information about this?
> > -> do the slot profiles need information about the GPU?
> > -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > > - The most important part is probably user interfaces. Spark
> offers
> > > > > configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > > k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > > proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > > in the general way atm, since we do not have supports for other
> > > resource
> > > > > types now. If later we decided to have per resource type config
> > > > > options, we
> > > > > can have backwards compatibility on the current proposed options
> > > with
> > > > > simple key mapping.
> > > > > - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > > Resource Manager" (or whatever it is called). That should be a
> pure
> > > > > component-internal refactoring.
> > > > > - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > > general extended resource. We can of course leverage them when
> > > > > supporting
> > > > > fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > > a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > > b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Isaac Godfried
In reply to this post by Stephan Ewen




---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----

> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> > - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> > - What parts need information about this?
> > -> do the slot profiles need information about the GPU?
> > -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > > - The most important part is probably user interfaces. Spark
> offers
> > > > > configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > > k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > > proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > > in the general way atm, since we do not have supports for other
> > > resource
> > > > > types now. If later we decided to have per resource type config
> > > > > options, we
> > > > > can have backwards compatibility on the current proposed options
> > > with
> > > > > simple key mapping.
> > > > > - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > > Resource Manager" (or whatever it is called). That should be a
> pure
> > > > > component-internal refactoring.
> > > > > - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > > general extended resource. We can of course leverage them when
> > > > > supporting
> > > > > fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > > a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > > b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Yangze Guo
@Shephan
Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
in such scenario.
If that's what you worry about, I'm +1 for holding
GPUManager(ExternalResourceManagers) in TaskExecutor instead of
TaskManagerServices.

Regarding the RuntimeContext/FunctionContext, it just holds the GPU
info instead of the GPU Manager. AFAIK, it's the only place we could
pass GPU info to the RichFunction/UserDefinedFunction.

Best,
Yangze Guo

On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[hidden email]> wrote:

>
>
>
>
>
> ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----
>
> > > Can we somehow keep this out of the TaskManager services
> > I fear that we could not. IMO, the GPUManager(or
> > ExternalServicesManagers in future) is conceptually one of the task
> > manager services, just like MemoryManager before 1.10.
> > - It maintains/holds the GPU resource at TM level and all of the
> > operators allocate the GPU resources from it. So, it should be
> > exclusive to a single TaskExecutor.
> > - We could add a collection called ExternalResourceManagers to hold
> > all managers of other external resources in the future.
> >
>
> Can you help me understand why this needs the addition in TaskMagerServices
> or in the RuntimeContext?
> Are you worried about the case when multiple Task Executors run in the same
> JVM? That's not common, but wouldn't it actually be good in that case to
> share the GPU Manager, given that the GPU is shared?
>
> Thanks,
> Stephan
>
> ---------------------------
>
>
> > What parts need information about this?
> > In this FLIP, operators need the information. Thus, we expose GPU
> > information to the RuntimeContext/FunctionContext. The slot profile is
> > not aware of GPU resources as GPU is TM level resource now.
> >
> > > Can the GPU Manager be a "self contained" thing that simply takes the
> > configuration, and then abstracts everything internally?
> > Yes, we just pass the path/args of the discover script and how many
> > GPUs per TM to it. It takes the responsibility to get the GPU
> > information and expose them to the RuntimeContext/FunctionContext of
> > Operators. Meanwhile, we'd better not allow operators to directly
> > access GPUManager, it should get what they want from Context. We could
> > then decouple the interface/implementation of GPUManager and Public
> > API.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:
> > >
> > > It sounds fine to initially start with GPU specific support and think
> > about
> > > generalizing this once we better understand the space.
> > >
> > > About the implementation suggested in FLIP-108:
> > > - Can we somehow keep this out of the TaskManager services? Anything we
> > > have to pull through all layers of the TM makes the TM components yet
> > more
> > > complex and harder to maintain.
> > >
> > > - What parts need information about this?
> > > -> do the slot profiles need information about the GPU?
> > > -> Can the GPU Manager be a "self contained" thing that simply takes
> > > the configuration, and then abstracts everything internally? Operators
> > can
> > > access it via "GPUManager.get()" or so?
> > >
> > >
> > >
> > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]> wrote:
> > >
> > > > Thanks for all the feedbacks.
> > > >
> > > > @Becket
> > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > > Public API section.
> > > >
> > > >
> > > > @Stephan @Becket
> > > > Regarding the general extended resource mechanism, I second Xintong's
> > > > suggestion.
> > > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > > prefer to not include it in the scope of this FLIP.
> > > > - Regarding the "Extended Resource Manager", if I understand
> > > > correctly, it just a code refactoring atm, we could extract the
> > > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > > that is the case, +1 to do it during implementation.
> > > >
> > > > @Xingbo
> > > > As Xintong said, we looked into how Spark supports a general "Custom
> > > > Resource Scheduling" before and decided to introduce a common resource
> > > > configuration
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > to make it more extensible. I think the "resource" is a proper level
> > > > to contain all the configs of extended resources.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]>
> > wrote:
> > > > >
> > > > > Thanks a lot for the FLIP, Yangze.
> > > > >
> > > > > There is no doubt that GPU resource management support will greatly
> > > > > facilitate the development of AI-related applications by PyFlink
> > users.
> > > > >
> > > > > I have only one comment about this wiki:
> > > > >
> > > > > Regarding the names of several GPU configurations, I think it is
> > better
> > > > to
> > > > > delete the resource field makes it consistent with the names of other
> > > > > resource-related configurations in TaskManagerOption.
> > > > >
> > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > taskmanager.gpu.discovery-script.path
> > > > >
> > > > > Best,
> > > > >
> > > > > Xingbo
> > > > >
> > > > >
> > > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > > >
> > > > > > @Stephan, @Becket,
> > > > > >
> > > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > > making
> > > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > > believe
> > > > > > supporting extended resources in a general mechanism is definitely
> > a
> > > > good
> > > > > > and extensible way. The reason we propose this FLIP narrowing its
> > scope
> > > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > > review
> > > > > > capacity needed for a general mechanism.
> > > > > >
> > > > > > To come up with a well design on a general extended resource
> > management
> > > > > > mechanism, we would need to investigate more on how people use
> > > > different
> > > > > > kind of resources in practice. For GPU, we learnt such knowledge
> > from
> > > > the
> > > > > > experts, Becket and his team members. But for FPGA, or other
> > potential
> > > > > > extended resources, we don't have such convenient information
> > sources,
> > > > > > making the investigation requires more efforts, which I tend to
> > think
> > > > is
> > > > > > not necessary atm.
> > > > > >
> > > > > > On the other hand, we also looked into how Spark supports a general
> > > > "Custom
> > > > > > Resource Scheduling". Assuming we want to have a similar general
> > > > extended
> > > > > > resource mechanism in the future, we believe that the current GPU
> > > > support
> > > > > > design can be easily extended, in an incremental way without too
> > many
> > > > > > reworks.
> > > > > >
> > > > > > - The most important part is probably user interfaces. Spark
> > offers
> > > > > > configuration options to define the amount, discovery script and
> > > > vendor
> > > > > > (on
> > > > > > k8s) in a per resource type bias [1], which is very similar to
> > what
> > > > we
> > > > > > proposed in this FLIP. I think it's not necessary to expose
> > config
> > > > > > options
> > > > > > in the general way atm, since we do not have supports for other
> > > > resource
> > > > > > types now. If later we decided to have per resource type config
> > > > > > options, we
> > > > > > can have backwards compatibility on the current proposed options
> > > > with
> > > > > > simple key mapping.
> > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > "Extended
> > > > > > Resource Manager" (or whatever it is called). That should be a
> > pure
> > > > > > component-internal refactoring.
> > > > > > - For ResourceProfile and ResourceSpec, there are already
> > fields for
> > > > > > general extended resource. We can of course leverage them when
> > > > > > supporting
> > > > > > fine grained GPU scheduling. That is also not in the scope of
> > this
> > > > first
> > > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > > >
> > > > > > To summary up, I agree with Becket that have a separate FLIP for
> > the
> > > > > > general extended resource mechanism, and keep it in mind when
> > > > discussing
> > > > > > and implementing the current one.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <[hidden email]>
> > > > wrote:
> > > > > >
> > > > > > > That's a good point, Stephan. It makes total sense to generalize
> > the
> > > > > > > resource management to support custom resources. Having that
> > allows
> > > > users
> > > > > > > to add new resources by themselves. The general resource
> > management
> > > > may
> > > > > > > involve two different aspects:
> > > > > > >
> > > > > > > 1. The custom resource type definition. It is supported by the
> > > > extended
> > > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> > cover
> > > > > > > majority of the cases.
> > > > > > >
> > > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > > resources
> > > > > > > to different tasks, operators, and so on. This may require two
> > > > levels /
> > > > > > > steps:
> > > > > > > a. Subtask level - make sure the subtasks are put into
> > suitable
> > > > > > slots.
> > > > > > > It is done by the global RM and is not customizable right now.
> > > > > > > b. Operator level - map the exact resource to the operators
> > in
> > > > TM.
> > > > > > e.g.
> > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > > assuming
> > > > > > > the global RM does not distinguish individual resources of the
> > same
> > > > type.
> > > > > > > It is true for memory, but not for GPU.
> > > > > > >
> > > > > > > The GPU manager is designed to do 2.b here. So it should
> > discover the
> > > > > > > physical GPU information and bind/match them to each operators.
> > > > Making
> > > > > > this
> > > > > > > general will fill in the missing piece to support custom resource
> > > > type
> > > > > > > definition. But I'd avoid calling it a "External Resource
> > Manager" to
> > > > > > avoid
> > > > > > > confusion with RM, maybe something like "Operator Resource
> > Assigner"
> > > > > > would
> > > > > > > be more accurate. So for each resource type users can have an
> > > > optional
> > > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> > need
> > > > > > this,
> > > > > > > but for other extended resources, users may need that.
> > > > > > >
> > > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > > achievable
> > > > > > > in this FLIP. But I am also OK with having that in a separate
> > FLIP
> > > > > > because
> > > > > > > the interface between the "Operator Resource Assigner" and
> > operator
> > > > may
> > > > > > > take a while to settle down if we want to make it generic. But I
> > > > think
> > > > > > our
> > > > > > > implementation should take this future work into consideration so
> > > > that we
> > > > > > > don't need to break backwards compatibility once we have that.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <[hidden email]>
> > > > wrote:
> > > > > > >
> > > > > > > > Thank you for writing this FLIP.
> > > > > > > >
> > > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > > scheduling
> > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > >
> > > > > > > > One thought I had when reading the proposal is if it makes
> > sense to
> > > > > > look
> > > > > > > at
> > > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> > is one
> > > > > > such
> > > > > > > > resource.
> > > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> > that is
> > > > how
> > > > > > it
> > > > > > > > is done there.
> > > > > > > > It has the advantage that it looks more extensible. Maybe
> > there is
> > > > a
> > > > > > GPU
> > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > Resource, a
> > > > > > Alibaba
> > > > > > > > TPU Resource, etc.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> > is a
> > > > > > > > must-have
> > > > > > > > > for machine learning use cases. Actually it is one of the
> > mostly
> > > > > > asked
> > > > > > > > > question from the users who are interested in using Flink
> > for ML.
> > > > > > > > >
> > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> > the
> > > > > > public
> > > > > > > > > interface section.
> > > > > > > > > 2. Is the data structure that holds GPU info also a public
> > API?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > discussion,
> > > > > > Yangze.
> > > > > > > > > >
> > > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> > is
> > > > > > > > significant,
> > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > > think
> > > > > > > it's a
> > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > [hidden email]
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> > Add
> > > > GPU
> > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > >
> > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > >
> > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > executor
> > > > and
> > > > > > > > > > > forward such requirements to the external resource
> > managers
> > > > (for
> > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > - Provide information of available GPU resources to
> > > > operators.
> > > > > > > > > > >
> > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > >
> > > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > services to
> > > > > > > > discover
> > > > > > > > > > > and expose GPU resource information to the context of
> > > > functions.
> > > > > > > > > > > - Introduce the default script for GPU discovery, in
> > which we
> > > > > > > provide
> > > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > > isolation
> > > > > > > in
> > > > > > > > > > > standalone mode.
> > > > > > > > > > >
> > > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > > Looking
> > > > > > > > forward
> > > > > > > > > > to
> > > > > > > > > > > your feedbacks.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song
@Yangze,
I think what Stephan means (@Stephan, please correct me if I'm wrong) is
that, we might not need to hold and maintain the GPUManager as a service in
TaskManagerServices or RuntimeContext. An alternative is to create /
retrieve the GPUManager only in the operators that need it, e.g., with a
static method `GPUManager.get()`.

@Stephan,
I agree with you on excluding GPUManager from TaskManagerServices.

   - For the first step, where we provide unified TM-level GPU information
   to all operators, it should be fine to have operators access /
   lazy-initiate GPUManager by themselves.
   - In future, we might have some more fine-grained GPU management, where
   we need to maintain GPUManager as a service and put GPU info in slot
   profiles. But at least for now it's not necessary to introduce such
   complexity.

However, I have some concerns on excluding GPUManager from RuntimeContext
and let operators access it directly.

   - Configurations needed for creating the GPUManager is not always
   available for operators.
   - If later we want to have fine-grained control over GPU (e.g.,
   operators in each slot can only see GPUs reserved for that slot), the
   approach cannot be easily extended.

I would suggest to wrap the GPUManager behind RuntimeContext and only
expose the GPUInfo to users. For now, we can declare a method
`getGPUInfo()` in RuntimeContext, with a default definition that calls
`GPUManager.get()` to get the lazily-created GPUManager. If later we want
to create / retrieve GPUManager in a different way, we can simply change
how `getGPUInfo` is implemented, without needing to change any public
interfaces.

Thank you~

Xintong Song



On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]> wrote:

> @Shephan
> Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> in such scenario.
> If that's what you worry about, I'm +1 for holding
> GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> TaskManagerServices.
>
> Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> info instead of the GPU Manager. AFAIK, it's the only place we could
> pass GPU info to the RichFunction/UserDefinedFunction.
>
> Best,
> Yangze Guo
>
> On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[hidden email]>
> wrote:
> >
> >
> >
> >
> >
> > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----
> >
> > > > Can we somehow keep this out of the TaskManager services
> > > I fear that we could not. IMO, the GPUManager(or
> > > ExternalServicesManagers in future) is conceptually one of the task
> > > manager services, just like MemoryManager before 1.10.
> > > - It maintains/holds the GPU resource at TM level and all of the
> > > operators allocate the GPU resources from it. So, it should be
> > > exclusive to a single TaskExecutor.
> > > - We could add a collection called ExternalResourceManagers to hold
> > > all managers of other external resources in the future.
> > >
> >
> > Can you help me understand why this needs the addition in
> TaskMagerServices
> > or in the RuntimeContext?
> > Are you worried about the case when multiple Task Executors run in the
> same
> > JVM? That's not common, but wouldn't it actually be good in that case to
> > share the GPU Manager, given that the GPU is shared?
> >
> > Thanks,
> > Stephan
> >
> > ---------------------------
> >
> >
> > > What parts need information about this?
> > > In this FLIP, operators need the information. Thus, we expose GPU
> > > information to the RuntimeContext/FunctionContext. The slot profile is
> > > not aware of GPU resources as GPU is TM level resource now.
> > >
> > > > Can the GPU Manager be a "self contained" thing that simply takes the
> > > configuration, and then abstracts everything internally?
> > > Yes, we just pass the path/args of the discover script and how many
> > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > information and expose them to the RuntimeContext/FunctionContext of
> > > Operators. Meanwhile, we'd better not allow operators to directly
> > > access GPUManager, it should get what they want from Context. We could
> > > then decouple the interface/implementation of GPUManager and Public
> > > API.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]> wrote:
> > > >
> > > > It sounds fine to initially start with GPU specific support and think
> > > about
> > > > generalizing this once we better understand the space.
> > > >
> > > > About the implementation suggested in FLIP-108:
> > > > - Can we somehow keep this out of the TaskManager services? Anything
> we
> > > > have to pull through all layers of the TM makes the TM components yet
> > > more
> > > > complex and harder to maintain.
> > > >
> > > > - What parts need information about this?
> > > > -> do the slot profiles need information about the GPU?
> > > > -> Can the GPU Manager be a "self contained" thing that simply takes
> > > > the configuration, and then abstracts everything internally?
> Operators
> > > can
> > > > access it via "GPUManager.get()" or so?
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]>
> wrote:
> > > >
> > > > > Thanks for all the feedbacks.
> > > > >
> > > > > @Becket
> > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > > > Public API section.
> > > > >
> > > > >
> > > > > @Stephan @Becket
> > > > > Regarding the general extended resource mechanism, I second
> Xintong's
> > > > > suggestion.
> > > > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > > > prefer to not include it in the scope of this FLIP.
> > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > open/close/allocateExtendResources of GPUManager to that
> interface. If
> > > > > that is the case, +1 to do it during implementation.
> > > > >
> > > > > @Xingbo
> > > > > As Xintong said, we looked into how Spark supports a general
> "Custom
> > > > > Resource Scheduling" before and decided to introduce a common
> resource
> > > > > configuration
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > to make it more extensible. I think the "resource" is a proper
> level
> > > > > to contain all the configs of extended resources.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > >
> > > > > > There is no doubt that GPU resource management support will
> greatly
> > > > > > facilitate the development of AI-related applications by PyFlink
> > > users.
> > > > > >
> > > > > > I have only one comment about this wiki:
> > > > > >
> > > > > > Regarding the names of several GPU configurations, I think it is
> > > better
> > > > > to
> > > > > > delete the resource field makes it consistent with the names of
> other
> > > > > > resource-related configurations in TaskManagerOption.
> > > > > >
> > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > taskmanager.gpu.discovery-script.path
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xingbo
> > > > > >
> > > > > >
> > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > > > >
> > > > > > > @Stephan, @Becket,
> > > > > > >
> > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> about
> > > > > making
> > > > > > > the "GPU Support" as some general "Extended Resource Support".
> We
> > > > > believe
> > > > > > > supporting extended resources in a general mechanism is
> definitely
> > > a
> > > > > good
> > > > > > > and extensible way. The reason we propose this FLIP narrowing
> its
> > > scope
> > > > > > > down to GPU alone, is mainly for the concern on extra efforts
> and
> > > > > review
> > > > > > > capacity needed for a general mechanism.
> > > > > > >
> > > > > > > To come up with a well design on a general extended resource
> > > management
> > > > > > > mechanism, we would need to investigate more on how people use
> > > > > different
> > > > > > > kind of resources in practice. For GPU, we learnt such
> knowledge
> > > from
> > > > > the
> > > > > > > experts, Becket and his team members. But for FPGA, or other
> > > potential
> > > > > > > extended resources, we don't have such convenient information
> > > sources,
> > > > > > > making the investigation requires more efforts, which I tend to
> > > think
> > > > > is
> > > > > > > not necessary atm.
> > > > > > >
> > > > > > > On the other hand, we also looked into how Spark supports a
> general
> > > > > "Custom
> > > > > > > Resource Scheduling". Assuming we want to have a similar
> general
> > > > > extended
> > > > > > > resource mechanism in the future, we believe that the current
> GPU
> > > > > support
> > > > > > > design can be easily extended, in an incremental way without
> too
> > > many
> > > > > > > reworks.
> > > > > > >
> > > > > > > - The most important part is probably user interfaces. Spark
> > > offers
> > > > > > > configuration options to define the amount, discovery script
> and
> > > > > vendor
> > > > > > > (on
> > > > > > > k8s) in a per resource type bias [1], which is very similar to
> > > what
> > > > > we
> > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > config
> > > > > > > options
> > > > > > > in the general way atm, since we do not have supports for other
> > > > > resource
> > > > > > > types now. If later we decided to have per resource type config
> > > > > > > options, we
> > > > > > > can have backwards compatibility on the current proposed
> options
> > > > > with
> > > > > > > simple key mapping.
> > > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > > "Extended
> > > > > > > Resource Manager" (or whatever it is called). That should be a
> > > pure
> > > > > > > component-internal refactoring.
> > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > fields for
> > > > > > > general extended resource. We can of course leverage them when
> > > > > > > supporting
> > > > > > > fine grained GPU scheduling. That is also not in the scope of
> > > this
> > > > > first
> > > > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > > > >
> > > > > > > To summary up, I agree with Becket that have a separate FLIP
> for
> > > the
> > > > > > > general extended resource mechanism, and keep it in mind when
> > > > > discussing
> > > > > > > and implementing the current one.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > > > That's a good point, Stephan. It makes total sense to
> generalize
> > > the
> > > > > > > > resource management to support custom resources. Having that
> > > allows
> > > > > users
> > > > > > > > to add new resources by themselves. The general resource
> > > management
> > > > > may
> > > > > > > > involve two different aspects:
> > > > > > > >
> > > > > > > > 1. The custom resource type definition. It is supported by
> the
> > > > > extended
> > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> likely
> > > cover
> > > > > > > > majority of the cases.
> > > > > > > >
> > > > > > > > 2. The custom resource allocation logic, i.e. how to assign
> the
> > > > > resources
> > > > > > > > to different tasks, operators, and so on. This may require
> two
> > > > > levels /
> > > > > > > > steps:
> > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > suitable
> > > > > > > slots.
> > > > > > > > It is done by the global RM and is not customizable right
> now.
> > > > > > > > b. Operator level - map the exact resource to the operators
> > > in
> > > > > TM.
> > > > > > > e.g.
> > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> needed
> > > > > assuming
> > > > > > > > the global RM does not distinguish individual resources of
> the
> > > same
> > > > > type.
> > > > > > > > It is true for memory, but not for GPU.
> > > > > > > >
> > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > discover the
> > > > > > > > physical GPU information and bind/match them to each
> operators.
> > > > > Making
> > > > > > > this
> > > > > > > > general will fill in the missing piece to support custom
> resource
> > > > > type
> > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > Manager" to
> > > > > > > avoid
> > > > > > > > confusion with RM, maybe something like "Operator Resource
> > > Assigner"
> > > > > > > would
> > > > > > > > be more accurate. So for each resource type users can have an
> > > > > optional
> > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> don't
> > > need
> > > > > > > this,
> > > > > > > > but for other extended resources, users may need that.
> > > > > > > >
> > > > > > > > Personally I think a pluggable "Operator Resource Assigner"
> is
> > > > > achievable
> > > > > > > > in this FLIP. But I am also OK with having that in a separate
> > > FLIP
> > > > > > > because
> > > > > > > > the interface between the "Operator Resource Assigner" and
> > > operator
> > > > > may
> > > > > > > > take a while to settle down if we want to make it generic.
> But I
> > > > > think
> > > > > > > our
> > > > > > > > implementation should take this future work into
> consideration so
> > > > > that we
> > > > > > > > don't need to break backwards compatibility once we have
> that.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> [hidden email]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > >
> > > > > > > > > I cannot really give much input into the mechanics of
> GPU-aware
> > > > > > > > scheduling
> > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > >
> > > > > > > > > One thought I had when reading the proposal is if it makes
> > > sense to
> > > > > > > look
> > > > > > > > at
> > > > > > > > > the "GPU Manager" as an "External Resource Manager", and
> GPU
> > > is one
> > > > > > > such
> > > > > > > > > resource.
> > > > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> > > that is
> > > > > how
> > > > > > > it
> > > > > > > > > is done there.
> > > > > > > > > It has the advantage that it looks more extensible. Maybe
> > > there is
> > > > > a
> > > > > > > GPU
> > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > Resource, a
> > > > > > > Alibaba
> > > > > > > > > TPU Resource, etc.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stephan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> support
> > > is a
> > > > > > > > > must-have
> > > > > > > > > > for machine learning use cases. Actually it is one of the
> > > mostly
> > > > > > > asked
> > > > > > > > > > question from the users who are interested in using Flink
> > > for ML.
> > > > > > > > > >
> > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > 1. The WebUI / REST API should probably also be
> mentioned in
> > > the
> > > > > > > public
> > > > > > > > > > interface section.
> > > > > > > > > > 2. Is the data structure that holds GPU info also a
> public
> > > API?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > discussion,
> > > > > > > Yangze.
> > > > > > > > > > >
> > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> Flink
> > > is
> > > > > > > > > significant,
> > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to
> me. I
> > > > > think
> > > > > > > > it's a
> > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > [hidden email]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > We would like to start a discussion thread on
> "FLIP-108:
> > > Add
> > > > > GPU
> > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > >
> > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > >
> > > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > > executor
> > > > > and
> > > > > > > > > > > > forward such requirements to the external resource
> > > managers
> > > > > (for
> > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > - Provide information of available GPU resources to
> > > > > operators.
> > > > > > > > > > > >
> > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > >
> > > > > > > > > > > > - Forward GPU resource requirements to
> Yarn/Kubernetes.
> > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > services to
> > > > > > > > > discover
> > > > > > > > > > > > and expose GPU resource information to the context of
> > > > > functions.
> > > > > > > > > > > > - Introduce the default script for GPU discovery, in
> > > which we
> > > > > > > > provide
> > > > > > > > > > > > the privilege mode to help user to achieve
> worker-level
> > > > > isolation
> > > > > > > > in
> > > > > > > > > > > > standalone mode.
> > > > > > > > > > > >
> > > > > > > > > > > > Please find more details in the FLIP wiki document
> [1].
> > > > > Looking
> > > > > > > > > forward
> > > > > > > > > > > to
> > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Becket Qin
It probably make sense for us to first agree on the final state. More
specifically, will the resource info be exposed through runtime context
eventually?

If that is the final state and we have a seamless migration story from this
FLIP to that final state, Personally I think it is OK to expose the GPU
info in the runtime context.

Thanks,

Jiangjie (Becket) Qin

On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <[hidden email]> wrote:

> @Yangze,
> I think what Stephan means (@Stephan, please correct me if I'm wrong) is
> that, we might not need to hold and maintain the GPUManager as a service in
> TaskManagerServices or RuntimeContext. An alternative is to create /
> retrieve the GPUManager only in the operators that need it, e.g., with a
> static method `GPUManager.get()`.
>
> @Stephan,
> I agree with you on excluding GPUManager from TaskManagerServices.
>
>    - For the first step, where we provide unified TM-level GPU information
>    to all operators, it should be fine to have operators access /
>    lazy-initiate GPUManager by themselves.
>    - In future, we might have some more fine-grained GPU management, where
>    we need to maintain GPUManager as a service and put GPU info in slot
>    profiles. But at least for now it's not necessary to introduce such
>    complexity.
>
> However, I have some concerns on excluding GPUManager from RuntimeContext
> and let operators access it directly.
>
>    - Configurations needed for creating the GPUManager is not always
>    available for operators.
>    - If later we want to have fine-grained control over GPU (e.g.,
>    operators in each slot can only see GPUs reserved for that slot), the
>    approach cannot be easily extended.
>
> I would suggest to wrap the GPUManager behind RuntimeContext and only
> expose the GPUInfo to users. For now, we can declare a method
> `getGPUInfo()` in RuntimeContext, with a default definition that calls
> `GPUManager.get()` to get the lazily-created GPUManager. If later we want
> to create / retrieve GPUManager in a different way, we can simply change
> how `getGPUInfo` is implemented, without needing to change any public
> interfaces.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]> wrote:
>
> > @Shephan
> > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > in such scenario.
> > If that's what you worry about, I'm +1 for holding
> > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > TaskManagerServices.
> >
> > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > info instead of the GPU Manager. AFAIK, it's the only place we could
> > pass GPU info to the RichFunction/UserDefinedFunction.
> >
> > Best,
> > Yangze Guo
> >
> > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[hidden email]>
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----
> > >
> > > > > Can we somehow keep this out of the TaskManager services
> > > > I fear that we could not. IMO, the GPUManager(or
> > > > ExternalServicesManagers in future) is conceptually one of the task
> > > > manager services, just like MemoryManager before 1.10.
> > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > operators allocate the GPU resources from it. So, it should be
> > > > exclusive to a single TaskExecutor.
> > > > - We could add a collection called ExternalResourceManagers to hold
> > > > all managers of other external resources in the future.
> > > >
> > >
> > > Can you help me understand why this needs the addition in
> > TaskMagerServices
> > > or in the RuntimeContext?
> > > Are you worried about the case when multiple Task Executors run in the
> > same
> > > JVM? That's not common, but wouldn't it actually be good in that case
> to
> > > share the GPU Manager, given that the GPU is shared?
> > >
> > > Thanks,
> > > Stephan
> > >
> > > ---------------------------
> > >
> > >
> > > > What parts need information about this?
> > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > information to the RuntimeContext/FunctionContext. The slot profile
> is
> > > > not aware of GPU resources as GPU is TM level resource now.
> > > >
> > > > > Can the GPU Manager be a "self contained" thing that simply takes
> the
> > > > configuration, and then abstracts everything internally?
> > > > Yes, we just pass the path/args of the discover script and how many
> > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > information and expose them to the RuntimeContext/FunctionContext of
> > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > access GPUManager, it should get what they want from Context. We
> could
> > > > then decouple the interface/implementation of GPUManager and Public
> > > > API.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]>
> wrote:
> > > > >
> > > > > It sounds fine to initially start with GPU specific support and
> think
> > > > about
> > > > > generalizing this once we better understand the space.
> > > > >
> > > > > About the implementation suggested in FLIP-108:
> > > > > - Can we somehow keep this out of the TaskManager services?
> Anything
> > we
> > > > > have to pull through all layers of the TM makes the TM components
> yet
> > > > more
> > > > > complex and harder to maintain.
> > > > >
> > > > > - What parts need information about this?
> > > > > -> do the slot profiles need information about the GPU?
> > > > > -> Can the GPU Manager be a "self contained" thing that simply
> takes
> > > > > the configuration, and then abstracts everything internally?
> > Operators
> > > > can
> > > > > access it via "GPUManager.get()" or so?
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]>
> > wrote:
> > > > >
> > > > > > Thanks for all the feedbacks.
> > > > > >
> > > > > > @Becket
> > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to
> the
> > > > > > Public API section.
> > > > > >
> > > > > >
> > > > > > @Stephan @Becket
> > > > > > Regarding the general extended resource mechanism, I second
> > Xintong's
> > > > > > suggestion.
> > > > > > - It's better to leverage ResourceProfile and ResourceSpec after
> we
> > > > > > supporting fine-grained GPU scheduling. As a first step
> proposal, I
> > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > > open/close/allocateExtendResources of GPUManager to that
> > interface. If
> > > > > > that is the case, +1 to do it during implementation.
> > > > > >
> > > > > > @Xingbo
> > > > > > As Xintong said, we looked into how Spark supports a general
> > "Custom
> > > > > > Resource Scheduling" before and decided to introduce a common
> > resource
> > > > > > configuration
> > > > > >
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > to make it more extensible. I think the "resource" is a proper
> > level
> > > > > > to contain all the configs of extended resources.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[hidden email]
> >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > >
> > > > > > > There is no doubt that GPU resource management support will
> > greatly
> > > > > > > facilitate the development of AI-related applications by
> PyFlink
> > > > users.
> > > > > > >
> > > > > > > I have only one comment about this wiki:
> > > > > > >
> > > > > > > Regarding the names of several GPU configurations, I think it
> is
> > > > better
> > > > > > to
> > > > > > > delete the resource field makes it consistent with the names of
> > other
> > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > >
> > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xingbo
> > > > > > >
> > > > > > >
> > > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > > > > >
> > > > > > > > @Stephan, @Becket,
> > > > > > > >
> > > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> > about
> > > > > > making
> > > > > > > > the "GPU Support" as some general "Extended Resource
> Support".
> > We
> > > > > > believe
> > > > > > > > supporting extended resources in a general mechanism is
> > definitely
> > > > a
> > > > > > good
> > > > > > > > and extensible way. The reason we propose this FLIP narrowing
> > its
> > > > scope
> > > > > > > > down to GPU alone, is mainly for the concern on extra efforts
> > and
> > > > > > review
> > > > > > > > capacity needed for a general mechanism.
> > > > > > > >
> > > > > > > > To come up with a well design on a general extended resource
> > > > management
> > > > > > > > mechanism, we would need to investigate more on how people
> use
> > > > > > different
> > > > > > > > kind of resources in practice. For GPU, we learnt such
> > knowledge
> > > > from
> > > > > > the
> > > > > > > > experts, Becket and his team members. But for FPGA, or other
> > > > potential
> > > > > > > > extended resources, we don't have such convenient information
> > > > sources,
> > > > > > > > making the investigation requires more efforts, which I tend
> to
> > > > think
> > > > > > is
> > > > > > > > not necessary atm.
> > > > > > > >
> > > > > > > > On the other hand, we also looked into how Spark supports a
> > general
> > > > > > "Custom
> > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > general
> > > > > > extended
> > > > > > > > resource mechanism in the future, we believe that the current
> > GPU
> > > > > > support
> > > > > > > > design can be easily extended, in an incremental way without
> > too
> > > > many
> > > > > > > > reworks.
> > > > > > > >
> > > > > > > > - The most important part is probably user interfaces. Spark
> > > > offers
> > > > > > > > configuration options to define the amount, discovery script
> > and
> > > > > > vendor
> > > > > > > > (on
> > > > > > > > k8s) in a per resource type bias [1], which is very similar
> to
> > > > what
> > > > > > we
> > > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > > config
> > > > > > > > options
> > > > > > > > in the general way atm, since we do not have supports for
> other
> > > > > > resource
> > > > > > > > types now. If later we decided to have per resource type
> config
> > > > > > > > options, we
> > > > > > > > can have backwards compatibility on the current proposed
> > options
> > > > > > with
> > > > > > > > simple key mapping.
> > > > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > > > "Extended
> > > > > > > > Resource Manager" (or whatever it is called). That should be
> a
> > > > pure
> > > > > > > > component-internal refactoring.
> > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > fields for
> > > > > > > > general extended resource. We can of course leverage them
> when
> > > > > > > > supporting
> > > > > > > > fine grained GPU scheduling. That is also not in the scope of
> > > > this
> > > > > > first
> > > > > > > > step proposal, and would require FLIP-56 to be finished
> first.
> > > > > > > >
> > > > > > > > To summary up, I agree with Becket that have a separate FLIP
> > for
> > > > the
> > > > > > > > general extended resource mechanism, and keep it in mind when
> > > > > > discussing
> > > > > > > > and implementing the current one.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > That's a good point, Stephan. It makes total sense to
> > generalize
> > > > the
> > > > > > > > > resource management to support custom resources. Having
> that
> > > > allows
> > > > > > users
> > > > > > > > > to add new resources by themselves. The general resource
> > > > management
> > > > > > may
> > > > > > > > > involve two different aspects:
> > > > > > > > >
> > > > > > > > > 1. The custom resource type definition. It is supported by
> > the
> > > > > > extended
> > > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> > likely
> > > > cover
> > > > > > > > > majority of the cases.
> > > > > > > > >
> > > > > > > > > 2. The custom resource allocation logic, i.e. how to assign
> > the
> > > > > > resources
> > > > > > > > > to different tasks, operators, and so on. This may require
> > two
> > > > > > levels /
> > > > > > > > > steps:
> > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > suitable
> > > > > > > > slots.
> > > > > > > > > It is done by the global RM and is not customizable right
> > now.
> > > > > > > > > b. Operator level - map the exact resource to the operators
> > > > in
> > > > > > TM.
> > > > > > > > e.g.
> > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> > needed
> > > > > > assuming
> > > > > > > > > the global RM does not distinguish individual resources of
> > the
> > > > same
> > > > > > type.
> > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > >
> > > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > > discover the
> > > > > > > > > physical GPU information and bind/match them to each
> > operators.
> > > > > > Making
> > > > > > > > this
> > > > > > > > > general will fill in the missing piece to support custom
> > resource
> > > > > > type
> > > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > > Manager" to
> > > > > > > > avoid
> > > > > > > > > confusion with RM, maybe something like "Operator Resource
> > > > Assigner"
> > > > > > > > would
> > > > > > > > > be more accurate. So for each resource type users can have
> an
> > > > > > optional
> > > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> > don't
> > > > need
> > > > > > > > this,
> > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > >
> > > > > > > > > Personally I think a pluggable "Operator Resource Assigner"
> > is
> > > > > > achievable
> > > > > > > > > in this FLIP. But I am also OK with having that in a
> separate
> > > > FLIP
> > > > > > > > because
> > > > > > > > > the interface between the "Operator Resource Assigner" and
> > > > operator
> > > > > > may
> > > > > > > > > take a while to settle down if we want to make it generic.
> > But I
> > > > > > think
> > > > > > > > our
> > > > > > > > > implementation should take this future work into
> > consideration so
> > > > > > that we
> > > > > > > > > don't need to break backwards compatibility once we have
> > that.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > >
> > > > > > > > > > I cannot really give much input into the mechanics of
> > GPU-aware
> > > > > > > > > scheduling
> > > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > > >
> > > > > > > > > > One thought I had when reading the proposal is if it
> makes
> > > > sense to
> > > > > > > > look
> > > > > > > > > at
> > > > > > > > > > the "GPU Manager" as an "External Resource Manager", and
> > GPU
> > > > is one
> > > > > > > > such
> > > > > > > > > > resource.
> > > > > > > > > > The way I understand the ResourceProfile and
> ResourceSpec,
> > > > that is
> > > > > > how
> > > > > > > > it
> > > > > > > > > > is done there.
> > > > > > > > > > It has the advantage that it looks more extensible. Maybe
> > > > there is
> > > > > > a
> > > > > > > > GPU
> > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > Resource, a
> > > > > > > > Alibaba
> > > > > > > > > > TPU Resource, etc.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stephan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > support
> > > > is a
> > > > > > > > > > must-have
> > > > > > > > > > > for machine learning use cases. Actually it is one of
> the
> > > > mostly
> > > > > > > > asked
> > > > > > > > > > > question from the users who are interested in using
> Flink
> > > > for ML.
> > > > > > > > > > >
> > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > mentioned in
> > > > the
> > > > > > > > public
> > > > > > > > > > > interface section.
> > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > public
> > > > API?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > [hidden email]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > discussion,
> > > > > > > > Yangze.
> > > > > > > > > > > >
> > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> > Flink
> > > > is
> > > > > > > > > > significant,
> > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to
> > me. I
> > > > > > think
> > > > > > > > > it's a
> > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > [hidden email]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > We would like to start a discussion thread on
> > "FLIP-108:
> > > > Add
> > > > > > GPU
> > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > > > executor
> > > > > > and
> > > > > > > > > > > > > forward such requirements to the external resource
> > > > managers
> > > > > > (for
> > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > - Provide information of available GPU resources to
> > > > > > operators.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Forward GPU resource requirements to
> > Yarn/Kubernetes.
> > > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > > services to
> > > > > > > > > > discover
> > > > > > > > > > > > > and expose GPU resource information to the context
> of
> > > > > > functions.
> > > > > > > > > > > > > - Introduce the default script for GPU discovery,
> in
> > > > which we
> > > > > > > > > provide
> > > > > > > > > > > > > the privilege mode to help user to achieve
> > worker-level
> > > > > > isolation
> > > > > > > > > in
> > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please find more details in the FLIP wiki document
> > [1].
> > > > > > Looking
> > > > > > > > > > forward
> > > > > > > > > > > > to
> > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song
 Thanks for the feedback, Becket.

IMO, eventually an operator should only see info of GPUs that are dedicated
for it, instead of all GPUs on the machine/container in the current design.
It does not make sense to let the user who writes a UDF to worry about
coordination among multiple operators running on the same machine. And if
we want to limit the GPU info an operator sees, we should not let the
operator to instantiate GPUManager, which means we have to expose something
through runtime context, either GPU info or some kind of limited access to
the GPUManager.

Thank you~

Xintong Song



On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <[hidden email]> wrote:

> It probably make sense for us to first agree on the final state. More
> specifically, will the resource info be exposed through runtime context
> eventually?
>
> If that is the final state and we have a seamless migration story from this
> FLIP to that final state, Personally I think it is OK to expose the GPU
> info in the runtime context.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <[hidden email]>
> wrote:
>
> > @Yangze,
> > I think what Stephan means (@Stephan, please correct me if I'm wrong) is
> > that, we might not need to hold and maintain the GPUManager as a service
> in
> > TaskManagerServices or RuntimeContext. An alternative is to create /
> > retrieve the GPUManager only in the operators that need it, e.g., with a
> > static method `GPUManager.get()`.
> >
> > @Stephan,
> > I agree with you on excluding GPUManager from TaskManagerServices.
> >
> >    - For the first step, where we provide unified TM-level GPU
> information
> >    to all operators, it should be fine to have operators access /
> >    lazy-initiate GPUManager by themselves.
> >    - In future, we might have some more fine-grained GPU management,
> where
> >    we need to maintain GPUManager as a service and put GPU info in slot
> >    profiles. But at least for now it's not necessary to introduce such
> >    complexity.
> >
> > However, I have some concerns on excluding GPUManager from RuntimeContext
> > and let operators access it directly.
> >
> >    - Configurations needed for creating the GPUManager is not always
> >    available for operators.
> >    - If later we want to have fine-grained control over GPU (e.g.,
> >    operators in each slot can only see GPUs reserved for that slot), the
> >    approach cannot be easily extended.
> >
> > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > expose the GPUInfo to users. For now, we can declare a method
> > `getGPUInfo()` in RuntimeContext, with a default definition that calls
> > `GPUManager.get()` to get the lazily-created GPUManager. If later we want
> > to create / retrieve GPUManager in a different way, we can simply change
> > how `getGPUInfo` is implemented, without needing to change any public
> > interfaces.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]> wrote:
> >
> > > @Shephan
> > > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > > in such scenario.
> > > If that's what you worry about, I'm +1 for holding
> > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > TaskManagerServices.
> > >
> > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > info instead of the GPU Manager. AFAIK, it's the only place we could
> > > pass GPU info to the RichFunction/UserDefinedFunction.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[hidden email]>
> > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote ----
> > > >
> > > > > > Can we somehow keep this out of the TaskManager services
> > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > ExternalServicesManagers in future) is conceptually one of the task
> > > > > manager services, just like MemoryManager before 1.10.
> > > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > > operators allocate the GPU resources from it. So, it should be
> > > > > exclusive to a single TaskExecutor.
> > > > > - We could add a collection called ExternalResourceManagers to hold
> > > > > all managers of other external resources in the future.
> > > > >
> > > >
> > > > Can you help me understand why this needs the addition in
> > > TaskMagerServices
> > > > or in the RuntimeContext?
> > > > Are you worried about the case when multiple Task Executors run in
> the
> > > same
> > > > JVM? That's not common, but wouldn't it actually be good in that case
> > to
> > > > share the GPU Manager, given that the GPU is shared?
> > > >
> > > > Thanks,
> > > > Stephan
> > > >
> > > > ---------------------------
> > > >
> > > >
> > > > > What parts need information about this?
> > > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > > information to the RuntimeContext/FunctionContext. The slot profile
> > is
> > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > >
> > > > > > Can the GPU Manager be a "self contained" thing that simply takes
> > the
> > > > > configuration, and then abstracts everything internally?
> > > > > Yes, we just pass the path/args of the discover script and how many
> > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > information and expose them to the RuntimeContext/FunctionContext
> of
> > > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > > access GPUManager, it should get what they want from Context. We
> > could
> > > > > then decouple the interface/implementation of GPUManager and Public
> > > > > API.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]>
> > wrote:
> > > > > >
> > > > > > It sounds fine to initially start with GPU specific support and
> > think
> > > > > about
> > > > > > generalizing this once we better understand the space.
> > > > > >
> > > > > > About the implementation suggested in FLIP-108:
> > > > > > - Can we somehow keep this out of the TaskManager services?
> > Anything
> > > we
> > > > > > have to pull through all layers of the TM makes the TM components
> > yet
> > > > > more
> > > > > > complex and harder to maintain.
> > > > > >
> > > > > > - What parts need information about this?
> > > > > > -> do the slot profiles need information about the GPU?
> > > > > > -> Can the GPU Manager be a "self contained" thing that simply
> > takes
> > > > > > the configuration, and then abstracts everything internally?
> > > Operators
> > > > > can
> > > > > > access it via "GPUManager.get()" or so?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Thanks for all the feedbacks.
> > > > > > >
> > > > > > > @Becket
> > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to
> > the
> > > > > > > Public API section.
> > > > > > >
> > > > > > >
> > > > > > > @Stephan @Becket
> > > > > > > Regarding the general extended resource mechanism, I second
> > > Xintong's
> > > > > > > suggestion.
> > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> after
> > we
> > > > > > > supporting fine-grained GPU scheduling. As a first step
> > proposal, I
> > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > interface. If
> > > > > > > that is the case, +1 to do it during implementation.
> > > > > > >
> > > > > > > @Xingbo
> > > > > > > As Xintong said, we looked into how Spark supports a general
> > > "Custom
> > > > > > > Resource Scheduling" before and decided to introduce a common
> > > resource
> > > > > > > configuration
> > > > > > >
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > to make it more extensible. I think the "resource" is a proper
> > > level
> > > > > > > to contain all the configs of extended resources.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> [hidden email]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > >
> > > > > > > > There is no doubt that GPU resource management support will
> > > greatly
> > > > > > > > facilitate the development of AI-related applications by
> > PyFlink
> > > > > users.
> > > > > > > >
> > > > > > > > I have only one comment about this wiki:
> > > > > > > >
> > > > > > > > Regarding the names of several GPU configurations, I think it
> > is
> > > > > better
> > > > > > > to
> > > > > > > > delete the resource field makes it consistent with the names
> of
> > > other
> > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > >
> > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xingbo
> > > > > > > >
> > > > > > > >
> > > > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三 上午10:39写道:
> > > > > > > >
> > > > > > > > > @Stephan, @Becket,
> > > > > > > > >
> > > > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> > > about
> > > > > > > making
> > > > > > > > > the "GPU Support" as some general "Extended Resource
> > Support".
> > > We
> > > > > > > believe
> > > > > > > > > supporting extended resources in a general mechanism is
> > > definitely
> > > > > a
> > > > > > > good
> > > > > > > > > and extensible way. The reason we propose this FLIP
> narrowing
> > > its
> > > > > scope
> > > > > > > > > down to GPU alone, is mainly for the concern on extra
> efforts
> > > and
> > > > > > > review
> > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > >
> > > > > > > > > To come up with a well design on a general extended
> resource
> > > > > management
> > > > > > > > > mechanism, we would need to investigate more on how people
> > use
> > > > > > > different
> > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > knowledge
> > > > > from
> > > > > > > the
> > > > > > > > > experts, Becket and his team members. But for FPGA, or
> other
> > > > > potential
> > > > > > > > > extended resources, we don't have such convenient
> information
> > > > > sources,
> > > > > > > > > making the investigation requires more efforts, which I
> tend
> > to
> > > > > think
> > > > > > > is
> > > > > > > > > not necessary atm.
> > > > > > > > >
> > > > > > > > > On the other hand, we also looked into how Spark supports a
> > > general
> > > > > > > "Custom
> > > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > > general
> > > > > > > extended
> > > > > > > > > resource mechanism in the future, we believe that the
> current
> > > GPU
> > > > > > > support
> > > > > > > > > design can be easily extended, in an incremental way
> without
> > > too
> > > > > many
> > > > > > > > > reworks.
> > > > > > > > >
> > > > > > > > > - The most important part is probably user interfaces.
> Spark
> > > > > offers
> > > > > > > > > configuration options to define the amount, discovery
> script
> > > and
> > > > > > > vendor
> > > > > > > > > (on
> > > > > > > > > k8s) in a per resource type bias [1], which is very similar
> > to
> > > > > what
> > > > > > > we
> > > > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > > > config
> > > > > > > > > options
> > > > > > > > > in the general way atm, since we do not have supports for
> > other
> > > > > > > resource
> > > > > > > > > types now. If later we decided to have per resource type
> > config
> > > > > > > > > options, we
> > > > > > > > > can have backwards compatibility on the current proposed
> > > options
> > > > > > > with
> > > > > > > > > simple key mapping.
> > > > > > > > > - For the GPU Manager, if later needed we can change it to
> a
> > > > > > > "Extended
> > > > > > > > > Resource Manager" (or whatever it is called). That should
> be
> > a
> > > > > pure
> > > > > > > > > component-internal refactoring.
> > > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > > fields for
> > > > > > > > > general extended resource. We can of course leverage them
> > when
> > > > > > > > > supporting
> > > > > > > > > fine grained GPU scheduling. That is also not in the scope
> of
> > > > > this
> > > > > > > first
> > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > first.
> > > > > > > > >
> > > > > > > > > To summary up, I agree with Becket that have a separate
> FLIP
> > > for
> > > > > the
> > > > > > > > > general extended resource mechanism, and keep it in mind
> when
> > > > > > > discussing
> > > > > > > > > and implementing the current one.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > generalize
> > > > > the
> > > > > > > > > > resource management to support custom resources. Having
> > that
> > > > > allows
> > > > > > > users
> > > > > > > > > > to add new resources by themselves. The general resource
> > > > > management
> > > > > > > may
> > > > > > > > > > involve two different aspects:
> > > > > > > > > >
> > > > > > > > > > 1. The custom resource type definition. It is supported
> by
> > > the
> > > > > > > extended
> > > > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> > > likely
> > > > > cover
> > > > > > > > > > majority of the cases.
> > > > > > > > > >
> > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> assign
> > > the
> > > > > > > resources
> > > > > > > > > > to different tasks, operators, and so on. This may
> require
> > > two
> > > > > > > levels /
> > > > > > > > > > steps:
> > > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > > suitable
> > > > > > > > > slots.
> > > > > > > > > > It is done by the global RM and is not customizable right
> > > now.
> > > > > > > > > > b. Operator level - map the exact resource to the
> operators
> > > > > in
> > > > > > > TM.
> > > > > > > > > e.g.
> > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> > > needed
> > > > > > > assuming
> > > > > > > > > > the global RM does not distinguish individual resources
> of
> > > the
> > > > > same
> > > > > > > type.
> > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > >
> > > > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > > > discover the
> > > > > > > > > > physical GPU information and bind/match them to each
> > > operators.
> > > > > > > Making
> > > > > > > > > this
> > > > > > > > > > general will fill in the missing piece to support custom
> > > resource
> > > > > > > type
> > > > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > > > Manager" to
> > > > > > > > > avoid
> > > > > > > > > > confusion with RM, maybe something like "Operator
> Resource
> > > > > Assigner"
> > > > > > > > > would
> > > > > > > > > > be more accurate. So for each resource type users can
> have
> > an
> > > > > > > optional
> > > > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> > > don't
> > > > > need
> > > > > > > > > this,
> > > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > > >
> > > > > > > > > > Personally I think a pluggable "Operator Resource
> Assigner"
> > > is
> > > > > > > achievable
> > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > separate
> > > > > FLIP
> > > > > > > > > because
> > > > > > > > > > the interface between the "Operator Resource Assigner"
> and
> > > > > operator
> > > > > > > may
> > > > > > > > > > take a while to settle down if we want to make it
> generic.
> > > But I
> > > > > > > think
> > > > > > > > > our
> > > > > > > > > > implementation should take this future work into
> > > consideration so
> > > > > > > that we
> > > > > > > > > > don't need to break backwards compatibility once we have
> > > that.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > >
> > > > > > > > > > > I cannot really give much input into the mechanics of
> > > GPU-aware
> > > > > > > > > > scheduling
> > > > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > > > >
> > > > > > > > > > > One thought I had when reading the proposal is if it
> > makes
> > > > > sense to
> > > > > > > > > look
> > > > > > > > > > at
> > > > > > > > > > > the "GPU Manager" as an "External Resource Manager",
> and
> > > GPU
> > > > > is one
> > > > > > > > > such
> > > > > > > > > > > resource.
> > > > > > > > > > > The way I understand the ResourceProfile and
> > ResourceSpec,
> > > > > that is
> > > > > > > how
> > > > > > > > > it
> > > > > > > > > > > is done there.
> > > > > > > > > > > It has the advantage that it looks more extensible.
> Maybe
> > > > > there is
> > > > > > > a
> > > > > > > > > GPU
> > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > > Resource, a
> > > > > > > > > Alibaba
> > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stephan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > > support
> > > > > is a
> > > > > > > > > > > must-have
> > > > > > > > > > > > for machine learning use cases. Actually it is one of
> > the
> > > > > mostly
> > > > > > > > > asked
> > > > > > > > > > > > question from the users who are interested in using
> > Flink
> > > > > for ML.
> > > > > > > > > > > >
> > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > mentioned in
> > > > > the
> > > > > > > > > public
> > > > > > > > > > > > interface section.
> > > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > > public
> > > > > API?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > [hidden email]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > > discussion,
> > > > > > > > > Yangze.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> > > Flink
> > > > > is
> > > > > > > > > > > significant,
> > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good
> to
> > > me. I
> > > > > > > think
> > > > > > > > > > it's a
> > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > [hidden email]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We would like to start a discussion thread on
> > > "FLIP-108:
> > > > > Add
> > > > > > > GPU
> > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> task
> > > > > executor
> > > > > > > and
> > > > > > > > > > > > > > forward such requirements to the external
> resource
> > > > > managers
> > > > > > > (for
> > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > - Provide information of available GPU resources
> to
> > > > > > > operators.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > Yarn/Kubernetes.
> > > > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > > > services to
> > > > > > > > > > > discover
> > > > > > > > > > > > > > and expose GPU resource information to the
> context
> > of
> > > > > > > functions.
> > > > > > > > > > > > > > - Introduce the default script for GPU discovery,
> > in
> > > > > which we
> > > > > > > > > > provide
> > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > worker-level
> > > > > > > isolation
> > > > > > > > > > in
> > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please find more details in the FLIP wiki
> document
> > > [1].
> > > > > > > Looking
> > > > > > > > > > > forward
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Stephan Ewen
Hi all!

The main point I wanted to throw into the discussion is the following:
  - With more and more use cases, more and more tools go into Flink
  - If everything becomes a "core feature", it will make the project hard
to develop in the future. Thinking "library" / "plugin" / "extension" style
where possible helps.

  - A good thought experiment is always: How many future developers have to
interact with this code (and possibly understand it partially), even if the
features they touch have nothing to do with GPU support. If many
contributors to unrelated features will have to touch it and understand it,
then let's think if there is a different solution. Maybe there is not, but
then we should be sure why.

  - That led me to raising this issue: If the GPU manager becomes a core
service in the TaskManager, Environment, RuntimeContext, etc. then everyone
developing TM and streaming tasks need to understand the GPU manager. That
seems oddly specific, is my impression.

Access to configuration seems not the right reason to do that. We should
expose the Flink configuration from the RuntimeContext anyways.

If GPUs are sliced and assigned during scheduling, there may be reason,
although it looks that it would belong to the slot then. Is that what we
are doing here?

Best,
Stephan


On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <[hidden email]> wrote:

>  Thanks for the feedback, Becket.
>
> IMO, eventually an operator should only see info of GPUs that are dedicated
> for it, instead of all GPUs on the machine/container in the current design.
> It does not make sense to let the user who writes a UDF to worry about
> coordination among multiple operators running on the same machine. And if
> we want to limit the GPU info an operator sees, we should not let the
> operator to instantiate GPUManager, which means we have to expose something
> through runtime context, either GPU info or some kind of limited access to
> the GPUManager.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <[hidden email]> wrote:
>
> > It probably make sense for us to first agree on the final state. More
> > specifically, will the resource info be exposed through runtime context
> > eventually?
> >
> > If that is the final state and we have a seamless migration story from
> this
> > FLIP to that final state, Personally I think it is OK to expose the GPU
> > info in the runtime context.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > @Yangze,
> > > I think what Stephan means (@Stephan, please correct me if I'm wrong)
> is
> > > that, we might not need to hold and maintain the GPUManager as a
> service
> > in
> > > TaskManagerServices or RuntimeContext. An alternative is to create /
> > > retrieve the GPUManager only in the operators that need it, e.g., with
> a
> > > static method `GPUManager.get()`.
> > >
> > > @Stephan,
> > > I agree with you on excluding GPUManager from TaskManagerServices.
> > >
> > >    - For the first step, where we provide unified TM-level GPU
> > information
> > >    to all operators, it should be fine to have operators access /
> > >    lazy-initiate GPUManager by themselves.
> > >    - In future, we might have some more fine-grained GPU management,
> > where
> > >    we need to maintain GPUManager as a service and put GPU info in slot
> > >    profiles. But at least for now it's not necessary to introduce such
> > >    complexity.
> > >
> > > However, I have some concerns on excluding GPUManager from
> RuntimeContext
> > > and let operators access it directly.
> > >
> > >    - Configurations needed for creating the GPUManager is not always
> > >    available for operators.
> > >    - If later we want to have fine-grained control over GPU (e.g.,
> > >    operators in each slot can only see GPUs reserved for that slot),
> the
> > >    approach cannot be easily extended.
> > >
> > > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > > expose the GPUInfo to users. For now, we can declare a method
> > > `getGPUInfo()` in RuntimeContext, with a default definition that calls
> > > `GPUManager.get()` to get the lazily-created GPUManager. If later we
> want
> > > to create / retrieve GPUManager in a different way, we can simply
> change
> > > how `getGPUInfo` is implemented, without needing to change any public
> > > interfaces.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]>
> wrote:
> > >
> > > > @Shephan
> > > > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > > > in such scenario.
> > > > If that's what you worry about, I'm +1 for holding
> > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > TaskManagerServices.
> > > >
> > > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > > info instead of the GPU Manager. AFAIK, it's the only place we could
> > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[hidden email]
> >
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote
> ----
> > > > >
> > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > ExternalServicesManagers in future) is conceptually one of the
> task
> > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > > > operators allocate the GPU resources from it. So, it should be
> > > > > > exclusive to a single TaskExecutor.
> > > > > > - We could add a collection called ExternalResourceManagers to
> hold
> > > > > > all managers of other external resources in the future.
> > > > > >
> > > > >
> > > > > Can you help me understand why this needs the addition in
> > > > TaskMagerServices
> > > > > or in the RuntimeContext?
> > > > > Are you worried about the case when multiple Task Executors run in
> > the
> > > > same
> > > > > JVM? That's not common, but wouldn't it actually be good in that
> case
> > > to
> > > > > share the GPU Manager, given that the GPU is shared?
> > > > >
> > > > > Thanks,
> > > > > Stephan
> > > > >
> > > > > ---------------------------
> > > > >
> > > > >
> > > > > > What parts need information about this?
> > > > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > > > information to the RuntimeContext/FunctionContext. The slot
> profile
> > > is
> > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > >
> > > > > > > Can the GPU Manager be a "self contained" thing that simply
> takes
> > > the
> > > > > > configuration, and then abstracts everything internally?
> > > > > > Yes, we just pass the path/args of the discover script and how
> many
> > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > information and expose them to the RuntimeContext/FunctionContext
> > of
> > > > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > > > access GPUManager, it should get what they want from Context. We
> > > could
> > > > > > then decouple the interface/implementation of GPUManager and
> Public
> > > > > > API.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]>
> > > wrote:
> > > > > > >
> > > > > > > It sounds fine to initially start with GPU specific support and
> > > think
> > > > > > about
> > > > > > > generalizing this once we better understand the space.
> > > > > > >
> > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > Anything
> > > > we
> > > > > > > have to pull through all layers of the TM makes the TM
> components
> > > yet
> > > > > > more
> > > > > > > complex and harder to maintain.
> > > > > > >
> > > > > > > - What parts need information about this?
> > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > -> Can the GPU Manager be a "self contained" thing that simply
> > > takes
> > > > > > > the configuration, and then abstracts everything internally?
> > > > Operators
> > > > > > can
> > > > > > > access it via "GPUManager.get()" or so?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[hidden email]>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks for all the feedbacks.
> > > > > > > >
> > > > > > > > @Becket
> > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them
> to
> > > the
> > > > > > > > Public API section.
> > > > > > > >
> > > > > > > >
> > > > > > > > @Stephan @Becket
> > > > > > > > Regarding the general extended resource mechanism, I second
> > > > Xintong's
> > > > > > > > suggestion.
> > > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> > after
> > > we
> > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > proposal, I
> > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > > > correctly, it just a code refactoring atm, we could extract
> the
> > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > interface. If
> > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > >
> > > > > > > > @Xingbo
> > > > > > > > As Xintong said, we looked into how Spark supports a general
> > > > "Custom
> > > > > > > > Resource Scheduling" before and decided to introduce a common
> > > > resource
> > > > > > > > configuration
> > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > to make it more extensible. I think the "resource" is a
> proper
> > > > level
> > > > > > > > to contain all the configs of extended resources.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > >
> > > > > > > > > There is no doubt that GPU resource management support will
> > > > greatly
> > > > > > > > > facilitate the development of AI-related applications by
> > > PyFlink
> > > > > > users.
> > > > > > > > >
> > > > > > > > > I have only one comment about this wiki:
> > > > > > > > >
> > > > > > > > > Regarding the names of several GPU configurations, I think
> it
> > > is
> > > > > > better
> > > > > > > > to
> > > > > > > > > delete the resource field makes it consistent with the
> names
> > of
> > > > other
> > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > >
> > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Xingbo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三
> 上午10:39写道:
> > > > > > > > >
> > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > >
> > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> discussion
> > > > about
> > > > > > > > making
> > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > Support".
> > > > We
> > > > > > > > believe
> > > > > > > > > > supporting extended resources in a general mechanism is
> > > > definitely
> > > > > > a
> > > > > > > > good
> > > > > > > > > > and extensible way. The reason we propose this FLIP
> > narrowing
> > > > its
> > > > > > scope
> > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > efforts
> > > > and
> > > > > > > > review
> > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > >
> > > > > > > > > > To come up with a well design on a general extended
> > resource
> > > > > > management
> > > > > > > > > > mechanism, we would need to investigate more on how
> people
> > > use
> > > > > > > > different
> > > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > > knowledge
> > > > > > from
> > > > > > > > the
> > > > > > > > > > experts, Becket and his team members. But for FPGA, or
> > other
> > > > > > potential
> > > > > > > > > > extended resources, we don't have such convenient
> > information
> > > > > > sources,
> > > > > > > > > > making the investigation requires more efforts, which I
> > tend
> > > to
> > > > > > think
> > > > > > > > is
> > > > > > > > > > not necessary atm.
> > > > > > > > > >
> > > > > > > > > > On the other hand, we also looked into how Spark
> supports a
> > > > general
> > > > > > > > "Custom
> > > > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > > > general
> > > > > > > > extended
> > > > > > > > > > resource mechanism in the future, we believe that the
> > current
> > > > GPU
> > > > > > > > support
> > > > > > > > > > design can be easily extended, in an incremental way
> > without
> > > > too
> > > > > > many
> > > > > > > > > > reworks.
> > > > > > > > > >
> > > > > > > > > > - The most important part is probably user interfaces.
> > Spark
> > > > > > offers
> > > > > > > > > > configuration options to define the amount, discovery
> > script
> > > > and
> > > > > > > > vendor
> > > > > > > > > > (on
> > > > > > > > > > k8s) in a per resource type bias [1], which is very
> similar
> > > to
> > > > > > what
> > > > > > > > we
> > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> expose
> > > > > > config
> > > > > > > > > > options
> > > > > > > > > > in the general way atm, since we do not have supports for
> > > other
> > > > > > > > resource
> > > > > > > > > > types now. If later we decided to have per resource type
> > > config
> > > > > > > > > > options, we
> > > > > > > > > > can have backwards compatibility on the current proposed
> > > > options
> > > > > > > > with
> > > > > > > > > > simple key mapping.
> > > > > > > > > > - For the GPU Manager, if later needed we can change it
> to
> > a
> > > > > > > > "Extended
> > > > > > > > > > Resource Manager" (or whatever it is called). That should
> > be
> > > a
> > > > > > pure
> > > > > > > > > > component-internal refactoring.
> > > > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > > > fields for
> > > > > > > > > > general extended resource. We can of course leverage them
> > > when
> > > > > > > > > > supporting
> > > > > > > > > > fine grained GPU scheduling. That is also not in the
> scope
> > of
> > > > > > this
> > > > > > > > first
> > > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > > first.
> > > > > > > > > >
> > > > > > > > > > To summary up, I agree with Becket that have a separate
> > FLIP
> > > > for
> > > > > > the
> > > > > > > > > > general extended resource mechanism, and keep it in mind
> > when
> > > > > > > > discussing
> > > > > > > > > > and implementing the current one.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > > generalize
> > > > > > the
> > > > > > > > > > > resource management to support custom resources. Having
> > > that
> > > > > > allows
> > > > > > > > users
> > > > > > > > > > > to add new resources by themselves. The general
> resource
> > > > > > management
> > > > > > > > may
> > > > > > > > > > > involve two different aspects:
> > > > > > > > > > >
> > > > > > > > > > > 1. The custom resource type definition. It is supported
> > by
> > > > the
> > > > > > > > extended
> > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> will
> > > > likely
> > > > > > cover
> > > > > > > > > > > majority of the cases.
> > > > > > > > > > >
> > > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> > assign
> > > > the
> > > > > > > > resources
> > > > > > > > > > > to different tasks, operators, and so on. This may
> > require
> > > > two
> > > > > > > > levels /
> > > > > > > > > > > steps:
> > > > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > > > suitable
> > > > > > > > > > slots.
> > > > > > > > > > > It is done by the global RM and is not customizable
> right
> > > > now.
> > > > > > > > > > > b. Operator level - map the exact resource to the
> > operators
> > > > > > in
> > > > > > > > TM.
> > > > > > > > > > e.g.
> > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step
> is
> > > > needed
> > > > > > > > assuming
> > > > > > > > > > > the global RM does not distinguish individual resources
> > of
> > > > the
> > > > > > same
> > > > > > > > type.
> > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > >
> > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> should
> > > > > > discover the
> > > > > > > > > > > physical GPU information and bind/match them to each
> > > > operators.
> > > > > > > > Making
> > > > > > > > > > this
> > > > > > > > > > > general will fill in the missing piece to support
> custom
> > > > resource
> > > > > > > > type
> > > > > > > > > > > definition. But I'd avoid calling it a "External
> Resource
> > > > > > Manager" to
> > > > > > > > > > avoid
> > > > > > > > > > > confusion with RM, maybe something like "Operator
> > Resource
> > > > > > Assigner"
> > > > > > > > > > would
> > > > > > > > > > > be more accurate. So for each resource type users can
> > have
> > > an
> > > > > > > > optional
> > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> users
> > > > don't
> > > > > > need
> > > > > > > > > > this,
> > > > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > > > >
> > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > Assigner"
> > > > is
> > > > > > > > achievable
> > > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > > separate
> > > > > > FLIP
> > > > > > > > > > because
> > > > > > > > > > > the interface between the "Operator Resource Assigner"
> > and
> > > > > > operator
> > > > > > > > may
> > > > > > > > > > > take a while to settle down if we want to make it
> > generic.
> > > > But I
> > > > > > > > think
> > > > > > > > > > our
> > > > > > > > > > > implementation should take this future work into
> > > > consideration so
> > > > > > > > that we
> > > > > > > > > > > don't need to break backwards compatibility once we
> have
> > > > that.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > >
> > > > > > > > > > > > I cannot really give much input into the mechanics of
> > > > GPU-aware
> > > > > > > > > > > scheduling
> > > > > > > > > > > > and GPU allocation, as I have no experience with
> that.
> > > > > > > > > > > >
> > > > > > > > > > > > One thought I had when reading the proposal is if it
> > > makes
> > > > > > sense to
> > > > > > > > > > look
> > > > > > > > > > > at
> > > > > > > > > > > > the "GPU Manager" as an "External Resource Manager",
> > and
> > > > GPU
> > > > > > is one
> > > > > > > > > > such
> > > > > > > > > > > > resource.
> > > > > > > > > > > > The way I understand the ResourceProfile and
> > > ResourceSpec,
> > > > > > that is
> > > > > > > > how
> > > > > > > > > > it
> > > > > > > > > > > > is done there.
> > > > > > > > > > > > It has the advantage that it looks more extensible.
> > Maybe
> > > > > > there is
> > > > > > > > a
> > > > > > > > > > GPU
> > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > > > Resource, a
> > > > > > > > > > Alibaba
> > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stephan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > > > support
> > > > > > is a
> > > > > > > > > > > > must-have
> > > > > > > > > > > > > for machine learning use cases. Actually it is one
> of
> > > the
> > > > > > mostly
> > > > > > > > > > asked
> > > > > > > > > > > > > question from the users who are interested in using
> > > Flink
> > > > > > for ML.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > mentioned in
> > > > > > the
> > > > > > > > > > public
> > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > > > public
> > > > > > API?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > [hidden email]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > > > discussion,
> > > > > > > > > > Yangze.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU
> in
> > > > Flink
> > > > > > is
> > > > > > > > > > > > significant,
> > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good
> > to
> > > > me. I
> > > > > > > > think
> > > > > > > > > > > it's a
> > > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > [hidden email]
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > "FLIP-108:
> > > > > > Add
> > > > > > > > GPU
> > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This FLIP mainly discusses the following
> issues:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> > task
> > > > > > executor
> > > > > > > > and
> > > > > > > > > > > > > > > forward such requirements to the external
> > resource
> > > > > > managers
> > > > > > > > (for
> > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > - Provide information of available GPU
> resources
> > to
> > > > > > > > operators.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> follows:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> manager
> > > > > > services to
> > > > > > > > > > > > discover
> > > > > > > > > > > > > > > and expose GPU resource information to the
> > context
> > > of
> > > > > > > > functions.
> > > > > > > > > > > > > > > - Introduce the default script for GPU
> discovery,
> > > in
> > > > > > which we
> > > > > > > > > > > provide
> > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > worker-level
> > > > > > > > isolation
> > > > > > > > > > > in
> > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > document
> > > > [1].
> > > > > > > > Looking
> > > > > > > > > > > > forward
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Becket Qin
Thanks for the comment, Stephan.

  - If everything becomes a "core feature", it will make the project hard
> to develop in the future. Thinking "library" / "plugin" / "extension" style
> where possible helps.


Completely agree. It is much more important to design a mechanism than
focusing on a specific case. Here is what I am thinking to fully support
custom resource management:
1. On the JM / RM side, use ResourceProfile and ResourceSpec to define the
resource and the amount required. They will be used to find suitable TMs
slots to run the tasks. At this point, the resources are only measured by
amount, i.e. they do not have individual ID.

2. On the TM side, have something like *"ResourceInfoProvider"* to identify
and provides the detail information of the individual resource, e.g. GPU
ID.. It is important because the operator may have to explicitly interact
with the physical resource it uses. The ResourceInfoProvider might look
like something below.
interface ResourceInfoProvider<INFO> {
    Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
ResourceProfile resourceProfile);
}

- There could be several "*ResourceInfoProvider*" configured on the TM to
retrieve the information for different resources.
- The TM will be responsible to assign those individual resources to each
operator according to their requested amount.
- The operators will be able to get the ResourceInfo from their
RuntimeContext.

If we agree this is a reasonable final state. We can adapt the current FLIP
to it. In fact it does not sound a big change to me. All the proposed
configuration can be as is, it is just that Flink itself won't care about
them, instead a GPUInfoProviver implementing the ResourceInfoProvider will
use them.

Thanks,

Jiangjie (Becket) Qin

On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> The main point I wanted to throw into the discussion is the following:
>   - With more and more use cases, more and more tools go into Flink
>   - If everything becomes a "core feature", it will make the project hard
> to develop in the future. Thinking "library" / "plugin" / "extension" style
> where possible helps.
>
>   - A good thought experiment is always: How many future developers have to
> interact with this code (and possibly understand it partially), even if the
> features they touch have nothing to do with GPU support. If many
> contributors to unrelated features will have to touch it and understand it,
> then let's think if there is a different solution. Maybe there is not, but
> then we should be sure why.
>
>   - That led me to raising this issue: If the GPU manager becomes a core
> service in the TaskManager, Environment, RuntimeContext, etc. then everyone
> developing TM and streaming tasks need to understand the GPU manager. That
> seems oddly specific, is my impression.
>
> Access to configuration seems not the right reason to do that. We should
> expose the Flink configuration from the RuntimeContext anyways.
>
> If GPUs are sliced and assigned during scheduling, there may be reason,
> although it looks that it would belong to the slot then. Is that what we
> are doing here?
>
> Best,
> Stephan
>
>
> On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <[hidden email]>
> wrote:
>
> >  Thanks for the feedback, Becket.
> >
> > IMO, eventually an operator should only see info of GPUs that are
> dedicated
> > for it, instead of all GPUs on the machine/container in the current
> design.
> > It does not make sense to let the user who writes a UDF to worry about
> > coordination among multiple operators running on the same machine. And if
> > we want to limit the GPU info an operator sees, we should not let the
> > operator to instantiate GPUManager, which means we have to expose
> something
> > through runtime context, either GPU info or some kind of limited access
> to
> > the GPUManager.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <[hidden email]> wrote:
> >
> > > It probably make sense for us to first agree on the final state. More
> > > specifically, will the resource info be exposed through runtime context
> > > eventually?
> > >
> > > If that is the final state and we have a seamless migration story from
> > this
> > > FLIP to that final state, Personally I think it is OK to expose the GPU
> > > info in the runtime context.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > @Yangze,
> > > > I think what Stephan means (@Stephan, please correct me if I'm wrong)
> > is
> > > > that, we might not need to hold and maintain the GPUManager as a
> > service
> > > in
> > > > TaskManagerServices or RuntimeContext. An alternative is to create /
> > > > retrieve the GPUManager only in the operators that need it, e.g.,
> with
> > a
> > > > static method `GPUManager.get()`.
> > > >
> > > > @Stephan,
> > > > I agree with you on excluding GPUManager from TaskManagerServices.
> > > >
> > > >    - For the first step, where we provide unified TM-level GPU
> > > information
> > > >    to all operators, it should be fine to have operators access /
> > > >    lazy-initiate GPUManager by themselves.
> > > >    - In future, we might have some more fine-grained GPU management,
> > > where
> > > >    we need to maintain GPUManager as a service and put GPU info in
> slot
> > > >    profiles. But at least for now it's not necessary to introduce
> such
> > > >    complexity.
> > > >
> > > > However, I have some concerns on excluding GPUManager from
> > RuntimeContext
> > > > and let operators access it directly.
> > > >
> > > >    - Configurations needed for creating the GPUManager is not always
> > > >    available for operators.
> > > >    - If later we want to have fine-grained control over GPU (e.g.,
> > > >    operators in each slot can only see GPUs reserved for that slot),
> > the
> > > >    approach cannot be easily extended.
> > > >
> > > > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > > > expose the GPUInfo to users. For now, we can declare a method
> > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> calls
> > > > `GPUManager.get()` to get the lazily-created GPUManager. If later we
> > want
> > > > to create / retrieve GPUManager in a different way, we can simply
> > change
> > > > how `getGPUInfo` is implemented, without needing to change any public
> > > > interfaces.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]>
> > wrote:
> > > >
> > > > > @Shephan
> > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> Manager
> > > > > in such scenario.
> > > > > If that's what you worry about, I'm +1 for holding
> > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > TaskManagerServices.
> > > > >
> > > > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> could
> > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> [hidden email]
> > >
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote
> > ----
> > > > > >
> > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > ExternalServicesManagers in future) is conceptually one of the
> > task
> > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > - It maintains/holds the GPU resource at TM level and all of
> the
> > > > > > > operators allocate the GPU resources from it. So, it should be
> > > > > > > exclusive to a single TaskExecutor.
> > > > > > > - We could add a collection called ExternalResourceManagers to
> > hold
> > > > > > > all managers of other external resources in the future.
> > > > > > >
> > > > > >
> > > > > > Can you help me understand why this needs the addition in
> > > > > TaskMagerServices
> > > > > > or in the RuntimeContext?
> > > > > > Are you worried about the case when multiple Task Executors run
> in
> > > the
> > > > > same
> > > > > > JVM? That's not common, but wouldn't it actually be good in that
> > case
> > > > to
> > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > >
> > > > > > Thanks,
> > > > > > Stephan
> > > > > >
> > > > > > ---------------------------
> > > > > >
> > > > > >
> > > > > > > What parts need information about this?
> > > > > > > In this FLIP, operators need the information. Thus, we expose
> GPU
> > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > profile
> > > > is
> > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > >
> > > > > > > > Can the GPU Manager be a "self contained" thing that simply
> > takes
> > > > the
> > > > > > > configuration, and then abstracts everything internally?
> > > > > > > Yes, we just pass the path/args of the discover script and how
> > many
> > > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > > information and expose them to the
> RuntimeContext/FunctionContext
> > > of
> > > > > > > Operators. Meanwhile, we'd better not allow operators to
> directly
> > > > > > > access GPUManager, it should get what they want from Context.
> We
> > > > could
> > > > > > > then decouple the interface/implementation of GPUManager and
> > Public
> > > > > > > API.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[hidden email]
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > It sounds fine to initially start with GPU specific support
> and
> > > > think
> > > > > > > about
> > > > > > > > generalizing this once we better understand the space.
> > > > > > > >
> > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > > Anything
> > > > > we
> > > > > > > > have to pull through all layers of the TM makes the TM
> > components
> > > > yet
> > > > > > > more
> > > > > > > > complex and harder to maintain.
> > > > > > > >
> > > > > > > > - What parts need information about this?
> > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> simply
> > > > takes
> > > > > > > > the configuration, and then abstracts everything internally?
> > > > > Operators
> > > > > > > can
> > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> [hidden email]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > >
> > > > > > > > > @Becket
> > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> them
> > to
> > > > the
> > > > > > > > > Public API section.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > @Stephan @Becket
> > > > > > > > > Regarding the general extended resource mechanism, I second
> > > > > Xintong's
> > > > > > > > > suggestion.
> > > > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> > > after
> > > > we
> > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > proposal, I
> > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > - Regarding the "Extended Resource Manager", if I
> understand
> > > > > > > > > correctly, it just a code refactoring atm, we could extract
> > the
> > > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > > interface. If
> > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > >
> > > > > > > > > @Xingbo
> > > > > > > > > As Xintong said, we looked into how Spark supports a
> general
> > > > > "Custom
> > > > > > > > > Resource Scheduling" before and decided to introduce a
> common
> > > > > resource
> > > > > > > > > configuration
> > > > > > > > >
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > to make it more extensible. I think the "resource" is a
> > proper
> > > > > level
> > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > [hidden email]
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > >
> > > > > > > > > > There is no doubt that GPU resource management support
> will
> > > > > greatly
> > > > > > > > > > facilitate the development of AI-related applications by
> > > > PyFlink
> > > > > > > users.
> > > > > > > > > >
> > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > >
> > > > > > > > > > Regarding the names of several GPU configurations, I
> think
> > it
> > > > is
> > > > > > > better
> > > > > > > > > to
> > > > > > > > > > delete the resource field makes it consistent with the
> > names
> > > of
> > > > > other
> > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > >
> > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Xingbo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三
> > 上午10:39写道:
> > > > > > > > > >
> > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > >
> > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > discussion
> > > > > about
> > > > > > > > > making
> > > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > > Support".
> > > > > We
> > > > > > > > > believe
> > > > > > > > > > > supporting extended resources in a general mechanism is
> > > > > definitely
> > > > > > > a
> > > > > > > > > good
> > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > narrowing
> > > > > its
> > > > > > > scope
> > > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > > efforts
> > > > > and
> > > > > > > > > review
> > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > >
> > > > > > > > > > > To come up with a well design on a general extended
> > > resource
> > > > > > > management
> > > > > > > > > > > mechanism, we would need to investigate more on how
> > people
> > > > use
> > > > > > > > > different
> > > > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > > > knowledge
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > experts, Becket and his team members. But for FPGA, or
> > > other
> > > > > > > potential
> > > > > > > > > > > extended resources, we don't have such convenient
> > > information
> > > > > > > sources,
> > > > > > > > > > > making the investigation requires more efforts, which I
> > > tend
> > > > to
> > > > > > > think
> > > > > > > > > is
> > > > > > > > > > > not necessary atm.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, we also looked into how Spark
> > supports a
> > > > > general
> > > > > > > > > "Custom
> > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> similar
> > > > > general
> > > > > > > > > extended
> > > > > > > > > > > resource mechanism in the future, we believe that the
> > > current
> > > > > GPU
> > > > > > > > > support
> > > > > > > > > > > design can be easily extended, in an incremental way
> > > without
> > > > > too
> > > > > > > many
> > > > > > > > > > > reworks.
> > > > > > > > > > >
> > > > > > > > > > > - The most important part is probably user interfaces.
> > > Spark
> > > > > > > offers
> > > > > > > > > > > configuration options to define the amount, discovery
> > > script
> > > > > and
> > > > > > > > > vendor
> > > > > > > > > > > (on
> > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > similar
> > > > to
> > > > > > > what
> > > > > > > > > we
> > > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> > expose
> > > > > > > config
> > > > > > > > > > > options
> > > > > > > > > > > in the general way atm, since we do not have supports
> for
> > > > other
> > > > > > > > > resource
> > > > > > > > > > > types now. If later we decided to have per resource
> type
> > > > config
> > > > > > > > > > > options, we
> > > > > > > > > > > can have backwards compatibility on the current
> proposed
> > > > > options
> > > > > > > > > with
> > > > > > > > > > > simple key mapping.
> > > > > > > > > > > - For the GPU Manager, if later needed we can change it
> > to
> > > a
> > > > > > > > > "Extended
> > > > > > > > > > > Resource Manager" (or whatever it is called). That
> should
> > > be
> > > > a
> > > > > > > pure
> > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> already
> > > > > > > fields for
> > > > > > > > > > > general extended resource. We can of course leverage
> them
> > > > when
> > > > > > > > > > > supporting
> > > > > > > > > > > fine grained GPU scheduling. That is also not in the
> > scope
> > > of
> > > > > > > this
> > > > > > > > > first
> > > > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > > > first.
> > > > > > > > > > >
> > > > > > > > > > > To summary up, I agree with Becket that have a separate
> > > FLIP
> > > > > for
> > > > > > > the
> > > > > > > > > > > general extended resource mechanism, and keep it in
> mind
> > > when
> > > > > > > > > discussing
> > > > > > > > > > > and implementing the current one.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > > > generalize
> > > > > > > the
> > > > > > > > > > > > resource management to support custom resources.
> Having
> > > > that
> > > > > > > allows
> > > > > > > > > users
> > > > > > > > > > > > to add new resources by themselves. The general
> > resource
> > > > > > > management
> > > > > > > > > may
> > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. The custom resource type definition. It is
> supported
> > > by
> > > > > the
> > > > > > > > > extended
> > > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> > will
> > > > > likely
> > > > > > > cover
> > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> > > assign
> > > > > the
> > > > > > > > > resources
> > > > > > > > > > > > to different tasks, operators, and so on. This may
> > > require
> > > > > two
> > > > > > > > > levels /
> > > > > > > > > > > > steps:
> > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> into
> > > > > > > suitable
> > > > > > > > > > > slots.
> > > > > > > > > > > > It is done by the global RM and is not customizable
> > right
> > > > > now.
> > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > operators
> > > > > > > in
> > > > > > > > > TM.
> > > > > > > > > > > e.g.
> > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step
> > is
> > > > > needed
> > > > > > > > > assuming
> > > > > > > > > > > > the global RM does not distinguish individual
> resources
> > > of
> > > > > the
> > > > > > > same
> > > > > > > > > type.
> > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > >
> > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > should
> > > > > > > discover the
> > > > > > > > > > > > physical GPU information and bind/match them to each
> > > > > operators.
> > > > > > > > > Making
> > > > > > > > > > > this
> > > > > > > > > > > > general will fill in the missing piece to support
> > custom
> > > > > resource
> > > > > > > > > type
> > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > Resource
> > > > > > > Manager" to
> > > > > > > > > > > avoid
> > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > Resource
> > > > > > > Assigner"
> > > > > > > > > > > would
> > > > > > > > > > > > be more accurate. So for each resource type users can
> > > have
> > > > an
> > > > > > > > > optional
> > > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> > users
> > > > > don't
> > > > > > > need
> > > > > > > > > > > this,
> > > > > > > > > > > > but for other extended resources, users may need
> that.
> > > > > > > > > > > >
> > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > Assigner"
> > > > > is
> > > > > > > > > achievable
> > > > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > > > separate
> > > > > > > FLIP
> > > > > > > > > > > because
> > > > > > > > > > > > the interface between the "Operator Resource
> Assigner"
> > > and
> > > > > > > operator
> > > > > > > > > may
> > > > > > > > > > > > take a while to settle down if we want to make it
> > > generic.
> > > > > But I
> > > > > > > > > think
> > > > > > > > > > > our
> > > > > > > > > > > > implementation should take this future work into
> > > > > consideration so
> > > > > > > > > that we
> > > > > > > > > > > > don't need to break backwards compatibility once we
> > have
> > > > > that.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I cannot really give much input into the mechanics
> of
> > > > > GPU-aware
> > > > > > > > > > > > scheduling
> > > > > > > > > > > > > and GPU allocation, as I have no experience with
> > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > One thought I had when reading the proposal is if
> it
> > > > makes
> > > > > > > sense to
> > > > > > > > > > > look
> > > > > > > > > > > > at
> > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> Manager",
> > > and
> > > > > GPU
> > > > > > > is one
> > > > > > > > > > > such
> > > > > > > > > > > > > resource.
> > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > ResourceSpec,
> > > > > > > that is
> > > > > > > > > how
> > > > > > > > > > > it
> > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > It has the advantage that it looks more extensible.
> > > Maybe
> > > > > > > there is
> > > > > > > > > a
> > > > > > > > > > > GPU
> > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and
> FPGA
> > > > > > > Resource, a
> > > > > > > > > > > Alibaba
> > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > [hidden email]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> management
> > > > > support
> > > > > > > is a
> > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > for machine learning use cases. Actually it is
> one
> > of
> > > > the
> > > > > > > mostly
> > > > > > > > > > > asked
> > > > > > > > > > > > > > question from the users who are interested in
> using
> > > > Flink
> > > > > > > for ML.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > > mentioned in
> > > > > > > the
> > > > > > > > > > > public
> > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> also a
> > > > > public
> > > > > > > API?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > > [hidden email]>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off
> the
> > > > > > > discussion,
> > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Big +1 for this feature. Supporting using of
> GPU
> > in
> > > > > Flink
> > > > > > > is
> > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks
> good
> > > to
> > > > > me. I
> > > > > > > > > think
> > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > > [hidden email]
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > > "FLIP-108:
> > > > > > > Add
> > > > > > > > > GPU
> > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > issues:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> > > task
> > > > > > > executor
> > > > > > > > > and
> > > > > > > > > > > > > > > > forward such requirements to the external
> > > resource
> > > > > > > managers
> > > > > > > > > (for
> > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > - Provide information of available GPU
> > resources
> > > to
> > > > > > > > > operators.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > follows:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > manager
> > > > > > > services to
> > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > and expose GPU resource information to the
> > > context
> > > > of
> > > > > > > > > functions.
> > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > discovery,
> > > > in
> > > > > > > which we
> > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > > worker-level
> > > > > > > > > isolation
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > document
> > > > > [1].
> > > > > > > > > Looking
> > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song
Thanks for the comments, Stephan & Becket.

@Stephan

I see your concern, and I completely agree with you that we should first
think about the "library" / "plugin" / "extension" style if possible.

If GPUs are sliced and assigned during scheduling, there may be reason,
> although it looks that it would belong to the slot then. Is that what we
> are doing here?


In the current proposal, we do not have the GPUs sliced and assigned to
slots, because it could be problematic without dynamic slot allocation.
E.g., the number of GPUs might not be evenly divisible by the number of
slots.

I think it makes sense to eventually have the GPUs assigned to slots. Even
then, we might still need a TM level GPUManager (or ResourceProvider like
Becket suggested). For memory, in each slot we can simply request the
amount of memory, leaving it to JVM / OS to decide which memory (address)
should be assigned. For GPU, and potentially other resources like FPGA, we
need to explicitly specify which GPU (index) should be used. Therefore, we
need some component at the TM level to coordinate which slot uses which
GPU.

IMO, unless we say Flink will not support slot-level GPU slicing at least
in the foreseeable future, I don't see a good way to avoid touching the TM
core. To that end, I think Becket's suggestion points to a good direction,
that supports more features (GPU, FPGA, etc.) with less coupling to the TM
core (only needs to understand the general interfaces). The detailed
implementation for specific resource types can even be encapsulated as a
library.

@Becket

Thanks for sharing your thought on the final state. Despite the details how
the interfaces should look like, I think this is a really good abstraction
for supporting general resource types.

I'd like to further clarify that, the following three things are all that
the "Flink core" needs to understand.

   - The *amount* of resource, for scheduling. Actually, we already have
   the Resource class in ResourceProfile and ResourceSpec for extended
   resource. It's just not really used.
   - The *info*, that Flink provides to the operators / user codes.
   - The *provider*, which generates the info based on the amount.

The "core" does not need to understand the specific implementation details
of the above three. They can even be implemented in a 3rd-party library.
Similar to how we allow users to define their custom MetricReporter.

Thank you~

Xintong Song



On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <[hidden email]> wrote:

> Thanks for the comment, Stephan.
>
>   - If everything becomes a "core feature", it will make the project hard
> > to develop in the future. Thinking "library" / "plugin" / "extension"
> style
> > where possible helps.
>
>
> Completely agree. It is much more important to design a mechanism than
> focusing on a specific case. Here is what I am thinking to fully support
> custom resource management:
> 1. On the JM / RM side, use ResourceProfile and ResourceSpec to define the
> resource and the amount required. They will be used to find suitable TMs
> slots to run the tasks. At this point, the resources are only measured by
> amount, i.e. they do not have individual ID.
>
> 2. On the TM side, have something like *"ResourceInfoProvider"* to identify
> and provides the detail information of the individual resource, e.g. GPU
> ID.. It is important because the operator may have to explicitly interact
> with the physical resource it uses. The ResourceInfoProvider might look
> like something below.
> interface ResourceInfoProvider<INFO> {
>     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> ResourceProfile resourceProfile);
> }
>
> - There could be several "*ResourceInfoProvider*" configured on the TM to
> retrieve the information for different resources.
> - The TM will be responsible to assign those individual resources to each
> operator according to their requested amount.
> - The operators will be able to get the ResourceInfo from their
> RuntimeContext.
>
> If we agree this is a reasonable final state. We can adapt the current FLIP
> to it. In fact it does not sound a big change to me. All the proposed
> configuration can be as is, it is just that Flink itself won't care about
> them, instead a GPUInfoProviver implementing the ResourceInfoProvider will
> use them.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <[hidden email]> wrote:
>
> > Hi all!
> >
> > The main point I wanted to throw into the discussion is the following:
> >   - With more and more use cases, more and more tools go into Flink
> >   - If everything becomes a "core feature", it will make the project hard
> > to develop in the future. Thinking "library" / "plugin" / "extension"
> style
> > where possible helps.
> >
> >   - A good thought experiment is always: How many future developers have
> to
> > interact with this code (and possibly understand it partially), even if
> the
> > features they touch have nothing to do with GPU support. If many
> > contributors to unrelated features will have to touch it and understand
> it,
> > then let's think if there is a different solution. Maybe there is not,
> but
> > then we should be sure why.
> >
> >   - That led me to raising this issue: If the GPU manager becomes a core
> > service in the TaskManager, Environment, RuntimeContext, etc. then
> everyone
> > developing TM and streaming tasks need to understand the GPU manager.
> That
> > seems oddly specific, is my impression.
> >
> > Access to configuration seems not the right reason to do that. We should
> > expose the Flink configuration from the RuntimeContext anyways.
> >
> > If GPUs are sliced and assigned during scheduling, there may be reason,
> > although it looks that it would belong to the slot then. Is that what we
> > are doing here?
> >
> > Best,
> > Stephan
> >
> >
> > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > >  Thanks for the feedback, Becket.
> > >
> > > IMO, eventually an operator should only see info of GPUs that are
> > dedicated
> > > for it, instead of all GPUs on the machine/container in the current
> > design.
> > > It does not make sense to let the user who writes a UDF to worry about
> > > coordination among multiple operators running on the same machine. And
> if
> > > we want to limit the GPU info an operator sees, we should not let the
> > > operator to instantiate GPUManager, which means we have to expose
> > something
> > > through runtime context, either GPU info or some kind of limited access
> > to
> > > the GPUManager.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <[hidden email]>
> wrote:
> > >
> > > > It probably make sense for us to first agree on the final state. More
> > > > specifically, will the resource info be exposed through runtime
> context
> > > > eventually?
> > > >
> > > > If that is the final state and we have a seamless migration story
> from
> > > this
> > > > FLIP to that final state, Personally I think it is OK to expose the
> GPU
> > > > info in the runtime context.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > @Yangze,
> > > > > I think what Stephan means (@Stephan, please correct me if I'm
> wrong)
> > > is
> > > > > that, we might not need to hold and maintain the GPUManager as a
> > > service
> > > > in
> > > > > TaskManagerServices or RuntimeContext. An alternative is to create
> /
> > > > > retrieve the GPUManager only in the operators that need it, e.g.,
> > with
> > > a
> > > > > static method `GPUManager.get()`.
> > > > >
> > > > > @Stephan,
> > > > > I agree with you on excluding GPUManager from TaskManagerServices.
> > > > >
> > > > >    - For the first step, where we provide unified TM-level GPU
> > > > information
> > > > >    to all operators, it should be fine to have operators access /
> > > > >    lazy-initiate GPUManager by themselves.
> > > > >    - In future, we might have some more fine-grained GPU
> management,
> > > > where
> > > > >    we need to maintain GPUManager as a service and put GPU info in
> > slot
> > > > >    profiles. But at least for now it's not necessary to introduce
> > such
> > > > >    complexity.
> > > > >
> > > > > However, I have some concerns on excluding GPUManager from
> > > RuntimeContext
> > > > > and let operators access it directly.
> > > > >
> > > > >    - Configurations needed for creating the GPUManager is not
> always
> > > > >    available for operators.
> > > > >    - If later we want to have fine-grained control over GPU (e.g.,
> > > > >    operators in each slot can only see GPUs reserved for that
> slot),
> > > the
> > > > >    approach cannot be easily extended.
> > > > >
> > > > > I would suggest to wrap the GPUManager behind RuntimeContext and
> only
> > > > > expose the GPUInfo to users. For now, we can declare a method
> > > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> > calls
> > > > > `GPUManager.get()` to get the lazily-created GPUManager. If later
> we
> > > want
> > > > > to create / retrieve GPUManager in a different way, we can simply
> > > change
> > > > > how `getGPUInfo` is implemented, without needing to change any
> public
> > > > > interfaces.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[hidden email]>
> > > wrote:
> > > > >
> > > > > > @Shephan
> > > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> > Manager
> > > > > > in such scenario.
> > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > > TaskManagerServices.
> > > > > >
> > > > > > Regarding the RuntimeContext/FunctionContext, it just holds the
> GPU
> > > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> > could
> > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [hidden email] wrote
> > > ----
> > > > > > >
> > > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > ExternalServicesManagers in future) is conceptually one of
> the
> > > task
> > > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > > - It maintains/holds the GPU resource at TM level and all of
> > the
> > > > > > > > operators allocate the GPU resources from it. So, it should
> be
> > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > - We could add a collection called ExternalResourceManagers
> to
> > > hold
> > > > > > > > all managers of other external resources in the future.
> > > > > > > >
> > > > > > >
> > > > > > > Can you help me understand why this needs the addition in
> > > > > > TaskMagerServices
> > > > > > > or in the RuntimeContext?
> > > > > > > Are you worried about the case when multiple Task Executors run
> > in
> > > > the
> > > > > > same
> > > > > > > JVM? That's not common, but wouldn't it actually be good in
> that
> > > case
> > > > > to
> > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stephan
> > > > > > >
> > > > > > > ---------------------------
> > > > > > >
> > > > > > >
> > > > > > > > What parts need information about this?
> > > > > > > > In this FLIP, operators need the information. Thus, we expose
> > GPU
> > > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > > profile
> > > > > is
> > > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > > >
> > > > > > > > > Can the GPU Manager be a "self contained" thing that simply
> > > takes
> > > > > the
> > > > > > > > configuration, and then abstracts everything internally?
> > > > > > > > Yes, we just pass the path/args of the discover script and
> how
> > > many
> > > > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > > > information and expose them to the
> > RuntimeContext/FunctionContext
> > > > of
> > > > > > > > Operators. Meanwhile, we'd better not allow operators to
> > directly
> > > > > > > > access GPUManager, it should get what they want from Context.
> > We
> > > > > could
> > > > > > > > then decouple the interface/implementation of GPUManager and
> > > Public
> > > > > > > > API.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> [hidden email]
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > It sounds fine to initially start with GPU specific support
> > and
> > > > > think
> > > > > > > > about
> > > > > > > > > generalizing this once we better understand the space.
> > > > > > > > >
> > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > > > Anything
> > > > > > we
> > > > > > > > > have to pull through all layers of the TM makes the TM
> > > components
> > > > > yet
> > > > > > > > more
> > > > > > > > > complex and harder to maintain.
> > > > > > > > >
> > > > > > > > > - What parts need information about this?
> > > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> > simply
> > > > > takes
> > > > > > > > > the configuration, and then abstracts everything
> internally?
> > > > > > Operators
> > > > > > > > can
> > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > >
> > > > > > > > > > @Becket
> > > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> > them
> > > to
> > > > > the
> > > > > > > > > > Public API section.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > @Stephan @Becket
> > > > > > > > > > Regarding the general extended resource mechanism, I
> second
> > > > > > Xintong's
> > > > > > > > > > suggestion.
> > > > > > > > > > - It's better to leverage ResourceProfile and
> ResourceSpec
> > > > after
> > > > > we
> > > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > > proposal, I
> > > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > understand
> > > > > > > > > > correctly, it just a code refactoring atm, we could
> extract
> > > the
> > > > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > > > interface. If
> > > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > > >
> > > > > > > > > > @Xingbo
> > > > > > > > > > As Xintong said, we looked into how Spark supports a
> > general
> > > > > > "Custom
> > > > > > > > > > Resource Scheduling" before and decided to introduce a
> > common
> > > > > > resource
> > > > > > > > > > configuration
> > > > > > > > > >
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > to make it more extensible. I think the "resource" is a
> > > proper
> > > > > > level
> > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > [hidden email]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > >
> > > > > > > > > > > There is no doubt that GPU resource management support
> > will
> > > > > > greatly
> > > > > > > > > > > facilitate the development of AI-related applications
> by
> > > > > PyFlink
> > > > > > > > users.
> > > > > > > > > > >
> > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > >
> > > > > > > > > > > Regarding the names of several GPU configurations, I
> > think
> > > it
> > > > > is
> > > > > > > > better
> > > > > > > > > > to
> > > > > > > > > > > delete the resource field makes it consistent with the
> > > names
> > > > of
> > > > > > other
> > > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > > >
> > > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > >
> > > > > > > > > > > Xingbo
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song <[hidden email]> 于2020年3月4日周三
> > > 上午10:39写道:
> > > > > > > > > > >
> > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > >
> > > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > > discussion
> > > > > > about
> > > > > > > > > > making
> > > > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > > > Support".
> > > > > > We
> > > > > > > > > > believe
> > > > > > > > > > > > supporting extended resources in a general mechanism
> is
> > > > > > definitely
> > > > > > > > a
> > > > > > > > > > good
> > > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > > narrowing
> > > > > > its
> > > > > > > > scope
> > > > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > > > efforts
> > > > > > and
> > > > > > > > > > review
> > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > >
> > > > > > > > > > > > To come up with a well design on a general extended
> > > > resource
> > > > > > > > management
> > > > > > > > > > > > mechanism, we would need to investigate more on how
> > > people
> > > > > use
> > > > > > > > > > different
> > > > > > > > > > > > kind of resources in practice. For GPU, we learnt
> such
> > > > > > knowledge
> > > > > > > > from
> > > > > > > > > > the
> > > > > > > > > > > > experts, Becket and his team members. But for FPGA,
> or
> > > > other
> > > > > > > > potential
> > > > > > > > > > > > extended resources, we don't have such convenient
> > > > information
> > > > > > > > sources,
> > > > > > > > > > > > making the investigation requires more efforts,
> which I
> > > > tend
> > > > > to
> > > > > > > > think
> > > > > > > > > > is
> > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, we also looked into how Spark
> > > supports a
> > > > > > general
> > > > > > > > > > "Custom
> > > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> > similar
> > > > > > general
> > > > > > > > > > extended
> > > > > > > > > > > > resource mechanism in the future, we believe that the
> > > > current
> > > > > > GPU
> > > > > > > > > > support
> > > > > > > > > > > > design can be easily extended, in an incremental way
> > > > without
> > > > > > too
> > > > > > > > many
> > > > > > > > > > > > reworks.
> > > > > > > > > > > >
> > > > > > > > > > > > - The most important part is probably user
> interfaces.
> > > > Spark
> > > > > > > > offers
> > > > > > > > > > > > configuration options to define the amount, discovery
> > > > script
> > > > > > and
> > > > > > > > > > vendor
> > > > > > > > > > > > (on
> > > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > > similar
> > > > > to
> > > > > > > > what
> > > > > > > > > > we
> > > > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> > > expose
> > > > > > > > config
> > > > > > > > > > > > options
> > > > > > > > > > > > in the general way atm, since we do not have supports
> > for
> > > > > other
> > > > > > > > > > resource
> > > > > > > > > > > > types now. If later we decided to have per resource
> > type
> > > > > config
> > > > > > > > > > > > options, we
> > > > > > > > > > > > can have backwards compatibility on the current
> > proposed
> > > > > > options
> > > > > > > > > > with
> > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > - For the GPU Manager, if later needed we can change
> it
> > > to
> > > > a
> > > > > > > > > > "Extended
> > > > > > > > > > > > Resource Manager" (or whatever it is called). That
> > should
> > > > be
> > > > > a
> > > > > > > > pure
> > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> > already
> > > > > > > > fields for
> > > > > > > > > > > > general extended resource. We can of course leverage
> > them
> > > > > when
> > > > > > > > > > > > supporting
> > > > > > > > > > > > fine grained GPU scheduling. That is also not in the
> > > scope
> > > > of
> > > > > > > > this
> > > > > > > > > > first
> > > > > > > > > > > > step proposal, and would require FLIP-56 to be
> finished
> > > > > first.
> > > > > > > > > > > >
> > > > > > > > > > > > To summary up, I agree with Becket that have a
> separate
> > > > FLIP
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > > general extended resource mechanism, and keep it in
> > mind
> > > > when
> > > > > > > > > > discussing
> > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > That's a good point, Stephan. It makes total sense
> to
> > > > > > generalize
> > > > > > > > the
> > > > > > > > > > > > > resource management to support custom resources.
> > Having
> > > > > that
> > > > > > > > allows
> > > > > > > > > > users
> > > > > > > > > > > > > to add new resources by themselves. The general
> > > resource
> > > > > > > > management
> > > > > > > > > > may
> > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. The custom resource type definition. It is
> > supported
> > > > by
> > > > > > the
> > > > > > > > > > extended
> > > > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> > > will
> > > > > > likely
> > > > > > > > cover
> > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how
> to
> > > > assign
> > > > > > the
> > > > > > > > > > resources
> > > > > > > > > > > > > to different tasks, operators, and so on. This may
> > > > require
> > > > > > two
> > > > > > > > > > levels /
> > > > > > > > > > > > > steps:
> > > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> > into
> > > > > > > > suitable
> > > > > > > > > > > > slots.
> > > > > > > > > > > > > It is done by the global RM and is not customizable
> > > right
> > > > > > now.
> > > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > > operators
> > > > > > > > in
> > > > > > > > > > TM.
> > > > > > > > > > > > e.g.
> > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This
> step
> > > is
> > > > > > needed
> > > > > > > > > > assuming
> > > > > > > > > > > > > the global RM does not distinguish individual
> > resources
> > > > of
> > > > > > the
> > > > > > > > same
> > > > > > > > > > type.
> > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > > should
> > > > > > > > discover the
> > > > > > > > > > > > > physical GPU information and bind/match them to
> each
> > > > > > operators.
> > > > > > > > > > Making
> > > > > > > > > > > > this
> > > > > > > > > > > > > general will fill in the missing piece to support
> > > custom
> > > > > > resource
> > > > > > > > > > type
> > > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > > Resource
> > > > > > > > Manager" to
> > > > > > > > > > > > avoid
> > > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > > Resource
> > > > > > > > Assigner"
> > > > > > > > > > > > would
> > > > > > > > > > > > > be more accurate. So for each resource type users
> can
> > > > have
> > > > > an
> > > > > > > > > > optional
> > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> > > users
> > > > > > don't
> > > > > > > > need
> > > > > > > > > > > > this,
> > > > > > > > > > > > > but for other extended resources, users may need
> > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > > Assigner"
> > > > > > is
> > > > > > > > > > achievable
> > > > > > > > > > > > > in this FLIP. But I am also OK with having that in
> a
> > > > > separate
> > > > > > > > FLIP
> > > > > > > > > > > > because
> > > > > > > > > > > > > the interface between the "Operator Resource
> > Assigner"
> > > > and
> > > > > > > > operator
> > > > > > > > > > may
> > > > > > > > > > > > > take a while to settle down if we want to make it
> > > > generic.
> > > > > > But I
> > > > > > > > > > think
> > > > > > > > > > > > our
> > > > > > > > > > > > > implementation should take this future work into
> > > > > > consideration so
> > > > > > > > > > that we
> > > > > > > > > > > > > don't need to break backwards compatibility once we
> > > have
> > > > > > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I cannot really give much input into the
> mechanics
> > of
> > > > > > GPU-aware
> > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > and GPU allocation, as I have no experience with
> > > that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One thought I had when reading the proposal is if
> > it
> > > > > makes
> > > > > > > > sense to
> > > > > > > > > > > > look
> > > > > > > > > > > > > at
> > > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> > Manager",
> > > > and
> > > > > > GPU
> > > > > > > > is one
> > > > > > > > > > > > such
> > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > > ResourceSpec,
> > > > > > > > that is
> > > > > > > > > > how
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > It has the advantage that it looks more
> extensible.
> > > > Maybe
> > > > > > > > there is
> > > > > > > > > > a
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and
> > FPGA
> > > > > > > > Resource, a
> > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > > [hidden email]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> > management
> > > > > > support
> > > > > > > > is a
> > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > for machine learning use cases. Actually it is
> > one
> > > of
> > > > > the
> > > > > > > > mostly
> > > > > > > > > > > > asked
> > > > > > > > > > > > > > > question from the users who are interested in
> > using
> > > > > Flink
> > > > > > > > for ML.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > > > mentioned in
> > > > > > > > the
> > > > > > > > > > > > public
> > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> > also a
> > > > > > public
> > > > > > > > API?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > > > [hidden email]>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off
> > the
> > > > > > > > discussion,
> > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Big +1 for this feature. Supporting using of
> > GPU
> > > in
> > > > > > Flink
> > > > > > > > is
> > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks
> > good
> > > > to
> > > > > > me. I
> > > > > > > > > > think
> > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > very good first step for Flink's GPU
> supports.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > > > [hidden email]
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We would like to start a discussion thread
> on
> > > > > > "FLIP-108:
> > > > > > > > Add
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > > issues:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Enable user to configure how many GPUs
> in a
> > > > task
> > > > > > > > executor
> > > > > > > > > > and
> > > > > > > > > > > > > > > > > forward such requirements to the external
> > > > resource
> > > > > > > > managers
> > > > > > > > > > (for
> > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > - Provide information of available GPU
> > > resources
> > > > to
> > > > > > > > > > operators.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > > follows:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > > manager
> > > > > > > > services to
> > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > and expose GPU resource information to the
> > > > context
> > > > > of
> > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > > discovery,
> > > > > in
> > > > > > > > which we
> > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > > > worker-level
> > > > > > > > > > isolation
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > > document
> > > > > > [1].
> > > > > > > > > > Looking
> > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
12