[DISCUSS] Support configure remote flink jar

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Support configure remote flink jar

tison
Hi forks,

Recently, our customers ask for a feature configuring remote flink jar. I'd
like to reach to you guys
to see whether or not it is a general need.

ATM Flink only supports configures local file as flink jar via `-yj`
option. If we pass a HDFS file
path, due to implementation detail it will fail with
IllegalArgumentException. In the story we support
configure remote flink jar, this limitation is eliminated. We also make use
of YARN locality so that
reducing uploading overhead, instead, asking YARN to localize the jar on AM
container started.

Besides, it possibly has overlap with FLINK-13938. I'd like to put the
discussion on our
mailing list first.

Are you looking forward to such a feature?

@Yang Wang: this feature is different from that we discussed offline, it
only focuses on flink jar, not
all ship files.

Best,
tison.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Yang Wang
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Thomas Weise
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar
file as alternative to the jar file itself. Such URL could point to an
artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:

> Hi tison,
>
> Thanks for your starting this discussion.
> * For user customized flink-dist jar, it is an useful feature. Since it
> could avoid to upload the flink-dist jar
> every time. Especially in production environment, it could accelerate the
> submission process.
> * For the standard flink-dist jar, FLINK-13938[1] could solve
> the problem.Upload a official flink release
> binary to distributed storage(hdfs) first, and then all the submission
> could benefit from it. Users could
> also upload the customized flink-dist jar to accelerate their submission.
>
> If the flink-dist jar could be specified to a remote path, maybe the user
> jar have the same situation.
>
> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>
> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
>
> > Hi forks,
> >
> > Recently, our customers ask for a feature configuring remote flink jar.
> > I'd like to reach to you guys
> > to see whether or not it is a general need.
> >
> > ATM Flink only supports configures local file as flink jar via `-yj`
> > option. If we pass a HDFS file
> > path, due to implementation detail it will fail with
> > IllegalArgumentException. In the story we support
> > configure remote flink jar, this limitation is eliminated. We also make
> > use of YARN locality so that
> > reducing uploading overhead, instead, asking YARN to localize the jar on
> > AM container started.
> >
> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> > discussion on our
> > mailing list first.
> >
> > Are you looking forward to such a feature?
> >
> > @Yang Wang: this feature is different from that we discussed offline, it
> > only focuses on flink jar, not
> > all ship files.
> >
> > Best,
> > tison.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Stephan Ewen
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve
this. Also more flexible, when it comes to supporting authentication
mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:

> I have implemented this feature in our env, Use ‘Init Container’ of
> docker to get URL of a jar file ,It seems a good idea.
>
> ouywl
> [hidden email]
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]> wrote:
>
> There is a related use case (not specific to HDFS) that I came across:
>
> It would be nice if the jar upload endpoint could accept the URL of a jar
> file as alternative to the jar file itself. Such URL could point to an
> artifactory or distributed file system.
>
> Thomas
>
>
> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
>
>> Hi tison,
>>
>> Thanks for your starting this discussion.
>> * For user customized flink-dist jar, it is an useful feature. Since it
>> could avoid to upload the flink-dist jar
>> every time. Especially in production environment, it could accelerate the
>> submission process.
>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>> the problem.Upload a official flink release
>> binary to distributed storage(hdfs) first, and then all the submission
>> could benefit from it. Users could
>> also upload the customized flink-dist jar to accelerate their submission.
>>
>> If the flink-dist jar could be specified to a remote path, maybe the user
>> jar have the same situation.
>>
>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>
>> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
>>
>> > Hi forks,
>> >
>> > Recently, our customers ask for a feature configuring remote flink jar.
>> > I'd like to reach to you guys
>> > to see whether or not it is a general need.
>> >
>> > ATM Flink only supports configures local file as flink jar via `-yj`
>> > option. If we pass a HDFS file
>> > path, due to implementation detail it will fail with
>> > IllegalArgumentException. In the story we support
>> > configure remote flink jar, this limitation is eliminated. We also make
>> > use of YARN locality so that
>> > reducing uploading overhead, instead, asking YARN to localize the jar on
>> > AM container started.
>> >
>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>> > discussion on our
>> > mailing list first.
>> >
>> > Are you looking forward to such a feature?
>> >
>> > @Yang Wang: this feature is different from that we discussed offline, it
>> > only focuses on flink jar, not
>> > all ship files.
>> >
>> > Best,
>> > tison.
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

tison
Thanks for your participation!

@Yang: Great to hear. I'd like to know whether or not a remote flink jar
path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink
jar from shipping which possibly not works for the remote one.

@Thomas: It inspires a lot URL becomes the unified representation of
resource. I'm thinking of how to serve a unique process getting resource
from URL which points to an artifact or distributed file system.

@ouywl & Stephan: Yes this improvement can be migrated to environment like
k8s, IIRC the k8s proposal already discussed about improvement using "init
container" and other technologies. However, so far I regard it is an
improvement different from one storage to another so that we achieve then
individually.


Best,
tison.


Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道:

> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>
> For containerized setups, and init container seems like a nice way to
> solve this. Also more flexible, when it comes to supporting authentication
> mechanisms for the target storage system, etc.
>
> On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
>
>> I have implemented this feature in our env, Use ‘Init Container’ of
>> docker to get URL of a jar file ,It seems a good idea.
>>
>> ouywl
>> [hidden email]
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]> wrote:
>>
>> There is a related use case (not specific to HDFS) that I came across:
>>
>> It would be nice if the jar upload endpoint could accept the URL of a jar
>> file as alternative to the jar file itself. Such URL could point to an
>> artifactory or distributed file system.
>>
>> Thomas
>>
>>
>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
>>
>>> Hi tison,
>>>
>>> Thanks for your starting this discussion.
>>> * For user customized flink-dist jar, it is an useful feature. Since it
>>> could avoid to upload the flink-dist jar
>>> every time. Especially in production environment, it could accelerate the
>>> submission process.
>>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>>> the problem.Upload a official flink release
>>> binary to distributed storage(hdfs) first, and then all the submission
>>> could benefit from it. Users could
>>> also upload the customized flink-dist jar to accelerate their submission.
>>>
>>> If the flink-dist jar could be specified to a remote path, maybe the user
>>> jar have the same situation.
>>>
>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>>
>>> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
>>>
>>> > Hi forks,
>>> >
>>> > Recently, our customers ask for a feature configuring remote flink jar.
>>> > I'd like to reach to you guys
>>> > to see whether or not it is a general need.
>>> >
>>> > ATM Flink only supports configures local file as flink jar via `-yj`
>>> > option. If we pass a HDFS file
>>> > path, due to implementation detail it will fail with
>>> > IllegalArgumentException. In the story we support
>>> > configure remote flink jar, this limitation is eliminated. We also make
>>> > use of YARN locality so that
>>> > reducing uploading overhead, instead, asking YARN to localize the jar
>>> on
>>> > AM container started.
>>> >
>>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>>> > discussion on our
>>> > mailing list first.
>>> >
>>> > Are you looking forward to such a feature?
>>> >
>>> > @Yang Wang: this feature is different from that we discussed offline,
>>> it
>>> > only focuses on flink jar, not
>>> > all ship files.
>>> >
>>> > Best,
>>> > tison.
>>> >
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late.

Yes, I think this is a very good idea. we already tweak the flink-yarn
package internally to support something similar to what @Thomas mentioned:
to support registering a Jar that has already uploaded to some DFS
(needless to be the Yarn public cache discussed in FLINK-13938).
The reason is that: we provide our internal packaged extension libraries
for our customers. And we've seen good performance improvement in our YARN
cluster during container localization phase after our customer switch to
use pre-uploaded JARs instead of having to upload every time during
deployment.

Looking forward for this feature!

--
Rong


On Tue, Nov 19, 2019 at 10:19 PM tison <[hidden email]> wrote:

> Thanks for your participation!
>
> @Yang: Great to hear. I'd like to know whether or not a remote flink jar
> path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local
> flink jar from shipping which possibly not works for the remote one.
>
> @Thomas: It inspires a lot URL becomes the unified representation of
> resource. I'm thinking of how to serve a unique process getting resource
> from URL which points to an artifact or distributed file system.
>
> @ouywl & Stephan: Yes this improvement can be migrated to environment like
> k8s, IIRC the k8s proposal already discussed about improvement using "init
> container" and other technologies. However, so far I regard it is an
> improvement different from one storage to another so that we achieve then
> individually.
>
>
> Best,
> tison.
>
>
> Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道:
>
>> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>>
>> For containerized setups, and init container seems like a nice way to
>> solve this. Also more flexible, when it comes to supporting authentication
>> mechanisms for the target storage system, etc.
>>
>> On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
>>
>>> I have implemented this feature in our env, Use ‘Init Container’ of
>>> docker to get URL of a jar file ,It seems a good idea.
>>>
>>> ouywl
>>> [hidden email]
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D>
>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>
>>> On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]>
>>> wrote:
>>>
>>> There is a related use case (not specific to HDFS) that I came across:
>>>
>>> It would be nice if the jar upload endpoint could accept the URL of a
>>> jar file as alternative to the jar file itself. Such URL could point to an
>>> artifactory or distributed file system.
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
>>>
>>>> Hi tison,
>>>>
>>>> Thanks for your starting this discussion.
>>>> * For user customized flink-dist jar, it is an useful feature. Since it
>>>> could avoid to upload the flink-dist jar
>>>> every time. Especially in production environment, it could accelerate
>>>> the
>>>> submission process.
>>>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>>>> the problem.Upload a official flink release
>>>> binary to distributed storage(hdfs) first, and then all the submission
>>>> could benefit from it. Users could
>>>> also upload the customized flink-dist jar to accelerate their
>>>> submission.
>>>>
>>>> If the flink-dist jar could be specified to a remote path, maybe the
>>>> user
>>>> jar have the same situation.
>>>>
>>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>>>
>>>> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
>>>>
>>>> > Hi forks,
>>>> >
>>>> > Recently, our customers ask for a feature configuring remote flink
>>>> jar.
>>>> > I'd like to reach to you guys
>>>> > to see whether or not it is a general need.
>>>> >
>>>> > ATM Flink only supports configures local file as flink jar via `-yj`
>>>> > option. If we pass a HDFS file
>>>> > path, due to implementation detail it will fail with
>>>> > IllegalArgumentException. In the story we support
>>>> > configure remote flink jar, this limitation is eliminated. We also
>>>> make
>>>> > use of YARN locality so that
>>>> > reducing uploading overhead, instead, asking YARN to localize the jar
>>>> on
>>>> > AM container started.
>>>> >
>>>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>>>> > discussion on our
>>>> > mailing list first.
>>>> >
>>>> > Are you looking forward to such a feature?
>>>> >
>>>> > @Yang Wang: this feature is different from that we discussed offline,
>>>> it
>>>> > only focuses on flink jar, not
>>>> > all ship files.
>>>> >
>>>> > Best,
>>>> > tison.
>>>> >
>>>>
>>>