Hi forks,
Recently, our customers ask for a feature configuring remote flink jar. I'd like to reach to you guys to see whether or not it is a general need. ATM Flink only supports configures local file as flink jar via `-yj` option. If we pass a HDFS file path, due to implementation detail it will fail with IllegalArgumentException. In the story we support configure remote flink jar, this limitation is eliminated. We also make use of YARN locality so that reducing uploading overhead, instead, asking YARN to localize the jar on AM container started. Besides, it possibly has overlap with FLINK-13938. I'd like to put the discussion on our mailing list first. Are you looking forward to such a feature? @Yang Wang: this feature is different from that we discussed offline, it only focuses on flink jar, not all ship files. Best, tison. |
Hi tison,
Thanks for your starting this discussion. * For user customized flink-dist jar, it is an useful feature. Since it could avoid to upload the flink-dist jar every time. Especially in production environment, it could accelerate the submission process. * For the standard flink-dist jar, FLINK-13938[1] could solve the problem.Upload a official flink release binary to distributed storage(hdfs) first, and then all the submission could benefit from it. Users could also upload the customized flink-dist jar to accelerate their submission. If the flink-dist jar could be specified to a remote path, maybe the user jar have the same situation. [1]. https://issues.apache.org/jira/browse/FLINK-13938 tison <[hidden email]> 于2019年11月19日周二 上午11:17写道: > Hi forks, > > Recently, our customers ask for a feature configuring remote flink jar. > I'd like to reach to you guys > to see whether or not it is a general need. > > ATM Flink only supports configures local file as flink jar via `-yj` > option. If we pass a HDFS file > path, due to implementation detail it will fail with > IllegalArgumentException. In the story we support > configure remote flink jar, this limitation is eliminated. We also make > use of YARN locality so that > reducing uploading overhead, instead, asking YARN to localize the jar on > AM container started. > > Besides, it possibly has overlap with FLINK-13938. I'd like to put the > discussion on our > mailing list first. > > Are you looking forward to such a feature? > > @Yang Wang: this feature is different from that we discussed offline, it > only focuses on flink jar, not > all ship files. > > Best, > tison. > |
There is a related use case (not specific to HDFS) that I came across:
It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system. Thomas On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote: > Hi tison, > > Thanks for your starting this discussion. > * For user customized flink-dist jar, it is an useful feature. Since it > could avoid to upload the flink-dist jar > every time. Especially in production environment, it could accelerate the > submission process. > * For the standard flink-dist jar, FLINK-13938[1] could solve > the problem.Upload a official flink release > binary to distributed storage(hdfs) first, and then all the submission > could benefit from it. Users could > also upload the customized flink-dist jar to accelerate their submission. > > If the flink-dist jar could be specified to a remote path, maybe the user > jar have the same situation. > > [1]. https://issues.apache.org/jira/browse/FLINK-13938 > > tison <[hidden email]> 于2019年11月19日周二 上午11:17写道: > > > Hi forks, > > > > Recently, our customers ask for a feature configuring remote flink jar. > > I'd like to reach to you guys > > to see whether or not it is a general need. > > > > ATM Flink only supports configures local file as flink jar via `-yj` > > option. If we pass a HDFS file > > path, due to implementation detail it will fail with > > IllegalArgumentException. In the story we support > > configure remote flink jar, this limitation is eliminated. We also make > > use of YARN locality so that > > reducing uploading overhead, instead, asking YARN to localize the jar on > > AM container started. > > > > Besides, it possibly has overlap with FLINK-13938. I'd like to put the > > discussion on our > > mailing list first. > > > > Are you looking forward to such a feature? > > > > @Yang Wang: this feature is different from that we discussed offline, it > > only focuses on flink jar, not > > all ship files. > > > > Best, > > tison. > > > |
Would that be a feature specific to Yarn? (and maybe standalone sessions)
For containerized setups, and init container seems like a nice way to solve this. Also more flexible, when it comes to supporting authentication mechanisms for the target storage system, etc. On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote: > I have implemented this feature in our env, Use ‘Init Container’ of > docker to get URL of a jar file ,It seems a good idea. > > ouywl > [hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > > On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]> wrote: > > There is a related use case (not specific to HDFS) that I came across: > > It would be nice if the jar upload endpoint could accept the URL of a jar > file as alternative to the jar file itself. Such URL could point to an > artifactory or distributed file system. > > Thomas > > > On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote: > >> Hi tison, >> >> Thanks for your starting this discussion. >> * For user customized flink-dist jar, it is an useful feature. Since it >> could avoid to upload the flink-dist jar >> every time. Especially in production environment, it could accelerate the >> submission process. >> * For the standard flink-dist jar, FLINK-13938[1] could solve >> the problem.Upload a official flink release >> binary to distributed storage(hdfs) first, and then all the submission >> could benefit from it. Users could >> also upload the customized flink-dist jar to accelerate their submission. >> >> If the flink-dist jar could be specified to a remote path, maybe the user >> jar have the same situation. >> >> [1]. https://issues.apache.org/jira/browse/FLINK-13938 >> >> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道: >> >> > Hi forks, >> > >> > Recently, our customers ask for a feature configuring remote flink jar. >> > I'd like to reach to you guys >> > to see whether or not it is a general need. >> > >> > ATM Flink only supports configures local file as flink jar via `-yj` >> > option. If we pass a HDFS file >> > path, due to implementation detail it will fail with >> > IllegalArgumentException. In the story we support >> > configure remote flink jar, this limitation is eliminated. We also make >> > use of YARN locality so that >> > reducing uploading overhead, instead, asking YARN to localize the jar on >> > AM container started. >> > >> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the >> > discussion on our >> > mailing list first. >> > >> > Are you looking forward to such a feature? >> > >> > @Yang Wang: this feature is different from that we discussed offline, it >> > only focuses on flink jar, not >> > all ship files. >> > >> > Best, >> > tison. >> > >> > |
Thanks for your participation!
@Yang: Great to hear. I'd like to know whether or not a remote flink jar path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink jar from shipping which possibly not works for the remote one. @Thomas: It inspires a lot URL becomes the unified representation of resource. I'm thinking of how to serve a unique process getting resource from URL which points to an artifact or distributed file system. @ouywl & Stephan: Yes this improvement can be migrated to environment like k8s, IIRC the k8s proposal already discussed about improvement using "init container" and other technologies. However, so far I regard it is an improvement different from one storage to another so that we achieve then individually. Best, tison. Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道: > Would that be a feature specific to Yarn? (and maybe standalone sessions) > > For containerized setups, and init container seems like a nice way to > solve this. Also more flexible, when it comes to supporting authentication > mechanisms for the target storage system, etc. > > On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote: > >> I have implemented this feature in our env, Use ‘Init Container’ of >> docker to get URL of a jar file ,It seems a good idea. >> >> ouywl >> [hidden email] >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]> wrote: >> >> There is a related use case (not specific to HDFS) that I came across: >> >> It would be nice if the jar upload endpoint could accept the URL of a jar >> file as alternative to the jar file itself. Such URL could point to an >> artifactory or distributed file system. >> >> Thomas >> >> >> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote: >> >>> Hi tison, >>> >>> Thanks for your starting this discussion. >>> * For user customized flink-dist jar, it is an useful feature. Since it >>> could avoid to upload the flink-dist jar >>> every time. Especially in production environment, it could accelerate the >>> submission process. >>> * For the standard flink-dist jar, FLINK-13938[1] could solve >>> the problem.Upload a official flink release >>> binary to distributed storage(hdfs) first, and then all the submission >>> could benefit from it. Users could >>> also upload the customized flink-dist jar to accelerate their submission. >>> >>> If the flink-dist jar could be specified to a remote path, maybe the user >>> jar have the same situation. >>> >>> [1]. https://issues.apache.org/jira/browse/FLINK-13938 >>> >>> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道: >>> >>> > Hi forks, >>> > >>> > Recently, our customers ask for a feature configuring remote flink jar. >>> > I'd like to reach to you guys >>> > to see whether or not it is a general need. >>> > >>> > ATM Flink only supports configures local file as flink jar via `-yj` >>> > option. If we pass a HDFS file >>> > path, due to implementation detail it will fail with >>> > IllegalArgumentException. In the story we support >>> > configure remote flink jar, this limitation is eliminated. We also make >>> > use of YARN locality so that >>> > reducing uploading overhead, instead, asking YARN to localize the jar >>> on >>> > AM container started. >>> > >>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the >>> > discussion on our >>> > mailing list first. >>> > >>> > Are you looking forward to such a feature? >>> > >>> > @Yang Wang: this feature is different from that we discussed offline, >>> it >>> > only focuses on flink jar, not >>> > all ship files. >>> > >>> > Best, >>> > tison. >>> > >>> >> |
Thanks @Tison for starting the discussion and sorry for joining so late.
Yes, I think this is a very good idea. we already tweak the flink-yarn package internally to support something similar to what @Thomas mentioned: to support registering a Jar that has already uploaded to some DFS (needless to be the Yarn public cache discussed in FLINK-13938). The reason is that: we provide our internal packaged extension libraries for our customers. And we've seen good performance improvement in our YARN cluster during container localization phase after our customer switch to use pre-uploaded JARs instead of having to upload every time during deployment. Looking forward for this feature! -- Rong On Tue, Nov 19, 2019 at 10:19 PM tison <[hidden email]> wrote: > Thanks for your participation! > > @Yang: Great to hear. I'd like to know whether or not a remote flink jar > path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local > flink jar from shipping which possibly not works for the remote one. > > @Thomas: It inspires a lot URL becomes the unified representation of > resource. I'm thinking of how to serve a unique process getting resource > from URL which points to an artifact or distributed file system. > > @ouywl & Stephan: Yes this improvement can be migrated to environment like > k8s, IIRC the k8s proposal already discussed about improvement using "init > container" and other technologies. However, so far I regard it is an > improvement different from one storage to another so that we achieve then > individually. > > > Best, > tison. > > > Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道: > >> Would that be a feature specific to Yarn? (and maybe standalone sessions) >> >> For containerized setups, and init container seems like a nice way to >> solve this. Also more flexible, when it comes to supporting authentication >> mechanisms for the target storage system, etc. >> >> On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote: >> >>> I have implemented this feature in our env, Use ‘Init Container’ of >>> docker to get URL of a jar file ,It seems a good idea. >>> >>> ouywl >>> [hidden email] >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> >>> On 11/19/2019 12:11,Thomas Weise<[hidden email]> <[hidden email]> >>> wrote: >>> >>> There is a related use case (not specific to HDFS) that I came across: >>> >>> It would be nice if the jar upload endpoint could accept the URL of a >>> jar file as alternative to the jar file itself. Such URL could point to an >>> artifactory or distributed file system. >>> >>> Thomas >>> >>> >>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote: >>> >>>> Hi tison, >>>> >>>> Thanks for your starting this discussion. >>>> * For user customized flink-dist jar, it is an useful feature. Since it >>>> could avoid to upload the flink-dist jar >>>> every time. Especially in production environment, it could accelerate >>>> the >>>> submission process. >>>> * For the standard flink-dist jar, FLINK-13938[1] could solve >>>> the problem.Upload a official flink release >>>> binary to distributed storage(hdfs) first, and then all the submission >>>> could benefit from it. Users could >>>> also upload the customized flink-dist jar to accelerate their >>>> submission. >>>> >>>> If the flink-dist jar could be specified to a remote path, maybe the >>>> user >>>> jar have the same situation. >>>> >>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938 >>>> >>>> tison <[hidden email]> 于2019年11月19日周二 上午11:17写道: >>>> >>>> > Hi forks, >>>> > >>>> > Recently, our customers ask for a feature configuring remote flink >>>> jar. >>>> > I'd like to reach to you guys >>>> > to see whether or not it is a general need. >>>> > >>>> > ATM Flink only supports configures local file as flink jar via `-yj` >>>> > option. If we pass a HDFS file >>>> > path, due to implementation detail it will fail with >>>> > IllegalArgumentException. In the story we support >>>> > configure remote flink jar, this limitation is eliminated. We also >>>> make >>>> > use of YARN locality so that >>>> > reducing uploading overhead, instead, asking YARN to localize the jar >>>> on >>>> > AM container started. >>>> > >>>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the >>>> > discussion on our >>>> > mailing list first. >>>> > >>>> > Are you looking forward to such a feature? >>>> > >>>> > @Yang Wang: this feature is different from that we discussed offline, >>>> it >>>> > only focuses on flink jar, not >>>> > all ship files. >>>> > >>>> > Best, >>>> > tison. >>>> > >>>> >>> |
Free forum by Nabble | Edit this page |