Hi all,
I would like to bring the discussion in https://issues.apache.org/jira/browse/FLINK-17745 to the dev mailing list, just to hear the opinions of the community. In a nutshell, in the early days of Flink, users could submit their jobs as fat-jars that had a specific structure. More concretely, the user could put the dependencies of the submitted job in a lib/ folder within his/her jar and Flink would search within the user's jar for such a folder, and if this existed, it would extract the nested jars, ship them independently and add them to the classpath. Finally, it would also ship the fat-jar itself so that the user-code is available at the cluster (for details see [1]). This way of submission was NOT documented anywhere and it has the obvious shortcoming that the "nested" jars will be shipped twice. In addition, it makes the codebase a bit more difficult to maintain, as this constitutes another way of submitting stuff. Given the above, I would like to propose to remove this codepath. But given that there are users using the hidden feature, I would like to discuss 1) how many such users exist, 2) how difficult it is for them to "migrate" to a different way of submitting jobs, and 3) if the rest of the community agrees on removing it. I post this on both dev and user ML so that we have better coverage. Looking forward to a fruitful discussion, Kostas [1] https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java#L222 |
Hi,
afaik, this feature was added because Hadoop MapReduce has it as well ( https://blog.cloudera.com/how-to-include-third-party-libraries-in-your-map-reduce-job/, point 2.). I don't remember having seen this anywhere in the wild. I believe it is a good idea to simplify our codebase here. If there are concerns, then we could at least add a big WARN log message in Flink 1.11+ that this feature will be deprecated in the future. On Wed, May 20, 2020 at 10:39 AM Kostas Kloudas <[hidden email]> wrote: > Hi all, > > I would like to bring the discussion in > https://issues.apache.org/jira/browse/FLINK-17745 to the dev mailing > list, just to hear the opinions of the community. > > In a nutshell, in the early days of Flink, users could submit their > jobs as fat-jars that had a specific structure. More concretely, the > user could put the dependencies of the submitted job in a lib/ folder > within his/her jar and Flink would search within the user's jar for > such a folder, and if this existed, it would extract the nested jars, > ship them independently and add them to the classpath. Finally, it > would also ship the fat-jar itself so that the user-code is available > at the cluster (for details see [1]). > > This way of submission was NOT documented anywhere and it has the > obvious shortcoming that the "nested" jars will be shipped twice. In > addition, it makes the codebase a bit more difficult to maintain, as > this constitutes another way of submitting stuff. > > Given the above, I would like to propose to remove this codepath. But > given that there are users using the hidden feature, I would like to > discuss 1) how many such users exist, 2) how difficult it is for them > to "migrate" to a different way of submitting jobs, and 3) if the rest > of the community agrees on removing it. > > I post this on both dev and user ML so that we have better coverage. > > Looking forward to a fruitful discussion, > Kostas > > [1] > https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java#L222 > |
Free forum by Nabble | Edit this page |