Hi everyone,
as you may know a first minimum version of FLIP-24 [1] for the upcoming Flink SQL Client has been merged to the master. We also merged possibilities to discover and configure table sources without a single line of code using string-based properties [2] and Java service provider discovery. We are now facing the issue of how to manage dependencies in this new environment. It is different from how regular Flink projects are created (by setting up a a new Maven project and build a jar or fat jar). Ideally, a user should be able to select from a set of prepared connectors, catalogs, and formats. E.g., if a Kafka connector and Avro format is needed, all that should be required is to move a "flink-kafka.jar" and "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink cluster together with the SQL query. The question is how do we want to offer those JAR files in the future? We see two options: 1) We prepare Maven build profiles for all offered modules and provide a shell script for building fat jars. A script call could look like "./sql-client-dependency.sh kafka 0.10". It would automatically download what is needed and place the JAR file in the library folder. This approach would keep our development effort low but would require Maven to be present and builds to pass on different environments (e.g. Windows). 2) We build fat jars for these modules with every Flink release that can be hostet somewhere (e.g. Apache infrastructure, but not Maven central). This would make it very easy to add a dependency by downloading the prepared JAR files. However, it would require to build and host large fat jars for every connector (and version) with every Flink major and minor release. The size of such a repository might grow quickly. What do you think? Do you see other options to make adding dependencies as possible? Regards, Timo [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client [2] https://issues.apache.org/jira/browse/FLINK-8240 |
My first intuition would be to go for approach #2 for the following reasons
- I expect that in the long run, the scripts will not be that simple to maintain. We saw that with all shell scripts thus far: they start simple, and then grow with many special cases for this and that setup. - Not all users have Maven, automatically downloading and configuring Maven could be an option, but that makes the scripts yet more tricky. - Download-and-drop-in is probably still easier to understand for users than the syntax of a script with its parameters - I think it may actually be even simpler to maintain for us, because all it does is add a profile or build target to each connector to also create the fat jar. - Storage space is no longer really a problem. Worst case we host the fat jars in an S3 bucket. On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> wrote: > Hi everyone, > > as you may know a first minimum version of FLIP-24 [1] for the upcoming > Flink SQL Client has been merged to the master. We also merged > possibilities to discover and configure table sources without a single line > of code using string-based properties [2] and Java service provider > discovery. > > We are now facing the issue of how to manage dependencies in this new > environment. It is different from how regular Flink projects are created > (by setting up a a new Maven project and build a jar or fat jar). Ideally, > a user should be able to select from a set of prepared connectors, > catalogs, and formats. E.g., if a Kafka connector and Avro format is > needed, all that should be required is to move a "flink-kafka.jar" and > "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink > cluster together with the SQL query. > > The question is how do we want to offer those JAR files in the future? We > see two options: > > 1) We prepare Maven build profiles for all offered modules and provide a > shell script for building fat jars. A script call could look like > "./sql-client-dependency.sh kafka 0.10". It would automatically download > what is needed and place the JAR file in the library folder. This approach > would keep our development effort low but would require Maven to be present > and builds to pass on different environments (e.g. Windows). > > 2) We build fat jars for these modules with every Flink release that can > be hostet somewhere (e.g. Apache infrastructure, but not Maven central). > This would make it very easy to add a dependency by downloading the > prepared JAR files. However, it would require to build and host large fat > jars for every connector (and version) with every Flink major and minor > release. The size of such a repository might grow quickly. > > What do you think? Do you see other options to make adding dependencies as > possible? > > > Regards, > > Timo > > > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client > > [2] https://issues.apache.org/jira/browse/FLINK-8240 > > |
Hi Timo,
thanks for your efforts. Personally, I think the second option would be better and here are my feelings. (1) The SQL client is designed to offer a convenient way for users to manipulate data with Flink. Obviously, the second option would be more easy-to-use. (2) The script will help to manage the dependencies automatically, but with less flexibility. Once the script cannot meet the need, users have to modify it themselves. (3) I wonder whether we could package all these built-in connectors and formats into a single JAR. With this all-in-one solution, users don’t need to consider much about the dependencies. Best, Xingcan > On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: > > My first intuition would be to go for approach #2 for the following reasons > > - I expect that in the long run, the scripts will not be that simple to > maintain. We saw that with all shell scripts thus far: they start simple, > and then grow with many special cases for this and that setup. > > - Not all users have Maven, automatically downloading and configuring > Maven could be an option, but that makes the scripts yet more tricky. > > - Download-and-drop-in is probably still easier to understand for users > than the syntax of a script with its parameters > > - I think it may actually be even simpler to maintain for us, because all > it does is add a profile or build target to each connector to also create > the fat jar. > > - Storage space is no longer really a problem. Worst case we host the fat > jars in an S3 bucket. > > > On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> wrote: > >> Hi everyone, >> >> as you may know a first minimum version of FLIP-24 [1] for the upcoming >> Flink SQL Client has been merged to the master. We also merged >> possibilities to discover and configure table sources without a single line >> of code using string-based properties [2] and Java service provider >> discovery. >> >> We are now facing the issue of how to manage dependencies in this new >> environment. It is different from how regular Flink projects are created >> (by setting up a a new Maven project and build a jar or fat jar). Ideally, >> a user should be able to select from a set of prepared connectors, >> catalogs, and formats. E.g., if a Kafka connector and Avro format is >> needed, all that should be required is to move a "flink-kafka.jar" and >> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink >> cluster together with the SQL query. >> >> The question is how do we want to offer those JAR files in the future? We >> see two options: >> >> 1) We prepare Maven build profiles for all offered modules and provide a >> shell script for building fat jars. A script call could look like >> "./sql-client-dependency.sh kafka 0.10". It would automatically download >> what is needed and place the JAR file in the library folder. This approach >> would keep our development effort low but would require Maven to be present >> and builds to pass on different environments (e.g. Windows). >> >> 2) We build fat jars for these modules with every Flink release that can >> be hostet somewhere (e.g. Apache infrastructure, but not Maven central). >> This would make it very easy to add a dependency by downloading the >> prepared JAR files. However, it would require to build and host large fat >> jars for every connector (and version) with every Flink major and minor >> release. The size of such a repository might grow quickly. >> >> What do you think? Do you see other options to make adding dependencies as >> possible? >> >> >> Regards, >> >> Timo >> >> >> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client >> >> [2] https://issues.apache.org/jira/browse/FLINK-8240 >> >> |
Hi Xingcan,
thank you for your feedback. Regarding (3) we also thought about that but this approach would not scale very well. Given that we might have fat jars for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one solution JAR file might easily go beyond 1 or 2 GB. I don't know if users want to download that just for a combination of connector and format. Timo Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > Hi Timo, > > thanks for your efforts. Personally, I think the second option would be better and here are my feelings. > > (1) The SQL client is designed to offer a convenient way for users to manipulate data with Flink. Obviously, the second option would be more easy-to-use. > > (2) The script will help to manage the dependencies automatically, but with less flexibility. Once the script cannot meet the need, users have to modify it themselves. > > (3) I wonder whether we could package all these built-in connectors and formats into a single JAR. With this all-in-one solution, users don’t need to consider much about the dependencies. > > Best, > Xingcan > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: >> >> My first intuition would be to go for approach #2 for the following reasons >> >> - I expect that in the long run, the scripts will not be that simple to >> maintain. We saw that with all shell scripts thus far: they start simple, >> and then grow with many special cases for this and that setup. >> >> - Not all users have Maven, automatically downloading and configuring >> Maven could be an option, but that makes the scripts yet more tricky. >> >> - Download-and-drop-in is probably still easier to understand for users >> than the syntax of a script with its parameters >> >> - I think it may actually be even simpler to maintain for us, because all >> it does is add a profile or build target to each connector to also create >> the fat jar. >> >> - Storage space is no longer really a problem. Worst case we host the fat >> jars in an S3 bucket. >> >> >> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> wrote: >> >>> Hi everyone, >>> >>> as you may know a first minimum version of FLIP-24 [1] for the upcoming >>> Flink SQL Client has been merged to the master. We also merged >>> possibilities to discover and configure table sources without a single line >>> of code using string-based properties [2] and Java service provider >>> discovery. >>> >>> We are now facing the issue of how to manage dependencies in this new >>> environment. It is different from how regular Flink projects are created >>> (by setting up a a new Maven project and build a jar or fat jar). Ideally, >>> a user should be able to select from a set of prepared connectors, >>> catalogs, and formats. E.g., if a Kafka connector and Avro format is >>> needed, all that should be required is to move a "flink-kafka.jar" and >>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink >>> cluster together with the SQL query. >>> >>> The question is how do we want to offer those JAR files in the future? We >>> see two options: >>> >>> 1) We prepare Maven build profiles for all offered modules and provide a >>> shell script for building fat jars. A script call could look like >>> "./sql-client-dependency.sh kafka 0.10". It would automatically download >>> what is needed and place the JAR file in the library folder. This approach >>> would keep our development effort low but would require Maven to be present >>> and builds to pass on different environments (e.g. Windows). >>> >>> 2) We build fat jars for these modules with every Flink release that can >>> be hostet somewhere (e.g. Apache infrastructure, but not Maven central). >>> This would make it very easy to add a dependency by downloading the >>> prepared JAR files. However, it would require to build and host large fat >>> jars for every connector (and version) with every Flink major and minor >>> release. The size of such a repository might grow quickly. >>> >>> What do you think? Do you see other options to make adding dependencies as >>> possible? >>> >>> >>> Regards, >>> >>> Timo >>> >>> >>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client >>> >>> [2] https://issues.apache.org/jira/browse/FLINK-8240 >>> >>> |
I think one problem with the "one fat jar for all" is that some
dependencies clash in the classnames across versions: - Kafka 0.9, 0.10, 0.11, 1.0 - Elasticsearch 2, 4, and 5 There are probably others as well... On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[hidden email]> wrote: > Hi Xingcan, > > thank you for your feedback. Regarding (3) we also thought about that but > this approach would not scale very well. Given that we might have fat jars > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one > solution JAR file might easily go beyond 1 or 2 GB. I don't know if users > want to download that just for a combination of connector and format. > > Timo > > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > > Hi Timo, >> >> thanks for your efforts. Personally, I think the second option would be >> better and here are my feelings. >> >> (1) The SQL client is designed to offer a convenient way for users to >> manipulate data with Flink. Obviously, the second option would be more >> easy-to-use. >> >> (2) The script will help to manage the dependencies automatically, but >> with less flexibility. Once the script cannot meet the need, users have to >> modify it themselves. >> >> (3) I wonder whether we could package all these built-in connectors and >> formats into a single JAR. With this all-in-one solution, users don’t need >> to consider much about the dependencies. >> >> Best, >> Xingcan >> >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> My first intuition would be to go for approach #2 for the following >>> reasons >>> >>> - I expect that in the long run, the scripts will not be that simple to >>> maintain. We saw that with all shell scripts thus far: they start simple, >>> and then grow with many special cases for this and that setup. >>> >>> - Not all users have Maven, automatically downloading and configuring >>> Maven could be an option, but that makes the scripts yet more tricky. >>> >>> - Download-and-drop-in is probably still easier to understand for users >>> than the syntax of a script with its parameters >>> >>> - I think it may actually be even simpler to maintain for us, because all >>> it does is add a profile or build target to each connector to also create >>> the fat jar. >>> >>> - Storage space is no longer really a problem. Worst case we host the fat >>> jars in an S3 bucket. >>> >>> >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> >>> wrote: >>> >>> Hi everyone, >>>> >>>> as you may know a first minimum version of FLIP-24 [1] for the upcoming >>>> Flink SQL Client has been merged to the master. We also merged >>>> possibilities to discover and configure table sources without a single >>>> line >>>> of code using string-based properties [2] and Java service provider >>>> discovery. >>>> >>>> We are now facing the issue of how to manage dependencies in this new >>>> environment. It is different from how regular Flink projects are created >>>> (by setting up a a new Maven project and build a jar or fat jar). >>>> Ideally, >>>> a user should be able to select from a set of prepared connectors, >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is >>>> needed, all that should be required is to move a "flink-kafka.jar" and >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink >>>> cluster together with the SQL query. >>>> >>>> The question is how do we want to offer those JAR files in the future? >>>> We >>>> see two options: >>>> >>>> 1) We prepare Maven build profiles for all offered modules and provide a >>>> shell script for building fat jars. A script call could look like >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically download >>>> what is needed and place the JAR file in the library folder. This >>>> approach >>>> would keep our development effort low but would require Maven to be >>>> present >>>> and builds to pass on different environments (e.g. Windows). >>>> >>>> 2) We build fat jars for these modules with every Flink release that can >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven central). >>>> This would make it very easy to add a dependency by downloading the >>>> prepared JAR files. However, it would require to build and host large >>>> fat >>>> jars for every connector (and version) with every Flink major and minor >>>> release. The size of such a repository might grow quickly. >>>> >>>> What do you think? Do you see other options to make adding dependencies >>>> as >>>> possible? >>>> >>>> >>>> Regards, >>>> >>>> Timo >>>> >>>> >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ >>>> SQL+Client >>>> >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 >>>> >>>> >>>> > |
Hi Timo,
Thanks for the initiating the SQL client effort. I agree with Xingcan's points, adding to it (1) most of the user for SQL client would very likely to have little Maven / build tool knowledge and (2) most likely the build script would grow much complex in the future that makes it exponentially hard for user to modify themselves. On (3) the single "fat" jar idea, adding on to the dependency conflict issue, another very common way I see is that users often want to maintain a list of individual jars, such as a list of relatively-constant, handy UDFs every time using the SQL client. They will probably need to package and ship separately anyway. I was wondering if "download-and-drop-in" might be a more straight forward approach? Best, Rong On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <[hidden email]> wrote: > I think one problem with the "one fat jar for all" is that some > dependencies clash in the classnames across versions: > - Kafka 0.9, 0.10, 0.11, 1.0 > - Elasticsearch 2, 4, and 5 > > There are probably others as well... > > On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[hidden email]> wrote: > > > Hi Xingcan, > > > > thank you for your feedback. Regarding (3) we also thought about that but > > this approach would not scale very well. Given that we might have fat > jars > > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one > > solution JAR file might easily go beyond 1 or 2 GB. I don't know if users > > want to download that just for a combination of connector and format. > > > > Timo > > > > > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > > > > Hi Timo, > >> > >> thanks for your efforts. Personally, I think the second option would be > >> better and here are my feelings. > >> > >> (1) The SQL client is designed to offer a convenient way for users to > >> manipulate data with Flink. Obviously, the second option would be more > >> easy-to-use. > >> > >> (2) The script will help to manage the dependencies automatically, but > >> with less flexibility. Once the script cannot meet the need, users have > to > >> modify it themselves. > >> > >> (3) I wonder whether we could package all these built-in connectors and > >> formats into a single JAR. With this all-in-one solution, users don’t > need > >> to consider much about the dependencies. > >> > >> Best, > >> Xingcan > >> > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: > >>> > >>> My first intuition would be to go for approach #2 for the following > >>> reasons > >>> > >>> - I expect that in the long run, the scripts will not be that simple to > >>> maintain. We saw that with all shell scripts thus far: they start > simple, > >>> and then grow with many special cases for this and that setup. > >>> > >>> - Not all users have Maven, automatically downloading and configuring > >>> Maven could be an option, but that makes the scripts yet more tricky. > >>> > >>> - Download-and-drop-in is probably still easier to understand for users > >>> than the syntax of a script with its parameters > >>> > >>> - I think it may actually be even simpler to maintain for us, because > all > >>> it does is add a profile or build target to each connector to also > create > >>> the fat jar. > >>> > >>> - Storage space is no longer really a problem. Worst case we host the > fat > >>> jars in an S3 bucket. > >>> > >>> > >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> > >>> wrote: > >>> > >>> Hi everyone, > >>>> > >>>> as you may know a first minimum version of FLIP-24 [1] for the > upcoming > >>>> Flink SQL Client has been merged to the master. We also merged > >>>> possibilities to discover and configure table sources without a single > >>>> line > >>>> of code using string-based properties [2] and Java service provider > >>>> discovery. > >>>> > >>>> We are now facing the issue of how to manage dependencies in this new > >>>> environment. It is different from how regular Flink projects are > created > >>>> (by setting up a a new Maven project and build a jar or fat jar). > >>>> Ideally, > >>>> a user should be able to select from a set of prepared connectors, > >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is > >>>> needed, all that should be required is to move a "flink-kafka.jar" and > >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a > Flink > >>>> cluster together with the SQL query. > >>>> > >>>> The question is how do we want to offer those JAR files in the future? > >>>> We > >>>> see two options: > >>>> > >>>> 1) We prepare Maven build profiles for all offered modules and > provide a > >>>> shell script for building fat jars. A script call could look like > >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically > download > >>>> what is needed and place the JAR file in the library folder. This > >>>> approach > >>>> would keep our development effort low but would require Maven to be > >>>> present > >>>> and builds to pass on different environments (e.g. Windows). > >>>> > >>>> 2) We build fat jars for these modules with every Flink release that > can > >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven > central). > >>>> This would make it very easy to add a dependency by downloading the > >>>> prepared JAR files. However, it would require to build and host large > >>>> fat > >>>> jars for every connector (and version) with every Flink major and > minor > >>>> release. The size of such a repository might grow quickly. > >>>> > >>>> What do you think? Do you see other options to make adding > dependencies > >>>> as > >>>> possible? > >>>> > >>>> > >>>> Regards, > >>>> > >>>> Timo > >>>> > >>>> > >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ > >>>> SQL+Client > >>>> > >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 > >>>> > >>>> > >>>> > > > |
I agree, option (2) would be the easiest approach for the users.
2018-03-01 0:00 GMT+01:00 Rong Rong <[hidden email]>: > Hi Timo, > > Thanks for the initiating the SQL client effort. I agree with Xingcan's > points, adding to it (1) most of the user for SQL client would very likely > to have little Maven / build tool knowledge and (2) most likely the build > script would grow much complex in the future that makes it exponentially > hard for user to modify themselves. > > On (3) the single "fat" jar idea, adding on to the dependency conflict > issue, another very common way I see is that users often want to maintain a > list of individual jars, such as a list of relatively-constant, handy UDFs > every time using the SQL client. They will probably need to package and > ship separately anyway. I was wondering if "download-and-drop-in" might be > a more straight forward approach? > > Best, > Rong > > On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <[hidden email]> wrote: > > > I think one problem with the "one fat jar for all" is that some > > dependencies clash in the classnames across versions: > > - Kafka 0.9, 0.10, 0.11, 1.0 > > - Elasticsearch 2, 4, and 5 > > > > There are probably others as well... > > > > On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[hidden email]> > wrote: > > > > > Hi Xingcan, > > > > > > thank you for your feedback. Regarding (3) we also thought about that > but > > > this approach would not scale very well. Given that we might have fat > > jars > > > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one > > > solution JAR file might easily go beyond 1 or 2 GB. I don't know if > users > > > want to download that just for a combination of connector and format. > > > > > > Timo > > > > > > > > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > > > > > > Hi Timo, > > >> > > >> thanks for your efforts. Personally, I think the second option would > be > > >> better and here are my feelings. > > >> > > >> (1) The SQL client is designed to offer a convenient way for users to > > >> manipulate data with Flink. Obviously, the second option would be more > > >> easy-to-use. > > >> > > >> (2) The script will help to manage the dependencies automatically, but > > >> with less flexibility. Once the script cannot meet the need, users > have > > to > > >> modify it themselves. > > >> > > >> (3) I wonder whether we could package all these built-in connectors > and > > >> formats into a single JAR. With this all-in-one solution, users don’t > > need > > >> to consider much about the dependencies. > > >> > > >> Best, > > >> Xingcan > > >> > > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: > > >>> > > >>> My first intuition would be to go for approach #2 for the following > > >>> reasons > > >>> > > >>> - I expect that in the long run, the scripts will not be that simple > to > > >>> maintain. We saw that with all shell scripts thus far: they start > > simple, > > >>> and then grow with many special cases for this and that setup. > > >>> > > >>> - Not all users have Maven, automatically downloading and configuring > > >>> Maven could be an option, but that makes the scripts yet more tricky. > > >>> > > >>> - Download-and-drop-in is probably still easier to understand for > users > > >>> than the syntax of a script with its parameters > > >>> > > >>> - I think it may actually be even simpler to maintain for us, because > > all > > >>> it does is add a profile or build target to each connector to also > > create > > >>> the fat jar. > > >>> > > >>> - Storage space is no longer really a problem. Worst case we host the > > fat > > >>> jars in an S3 bucket. > > >>> > > >>> > > >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> > > >>> wrote: > > >>> > > >>> Hi everyone, > > >>>> > > >>>> as you may know a first minimum version of FLIP-24 [1] for the > > upcoming > > >>>> Flink SQL Client has been merged to the master. We also merged > > >>>> possibilities to discover and configure table sources without a > single > > >>>> line > > >>>> of code using string-based properties [2] and Java service provider > > >>>> discovery. > > >>>> > > >>>> We are now facing the issue of how to manage dependencies in this > new > > >>>> environment. It is different from how regular Flink projects are > > created > > >>>> (by setting up a a new Maven project and build a jar or fat jar). > > >>>> Ideally, > > >>>> a user should be able to select from a set of prepared connectors, > > >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is > > >>>> needed, all that should be required is to move a "flink-kafka.jar" > and > > >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a > > Flink > > >>>> cluster together with the SQL query. > > >>>> > > >>>> The question is how do we want to offer those JAR files in the > future? > > >>>> We > > >>>> see two options: > > >>>> > > >>>> 1) We prepare Maven build profiles for all offered modules and > > provide a > > >>>> shell script for building fat jars. A script call could look like > > >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically > > download > > >>>> what is needed and place the JAR file in the library folder. This > > >>>> approach > > >>>> would keep our development effort low but would require Maven to be > > >>>> present > > >>>> and builds to pass on different environments (e.g. Windows). > > >>>> > > >>>> 2) We build fat jars for these modules with every Flink release that > > can > > >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven > > central). > > >>>> This would make it very easy to add a dependency by downloading the > > >>>> prepared JAR files. However, it would require to build and host > large > > >>>> fat > > >>>> jars for every connector (and version) with every Flink major and > > minor > > >>>> release. The size of such a repository might grow quickly. > > >>>> > > >>>> What do you think? Do you see other options to make adding > > dependencies > > >>>> as > > >>>> possible? > > >>>> > > >>>> > > >>>> Regards, > > >>>> > > >>>> Timo > > >>>> > > >>>> > > >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ > > >>>> SQL+Client > > >>>> > > >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 > > >>>> > > >>>> > > >>>> > > > > > > |
Hi everyone,
thanks for your opinions. So the majority voted for option (2) fat jars that are ready to be used. I will create an Jira issue and prepare the infrastructure for the first connector and first format. Regards, Timo Am 3/1/18 um 11:38 AM schrieb Fabian Hueske: > I agree, option (2) would be the easiest approach for the users. > > > 2018-03-01 0:00 GMT+01:00 Rong Rong <[hidden email]>: > >> Hi Timo, >> >> Thanks for the initiating the SQL client effort. I agree with Xingcan's >> points, adding to it (1) most of the user for SQL client would very likely >> to have little Maven / build tool knowledge and (2) most likely the build >> script would grow much complex in the future that makes it exponentially >> hard for user to modify themselves. >> >> On (3) the single "fat" jar idea, adding on to the dependency conflict >> issue, another very common way I see is that users often want to maintain a >> list of individual jars, such as a list of relatively-constant, handy UDFs >> every time using the SQL client. They will probably need to package and >> ship separately anyway. I was wondering if "download-and-drop-in" might be >> a more straight forward approach? >> >> Best, >> Rong >> >> On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <[hidden email]> wrote: >> >>> I think one problem with the "one fat jar for all" is that some >>> dependencies clash in the classnames across versions: >>> - Kafka 0.9, 0.10, 0.11, 1.0 >>> - Elasticsearch 2, 4, and 5 >>> >>> There are probably others as well... >>> >>> On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[hidden email]> >> wrote: >>>> Hi Xingcan, >>>> >>>> thank you for your feedback. Regarding (3) we also thought about that >> but >>>> this approach would not scale very well. Given that we might have fat >>> jars >>>> for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one >>>> solution JAR file might easily go beyond 1 or 2 GB. I don't know if >> users >>>> want to download that just for a combination of connector and format. >>>> >>>> Timo >>>> >>>> >>>> Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: >>>> >>>> Hi Timo, >>>>> thanks for your efforts. Personally, I think the second option would >> be >>>>> better and here are my feelings. >>>>> >>>>> (1) The SQL client is designed to offer a convenient way for users to >>>>> manipulate data with Flink. Obviously, the second option would be more >>>>> easy-to-use. >>>>> >>>>> (2) The script will help to manage the dependencies automatically, but >>>>> with less flexibility. Once the script cannot meet the need, users >> have >>> to >>>>> modify it themselves. >>>>> >>>>> (3) I wonder whether we could package all these built-in connectors >> and >>>>> formats into a single JAR. With this all-in-one solution, users don’t >>> need >>>>> to consider much about the dependencies. >>>>> >>>>> Best, >>>>> Xingcan >>>>> >>>>> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[hidden email]> wrote: >>>>>> My first intuition would be to go for approach #2 for the following >>>>>> reasons >>>>>> >>>>>> - I expect that in the long run, the scripts will not be that simple >> to >>>>>> maintain. We saw that with all shell scripts thus far: they start >>> simple, >>>>>> and then grow with many special cases for this and that setup. >>>>>> >>>>>> - Not all users have Maven, automatically downloading and configuring >>>>>> Maven could be an option, but that makes the scripts yet more tricky. >>>>>> >>>>>> - Download-and-drop-in is probably still easier to understand for >> users >>>>>> than the syntax of a script with its parameters >>>>>> >>>>>> - I think it may actually be even simpler to maintain for us, because >>> all >>>>>> it does is add a profile or build target to each connector to also >>> create >>>>>> the fat jar. >>>>>> >>>>>> - Storage space is no longer really a problem. Worst case we host the >>> fat >>>>>> jars in an S3 bucket. >>>>>> >>>>>> >>>>>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[hidden email]> >>>>>> wrote: >>>>>> >>>>>> Hi everyone, >>>>>>> as you may know a first minimum version of FLIP-24 [1] for the >>> upcoming >>>>>>> Flink SQL Client has been merged to the master. We also merged >>>>>>> possibilities to discover and configure table sources without a >> single >>>>>>> line >>>>>>> of code using string-based properties [2] and Java service provider >>>>>>> discovery. >>>>>>> >>>>>>> We are now facing the issue of how to manage dependencies in this >> new >>>>>>> environment. It is different from how regular Flink projects are >>> created >>>>>>> (by setting up a a new Maven project and build a jar or fat jar). >>>>>>> Ideally, >>>>>>> a user should be able to select from a set of prepared connectors, >>>>>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is >>>>>>> needed, all that should be required is to move a "flink-kafka.jar" >> and >>>>>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a >>> Flink >>>>>>> cluster together with the SQL query. >>>>>>> >>>>>>> The question is how do we want to offer those JAR files in the >> future? >>>>>>> We >>>>>>> see two options: >>>>>>> >>>>>>> 1) We prepare Maven build profiles for all offered modules and >>> provide a >>>>>>> shell script for building fat jars. A script call could look like >>>>>>> "./sql-client-dependency.sh kafka 0.10". It would automatically >>> download >>>>>>> what is needed and place the JAR file in the library folder. This >>>>>>> approach >>>>>>> would keep our development effort low but would require Maven to be >>>>>>> present >>>>>>> and builds to pass on different environments (e.g. Windows). >>>>>>> >>>>>>> 2) We build fat jars for these modules with every Flink release that >>> can >>>>>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven >>> central). >>>>>>> This would make it very easy to add a dependency by downloading the >>>>>>> prepared JAR files. However, it would require to build and host >> large >>>>>>> fat >>>>>>> jars for every connector (and version) with every Flink major and >>> minor >>>>>>> release. The size of such a repository might grow quickly. >>>>>>> >>>>>>> What do you think? Do you see other options to make adding >>> dependencies >>>>>>> as >>>>>>> possible? >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ >>>>>>> SQL+Client >>>>>>> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 >>>>>>> >>>>>>> >>>>>>> |
Free forum by Nabble | Edit this page |