[DISCUSS] Putting Table API jars in /lib by default

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Putting Table API jars in /lib by default

Stephan Ewen
Hi all!

I want to discuss making the Table API jars part of the "flink uber jar" in
"/lib" by default.

So far, the Table API was an optional dependency in "/opt".
With the current effort to make it a first class API in Flink, it would be
good experience if the Table API was available by default.

There are a few open questions, though:

(1) Java only, or Scala as well?
   ==> Ideally the Scala Table API is no runtime dependency, but only a
client-side (pre-flight) dependency and simply part of the user program,
not part of Flink's "/lib"

(2) Which runner? Flink or Blink?
   ==> Ideally both
   ==> Do we need to rename the Blink classes to have ".blink." in the
package to avoid class name clashes?
   ==> Do we need to shade/relocate much to avoid dependency clashes?

(3) What should happen with the legacy BatchTableEnvironment (DataSet
based) module?
   ==> Should be possible to simply add this as well, if the Flink runner
is included anyways.

Note that this does not mean users add a dependency to everything when they
develop, so they should not see all environments.
It simply should make table programs executable without moving stuff from
/opt to /lib

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Putting Table API jars in /lib by default

Till Rohrmann
Given that the Table API is going to be Flink's main API for analytical
workloads, it makes sense to make it as easy as possible for users to
actually use it.

The question I would have is which other transitive dependencies are we
gonna add to Flink's system class path by adding the Table API jars to
/lib. I would assume that most if not all should be filtered out by the
child-first class loader. However, if some unshaded dependencies come with
the Table API jars and users have disabled the child first class loading,
then we might break setups with this change. A release note could mitigate
this problem.

Cheers,
Till

On Tue, Jun 11, 2019 at 5:29 PM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> I want to discuss making the Table API jars part of the "flink uber jar" in
> "/lib" by default.
>
> So far, the Table API was an optional dependency in "/opt".
> With the current effort to make it a first class API in Flink, it would be
> good experience if the Table API was available by default.
>
> There are a few open questions, though:
>
> (1) Java only, or Scala as well?
>    ==> Ideally the Scala Table API is no runtime dependency, but only a
> client-side (pre-flight) dependency and simply part of the user program,
> not part of Flink's "/lib"
>
> (2) Which runner? Flink or Blink?
>    ==> Ideally both
>    ==> Do we need to rename the Blink classes to have ".blink." in the
> package to avoid class name clashes?
>    ==> Do we need to shade/relocate much to avoid dependency clashes?
>
> (3) What should happen with the legacy BatchTableEnvironment (DataSet
> based) module?
>    ==> Should be possible to simply add this as well, if the Flink runner
> is included anyways.
>
> Note that this does not mean users add a dependency to everything when they
> develop, so they should not see all environments.
> It simply should make table programs executable without moving stuff from
> /opt to /lib
>
> Best,
> Stephan
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Putting Table API jars in /lib by default

Aljoscha Krettek-2
+1 I agree that Table API should be in lib because it will become a first-class-citizen.

Currently, both the classic Flink Table Runner and the new Blink-based Table Runner share the same package structure, i.e they are both rooted in org.apache.flink.table. We have to resolve this before we can have both runners in lib/. Before we do this, we should first get both runners wired up with the new Planner interface, however. Then we can do one clean rename.

Aljoscha

> On 12. Jun 2019, at 11:50, Till Rohrmann <[hidden email]> wrote:
>
> Given that the Table API is going to be Flink's main API for analytical
> workloads, it makes sense to make it as easy as possible for users to
> actually use it.
>
> The question I would have is which other transitive dependencies are we
> gonna add to Flink's system class path by adding the Table API jars to
> /lib. I would assume that most if not all should be filtered out by the
> child-first class loader. However, if some unshaded dependencies come with
> the Table API jars and users have disabled the child first class loading,
> then we might break setups with this change. A release note could mitigate
> this problem.
>
> Cheers,
> Till
>
> On Tue, Jun 11, 2019 at 5:29 PM Stephan Ewen <[hidden email]> wrote:
>
>> Hi all!
>>
>> I want to discuss making the Table API jars part of the "flink uber jar" in
>> "/lib" by default.
>>
>> So far, the Table API was an optional dependency in "/opt".
>> With the current effort to make it a first class API in Flink, it would be
>> good experience if the Table API was available by default.
>>
>> There are a few open questions, though:
>>
>> (1) Java only, or Scala as well?
>>   ==> Ideally the Scala Table API is no runtime dependency, but only a
>> client-side (pre-flight) dependency and simply part of the user program,
>> not part of Flink's "/lib"
>>
>> (2) Which runner? Flink or Blink?
>>   ==> Ideally both
>>   ==> Do we need to rename the Blink classes to have ".blink." in the
>> package to avoid class name clashes?
>>   ==> Do we need to shade/relocate much to avoid dependency clashes?
>>
>> (3) What should happen with the legacy BatchTableEnvironment (DataSet
>> based) module?
>>   ==> Should be possible to simply add this as well, if the Flink runner
>> is included anyways.
>>
>> Note that this does not mean users add a dependency to everything when they
>> develop, so they should not see all environments.
>> It simply should make table programs executable without moving stuff from
>> /opt to /lib
>>
>> Best,
>> Stephan
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Putting Table API jars in /lib by default

Stephan Ewen
Thanks, then let's revisit this in a bit when the Blink/Flink runners have
been separated in their package structure.

On Wed, Jun 12, 2019 at 2:04 PM Aljoscha Krettek <[hidden email]>
wrote:

> +1 I agree that Table API should be in lib because it will become a
> first-class-citizen.
>
> Currently, both the classic Flink Table Runner and the new Blink-based
> Table Runner share the same package structure, i.e they are both rooted in
> org.apache.flink.table. We have to resolve this before we can have both
> runners in lib/. Before we do this, we should first get both runners wired
> up with the new Planner interface, however. Then we can do one clean rename.
>
> Aljoscha
>
> > On 12. Jun 2019, at 11:50, Till Rohrmann <[hidden email]> wrote:
> >
> > Given that the Table API is going to be Flink's main API for analytical
> > workloads, it makes sense to make it as easy as possible for users to
> > actually use it.
> >
> > The question I would have is which other transitive dependencies are we
> > gonna add to Flink's system class path by adding the Table API jars to
> > /lib. I would assume that most if not all should be filtered out by the
> > child-first class loader. However, if some unshaded dependencies come
> with
> > the Table API jars and users have disabled the child first class loading,
> > then we might break setups with this change. A release note could
> mitigate
> > this problem.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 11, 2019 at 5:29 PM Stephan Ewen <[hidden email]> wrote:
> >
> >> Hi all!
> >>
> >> I want to discuss making the Table API jars part of the "flink uber
> jar" in
> >> "/lib" by default.
> >>
> >> So far, the Table API was an optional dependency in "/opt".
> >> With the current effort to make it a first class API in Flink, it would
> be
> >> good experience if the Table API was available by default.
> >>
> >> There are a few open questions, though:
> >>
> >> (1) Java only, or Scala as well?
> >>   ==> Ideally the Scala Table API is no runtime dependency, but only a
> >> client-side (pre-flight) dependency and simply part of the user program,
> >> not part of Flink's "/lib"
> >>
> >> (2) Which runner? Flink or Blink?
> >>   ==> Ideally both
> >>   ==> Do we need to rename the Blink classes to have ".blink." in the
> >> package to avoid class name clashes?
> >>   ==> Do we need to shade/relocate much to avoid dependency clashes?
> >>
> >> (3) What should happen with the legacy BatchTableEnvironment (DataSet
> >> based) module?
> >>   ==> Should be possible to simply add this as well, if the Flink runner
> >> is included anyways.
> >>
> >> Note that this does not mean users add a dependency to everything when
> they
> >> develop, so they should not see all environments.
> >> It simply should make table programs executable without moving stuff
> from
> >> /opt to /lib
> >>
> >> Best,
> >> Stephan
> >>
>
>