[DISCUSS][TABLE] Issue with package structure in the Table API

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS][TABLE] Issue with package structure in the Table API

dwysakowicz
Hi devs,

I wanted to bring up a problem that we have in our package structure.

As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
started advertising importing two packages in the scala API:
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._

The intention was that the first package (org.apache.flink.table.api)
would contain all api classes that are required to work with the unified
TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
expressionDsl. The second package (org.apache.flink.table.api.scala._)
would've been an optional package that contain bridging conversions
between Table and DataStream/DataSet APIs including the more specific
StreamTableEnvironment and BatchTableEnvironment.

The part missing in the original plan was to move all expressions
implicit conversions to the org.apache.flink.table.api package. Without
that step users of pure table program (that do not use the
table-api-scala-bridge module) cannot use the Expression DSL. Therefore
we should try to move those expressions as soon as possible.

The problem with this approach is that it clashes with common imports of
classes from java.* and scala.* packages. Users are forced to write:

import org.apache.flink.table.api._
import org.apache.flink.table.api.scala_
import _root_.scala.collection.mutable.ArrayBuffer
import _root_.java.lang.Integer

Besides being cumbersome, it also messes up the macro based type
extraction (org.apache.flink.api.scala#createTypeInformation) for all
classes from scala.* packages. I don't fully understand the reasons for
it, but the createTypeInformation somehow drops the _root_ for
WeakTypeTags. So e.g. for a call:
createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
actually tries to construct a TypeInformation for
org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
obviously fails.



What I would suggest for a target solution is to have:

1. for users of unified Table API with Scala ExpressionDSL

import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
and expressions)

2. for users of Table API with scala's bridging conversions

import org.apache.flink.table.api._ (for Tumble etc. and expressions)
import org.apache.flink.table.api.bridge.scala._ (for bridging
conversions and StreamTableEnvironment)

3. for users of unified Table API with Java ExpressionDSL

import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)

4. for users of Table API with java's bridging conversions

import org.apache.flink.table.api.* (for Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)
import org.apache.flink.table.api.bridge.java.*

To have that working we need to:
* move the scala expression DSL to org.apache.flink.table.api package in
table-api-scala module
* move all classes from org.apache.flink.table.api.scala and
org.apache.flink.table.api.java packages to
org.apache.flink.table.api.bridge.scala and
org.apache.flink.table.api.bridge.java accordingly and drop the former
packages

The biggest question I have is how do we want to perform that
transition. If we do it in one go we will break existing user programs
that uses classes from org.apache.flink.table.api.java/scala. Those
packages were present from day one of Table API. Nevertheless this would
be my preffered way to move forward as we annoy users only once, even if
one big time :(

Different option would be to make that transition gradually in 3 releases.
 1. In the first we introduce the
org.apache.flink.table.api.bridge.java/scala, and we have
StreamTableEnvironment etc. as well as expression DSL in both. We ask
users to migrate to the new package.
 2. We drop the org.apache.flink.table.api.java/scala and ask users to
import additionally org.apache.flink.table.api.* for expressions (this
is the same as we did in 1.9.0, the thing though it was extremely hard
to do it then)
 3. We finally move the expression DSL from
org.apache.flink.table.api.bridge.scala to org.apache.flink.table.api
 
What do you think about it?

Best,

Dawid



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][TABLE] Issue with package structure in the Table API

Timo Walther-2
Hi everyone,

thanks for bringing our offline discussion to the mailing list, Dawid.
This is a very bad mistake that has been made in the past. In general,
we should discourage putting the terms "java" and "scala" in package
names as this has side effects on Scala imports.

I really don't like forcing users to put a "_root_" in their imports. It
also happended to me a couple of times while developing Flink code that
I was sitting in front of my IDE wondering why the code doesn't compile.

I'm also in favor of peforming this big change as early as possible. I'm
sure Table API users are already quite annoyed by all the
changes/refactorings happening. Changing the imports twice or three
times is even more cumbersome.

Having to import just "org.apache.flink.table.api._" is a big usability
plus for new users and especially interactive shell/notebook users.

Regards,
Timo


On 13.02.20 14:39, Dawid Wysakowicz wrote:

> Hi devs,
>
> I wanted to bring up a problem that we have in our package structure.
>
> As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
> started advertising importing two packages in the scala API:
> import org.apache.flink.table.api._
> import org.apache.flink.table.api.scala._
>
> The intention was that the first package (org.apache.flink.table.api)
> would contain all api classes that are required to work with the unified
> TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
> expressionDsl. The second package (org.apache.flink.table.api.scala._)
> would've been an optional package that contain bridging conversions
> between Table and DataStream/DataSet APIs including the more specific
> StreamTableEnvironment and BatchTableEnvironment.
>
> The part missing in the original plan was to move all expressions
> implicit conversions to the org.apache.flink.table.api package. Without
> that step users of pure table program (that do not use the
> table-api-scala-bridge module) cannot use the Expression DSL. Therefore
> we should try to move those expressions as soon as possible.
>
> The problem with this approach is that it clashes with common imports of
> classes from java.* and scala.* packages. Users are forced to write:
>
> import org.apache.flink.table.api._
> import org.apache.flink.table.api.scala_
> import _root_.scala.collection.mutable.ArrayBuffer
> import _root_.java.lang.Integer
>
> Besides being cumbersome, it also messes up the macro based type
> extraction (org.apache.flink.api.scala#createTypeInformation) for all
> classes from scala.* packages. I don't fully understand the reasons for
> it, but the createTypeInformation somehow drops the _root_ for
> WeakTypeTags. So e.g. for a call:
> createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
> actually tries to construct a TypeInformation for
> org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
> obviously fails.
>
>
>
> What I would suggest for a target solution is to have:
>
> 1. for users of unified Table API with Scala ExpressionDSL
>
> import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
> and expressions)
>
> 2. for users of Table API with scala's bridging conversions
>
> import org.apache.flink.table.api._ (for Tumble etc. and expressions)
> import org.apache.flink.table.api.bridge.scala._ (for bridging
> conversions and StreamTableEnvironment)
>
> 3. for users of unified Table API with Java ExpressionDSL
>
> import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
> import org.apache.flink.table.api.Expressions.* (for Expression dsl)
>
> 4. for users of Table API with java's bridging conversions
>
> import org.apache.flink.table.api.* (for Tumble etc.)
> import org.apache.flink.table.api.Expressions.* (for Expression dsl)
> import org.apache.flink.table.api.bridge.java.*
>
> To have that working we need to:
> * move the scala expression DSL to org.apache.flink.table.api package in
> table-api-scala module
> * move all classes from org.apache.flink.table.api.scala and
> org.apache.flink.table.api.java packages to
> org.apache.flink.table.api.bridge.scala and
> org.apache.flink.table.api.bridge.java accordingly and drop the former
> packages
>
> The biggest question I have is how do we want to perform that
> transition. If we do it in one go we will break existing user programs
> that uses classes from org.apache.flink.table.api.java/scala. Those
> packages were present from day one of Table API. Nevertheless this would
> be my preffered way to move forward as we annoy users only once, even if
> one big time :(
>
> Different option would be to make that transition gradually in 3 releases.
>   1. In the first we introduce the
> org.apache.flink.table.api.bridge.java/scala, and we have
> StreamTableEnvironment etc. as well as expression DSL in both. We ask
> users to migrate to the new package.
>   2. We drop the org.apache.flink.table.api.java/scala and ask users to
> import additionally org.apache.flink.table.api.* for expressions (this
> is the same as we did in 1.9.0, the thing though it was extremely hard
> to do it then)
>   3. We finally move the expression DSL from
> org.apache.flink.table.api.bridge.scala to org.apache.flink.table.api
>  
> What do you think about it?
>
> Best,
>
> Dawid
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][TABLE] Issue with package structure in the Table API

Jingsong Li
Thanks for bringing this discussion.

+1 to peforming this big change as early as possible.

You solved my question, why we need "_root_". Yes, I don't like this import
too.

And it is very strange that expressionDsl is in api, but can only work in
api.scala. (Because scala extends ImplicitExpressionConversions)

About transition gradually in 3 releases. Can users rewrite to the right
way in one release? Or they must update and update?

Best,
Jingsong Lee

On Fri, Feb 14, 2020 at 12:12 AM Timo Walther <[hidden email]> wrote:

> Hi everyone,
>
> thanks for bringing our offline discussion to the mailing list, Dawid.
> This is a very bad mistake that has been made in the past. In general,
> we should discourage putting the terms "java" and "scala" in package
> names as this has side effects on Scala imports.
>
> I really don't like forcing users to put a "_root_" in their imports. It
> also happended to me a couple of times while developing Flink code that
> I was sitting in front of my IDE wondering why the code doesn't compile.
>
> I'm also in favor of peforming this big change as early as possible. I'm
> sure Table API users are already quite annoyed by all the
> changes/refactorings happening. Changing the imports twice or three
> times is even more cumbersome.
>
> Having to import just "org.apache.flink.table.api._" is a big usability
> plus for new users and especially interactive shell/notebook users.
>
> Regards,
> Timo
>
>
> On 13.02.20 14:39, Dawid Wysakowicz wrote:
> > Hi devs,
> >
> > I wanted to bring up a problem that we have in our package structure.
> >
> > As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
> > started advertising importing two packages in the scala API:
> > import org.apache.flink.table.api._
> > import org.apache.flink.table.api.scala._
> >
> > The intention was that the first package (org.apache.flink.table.api)
> > would contain all api classes that are required to work with the unified
> > TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
> > expressionDsl. The second package (org.apache.flink.table.api.scala._)
> > would've been an optional package that contain bridging conversions
> > between Table and DataStream/DataSet APIs including the more specific
> > StreamTableEnvironment and BatchTableEnvironment.
> >
> > The part missing in the original plan was to move all expressions
> > implicit conversions to the org.apache.flink.table.api package. Without
> > that step users of pure table program (that do not use the
> > table-api-scala-bridge module) cannot use the Expression DSL. Therefore
> > we should try to move those expressions as soon as possible.
> >
> > The problem with this approach is that it clashes with common imports of
> > classes from java.* and scala.* packages. Users are forced to write:
> >
> > import org.apache.flink.table.api._
> > import org.apache.flink.table.api.scala_
> > import _root_.scala.collection.mutable.ArrayBuffer
> > import _root_.java.lang.Integer
> >
> > Besides being cumbersome, it also messes up the macro based type
> > extraction (org.apache.flink.api.scala#createTypeInformation) for all
> > classes from scala.* packages. I don't fully understand the reasons for
> > it, but the createTypeInformation somehow drops the _root_ for
> > WeakTypeTags. So e.g. for a call:
> > createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
> > actually tries to construct a TypeInformation for
> > org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
> > obviously fails.
> >
> >
> >
> > What I would suggest for a target solution is to have:
> >
> > 1. for users of unified Table API with Scala ExpressionDSL
> >
> > import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
> > and expressions)
> >
> > 2. for users of Table API with scala's bridging conversions
> >
> > import org.apache.flink.table.api._ (for Tumble etc. and expressions)
> > import org.apache.flink.table.api.bridge.scala._ (for bridging
> > conversions and StreamTableEnvironment)
> >
> > 3. for users of unified Table API with Java ExpressionDSL
> >
> > import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
> > import org.apache.flink.table.api.Expressions.* (for Expression dsl)
> >
> > 4. for users of Table API with java's bridging conversions
> >
> > import org.apache.flink.table.api.* (for Tumble etc.)
> > import org.apache.flink.table.api.Expressions.* (for Expression dsl)
> > import org.apache.flink.table.api.bridge.java.*
> >
> > To have that working we need to:
> > * move the scala expression DSL to org.apache.flink.table.api package in
> > table-api-scala module
> > * move all classes from org.apache.flink.table.api.scala and
> > org.apache.flink.table.api.java packages to
> > org.apache.flink.table.api.bridge.scala and
> > org.apache.flink.table.api.bridge.java accordingly and drop the former
> > packages
> >
> > The biggest question I have is how do we want to perform that
> > transition. If we do it in one go we will break existing user programs
> > that uses classes from org.apache.flink.table.api.java/scala. Those
> > packages were present from day one of Table API. Nevertheless this would
> > be my preffered way to move forward as we annoy users only once, even if
> > one big time :(
> >
> > Different option would be to make that transition gradually in 3
> releases.
> >   1. In the first we introduce the
> > org.apache.flink.table.api.bridge.java/scala, and we have
> > StreamTableEnvironment etc. as well as expression DSL in both. We ask
> > users to migrate to the new package.
> >   2. We drop the org.apache.flink.table.api.java/scala and ask users to
> > import additionally org.apache.flink.table.api.* for expressions (this
> > is the same as we did in 1.9.0, the thing though it was extremely hard
> > to do it then)
> >   3. We finally move the expression DSL from
> > org.apache.flink.table.api.bridge.scala to org.apache.flink.table.api
> >
> > What do you think about it?
> >
> > Best,
> >
> > Dawid
> >
> >
>
>

--
Best, Jingsong Lee