[DISCUSS] Introduce flink-connector-hive-xx modules

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Introduce flink-connector-hive-xx modules

Jingsong Lee
Hi all,

I'd like to propose introduce flink-connector-hive-xx modules.

We have documented the dependencies detailed information[2]. But still has
some inconvenient:
- Too many versions, users need to pick one version from 8 versions.
- Too many versions, It's not friendly to our developers either, because
there's a problem/exception, we need to look at eight different versions of
hive client code, which are often various.
- Too many jars, for example, users need to download 4+ jars for Hive 1.x
from various places.

We have discussed in [1] and [2], but unfortunately, we can not achieve an
agreement.

For improving this, I'd like to introduce few flink-connector-hive-xx
modules in flink-connectors, module contains all the dependencies related
to hive. And only support lower hive metastore versions:
- "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
- "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
- "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
- "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
- "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2

Users can choose one and download to flink/lib. It includes all hive things.

I try to use a single module to deploy multiple versions, but I can not
find a suitable way, because different modules require different versions
and different dependencies.

What do you think?

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html

Best,
Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

bowen.li
Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in
the past few months that the complexity involved in different Hive versions
has been quite painful for users to start with. So it's great to step
forward and deal with such issue.

Before getting on a decision, can you please explain:

1) why you proposed segregating hive versions into the 5 ranges above?
2) what different Hive features are supported in the 5 ranges?
3) have you tested that whether the proposed corresponding Flink module
will be fully compatible with each Hive version range?

Thanks,
Bowen



On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]> wrote:

> Hi all,
>
> I'd like to propose introduce flink-connector-hive-xx modules.
>
> We have documented the dependencies detailed information[2]. But still has
> some inconvenient:
> - Too many versions, users need to pick one version from 8 versions.
> - Too many versions, It's not friendly to our developers either, because
> there's a problem/exception, we need to look at eight different versions of
> hive client code, which are often various.
> - Too many jars, for example, users need to download 4+ jars for Hive 1.x
> from various places.
>
> We have discussed in [1] and [2], but unfortunately, we can not achieve an
> agreement.
>
> For improving this, I'd like to introduce few flink-connector-hive-xx
> modules in flink-connectors, module contains all the dependencies related
> to hive. And only support lower hive metastore versions:
> - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
>
> Users can choose one and download to flink/lib. It includes all hive
> things.
>
> I try to use a single module to deploy multiple versions, but I can not
> find a suitable way, because different modules require different versions
> and different dependencies.
>
> What do you think?
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
>
> Best,
> Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

Jingsong Li
Thanks Bowen for involving.

> why you proposed segregating hive versions into the 5 ranges above? &
what different Hive features are supported in the 5 ranges?

For only higher client dependencies version support lower hive metastore
versions:
- Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we
can throw exception for the unsupported feature.
- Hive 2.0 and Hive 2.1, primary key support and alter_partition api change.
- Hive 2.2 no thrift change.
- Hive 2.3 change many things, lots of thrift change.
- Hive 3+, not null. unique, timestamp, so many things.

All these things can be found in hive_metastore.thrift.

I think I can try do more effort in implementation to use Hive 2.2 to
support Hive 2.0. So the range size will be 4.

> have you tested that whether the proposed corresponding Flink module will
be fully compatible with each Hive version range?

Yes, I have done some tests, not really for "fully", but it is a technical
judgment.

Best,
Jingsong Lee

On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote:

> Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in
> the past few months that the complexity involved in different Hive versions
> has been quite painful for users to start with. So it's great to step
> forward and deal with such issue.
>
> Before getting on a decision, can you please explain:
>
> 1) why you proposed segregating hive versions into the 5 ranges above?
> 2) what different Hive features are supported in the 5 ranges?
> 3) have you tested that whether the proposed corresponding Flink module
> will be fully compatible with each Hive version range?
>
> Thanks,
> Bowen
>
>
>
> On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]>
> wrote:
>
> > Hi all,
> >
> > I'd like to propose introduce flink-connector-hive-xx modules.
> >
> > We have documented the dependencies detailed information[2]. But still
> has
> > some inconvenient:
> > - Too many versions, users need to pick one version from 8 versions.
> > - Too many versions, It's not friendly to our developers either, because
> > there's a problem/exception, we need to look at eight different versions
> of
> > hive client code, which are often various.
> > - Too many jars, for example, users need to download 4+ jars for Hive 1.x
> > from various places.
> >
> > We have discussed in [1] and [2], but unfortunately, we can not achieve
> an
> > agreement.
> >
> > For improving this, I'd like to introduce few flink-connector-hive-xx
> > modules in flink-connectors, module contains all the dependencies related
> > to hive. And only support lower hive metastore versions:
> > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> >
> > Users can choose one and download to flink/lib. It includes all hive
> > things.
> >
> > I try to use a single module to deploy multiple versions, but I can not
> > find a suitable way, because different modules require different versions
> > and different dependencies.
> >
> > What do you think?
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > [2]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> >
> > Best,
> > Jingsong Lee
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

bowen.li
Thanks Jingsong for your explanation! I'm +1 for this initiative.

According to your description, I think it makes sense to incorporate
support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to
4.

A couple minor followup questions:
1) will there be a base module like "flink-connector-hive-base" which holds
all the common logic of these proposed modules and is compiled into the
uber jar of "flink-connector-hive-xxx"?
2) according to my observation, it's more common to set the version in
module name to be the lowest version that this module supports, e.g. for
Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
rather than "flink-connector-hive-1.2"


On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> wrote:

> Thanks Bowen for involving.
>
> > why you proposed segregating hive versions into the 5 ranges above? &
> what different Hive features are supported in the 5 ranges?
>
> For only higher client dependencies version support lower hive metastore
> versions:
> - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we
> can throw exception for the unsupported feature.
> - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> change.
> - Hive 2.2 no thrift change.
> - Hive 2.3 change many things, lots of thrift change.
> - Hive 3+, not null. unique, timestamp, so many things.
>
> All these things can be found in hive_metastore.thrift.
>
> I think I can try do more effort in implementation to use Hive 2.2 to
> support Hive 2.0. So the range size will be 4.
>
> > have you tested that whether the proposed corresponding Flink module will
> be fully compatible with each Hive version range?
>
> Yes, I have done some tests, not really for "fully", but it is a technical
> judgment.
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote:
>
> > Thanks, Jingsong, for bringing this up. We've received lots of feedbacks
> in
> > the past few months that the complexity involved in different Hive
> versions
> > has been quite painful for users to start with. So it's great to step
> > forward and deal with such issue.
> >
> > Before getting on a decision, can you please explain:
> >
> > 1) why you proposed segregating hive versions into the 5 ranges above?
> > 2) what different Hive features are supported in the 5 ranges?
> > 3) have you tested that whether the proposed corresponding Flink module
> > will be fully compatible with each Hive version range?
> >
> > Thanks,
> > Bowen
> >
> >
> >
> > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose introduce flink-connector-hive-xx modules.
> > >
> > > We have documented the dependencies detailed information[2]. But still
> > has
> > > some inconvenient:
> > > - Too many versions, users need to pick one version from 8 versions.
> > > - Too many versions, It's not friendly to our developers either,
> because
> > > there's a problem/exception, we need to look at eight different
> versions
> > of
> > > hive client code, which are often various.
> > > - Too many jars, for example, users need to download 4+ jars for Hive
> 1.x
> > > from various places.
> > >
> > > We have discussed in [1] and [2], but unfortunately, we can not achieve
> > an
> > > agreement.
> > >
> > > For improving this, I'd like to introduce few flink-connector-hive-xx
> > > modules in flink-connectors, module contains all the dependencies
> related
> > > to hive. And only support lower hive metastore versions:
> > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > >
> > > Users can choose one and download to flink/lib. It includes all hive
> > > things.
> > >
> > > I try to use a single module to deploy multiple versions, but I can not
> > > find a suitable way, because different modules require different
> versions
> > > and different dependencies.
> > >
> > > What do you think?
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > [2]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > >
> > > Best,
> > > Jingsong Lee
> > >
> >
>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

Jingsong Li
Hi Bowen, thanks for your reply.

> will there be a base module like "flink-connector-hive-base" which holds
all the common logic of these proposed modules

Maybe we don't need, their implementation is only "pom.xml". Different
versions have different dependencies.

> it's more common to set the version in module name to be the lowest
version that this module supports

I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the
field hiveVersion[1]. He may enter the wrong hiveVersion because of the
name, or he may have the wrong expectation for the hive built-in functions.

[1] https://github.com/apache/flink/pull/11304

Best,
Jingsong Lee

On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote:

> Thanks Jingsong for your explanation! I'm +1 for this initiative.
>
> According to your description, I think it makes sense to incorporate
> support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to
> 4.
>
> A couple minor followup questions:
> 1) will there be a base module like "flink-connector-hive-base" which holds
> all the common logic of these proposed modules and is compiled into the
> uber jar of "flink-connector-hive-xxx"?
> 2) according to my observation, it's more common to set the version in
> module name to be the lowest version that this module supports, e.g. for
> Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> rather than "flink-connector-hive-1.2"
>
>
> On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]>
> wrote:
>
> > Thanks Bowen for involving.
> >
> > > why you proposed segregating hive versions into the 5 ranges above? &
> > what different Hive features are supported in the 5 ranges?
> >
> > For only higher client dependencies version support lower hive metastore
> > versions:
> > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats,
> we
> > can throw exception for the unsupported feature.
> > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > change.
> > - Hive 2.2 no thrift change.
> > - Hive 2.3 change many things, lots of thrift change.
> > - Hive 3+, not null. unique, timestamp, so many things.
> >
> > All these things can be found in hive_metastore.thrift.
> >
> > I think I can try do more effort in implementation to use Hive 2.2 to
> > support Hive 2.0. So the range size will be 4.
> >
> > > have you tested that whether the proposed corresponding Flink module
> will
> > be fully compatible with each Hive version range?
> >
> > Yes, I have done some tests, not really for "fully", but it is a
> technical
> > judgment.
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote:
> >
> > > Thanks, Jingsong, for bringing this up. We've received lots of
> feedbacks
> > in
> > > the past few months that the complexity involved in different Hive
> > versions
> > > has been quite painful for users to start with. So it's great to step
> > > forward and deal with such issue.
> > >
> > > Before getting on a decision, can you please explain:
> > >
> > > 1) why you proposed segregating hive versions into the 5 ranges above?
> > > 2) what different Hive features are supported in the 5 ranges?
> > > 3) have you tested that whether the proposed corresponding Flink module
> > > will be fully compatible with each Hive version range?
> > >
> > > Thanks,
> > > Bowen
> > >
> > >
> > >
> > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to propose introduce flink-connector-hive-xx modules.
> > > >
> > > > We have documented the dependencies detailed information[2]. But
> still
> > > has
> > > > some inconvenient:
> > > > - Too many versions, users need to pick one version from 8 versions.
> > > > - Too many versions, It's not friendly to our developers either,
> > because
> > > > there's a problem/exception, we need to look at eight different
> > versions
> > > of
> > > > hive client code, which are often various.
> > > > - Too many jars, for example, users need to download 4+ jars for Hive
> > 1.x
> > > > from various places.
> > > >
> > > > We have discussed in [1] and [2], but unfortunately, we can not
> achieve
> > > an
> > > > agreement.
> > > >
> > > > For improving this, I'd like to introduce few flink-connector-hive-xx
> > > > modules in flink-connectors, module contains all the dependencies
> > related
> > > > to hive. And only support lower hive metastore versions:
> > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > > >
> > > > Users can choose one and download to flink/lib. It includes all hive
> > > > things.
> > > >
> > > > I try to use a single module to deploy multiple versions, but I can
> not
> > > > find a suitable way, because different modules require different
> > versions
> > > > and different dependencies.
> > > >
> > > > What do you think?
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > > [2]
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

bowen.li
> I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the
field hiveVersion[1]. He may enter the wrong hiveVersion because of the
name, or he may have the wrong expectation for the hive built-in functions.

Sorry, I'm not sure if my proposal is understood correctly.

What I'm saying is, in your original proposal, taking an example, suggested
naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
1.2.2, a name including the highest Hive version it supports. I'm
suggesting to name it "flink-connector-hive-1.0", a name including the
lowest Hive version it supports.

What do you think?



On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]> wrote:

> Hi Bowen, thanks for your reply.
>
> > will there be a base module like "flink-connector-hive-base" which holds
> all the common logic of these proposed modules
>
> Maybe we don't need, their implementation is only "pom.xml". Different
> versions have different dependencies.
>
> > it's more common to set the version in module name to be the lowest
> version that this module supports
>
> I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> [1] https://github.com/apache/flink/pull/11304
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote:
>
> > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> >
> > According to your description, I think it makes sense to incorporate
> > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges
> to
> > 4.
> >
> > A couple minor followup questions:
> > 1) will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules and is compiled into the
> > uber jar of "flink-connector-hive-xxx"?
> > 2) according to my observation, it's more common to set the version in
> > module name to be the lowest version that this module supports, e.g. for
> > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > rather than "flink-connector-hive-1.2"
> >
> >
> > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]>
> > wrote:
> >
> > > Thanks Bowen for involving.
> > >
> > > > why you proposed segregating hive versions into the 5 ranges above? &
> > > what different Hive features are supported in the 5 ranges?
> > >
> > > For only higher client dependencies version support lower hive
> metastore
> > > versions:
> > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats,
> > we
> > > can throw exception for the unsupported feature.
> > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > change.
> > > - Hive 2.2 no thrift change.
> > > - Hive 2.3 change many things, lots of thrift change.
> > > - Hive 3+, not null. unique, timestamp, so many things.
> > >
> > > All these things can be found in hive_metastore.thrift.
> > >
> > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > support Hive 2.0. So the range size will be 4.
> > >
> > > > have you tested that whether the proposed corresponding Flink module
> > will
> > > be fully compatible with each Hive version range?
> > >
> > > Yes, I have done some tests, not really for "fully", but it is a
> > technical
> > > judgment.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote:
> > >
> > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > feedbacks
> > > in
> > > > the past few months that the complexity involved in different Hive
> > > versions
> > > > has been quite painful for users to start with. So it's great to step
> > > > forward and deal with such issue.
> > > >
> > > > Before getting on a decision, can you please explain:
> > > >
> > > > 1) why you proposed segregating hive versions into the 5 ranges
> above?
> > > > 2) what different Hive features are supported in the 5 ranges?
> > > > 3) have you tested that whether the proposed corresponding Flink
> module
> > > > will be fully compatible with each Hive version range?
> > > >
> > > > Thanks,
> > > > Bowen
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to propose introduce flink-connector-hive-xx modules.
> > > > >
> > > > > We have documented the dependencies detailed information[2]. But
> > still
> > > > has
> > > > > some inconvenient:
> > > > > - Too many versions, users need to pick one version from 8
> versions.
> > > > > - Too many versions, It's not friendly to our developers either,
> > > because
> > > > > there's a problem/exception, we need to look at eight different
> > > versions
> > > > of
> > > > > hive client code, which are often various.
> > > > > - Too many jars, for example, users need to download 4+ jars for
> Hive
> > > 1.x
> > > > > from various places.
> > > > >
> > > > > We have discussed in [1] and [2], but unfortunately, we can not
> > achieve
> > > > an
> > > > > agreement.
> > > > >
> > > > > For improving this, I'd like to introduce few
> flink-connector-hive-xx
> > > > > modules in flink-connectors, module contains all the dependencies
> > > related
> > > > > to hive. And only support lower hive metastore versions:
> > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > > > >
> > > > > Users can choose one and download to flink/lib. It includes all
> hive
> > > > > things.
> > > > >
> > > > > I try to use a single module to deploy multiple versions, but I can
> > not
> > > > > find a suitable way, because different modules require different
> > > versions
> > > > > and different dependencies.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

Jingsong Li
Hi Bowen,

My idea is to directly provide the really dependent version, such as hive
1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
know the version. As for which metastore is supported, we can guide it in
the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
which will make users have wrong expectations.

Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.

Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 1:00 AM Bowen Li <[hidden email]> wrote:

> > I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> Sorry, I'm not sure if my proposal is understood correctly.
>
> What I'm saying is, in your original proposal, taking an example, suggested
> naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> 1.2.2, a name including the highest Hive version it supports. I'm
> suggesting to name it "flink-connector-hive-1.0", a name including the
> lowest Hive version it supports.
>
> What do you think?
>
>
>
> On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]>
> wrote:
>
> > Hi Bowen, thanks for your reply.
> >
> > > will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules
> >
> > Maybe we don't need, their implementation is only "pom.xml". Different
> > versions have different dependencies.
> >
> > > it's more common to set the version in module name to be the lowest
> > version that this module supports
> >
> > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > [1] https://github.com/apache/flink/pull/11304
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote:
> >
> > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > >
> > > According to your description, I think it makes sense to incorporate
> > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> ranges
> > to
> > > 4.
> > >
> > > A couple minor followup questions:
> > > 1) will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules and is compiled into the
> > > uber jar of "flink-connector-hive-xxx"?
> > > 2) according to my observation, it's more common to set the version in
> > > module name to be the lowest version that this module supports, e.g.
> for
> > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > rather than "flink-connector-hive-1.2"
> > >
> > >
> > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]>
> > > wrote:
> > >
> > > > Thanks Bowen for involving.
> > > >
> > > > > why you proposed segregating hive versions into the 5 ranges
> above? &
> > > > what different Hive features are supported in the 5 ranges?
> > > >
> > > > For only higher client dependencies version support lower hive
> > metastore
> > > > versions:
> > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> stats,
> > > we
> > > > can throw exception for the unsupported feature.
> > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > > change.
> > > > - Hive 2.2 no thrift change.
> > > > - Hive 2.3 change many things, lots of thrift change.
> > > > - Hive 3+, not null. unique, timestamp, so many things.
> > > >
> > > > All these things can be found in hive_metastore.thrift.
> > > >
> > > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > > support Hive 2.0. So the range size will be 4.
> > > >
> > > > > have you tested that whether the proposed corresponding Flink
> module
> > > will
> > > > be fully compatible with each Hive version range?
> > > >
> > > > Yes, I have done some tests, not really for "fully", but it is a
> > > technical
> > > > judgment.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote:
> > > >
> > > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > > feedbacks
> > > > in
> > > > > the past few months that the complexity involved in different Hive
> > > > versions
> > > > > has been quite painful for users to start with. So it's great to
> step
> > > > > forward and deal with such issue.
> > > > >
> > > > > Before getting on a decision, can you please explain:
> > > > >
> > > > > 1) why you proposed segregating hive versions into the 5 ranges
> > above?
> > > > > 2) what different Hive features are supported in the 5 ranges?
> > > > > 3) have you tested that whether the proposed corresponding Flink
> > module
> > > > > will be fully compatible with each Hive version range?
> > > > >
> > > > > Thanks,
> > > > > Bowen
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <
> [hidden email]
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to propose introduce flink-connector-hive-xx modules.
> > > > > >
> > > > > > We have documented the dependencies detailed information[2]. But
> > > still
> > > > > has
> > > > > > some inconvenient:
> > > > > > - Too many versions, users need to pick one version from 8
> > versions.
> > > > > > - Too many versions, It's not friendly to our developers either,
> > > > because
> > > > > > there's a problem/exception, we need to look at eight different
> > > > versions
> > > > > of
> > > > > > hive client code, which are often various.
> > > > > > - Too many jars, for example, users need to download 4+ jars for
> > Hive
> > > > 1.x
> > > > > > from various places.
> > > > > >
> > > > > > We have discussed in [1] and [2], but unfortunately, we can not
> > > achieve
> > > > > an
> > > > > > agreement.
> > > > > >
> > > > > > For improving this, I'd like to introduce few
> > flink-connector-hive-xx
> > > > > > modules in flink-connectors, module contains all the dependencies
> > > > related
> > > > > > to hive. And only support lower hive metastore versions:
> > > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > > > > >
> > > > > > Users can choose one and download to flink/lib. It includes all
> > hive
> > > > > > things.
> > > > > >
> > > > > > I try to use a single module to deploy multiple versions, but I
> can
> > > not
> > > > > > find a suitable way, because different modules require different
> > > > versions
> > > > > > and different dependencies.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > > > > >
> > > > > > Best,
> > > > > > Jingsong Lee
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

bowen.li
Hi Jingsong,

I think I misunderstood you. So your argument is that, to support hive
1.0.0 - 1.2.2, we are actually using Hive 1.2.2 and thus we name the flink
module as "flink-connector-hive-1.2", right? It makes sense to me now.

+1 for this change.

Cheers,
Bowen

On Thu, Mar 5, 2020 at 6:53 PM Jingsong Li <[hidden email]> wrote:

> Hi Bowen,
>
> My idea is to directly provide the really dependent version, such as hive
> 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
> know the version. As for which metastore is supported, we can guide it in
> the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
> which will make users have wrong expectations.
>
> Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 1:00 AM Bowen Li <[hidden email]> wrote:
>
> > > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > Sorry, I'm not sure if my proposal is understood correctly.
> >
> > What I'm saying is, in your original proposal, taking an example,
> suggested
> > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> > 1.2.2, a name including the highest Hive version it supports. I'm
> > suggesting to name it "flink-connector-hive-1.0", a name including the
> > lowest Hive version it supports.
> >
> > What do you think?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]>
> > wrote:
> >
> > > Hi Bowen, thanks for your reply.
> > >
> > > > will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules
> > >
> > > Maybe we don't need, their implementation is only "pom.xml". Different
> > > versions have different dependencies.
> > >
> > > > it's more common to set the version in module name to be the lowest
> > > version that this module supports
> > >
> > > I have some hesitation, because the actual version number can better
> > > reflect the actual dependency. For example, if the user also knows the
> > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > > name, or he may have the wrong expectation for the hive built-in
> > functions.
> > >
> > > [1] https://github.com/apache/flink/pull/11304
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote:
> > >
> > > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > > >
> > > > According to your description, I think it makes sense to incorporate
> > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> > ranges
> > > to
> > > > 4.
> > > >
> > > > A couple minor followup questions:
> > > > 1) will there be a base module like "flink-connector-hive-base" which
> > > holds
> > > > all the common logic of these proposed modules and is compiled into
> the
> > > > uber jar of "flink-connector-hive-xxx"?
> > > > 2) according to my observation, it's more common to set the version
> in
> > > > module name to be the lowest version that this module supports, e.g.
> > for
> > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > > rather than "flink-connector-hive-1.2"
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]>
> > > > wrote:
> > > >
> > > > > Thanks Bowen for involving.
> > > > >
> > > > > > why you proposed segregating hive versions into the 5 ranges
> > above? &
> > > > > what different Hive features are supported in the 5 ranges?
> > > > >
> > > > > For only higher client dependencies version support lower hive
> > > metastore
> > > > > versions:
> > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> > stats,
> > > > we
> > > > > can throw exception for the unsupported feature.
> > > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition
> api
> > > > > change.
> > > > > - Hive 2.2 no thrift change.
> > > > > - Hive 2.3 change many things, lots of thrift change.
> > > > > - Hive 3+, not null. unique, timestamp, so many things.
> > > > >
> > > > > All these things can be found in hive_metastore.thrift.
> > > > >
> > > > > I think I can try do more effort in implementation to use Hive 2.2
> to
> > > > > support Hive 2.0. So the range size will be 4.
> > > > >
> > > > > > have you tested that whether the proposed corresponding Flink
> > module
> > > > will
> > > > > be fully compatible with each Hive version range?
> > > > >
> > > > > Yes, I have done some tests, not really for "fully", but it is a
> > > > technical
> > > > > judgment.
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]>
> wrote:
> > > > >
> > > > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > > > feedbacks
> > > > > in
> > > > > > the past few months that the complexity involved in different
> Hive
> > > > > versions
> > > > > > has been quite painful for users to start with. So it's great to
> > step
> > > > > > forward and deal with such issue.
> > > > > >
> > > > > > Before getting on a decision, can you please explain:
> > > > > >
> > > > > > 1) why you proposed segregating hive versions into the 5 ranges
> > > above?
> > > > > > 2) what different Hive features are supported in the 5 ranges?
> > > > > > 3) have you tested that whether the proposed corresponding Flink
> > > module
> > > > > > will be fully compatible with each Hive version range?
> > > > > >
> > > > > > Thanks,
> > > > > > Bowen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to propose introduce flink-connector-hive-xx modules.
> > > > > > >
> > > > > > > We have documented the dependencies detailed information[2].
> But
> > > > still
> > > > > > has
> > > > > > > some inconvenient:
> > > > > > > - Too many versions, users need to pick one version from 8
> > > versions.
> > > > > > > - Too many versions, It's not friendly to our developers
> either,
> > > > > because
> > > > > > > there's a problem/exception, we need to look at eight different
> > > > > versions
> > > > > > of
> > > > > > > hive client code, which are often various.
> > > > > > > - Too many jars, for example, users need to download 4+ jars
> for
> > > Hive
> > > > > 1.x
> > > > > > > from various places.
> > > > > > >
> > > > > > > We have discussed in [1] and [2], but unfortunately, we can not
> > > > achieve
> > > > > > an
> > > > > > > agreement.
> > > > > > >
> > > > > > > For improving this, I'd like to introduce few
> > > flink-connector-hive-xx
> > > > > > > modules in flink-connectors, module contains all the
> dependencies
> > > > > related
> > > > > > > to hive. And only support lower hive metastore versions:
> > > > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > > > > > >
> > > > > > > Users can choose one and download to flink/lib. It includes all
> > > hive
> > > > > > > things.
> > > > > > >
> > > > > > > I try to use a single module to deploy multiple versions, but I
> > can
> > > > not
> > > > > > > find a suitable way, because different modules require
> different
> > > > > versions
> > > > > > > and different dependencies.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong Lee
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best, Jingsong Lee
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>
>
> --
> Best, Jingsong Lee
>