Hi all,
I'd like to propose introduce flink-connector-hive-xx modules. We have documented the dependencies detailed information[2]. But still has some inconvenient: - Too many versions, users need to pick one version from 8 versions. - Too many versions, It's not friendly to our developers either, because there's a problem/exception, we need to look at eight different versions of hive client code, which are often various. - Too many jars, for example, users need to download 4+ jars for Hive 1.x from various places. We have discussed in [1] and [2], but unfortunately, we can not achieve an agreement. For improving this, I'd like to introduce few flink-connector-hive-xx modules in flink-connectors, module contains all the dependencies related to hive. And only support lower hive metastore versions: - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 Users can choose one and download to flink/lib. It includes all hive things. I try to use a single module to deploy multiple versions, but I can not find a suitable way, because different modules require different versions and different dependencies. What do you think? [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html Best, Jingsong Lee |
Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in
the past few months that the complexity involved in different Hive versions has been quite painful for users to start with. So it's great to step forward and deal with such issue. Before getting on a decision, can you please explain: 1) why you proposed segregating hive versions into the 5 ranges above? 2) what different Hive features are supported in the 5 ranges? 3) have you tested that whether the proposed corresponding Flink module will be fully compatible with each Hive version range? Thanks, Bowen On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]> wrote: > Hi all, > > I'd like to propose introduce flink-connector-hive-xx modules. > > We have documented the dependencies detailed information[2]. But still has > some inconvenient: > - Too many versions, users need to pick one version from 8 versions. > - Too many versions, It's not friendly to our developers either, because > there's a problem/exception, we need to look at eight different versions of > hive client code, which are often various. > - Too many jars, for example, users need to download 4+ jars for Hive 1.x > from various places. > > We have discussed in [1] and [2], but unfortunately, we can not achieve an > agreement. > > For improving this, I'd like to introduce few flink-connector-hive-xx > modules in flink-connectors, module contains all the dependencies related > to hive. And only support lower hive metastore versions: > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > Users can choose one and download to flink/lib. It includes all hive > things. > > I try to use a single module to deploy multiple versions, but I can not > find a suitable way, because different modules require different versions > and different dependencies. > > What do you think? > > [1] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > [2] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > Best, > Jingsong Lee > |
Thanks Bowen for involving.
> why you proposed segregating hive versions into the 5 ranges above? & what different Hive features are supported in the 5 ranges? For only higher client dependencies version support lower hive metastore versions: - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we can throw exception for the unsupported feature. - Hive 2.0 and Hive 2.1, primary key support and alter_partition api change. - Hive 2.2 no thrift change. - Hive 2.3 change many things, lots of thrift change. - Hive 3+, not null. unique, timestamp, so many things. All these things can be found in hive_metastore.thrift. I think I can try do more effort in implementation to use Hive 2.2 to support Hive 2.0. So the range size will be 4. > have you tested that whether the proposed corresponding Flink module will be fully compatible with each Hive version range? Yes, I have done some tests, not really for "fully", but it is a technical judgment. Best, Jingsong Lee On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote: > Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in > the past few months that the complexity involved in different Hive versions > has been quite painful for users to start with. So it's great to step > forward and deal with such issue. > > Before getting on a decision, can you please explain: > > 1) why you proposed segregating hive versions into the 5 ranges above? > 2) what different Hive features are supported in the 5 ranges? > 3) have you tested that whether the proposed corresponding Flink module > will be fully compatible with each Hive version range? > > Thanks, > Bowen > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]> > wrote: > > > Hi all, > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > We have documented the dependencies detailed information[2]. But still > has > > some inconvenient: > > - Too many versions, users need to pick one version from 8 versions. > > - Too many versions, It's not friendly to our developers either, because > > there's a problem/exception, we need to look at eight different versions > of > > hive client code, which are often various. > > - Too many jars, for example, users need to download 4+ jars for Hive 1.x > > from various places. > > > > We have discussed in [1] and [2], but unfortunately, we can not achieve > an > > agreement. > > > > For improving this, I'd like to introduce few flink-connector-hive-xx > > modules in flink-connectors, module contains all the dependencies related > > to hive. And only support lower hive metastore versions: > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > Users can choose one and download to flink/lib. It includes all hive > > things. > > > > I try to use a single module to deploy multiple versions, but I can not > > find a suitable way, because different modules require different versions > > and different dependencies. > > > > What do you think? > > > > [1] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > [2] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > Best, > > Jingsong Lee > > > -- Best, Jingsong Lee |
Thanks Jingsong for your explanation! I'm +1 for this initiative.
According to your description, I think it makes sense to incorporate support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to 4. A couple minor followup questions: 1) will there be a base module like "flink-connector-hive-base" which holds all the common logic of these proposed modules and is compiled into the uber jar of "flink-connector-hive-xxx"? 2) according to my observation, it's more common to set the version in module name to be the lowest version that this module supports, e.g. for Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0" rather than "flink-connector-hive-1.2" On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> wrote: > Thanks Bowen for involving. > > > why you proposed segregating hive versions into the 5 ranges above? & > what different Hive features are supported in the 5 ranges? > > For only higher client dependencies version support lower hive metastore > versions: > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we > can throw exception for the unsupported feature. > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api > change. > - Hive 2.2 no thrift change. > - Hive 2.3 change many things, lots of thrift change. > - Hive 3+, not null. unique, timestamp, so many things. > > All these things can be found in hive_metastore.thrift. > > I think I can try do more effort in implementation to use Hive 2.2 to > support Hive 2.0. So the range size will be 4. > > > have you tested that whether the proposed corresponding Flink module will > be fully compatible with each Hive version range? > > Yes, I have done some tests, not really for "fully", but it is a technical > judgment. > > Best, > Jingsong Lee > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote: > > > Thanks, Jingsong, for bringing this up. We've received lots of feedbacks > in > > the past few months that the complexity involved in different Hive > versions > > has been quite painful for users to start with. So it's great to step > > forward and deal with such issue. > > > > Before getting on a decision, can you please explain: > > > > 1) why you proposed segregating hive versions into the 5 ranges above? > > 2) what different Hive features are supported in the 5 ranges? > > 3) have you tested that whether the proposed corresponding Flink module > > will be fully compatible with each Hive version range? > > > > Thanks, > > Bowen > > > > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]> > > wrote: > > > > > Hi all, > > > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > > > We have documented the dependencies detailed information[2]. But still > > has > > > some inconvenient: > > > - Too many versions, users need to pick one version from 8 versions. > > > - Too many versions, It's not friendly to our developers either, > because > > > there's a problem/exception, we need to look at eight different > versions > > of > > > hive client code, which are often various. > > > - Too many jars, for example, users need to download 4+ jars for Hive > 1.x > > > from various places. > > > > > > We have discussed in [1] and [2], but unfortunately, we can not achieve > > an > > > agreement. > > > > > > For improving this, I'd like to introduce few flink-connector-hive-xx > > > modules in flink-connectors, module contains all the dependencies > related > > > to hive. And only support lower hive metastore versions: > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > > > Users can choose one and download to flink/lib. It includes all hive > > > things. > > > > > > I try to use a single module to deploy multiple versions, but I can not > > > find a suitable way, because different modules require different > versions > > > and different dependencies. > > > > > > What do you think? > > > > > > [1] > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > > [2] > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > > > Best, > > > Jingsong Lee > > > > > > > > -- > Best, Jingsong Lee > |
Hi Bowen, thanks for your reply.
> will there be a base module like "flink-connector-hive-base" which holds all the common logic of these proposed modules Maybe we don't need, their implementation is only "pom.xml". Different versions have different dependencies. > it's more common to set the version in module name to be the lowest version that this module supports I have some hesitation, because the actual version number can better reflect the actual dependency. For example, if the user also knows the field hiveVersion[1]. He may enter the wrong hiveVersion because of the name, or he may have the wrong expectation for the hive built-in functions. [1] https://github.com/apache/flink/pull/11304 Best, Jingsong Lee On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote: > Thanks Jingsong for your explanation! I'm +1 for this initiative. > > According to your description, I think it makes sense to incorporate > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to > 4. > > A couple minor followup questions: > 1) will there be a base module like "flink-connector-hive-base" which holds > all the common logic of these proposed modules and is compiled into the > uber jar of "flink-connector-hive-xxx"? > 2) according to my observation, it's more common to set the version in > module name to be the lowest version that this module supports, e.g. for > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0" > rather than "flink-connector-hive-1.2" > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> > wrote: > > > Thanks Bowen for involving. > > > > > why you proposed segregating hive versions into the 5 ranges above? & > > what different Hive features are supported in the 5 ranges? > > > > For only higher client dependencies version support lower hive metastore > > versions: > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, > we > > can throw exception for the unsupported feature. > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api > > change. > > - Hive 2.2 no thrift change. > > - Hive 2.3 change many things, lots of thrift change. > > - Hive 3+, not null. unique, timestamp, so many things. > > > > All these things can be found in hive_metastore.thrift. > > > > I think I can try do more effort in implementation to use Hive 2.2 to > > support Hive 2.0. So the range size will be 4. > > > > > have you tested that whether the proposed corresponding Flink module > will > > be fully compatible with each Hive version range? > > > > Yes, I have done some tests, not really for "fully", but it is a > technical > > judgment. > > > > Best, > > Jingsong Lee > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote: > > > > > Thanks, Jingsong, for bringing this up. We've received lots of > feedbacks > > in > > > the past few months that the complexity involved in different Hive > > versions > > > has been quite painful for users to start with. So it's great to step > > > forward and deal with such issue. > > > > > > Before getting on a decision, can you please explain: > > > > > > 1) why you proposed segregating hive versions into the 5 ranges above? > > > 2) what different Hive features are supported in the 5 ranges? > > > 3) have you tested that whether the proposed corresponding Flink module > > > will be fully compatible with each Hive version range? > > > > > > Thanks, > > > Bowen > > > > > > > > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email]> > > > wrote: > > > > > > > Hi all, > > > > > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > > > > > We have documented the dependencies detailed information[2]. But > still > > > has > > > > some inconvenient: > > > > - Too many versions, users need to pick one version from 8 versions. > > > > - Too many versions, It's not friendly to our developers either, > > because > > > > there's a problem/exception, we need to look at eight different > > versions > > > of > > > > hive client code, which are often various. > > > > - Too many jars, for example, users need to download 4+ jars for Hive > > 1.x > > > > from various places. > > > > > > > > We have discussed in [1] and [2], but unfortunately, we can not > achieve > > > an > > > > agreement. > > > > > > > > For improving this, I'd like to introduce few flink-connector-hive-xx > > > > modules in flink-connectors, module contains all the dependencies > > related > > > > to hive. And only support lower hive metastore versions: > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > > > > > Users can choose one and download to flink/lib. It includes all hive > > > > things. > > > > > > > > I try to use a single module to deploy multiple versions, but I can > not > > > > find a suitable way, because different modules require different > > versions > > > > and different dependencies. > > > > > > > > What do you think? > > > > > > > > [1] > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > > > [2] > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > > > > > Best, > > > > Jingsong Lee > > > > > > > > > > > > > -- > > Best, Jingsong Lee > > > -- Best, Jingsong Lee |
> I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the field hiveVersion[1]. He may enter the wrong hiveVersion because of the name, or he may have the wrong expectation for the hive built-in functions. Sorry, I'm not sure if my proposal is understood correctly. What I'm saying is, in your original proposal, taking an example, suggested naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2, a name including the highest Hive version it supports. I'm suggesting to name it "flink-connector-hive-1.0", a name including the lowest Hive version it supports. What do you think? On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]> wrote: > Hi Bowen, thanks for your reply. > > > will there be a base module like "flink-connector-hive-base" which holds > all the common logic of these proposed modules > > Maybe we don't need, their implementation is only "pom.xml". Different > versions have different dependencies. > > > it's more common to set the version in module name to be the lowest > version that this module supports > > I have some hesitation, because the actual version number can better > reflect the actual dependency. For example, if the user also knows the > field hiveVersion[1]. He may enter the wrong hiveVersion because of the > name, or he may have the wrong expectation for the hive built-in functions. > > [1] https://github.com/apache/flink/pull/11304 > > Best, > Jingsong Lee > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote: > > > Thanks Jingsong for your explanation! I'm +1 for this initiative. > > > > According to your description, I think it makes sense to incorporate > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges > to > > 4. > > > > A couple minor followup questions: > > 1) will there be a base module like "flink-connector-hive-base" which > holds > > all the common logic of these proposed modules and is compiled into the > > uber jar of "flink-connector-hive-xxx"? > > 2) according to my observation, it's more common to set the version in > > module name to be the lowest version that this module supports, e.g. for > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0" > > rather than "flink-connector-hive-1.2" > > > > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> > > wrote: > > > > > Thanks Bowen for involving. > > > > > > > why you proposed segregating hive versions into the 5 ranges above? & > > > what different Hive features are supported in the 5 ranges? > > > > > > For only higher client dependencies version support lower hive > metastore > > > versions: > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, > > we > > > can throw exception for the unsupported feature. > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api > > > change. > > > - Hive 2.2 no thrift change. > > > - Hive 2.3 change many things, lots of thrift change. > > > - Hive 3+, not null. unique, timestamp, so many things. > > > > > > All these things can be found in hive_metastore.thrift. > > > > > > I think I can try do more effort in implementation to use Hive 2.2 to > > > support Hive 2.0. So the range size will be 4. > > > > > > > have you tested that whether the proposed corresponding Flink module > > will > > > be fully compatible with each Hive version range? > > > > > > Yes, I have done some tests, not really for "fully", but it is a > > technical > > > judgment. > > > > > > Best, > > > Jingsong Lee > > > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote: > > > > > > > Thanks, Jingsong, for bringing this up. We've received lots of > > feedbacks > > > in > > > > the past few months that the complexity involved in different Hive > > > versions > > > > has been quite painful for users to start with. So it's great to step > > > > forward and deal with such issue. > > > > > > > > Before getting on a decision, can you please explain: > > > > > > > > 1) why you proposed segregating hive versions into the 5 ranges > above? > > > > 2) what different Hive features are supported in the 5 ranges? > > > > 3) have you tested that whether the proposed corresponding Flink > module > > > > will be fully compatible with each Hive version range? > > > > > > > > Thanks, > > > > Bowen > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee <[hidden email] > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > > > > > > > We have documented the dependencies detailed information[2]. But > > still > > > > has > > > > > some inconvenient: > > > > > - Too many versions, users need to pick one version from 8 > versions. > > > > > - Too many versions, It's not friendly to our developers either, > > > because > > > > > there's a problem/exception, we need to look at eight different > > > versions > > > > of > > > > > hive client code, which are often various. > > > > > - Too many jars, for example, users need to download 4+ jars for > Hive > > > 1.x > > > > > from various places. > > > > > > > > > > We have discussed in [1] and [2], but unfortunately, we can not > > achieve > > > > an > > > > > agreement. > > > > > > > > > > For improving this, I'd like to introduce few > flink-connector-hive-xx > > > > > modules in flink-connectors, module contains all the dependencies > > > related > > > > > to hive. And only support lower hive metastore versions: > > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > > > > > > > Users can choose one and download to flink/lib. It includes all > hive > > > > > things. > > > > > > > > > > I try to use a single module to deploy multiple versions, but I can > > not > > > > > find a suitable way, because different modules require different > > > versions > > > > > and different dependencies. > > > > > > > > > > What do you think? > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > > > > [2] > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > > > > > > > Best, > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > -- > > > Best, Jingsong Lee > > > > > > > > -- > Best, Jingsong Lee > |
Hi Bowen,
My idea is to directly provide the really dependent version, such as hive 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly know the version. As for which metastore is supported, we can guide it in the document, otherwise, write 1.0, and the result version is indeed 1.2.2, which will make users have wrong expectations. Another, maybe 2.3.6 can support 2.0-2.2 after some efforts. Best, Jingsong Lee On Fri, Mar 6, 2020 at 1:00 AM Bowen Li <[hidden email]> wrote: > > I have some hesitation, because the actual version number can better > reflect the actual dependency. For example, if the user also knows the > field hiveVersion[1]. He may enter the wrong hiveVersion because of the > name, or he may have the wrong expectation for the hive built-in functions. > > Sorry, I'm not sure if my proposal is understood correctly. > > What I'm saying is, in your original proposal, taking an example, suggested > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 - > 1.2.2, a name including the highest Hive version it supports. I'm > suggesting to name it "flink-connector-hive-1.0", a name including the > lowest Hive version it supports. > > What do you think? > > > > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]> > wrote: > > > Hi Bowen, thanks for your reply. > > > > > will there be a base module like "flink-connector-hive-base" which > holds > > all the common logic of these proposed modules > > > > Maybe we don't need, their implementation is only "pom.xml". Different > > versions have different dependencies. > > > > > it's more common to set the version in module name to be the lowest > > version that this module supports > > > > I have some hesitation, because the actual version number can better > > reflect the actual dependency. For example, if the user also knows the > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the > > name, or he may have the wrong expectation for the hive built-in > functions. > > > > [1] https://github.com/apache/flink/pull/11304 > > > > Best, > > Jingsong Lee > > > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote: > > > > > Thanks Jingsong for your explanation! I'm +1 for this initiative. > > > > > > According to your description, I think it makes sense to incorporate > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of > ranges > > to > > > 4. > > > > > > A couple minor followup questions: > > > 1) will there be a base module like "flink-connector-hive-base" which > > holds > > > all the common logic of these proposed modules and is compiled into the > > > uber jar of "flink-connector-hive-xxx"? > > > 2) according to my observation, it's more common to set the version in > > > module name to be the lowest version that this module supports, e.g. > for > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0" > > > rather than "flink-connector-hive-1.2" > > > > > > > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> > > > wrote: > > > > > > > Thanks Bowen for involving. > > > > > > > > > why you proposed segregating hive versions into the 5 ranges > above? & > > > > what different Hive features are supported in the 5 ranges? > > > > > > > > For only higher client dependencies version support lower hive > > metastore > > > > versions: > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column > stats, > > > we > > > > can throw exception for the unsupported feature. > > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api > > > > change. > > > > - Hive 2.2 no thrift change. > > > > - Hive 2.3 change many things, lots of thrift change. > > > > - Hive 3+, not null. unique, timestamp, so many things. > > > > > > > > All these things can be found in hive_metastore.thrift. > > > > > > > > I think I can try do more effort in implementation to use Hive 2.2 to > > > > support Hive 2.0. So the range size will be 4. > > > > > > > > > have you tested that whether the proposed corresponding Flink > module > > > will > > > > be fully compatible with each Hive version range? > > > > > > > > Yes, I have done some tests, not really for "fully", but it is a > > > technical > > > > judgment. > > > > > > > > Best, > > > > Jingsong Lee > > > > > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> wrote: > > > > > > > > > Thanks, Jingsong, for bringing this up. We've received lots of > > > feedbacks > > > > in > > > > > the past few months that the complexity involved in different Hive > > > > versions > > > > > has been quite painful for users to start with. So it's great to > step > > > > > forward and deal with such issue. > > > > > > > > > > Before getting on a decision, can you please explain: > > > > > > > > > > 1) why you proposed segregating hive versions into the 5 ranges > > above? > > > > > 2) what different Hive features are supported in the 5 ranges? > > > > > 3) have you tested that whether the proposed corresponding Flink > > module > > > > > will be fully compatible with each Hive version range? > > > > > > > > > > Thanks, > > > > > Bowen > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > > > > > > > > > We have documented the dependencies detailed information[2]. But > > > still > > > > > has > > > > > > some inconvenient: > > > > > > - Too many versions, users need to pick one version from 8 > > versions. > > > > > > - Too many versions, It's not friendly to our developers either, > > > > because > > > > > > there's a problem/exception, we need to look at eight different > > > > versions > > > > > of > > > > > > hive client code, which are often various. > > > > > > - Too many jars, for example, users need to download 4+ jars for > > Hive > > > > 1.x > > > > > > from various places. > > > > > > > > > > > > We have discussed in [1] and [2], but unfortunately, we can not > > > achieve > > > > > an > > > > > > agreement. > > > > > > > > > > > > For improving this, I'd like to introduce few > > flink-connector-hive-xx > > > > > > modules in flink-connectors, module contains all the dependencies > > > > related > > > > > > to hive. And only support lower hive metastore versions: > > > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > > > > > > > > > Users can choose one and download to flink/lib. It includes all > > hive > > > > > > things. > > > > > > > > > > > > I try to use a single module to deploy multiple versions, but I > can > > > not > > > > > > find a suitable way, because different modules require different > > > > versions > > > > > > and different dependencies. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > > > > > > > > > Best, > > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best, Jingsong Lee > > > > > > > > > > > > > -- > > Best, Jingsong Lee > > > -- Best, Jingsong Lee |
Hi Jingsong,
I think I misunderstood you. So your argument is that, to support hive 1.0.0 - 1.2.2, we are actually using Hive 1.2.2 and thus we name the flink module as "flink-connector-hive-1.2", right? It makes sense to me now. +1 for this change. Cheers, Bowen On Thu, Mar 5, 2020 at 6:53 PM Jingsong Li <[hidden email]> wrote: > Hi Bowen, > > My idea is to directly provide the really dependent version, such as hive > 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly > know the version. As for which metastore is supported, we can guide it in > the document, otherwise, write 1.0, and the result version is indeed 1.2.2, > which will make users have wrong expectations. > > Another, maybe 2.3.6 can support 2.0-2.2 after some efforts. > > Best, > Jingsong Lee > > On Fri, Mar 6, 2020 at 1:00 AM Bowen Li <[hidden email]> wrote: > > > > I have some hesitation, because the actual version number can better > > reflect the actual dependency. For example, if the user also knows the > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the > > name, or he may have the wrong expectation for the hive built-in > functions. > > > > Sorry, I'm not sure if my proposal is understood correctly. > > > > What I'm saying is, in your original proposal, taking an example, > suggested > > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 - > > 1.2.2, a name including the highest Hive version it supports. I'm > > suggesting to name it "flink-connector-hive-1.0", a name including the > > lowest Hive version it supports. > > > > What do you think? > > > > > > > > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li <[hidden email]> > > wrote: > > > > > Hi Bowen, thanks for your reply. > > > > > > > will there be a base module like "flink-connector-hive-base" which > > holds > > > all the common logic of these proposed modules > > > > > > Maybe we don't need, their implementation is only "pom.xml". Different > > > versions have different dependencies. > > > > > > > it's more common to set the version in module name to be the lowest > > > version that this module supports > > > > > > I have some hesitation, because the actual version number can better > > > reflect the actual dependency. For example, if the user also knows the > > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the > > > name, or he may have the wrong expectation for the hive built-in > > functions. > > > > > > [1] https://github.com/apache/flink/pull/11304 > > > > > > Best, > > > Jingsong Lee > > > > > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li <[hidden email]> wrote: > > > > > > > Thanks Jingsong for your explanation! I'm +1 for this initiative. > > > > > > > > According to your description, I think it makes sense to incorporate > > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of > > ranges > > > to > > > > 4. > > > > > > > > A couple minor followup questions: > > > > 1) will there be a base module like "flink-connector-hive-base" which > > > holds > > > > all the common logic of these proposed modules and is compiled into > the > > > > uber jar of "flink-connector-hive-xxx"? > > > > 2) according to my observation, it's more common to set the version > in > > > > module name to be the lowest version that this module supports, e.g. > > for > > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0" > > > > rather than "flink-connector-hive-1.2" > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li <[hidden email]> > > > > wrote: > > > > > > > > > Thanks Bowen for involving. > > > > > > > > > > > why you proposed segregating hive versions into the 5 ranges > > above? & > > > > > what different Hive features are supported in the 5 ranges? > > > > > > > > > > For only higher client dependencies version support lower hive > > > metastore > > > > > versions: > > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column > > stats, > > > > we > > > > > can throw exception for the unsupported feature. > > > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition > api > > > > > change. > > > > > - Hive 2.2 no thrift change. > > > > > - Hive 2.3 change many things, lots of thrift change. > > > > > - Hive 3+, not null. unique, timestamp, so many things. > > > > > > > > > > All these things can be found in hive_metastore.thrift. > > > > > > > > > > I think I can try do more effort in implementation to use Hive 2.2 > to > > > > > support Hive 2.0. So the range size will be 4. > > > > > > > > > > > have you tested that whether the proposed corresponding Flink > > module > > > > will > > > > > be fully compatible with each Hive version range? > > > > > > > > > > Yes, I have done some tests, not really for "fully", but it is a > > > > technical > > > > > judgment. > > > > > > > > > > Best, > > > > > Jingsong Lee > > > > > > > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li <[hidden email]> > wrote: > > > > > > > > > > > Thanks, Jingsong, for bringing this up. We've received lots of > > > > feedbacks > > > > > in > > > > > > the past few months that the complexity involved in different > Hive > > > > > versions > > > > > > has been quite painful for users to start with. So it's great to > > step > > > > > > forward and deal with such issue. > > > > > > > > > > > > Before getting on a decision, can you please explain: > > > > > > > > > > > > 1) why you proposed segregating hive versions into the 5 ranges > > > above? > > > > > > 2) what different Hive features are supported in the 5 ranges? > > > > > > 3) have you tested that whether the proposed corresponding Flink > > > module > > > > > > will be fully compatible with each Hive version range? > > > > > > > > > > > > Thanks, > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I'd like to propose introduce flink-connector-hive-xx modules. > > > > > > > > > > > > > > We have documented the dependencies detailed information[2]. > But > > > > still > > > > > > has > > > > > > > some inconvenient: > > > > > > > - Too many versions, users need to pick one version from 8 > > > versions. > > > > > > > - Too many versions, It's not friendly to our developers > either, > > > > > because > > > > > > > there's a problem/exception, we need to look at eight different > > > > > versions > > > > > > of > > > > > > > hive client code, which are often various. > > > > > > > - Too many jars, for example, users need to download 4+ jars > for > > > Hive > > > > > 1.x > > > > > > > from various places. > > > > > > > > > > > > > > We have discussed in [1] and [2], but unfortunately, we can not > > > > achieve > > > > > > an > > > > > > > agreement. > > > > > > > > > > > > > > For improving this, I'd like to introduce few > > > flink-connector-hive-xx > > > > > > > modules in flink-connectors, module contains all the > dependencies > > > > > related > > > > > > > to hive. And only support lower hive metastore versions: > > > > > > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2 > > > > > > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1 > > > > > > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0 > > > > > > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6 > > > > > > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2 > > > > > > > > > > > > > > Users can choose one and download to flink/lib. It includes all > > > hive > > > > > > > things. > > > > > > > > > > > > > > I try to use a single module to deploy multiple versions, but I > > can > > > > not > > > > > > > find a suitable way, because different modules require > different > > > > > versions > > > > > > > and different dependencies. > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html > > > > > > > > > > > > > > Best, > > > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best, Jingsong Lee > > > > > > > > > > > > > > > > > > -- > > > Best, Jingsong Lee > > > > > > > > -- > Best, Jingsong Lee > |
Free forum by Nabble | Edit this page |