Hi everybody,
I'd like to start a discussion about blocking issues and outstanding features of the Table API and SQL for the 1.1.0 release. As you probably know, the Table API was completely reworked and ported to Apache Calcite. Moreover, we added initial support for SQL on batch and streaming tables. We have come quite far but there are still a couple of issue that need to be resolved before we can release a new version of Flink. I would like to start collecting and prioritizing issues such that we can work towards a feature set that we would like to be included in the next release. In order to prepare this list, I tried to execute the TPC-H query set using the currently supported SQL feature set. Only one (Q18) out of the 22 queries could be executed. The others failed due to unsupported features or bugs. In the following, I list issues ordered by priority that I think need be resolved for the release. - FLINK-3728: Detect unsupported operators and improve error messages. While we can effectively prevent unsupported operations in the Table API, this is not easily possible with SQL queries. At the moment, unsupported operations are either not detected and translated into invalid plans or throw a hard to understand exceptions. - FLINK-3859: Add support for DECIMAL. Without this feature, it is not possible to use floating point literals in SQL queries. - FLINK-3152 / FLINK-3580: Add support for date types and date functions. - FLINK-3586: Prevent AVG(LONG) overflow by using BigInteger as intermediate data type. - FLINK-2971: Add support for outer joins (a PR for this issue exists #1981) - FLINK-3936 : Add MIN / MAX aggregation function for BOOLEAN types - FLINK-3916: Add support for generic types which are handled by - FLINK-3723: This is an proposal to split the Table API select() method into select() for projection and aggregate() for aggregations. At the moment, both are handled by select() (such as in SQL) and internally separated by the Table API. We should decide for Flink 1.1.0 whether to implement the proposal or not. - FLINK-3871 / FLINK-3873: Add Table Source and TableSink for Avro encoded Kafka sources - FLINK-3872 / FLINK-3874 : Add TableSource and TableSink for JSON encoded Kafka sources - More TableSource / TableSinks Please review this list, add issues that you think should go in as well, and discuss the priorities of the features. Also if you would like to get involved with improving the Table API / SQL, drop a mail to the mailing list or a comment to a JIRA issue. I think it would be good if somebody would coordinate these efforts. I would be happy to do it. However, I will leave in one month for a two-months parental leave and I don't know how much I can contribute in that time. So if somebody would like to step up and help coordinating, please let me and the others know. Cheers, Fabian |
Hi Fabian,
The priority seems reasonable to me. I skimmed through your TPCH branch and find data type and related functions would be quite important to enable most of TPC-H queries. I would try out your TPC-H branch and pick up some issues you filed these two days for the 1.1.0 release. Hope this helps :) Best, Yijie On Thu, May 19, 2016 at 11:59 PM, Fabian Hueske <[hidden email]> wrote: > Hi everybody, > > I'd like to start a discussion about blocking issues and outstanding > features of the Table API and SQL for the 1.1.0 release. As you probably > know, the Table API was completely reworked and ported to Apache Calcite. > Moreover, we added initial support for SQL on batch and streaming tables. > > We have come quite far but there are still a couple of issue that need to > be resolved before we can release a new version of Flink. I would like to > start collecting and prioritizing issues such that we can work towards a > feature set that we would like to be included in the next release. In order > to prepare this list, I tried to execute the TPC-H query set using the > currently supported SQL feature set. Only one (Q18) out of the 22 queries > could be executed. The others failed due to unsupported features or bugs. > > In the following, I list issues ordered by priority that I think need be > resolved for the release. > > - FLINK-3728: Detect unsupported operators and improve error messages. > While we can effectively prevent unsupported operations in the Table API, > this is not easily possible with SQL queries. At the moment, unsupported > operations are either not detected and translated into invalid plans or > throw a hard to understand exceptions. > - FLINK-3859: Add support for DECIMAL. Without this feature, it is not > possible to use floating point literals in SQL queries. > - FLINK-3152 / FLINK-3580: Add support for date types and date > functions. > - FLINK-3586: Prevent AVG(LONG) overflow by using BigInteger as > intermediate data type. > - FLINK-2971: Add support for outer joins (a PR for this issue exists > #1981) > - FLINK-3936 : Add MIN / MAX aggregation function for BOOLEAN types > - FLINK-3916: Add support for generic types which are handled by > - FLINK-3723: This is an proposal to split the Table API select() > method into select() for projection and aggregate() for aggregations. At > the moment, both are handled by select() (such as in SQL) and internally > separated by the Table API. We should decide for Flink 1.1.0 whether to > implement the proposal or not. > - FLINK-3871 / FLINK-3873: Add Table Source and TableSink for Avro > encoded Kafka sources > - FLINK-3872 / FLINK-3874 : Add TableSource and TableSink for JSON > encoded Kafka sources > - More TableSource / TableSinks > > Please review this list, add issues that you think should go in as well, > and discuss the priorities of the features. > Also if you would like to get involved with improving the Table API / SQL, > drop a mail to the mailing list or a comment to a JIRA issue. > > I think it would be good if somebody would coordinate these efforts. I > would be happy to do it. However, I will leave in one month for a > two-months parental leave and I don't know how much I can contribute in > that time. So if somebody would like to step up and help coordinating, > please let me and the others know. > > Cheers, Fabian > |
Hi Fabian,
thank you for summarizing the most important issues. I already worked on FLINK-3152 / FLINK-3580 but stopped in favor of FLINK-3859. I will open a PR for FLINK-3859 very soon, just need to rebase it onto the latest validation layer and do some testing. Unfortunately I'm on vacation next week. I would like to take care of the above issues. I can also help coordinating the next weeks. What are the plans for 1.1.0 release so far? Regards, Timo On 20.05.2016 14:49, Yijie Shen wrote: > Hi Fabian, > > The priority seems reasonable to me. I skimmed through your TPCH branch and > find data type and related functions would be quite important to enable > most of TPC-H queries. > > I would try out your TPC-H branch and pick up some issues you filed these > two days for the 1.1.0 release. > > Hope this helps :) > > Best, > Yijie > > On Thu, May 19, 2016 at 11:59 PM, Fabian Hueske <[hidden email]> wrote: > >> Hi everybody, >> >> I'd like to start a discussion about blocking issues and outstanding >> features of the Table API and SQL for the 1.1.0 release. As you probably >> know, the Table API was completely reworked and ported to Apache Calcite. >> Moreover, we added initial support for SQL on batch and streaming tables. >> >> We have come quite far but there are still a couple of issue that need to >> be resolved before we can release a new version of Flink. I would like to >> start collecting and prioritizing issues such that we can work towards a >> feature set that we would like to be included in the next release. In order >> to prepare this list, I tried to execute the TPC-H query set using the >> currently supported SQL feature set. Only one (Q18) out of the 22 queries >> could be executed. The others failed due to unsupported features or bugs. >> >> In the following, I list issues ordered by priority that I think need be >> resolved for the release. >> >> - FLINK-3728: Detect unsupported operators and improve error messages. >> While we can effectively prevent unsupported operations in the Table API, >> this is not easily possible with SQL queries. At the moment, unsupported >> operations are either not detected and translated into invalid plans or >> throw a hard to understand exceptions. >> - FLINK-3859: Add support for DECIMAL. Without this feature, it is not >> possible to use floating point literals in SQL queries. >> - FLINK-3152 / FLINK-3580: Add support for date types and date >> functions. >> - FLINK-3586: Prevent AVG(LONG) overflow by using BigInteger as >> intermediate data type. >> - FLINK-2971: Add support for outer joins (a PR for this issue exists >> #1981) >> - FLINK-3936 : Add MIN / MAX aggregation function for BOOLEAN types >> - FLINK-3916: Add support for generic types which are handled by >> - FLINK-3723: This is an proposal to split the Table API select() >> method into select() for projection and aggregate() for aggregations. At >> the moment, both are handled by select() (such as in SQL) and internally >> separated by the Table API. We should decide for Flink 1.1.0 whether to >> implement the proposal or not. >> - FLINK-3871 / FLINK-3873: Add Table Source and TableSink for Avro >> encoded Kafka sources >> - FLINK-3872 / FLINK-3874 : Add TableSource and TableSink for JSON >> encoded Kafka sources >> - More TableSource / TableSinks >> >> Please review this list, add issues that you think should go in as well, >> and discuss the priorities of the features. >> Also if you would like to get involved with improving the Table API / SQL, >> drop a mail to the mailing list or a comment to a JIRA issue. >> >> I think it would be good if somebody would coordinate these efforts. I >> would be happy to do it. However, I will leave in one month for a >> two-months parental leave and I don't know how much I can contribute in >> that time. So if somebody would like to step up and help coordinating, >> please let me and the others know. >> >> Cheers, Fabian >> -- Freundliche Grüße / Kind Regards Timo Walther Follow me: @twalthr https://www.linkedin.com/in/twalthr |
On Fri, May 20, 2016 at 3:32 PM, Timo Walther <[hidden email]> wrote:
> What are the plans for 1.1.0 release so far? Hey Timo, following the 3 months release schedule, a release would be due soon (June 8th). My personal opinion is that exact date is probably a little optimistic, but June sounds about right. We should definitely start the general release discussion. Stephan already suggested this in another thread, so you can expect this to happen soon. It's very good timing that you started preparing SQL for 1.1. That will probably be one of the major features of 1.1. :-) |
Hey all,
@Fabian: thanks for compiling the list of issues and trying out TPC-H I think it would be nice to include the first 5 from your list in 1.1.0. What about FLINK-3656 (re-working the tests)? Do we want to do this before the release as well? Great to see Timo willing to coordinate and Yijie willing to help! Unfortunately, I won't have much time to offer in the following months, but I'll try to keep myself up-to-date :) Cheers, -Vasia. On 20 May 2016 at 15:59, Ufuk Celebi <[hidden email]> wrote: > On Fri, May 20, 2016 at 3:32 PM, Timo Walther <[hidden email]> wrote: > > What are the plans for 1.1.0 release so far? > > Hey Timo, > > following the 3 months release schedule, a release would be due soon > (June 8th). My personal opinion is that exact date is probably a > little optimistic, but June sounds about right. > > We should definitely start the general release discussion. Stephan > already suggested this in another thread, so you can expect this to > happen soon. > > It's very good timing that you started preparing SQL for 1.1. That > will probably be one of the major features of 1.1. :-) > |
Hi everybody,
thanks for the feedback so far. I just went over JIRA and increased the priority of some of the issues I listed (the TOP 6, out of which 1 is fixed, + FLINK-3723) to CRITICAL to distinguish them from the rest. I found two more critical issue that should be fixed: - FLINK-3971: Incorrect handling of null values in aggregation functions - FLINK-3944: Reordering Cartesian products and joins to resolve the Cartesian products into equi-joins. @Vasia: I think that FLINK-3656 is important to resolve, but I see this independent of the release. Most of these issues are currently work in progress which is very good. It would be great if those who are looking for an issue to work on would consider to pick a critical one. Please do also open new issues if you find bugs or important improvements that should be fixed for 1.1.0. Thanks, Fabian 2016-05-20 16:09 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > Hey all, > > @Fabian: thanks for compiling the list of issues and trying out TPC-H I > think it would be nice to include the first 5 from your list in 1.1.0. What > about FLINK-3656 (re-working the tests)? Do we want to do this before the > release as well? > > Great to see Timo willing to coordinate and Yijie willing to help! > Unfortunately, I won't have much time to offer in the following months, but > I'll try to keep myself up-to-date :) > > Cheers, > -Vasia. > > On 20 May 2016 at 15:59, Ufuk Celebi <[hidden email]> wrote: > > > On Fri, May 20, 2016 at 3:32 PM, Timo Walther <[hidden email]> > wrote: > > > What are the plans for 1.1.0 release so far? > > > > Hey Timo, > > > > following the 3 months release schedule, a release would be due soon > > (June 8th). My personal opinion is that exact date is probably a > > little optimistic, but June sounds about right. > > > > We should definitely start the general release discussion. Stephan > > already suggested this in another thread, so you can expect this to > > happen soon. > > > > It's very good timing that you started preparing SQL for 1.1. That > > will probably be one of the major features of 1.1. :-) > > > |
If it's ok for you I'd need also to merge FLINK-3901[1] and FLINK-3908[2]
[1] https://github.com/apache/flink/pull/1989 [2] https://github.com/apache/flink/pull/2007 Best, Flavio On Wed, May 25, 2016 at 5:04 PM, Fabian Hueske <[hidden email]> wrote: > Hi everybody, > > thanks for the feedback so far. > > I just went over JIRA and increased the priority of some of the issues I > listed (the TOP 6, out of which 1 is fixed, + FLINK-3723) to CRITICAL to > distinguish them from the rest. > > I found two more critical issue that should be fixed: > - FLINK-3971: Incorrect handling of null values in aggregation functions > - FLINK-3944: Reordering Cartesian products and joins to resolve the > Cartesian products into equi-joins. > > @Vasia: I think that FLINK-3656 is important to resolve, but I see this > independent of the release. > > Most of these issues are currently work in progress which is very good. > It would be great if those who are looking for an issue to work on would > consider to pick a critical one. > > Please do also open new issues if you find bugs or important improvements > that should be fixed for 1.1.0. > > Thanks, Fabian > > 2016-05-20 16:09 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > Hey all, > > > > @Fabian: thanks for compiling the list of issues and trying out TPC-H I > > think it would be nice to include the first 5 from your list in 1.1.0. > What > > about FLINK-3656 (re-working the tests)? Do we want to do this before the > > release as well? > > > > Great to see Timo willing to coordinate and Yijie willing to help! > > Unfortunately, I won't have much time to offer in the following months, > but > > I'll try to keep myself up-to-date :) > > > > Cheers, > > -Vasia. > > > > On 20 May 2016 at 15:59, Ufuk Celebi <[hidden email]> wrote: > > > > > On Fri, May 20, 2016 at 3:32 PM, Timo Walther <[hidden email]> > > wrote: > > > > What are the plans for 1.1.0 release so far? > > > > > > Hey Timo, > > > > > > following the 3 months release schedule, a release would be due soon > > > (June 8th). My personal opinion is that exact date is probably a > > > little optimistic, but June sounds about right. > > > > > > We should definitely start the general release discussion. Stephan > > > already suggested this in another thread, so you can expect this to > > > happen soon. > > > > > > It's very good timing that you started preparing SQL for 1.1. That > > > will probably be one of the major features of 1.1. :-) > > > > > > |
Hi Flavio,
shouldn't be a problem to include these issues in the release. Cheers, Fabian 2016-05-25 17:42 GMT+02:00 Flavio Pompermaier <[hidden email]>: > If it's ok for you I'd need also to merge FLINK-3901[1] and FLINK-3908[2] > > [1] https://github.com/apache/flink/pull/1989 > [2] https://github.com/apache/flink/pull/2007 > > Best, > Flavio > > On Wed, May 25, 2016 at 5:04 PM, Fabian Hueske <[hidden email]> wrote: > > > Hi everybody, > > > > thanks for the feedback so far. > > > > I just went over JIRA and increased the priority of some of the issues I > > listed (the TOP 6, out of which 1 is fixed, + FLINK-3723) to CRITICAL to > > distinguish them from the rest. > > > > I found two more critical issue that should be fixed: > > - FLINK-3971: Incorrect handling of null values in aggregation functions > > - FLINK-3944: Reordering Cartesian products and joins to resolve the > > Cartesian products into equi-joins. > > > > @Vasia: I think that FLINK-3656 is important to resolve, but I see this > > independent of the release. > > > > Most of these issues are currently work in progress which is very good. > > It would be great if those who are looking for an issue to work on would > > consider to pick a critical one. > > > > Please do also open new issues if you find bugs or important improvements > > that should be fixed for 1.1.0. > > > > Thanks, Fabian > > > > 2016-05-20 16:09 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > > > Hey all, > > > > > > @Fabian: thanks for compiling the list of issues and trying out TPC-H I > > > think it would be nice to include the first 5 from your list in 1.1.0. > > What > > > about FLINK-3656 (re-working the tests)? Do we want to do this before > the > > > release as well? > > > > > > Great to see Timo willing to coordinate and Yijie willing to help! > > > Unfortunately, I won't have much time to offer in the following months, > > but > > > I'll try to keep myself up-to-date :) > > > > > > Cheers, > > > -Vasia. > > > > > > On 20 May 2016 at 15:59, Ufuk Celebi <[hidden email]> wrote: > > > > > > > On Fri, May 20, 2016 at 3:32 PM, Timo Walther <[hidden email]> > > > wrote: > > > > > What are the plans for 1.1.0 release so far? > > > > > > > > Hey Timo, > > > > > > > > following the 3 months release schedule, a release would be due soon > > > > (June 8th). My personal opinion is that exact date is probably a > > > > little optimistic, but June sounds about right. > > > > > > > > We should definitely start the general release discussion. Stephan > > > > already suggested this in another thread, so you can expect this to > > > > happen soon. > > > > > > > > It's very good timing that you started preparing SQL for 1.1. That > > > > will probably be one of the major features of 1.1. :-) > > > > > > > > > > |
Free forum by Nabble | Edit this page |