[DISCUSS] FLIP-63: Rework table partition support

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
Hi everyone, thank you for your comments. Mail name was updated
and streaming-related concepts were added.

We would like to start a discussion thread on "FLIP-63: Rework table
partition support"(Design doc: [1]), where we describe how to partition
support in flink and how to integrate to hive partition.

This FLIP addresses:
   - Introduce whole story about partition support.
   - Introduce and discuss DDL of partition support.
   - Introduce static and dynamic partition insert.
   - Introduce partition pruning
   - Introduce dynamic partition implementation
   - Introduce FileFormatSink to deal with streaming exactly-once and
 partition-related logic.

Details can be seen in the design document.
Looking forward to your feedbacks. Thank you.

[1] https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing

Best,
Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

Biao Liu
Hi Jingsong,

Thank you for bringing this discussion. Since I don't have much experience
of Flink table/SQL, I'll ask some questions from runtime or engine
perspective.

> ... where we describe how to partition support in flink and how to
integrate to hive partition.

FLIP-27 [1] introduces "partition" concept officially. The changes of
FLIP-27 are not only about source interface but also about the whole
infrastructure.
Have you ever thought how to integrate your proposal with these changes? Or
you just want to support "partition" in table layer, there will be no
requirement of underlying infrastructure?

I have seen a discussion [2] that seems be a requirement of infrastructure
to support your proposal. So I have some concerns there might be some
conflicts between this proposal and FLIP-27.

1.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
2.
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html

Thanks,
Biao /'bɪ.aʊ/



On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]>
wrote:

> Hi everyone, thank you for your comments. Mail name was updated
> and streaming-related concepts were added.
>
> We would like to start a discussion thread on "FLIP-63: Rework table
> partition support"(Design doc: [1]), where we describe how to partition
> support in flink and how to integrate to hive partition.
>
> This FLIP addresses:
>    - Introduce whole story about partition support.
>    - Introduce and discuss DDL of partition support.
>    - Introduce static and dynamic partition insert.
>    - Introduce partition pruning
>    - Introduce dynamic partition implementation
>    - Introduce FileFormatSink to deal with streaming exactly-once and
>  partition-related logic.
>
> Details can be seen in the design document.
> Looking forward to your feedbacks. Thank you.
>
> [1]
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>
> Best,
> Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
Hi biao, thanks for your feedbacks:

Actually, the runtime source partition of runtime is similar to split, which concerns data reading, parallelism and fault tolerance, all the runtime concepts.
While table partition is only a virtual concept. Users are more likely to choose which partition to read and which partition to write. Users can manage their partitions.
One is physical implementation correlation, the other is logical concept correlation.
So I think they are two completely different things.

About [2], The main problem is that how to write data to a catalog file system in stream mode, it is a general problem and has little to do with partition.

[2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html

Best,
Jingsong Lee


------------------------------------------------------------------
From:Biao Liu <[hidden email]>
Send Time:2019年9月10日(星期二) 14:57
To:dev <[hidden email]>; JingsongLee <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

Hi Jingsong,

Thank you for bringing this discussion. Since I don't have much experience of Flink table/SQL, I'll ask some questions from runtime or engine perspective.

> ... where we describe how to partition support in flink and how to integrate to hive partition.

FLIP-27 [1] introduces "partition" concept officially. The changes of FLIP-27 are not only about source interface but also about the whole infrastructure.
Have you ever thought how to integrate your proposal with these changes? Or you just want to support "partition" in table layer, there will be no requirement of underlying infrastructure?

I have seen a discussion [2] that seems be a requirement of infrastructure to support your proposal. So I have some concerns there might be some conflicts between this proposal and FLIP-27.

1. https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
2. http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html

Thanks,
Biao /'bɪ.aʊ/



On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]> wrote:
Hi everyone, thank you for your comments. Mail name was updated
 and streaming-related concepts were added.

 We would like to start a discussion thread on "FLIP-63: Rework table
 partition support"(Design doc: [1]), where we describe how to partition
 support in flink and how to integrate to hive partition.

 This FLIP addresses:
    - Introduce whole story about partition support.
    - Introduce and discuss DDL of partition support.
    - Introduce static and dynamic partition insert.
    - Introduce partition pruning
    - Introduce dynamic partition implementation
    - Introduce FileFormatSink to deal with streaming exactly-once and
  partition-related logic.

 Details can be seen in the design document.
 Looking forward to your feedbacks. Thank you.

 [1] https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing

 Best,
 Jingsong Lee

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

Biao Liu
Hi Jingsong,

Thanks for explaining. It looks cool!

Thanks,
Biao /'bɪ.aʊ/



On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]>
wrote:

> Hi biao, thanks for your feedbacks:
>
> Actually, the runtime source partition of runtime is similar to split,
> which concerns data reading, parallelism and fault tolerance, all the
> runtime concepts.
> While table partition is only a virtual concept. Users are more likely to
> choose which partition to read and which partition to write. Users can
> manage their partitions.
> One is physical implementation correlation, the other is logical concept
> correlation.
> So I think they are two completely different things.
>
> About [2], The main problem is that how to write data to a catalog file
> system in stream mode, it is a general problem and has little to do with
> partition.
>
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Biao Liu <[hidden email]>
> Send Time:2019年9月10日(星期二) 14:57
> To:dev <[hidden email]>; JingsongLee <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> Hi Jingsong,
>
> Thank you for bringing this discussion. Since I don't have much experience
> of Flink table/SQL, I'll ask some questions from runtime or engine
> perspective.
>
> > ... where we describe how to partition support in flink and how to
> integrate to hive partition.
>
> FLIP-27 [1] introduces "partition" concept officially. The changes of
> FLIP-27 are not only about source interface but also about the whole
> infrastructure.
> Have you ever thought how to integrate your proposal with these changes?
> Or you just want to support "partition" in table layer, there will be no
> requirement of underlying infrastructure?
>
> I have seen a discussion [2] that seems be a requirement of infrastructure
> to support your proposal. So I have some concerns there might be some
> conflicts between this proposal and FLIP-27.
>
> 1.
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> 2.
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]>
> wrote:
> Hi everyone, thank you for your comments. Mail name was updated
>  and streaming-related concepts were added.
>
>  We would like to start a discussion thread on "FLIP-63: Rework table
>  partition support"(Design doc: [1]), where we describe how to partition
>  support in flink and how to integrate to hive partition.
>
>  This FLIP addresses:
>     - Introduce whole story about partition support.
>     - Introduce and discuss DDL of partition support.
>     - Introduce static and dynamic partition insert.
>     - Introduce partition pruning
>     - Introduce dynamic partition implementation
>     - Introduce FileFormatSink to deal with streaming exactly-once and
>   partition-related logic.
>
>  Details can be seen in the design document.
>  Looking forward to your feedbacks. Thank you.
>
>  [1]
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>
>  Best,
>  Jingsong Lee
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

Kurt Young
+1 to this feature, I left some comments on google doc.

Another comment is I think we should do some reorganize about the content
when you converting this to a cwiki page. I will have some offline
discussion
with you.

Since this feature seems to be a fairly big efforts, so I suggest we can
settle
down the design doc ASAP and start vote process.

Best,
Kurt


On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:

> Hi Jingsong,
>
> Thanks for explaining. It looks cool!
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]
> .invalid>
> wrote:
>
> > Hi biao, thanks for your feedbacks:
> >
> > Actually, the runtime source partition of runtime is similar to split,
> > which concerns data reading, parallelism and fault tolerance, all the
> > runtime concepts.
> > While table partition is only a virtual concept. Users are more likely to
> > choose which partition to read and which partition to write. Users can
> > manage their partitions.
> > One is physical implementation correlation, the other is logical concept
> > correlation.
> > So I think they are two completely different things.
> >
> > About [2], The main problem is that how to write data to a catalog file
> > system in stream mode, it is a general problem and has little to do with
> > partition.
> >
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Biao Liu <[hidden email]>
> > Send Time:2019年9月10日(星期二) 14:57
> > To:dev <[hidden email]>; JingsongLee <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
> >
> > Hi Jingsong,
> >
> > Thank you for bringing this discussion. Since I don't have much
> experience
> > of Flink table/SQL, I'll ask some questions from runtime or engine
> > perspective.
> >
> > > ... where we describe how to partition support in flink and how to
> > integrate to hive partition.
> >
> > FLIP-27 [1] introduces "partition" concept officially. The changes of
> > FLIP-27 are not only about source interface but also about the whole
> > infrastructure.
> > Have you ever thought how to integrate your proposal with these changes?
> > Or you just want to support "partition" in table layer, there will be no
> > requirement of underlying infrastructure?
> >
> > I have seen a discussion [2] that seems be a requirement of
> infrastructure
> > to support your proposal. So I have some concerns there might be some
> > conflicts between this proposal and FLIP-27.
> >
> > 1.
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > 2.
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
> >
> >
> > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]
> .invalid>
> > wrote:
> > Hi everyone, thank you for your comments. Mail name was updated
> >  and streaming-related concepts were added.
> >
> >  We would like to start a discussion thread on "FLIP-63: Rework table
> >  partition support"(Design doc: [1]), where we describe how to partition
> >  support in flink and how to integrate to hive partition.
> >
> >  This FLIP addresses:
> >     - Introduce whole story about partition support.
> >     - Introduce and discuss DDL of partition support.
> >     - Introduce static and dynamic partition insert.
> >     - Introduce partition pruning
> >     - Introduce dynamic partition implementation
> >     - Introduce FileFormatSink to deal with streaming exactly-once and
> >   partition-related logic.
> >
> >  Details can be seen in the design document.
> >  Looking forward to your feedbacks. Thank you.
> >
> >  [1]
> >
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
> >
> >  Best,
> >  Jingsong Lee
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
Thanks for your reply and google doc comments. It has been discussed
 for two weeks now. I will start a vote thread.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <[hidden email]>
Send Time:2019年9月16日(星期一) 15:55
To:dev <[hidden email]>
Cc:JingsongLee <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

+1 to this feature, I left some comments on google doc.

Another comment is I think we should do some reorganize about the content
when you converting this to a cwiki page. I will have some offline discussion
with you.

Since this feature seems to be a fairly big efforts, so I suggest we can settle
down the design doc ASAP and start vote process.
Best,
Kurt


On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:
Hi Jingsong,

 Thanks for explaining. It looks cool!

 Thanks,
 Biao /'bɪ.aʊ/



 On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]>
 wrote:

 > Hi biao, thanks for your feedbacks:
 >
 > Actually, the runtime source partition of runtime is similar to split,
 > which concerns data reading, parallelism and fault tolerance, all the
 > runtime concepts.
 > While table partition is only a virtual concept. Users are more likely to
 > choose which partition to read and which partition to write. Users can
 > manage their partitions.
 > One is physical implementation correlation, the other is logical concept
 > correlation.
 > So I think they are two completely different things.
 >
 > About [2], The main problem is that how to write data to a catalog file
 > system in stream mode, it is a general problem and has little to do with
 > partition.
 >
 > [2]
 > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
 >
 > Best,
 > Jingsong Lee
 >
 >
 > ------------------------------------------------------------------
 > From:Biao Liu <[hidden email]>
 > Send Time:2019年9月10日(星期二) 14:57
 > To:dev <[hidden email]>; JingsongLee <[hidden email]>
 > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
 >
 > Hi Jingsong,
 >
 > Thank you for bringing this discussion. Since I don't have much experience
 > of Flink table/SQL, I'll ask some questions from runtime or engine
 > perspective.
 >
 > > ... where we describe how to partition support in flink and how to
 > integrate to hive partition.
 >
 > FLIP-27 [1] introduces "partition" concept officially. The changes of
 > FLIP-27 are not only about source interface but also about the whole
 > infrastructure.
 > Have you ever thought how to integrate your proposal with these changes?
 > Or you just want to support "partition" in table layer, there will be no
 > requirement of underlying infrastructure?
 >
 > I have seen a discussion [2] that seems be a requirement of infrastructure
 > to support your proposal. So I have some concerns there might be some
 > conflicts between this proposal and FLIP-27.
 >
 > 1.
 > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
 > 2.
 > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
 >
 > Thanks,
 > Biao /'bɪ.aʊ/
 >
 >
 >
 > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]>
 > wrote:
 > Hi everyone, thank you for your comments. Mail name was updated
 >  and streaming-related concepts were added.
 >
 >  We would like to start a discussion thread on "FLIP-63: Rework table
 >  partition support"(Design doc: [1]), where we describe how to partition
 >  support in flink and how to integrate to hive partition.
 >
 >  This FLIP addresses:
 >     - Introduce whole story about partition support.
 >     - Introduce and discuss DDL of partition support.
 >     - Introduce static and dynamic partition insert.
 >     - Introduce partition pruning
 >     - Introduce dynamic partition implementation
 >     - Introduce FileFormatSink to deal with streaming exactly-once and
 >   partition-related logic.
 >
 >  Details can be seen in the design document.
 >  Looking forward to your feedbacks. Thank you.
 >
 >  [1]
 > https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
 >
 >  Best,
 >  Jingsong Lee
 >
 >

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
Thanks for your discussion on google document.
Comments addressed and added FileSystem connector chapter, and introduce code prototype for file system connector to unify flink file system and hive connectors.

Looking forward to your feedbacks. Thank you.

Best,
Jingsong Lee


------------------------------------------------------------------
From:JingsongLee <[hidden email]>
Send Time:2019年9月18日(星期三) 09:45
To:Kurt Young <[hidden email]>; dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

Thanks for your reply and google doc comments. It has been discussed
 for two weeks now. I will start a vote thread.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <[hidden email]>
Send Time:2019年9月16日(星期一) 15:55
To:dev <[hidden email]>
Cc:JingsongLee <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

+1 to this feature, I left some comments on google doc.

Another comment is I think we should do some reorganize about the content
when you converting this to a cwiki page. I will have some offline discussion
with you.

Since this feature seems to be a fairly big efforts, so I suggest we can settle
down the design doc ASAP and start vote process.
Best,
Kurt


On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:
Hi Jingsong,

 Thanks for explaining. It looks cool!

 Thanks,
 Biao /'bɪ.aʊ/



 On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]>
 wrote:

 > Hi biao, thanks for your feedbacks:
 >
 > Actually, the runtime source partition of runtime is similar to split,
 > which concerns data reading, parallelism and fault tolerance, all the
 > runtime concepts.
 > While table partition is only a virtual concept. Users are more likely to
 > choose which partition to read and which partition to write. Users can
 > manage their partitions.
 > One is physical implementation correlation, the other is logical concept
 > correlation.
 > So I think they are two completely different things.
 >
 > About [2], The main problem is that how to write data to a catalog file
 > system in stream mode, it is a general problem and has little to do with
 > partition.
 >
 > [2]
 > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
 >
 > Best,
 > Jingsong Lee
 >
 >
 > ------------------------------------------------------------------
 > From:Biao Liu <[hidden email]>
 > Send Time:2019年9月10日(星期二) 14:57
 > To:dev <[hidden email]>; JingsongLee <[hidden email]>
 > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
 >
 > Hi Jingsong,
 >
 > Thank you for bringing this discussion. Since I don't have much experience
 > of Flink table/SQL, I'll ask some questions from runtime or engine
 > perspective.
 >
 > > ... where we describe how to partition support in flink and how to
 > integrate to hive partition.
 >
 > FLIP-27 [1] introduces "partition" concept officially. The changes of
 > FLIP-27 are not only about source interface but also about the whole
 > infrastructure.
 > Have you ever thought how to integrate your proposal with these changes?
 > Or you just want to support "partition" in table layer, there will be no
 > requirement of underlying infrastructure?
 >
 > I have seen a discussion [2] that seems be a requirement of infrastructure
 > to support your proposal. So I have some concerns there might be some
 > conflicts between this proposal and FLIP-27.
 >
 > 1.
 > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
 > 2.
 > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
 >
 > Thanks,
 > Biao /'bɪ.aʊ/
 >
 >
 >
 > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]>
 > wrote:
 > Hi everyone, thank you for your comments. Mail name was updated
 >  and streaming-related concepts were added.
 >
 >  We would like to start a discussion thread on "FLIP-63: Rework table
 >  partition support"(Design doc: [1]), where we describe how to partition
 >  support in flink and how to integrate to hive partition.
 >
 >  This FLIP addresses:
 >     - Introduce whole story about partition support.
 >     - Introduce and discuss DDL of partition support.
 >     - Introduce static and dynamic partition insert.
 >     - Introduce partition pruning
 >     - Introduce dynamic partition implementation
 >     - Introduce FileFormatSink to deal with streaming exactly-once and
 >   partition-related logic.
 >
 >  Details can be seen in the design document.
 >  Looking forward to your feedbacks. Thank you.
 >
 >  [1]
 > https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
 >
 >  Best,
 >  Jingsong Lee
 >
 >

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

bowen.li
Hi Jingsong,

Thanks for driving this effort!

Besides a few further comments on Catalog APIs that I just left, it LGTM.

Not sure why, but the voting thread in gmail shows in the same thread as
the discussion's. After addressing all the comments, could you start a new,
separate thread to let other people be aware of it?

Thanks,
Bowen

On Mon, Sep 23, 2019 at 1:25 AM JingsongLee <[hidden email]>
wrote:

>  Thanks for your discussion on google document.
> Comments addressed and added FileSystem connector chapter, and introduce
> code prototype for file system connector to unify flink file system and
> hive connectors.
>
> Looking forward to your feedbacks. Thank you.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:JingsongLee <[hidden email]>
> Send Time:2019年9月18日(星期三) 09:45
> To:Kurt Young <[hidden email]>; dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> Thanks for your reply and google doc comments. It has been discussed
>  for two weeks now. I will start a vote thread.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <[hidden email]>
> Send Time:2019年9月16日(星期一) 15:55
> To:dev <[hidden email]>
> Cc:JingsongLee <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> +1 to this feature, I left some comments on google doc.
>
> Another comment is I think we should do some reorganize about the content
> when you converting this to a cwiki page. I will have some offline
> discussion
> with you.
>
> Since this feature seems to be a fairly big efforts, so I suggest we can
> settle
> down the design doc ASAP and start vote process.
> Best,
> Kurt
>
>
> On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:
> Hi Jingsong,
>
>  Thanks for explaining. It looks cool!
>
>  Thanks,
>  Biao /'bɪ.aʊ/
>
>
>
>  On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]
> .invalid>
>  wrote:
>
>  > Hi biao, thanks for your feedbacks:
>  >
>  > Actually, the runtime source partition of runtime is similar to split,
>  > which concerns data reading, parallelism and fault tolerance, all the
>  > runtime concepts.
>  > While table partition is only a virtual concept. Users are more likely
> to
>  > choose which partition to read and which partition to write. Users can
>  > manage their partitions.
>  > One is physical implementation correlation, the other is logical concept
>  > correlation.
>  > So I think they are two completely different things.
>  >
>  > About [2], The main problem is that how to write data to a catalog file
>  > system in stream mode, it is a general problem and has little to do with
>  > partition.
>  >
>  > [2]
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Best,
>  > Jingsong Lee
>  >
>  >
>  > ------------------------------------------------------------------
>  > From:Biao Liu <[hidden email]>
>  > Send Time:2019年9月10日(星期二) 14:57
>  > To:dev <[hidden email]>; JingsongLee <[hidden email]>
>  > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>  >
>  > Hi Jingsong,
>  >
>  > Thank you for bringing this discussion. Since I don't have much
> experience
>  > of Flink table/SQL, I'll ask some questions from runtime or engine
>  > perspective.
>  >
>  > > ... where we describe how to partition support in flink and how to
>  > integrate to hive partition.
>  >
>  > FLIP-27 [1] introduces "partition" concept officially. The changes of
>  > FLIP-27 are not only about source interface but also about the whole
>  > infrastructure.
>  > Have you ever thought how to integrate your proposal with these changes?
>  > Or you just want to support "partition" in table layer, there will be no
>  > requirement of underlying infrastructure?
>  >
>  > I have seen a discussion [2] that seems be a requirement of
> infrastructure
>  > to support your proposal. So I have some concerns there might be some
>  > conflicts between this proposal and FLIP-27.
>  >
>  > 1.
>  >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>  > 2.
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Thanks,
>  > Biao /'bɪ.aʊ/
>  >
>  >
>  >
>  > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]
> .invalid>
>  > wrote:
>  > Hi everyone, thank you for your comments. Mail name was updated
>  >  and streaming-related concepts were added.
>  >
>  >  We would like to start a discussion thread on "FLIP-63: Rework table
>  >  partition support"(Design doc: [1]), where we describe how to partition
>  >  support in flink and how to integrate to hive partition.
>  >
>  >  This FLIP addresses:
>  >     - Introduce whole story about partition support.
>  >     - Introduce and discuss DDL of partition support.
>  >     - Introduce static and dynamic partition insert.
>  >     - Introduce partition pruning
>  >     - Introduce dynamic partition implementation
>  >     - Introduce FileFormatSink to deal with streaming exactly-once and
>  >   partition-related logic.
>  >
>  >  Details can be seen in the design document.
>  >  Looking forward to your feedbacks. Thank you.
>  >
>  >  [1]
>  >
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>  >
>  >  Best,
>  >  Jingsong Lee
>  >
>  >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
Thanks for you review, I will send a another vote thread from my apache email.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Bowen Li <[hidden email]>
Send Time:2019年9月24日(星期二) 03:06
To:JingsongLee <[hidden email]>
Cc:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

Hi Jingsong,

Thanks for driving this effort!

Besides a few further comments on Catalog APIs that I just left, it LGTM.

Not sure why, but the voting thread in gmail shows in the same thread as
the discussion's. After addressing all the comments, could you start a new,
separate thread to let other people be aware of it?

Thanks,
Bowen

On Mon, Sep 23, 2019 at 1:25 AM JingsongLee <[hidden email]>
wrote:

>  Thanks for your discussion on google document.
> Comments addressed and added FileSystem connector chapter, and introduce
> code prototype for file system connector to unify flink file system and
> hive connectors.
>
> Looking forward to your feedbacks. Thank you.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:JingsongLee <[hidden email]>
> Send Time:2019年9月18日(星期三) 09:45
> To:Kurt Young <[hidden email]>; dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> Thanks for your reply and google doc comments. It has been discussed
>  for two weeks now. I will start a vote thread.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <[hidden email]>
> Send Time:2019年9月16日(星期一) 15:55
> To:dev <[hidden email]>
> Cc:JingsongLee <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> +1 to this feature, I left some comments on google doc.
>
> Another comment is I think we should do some reorganize about the content
> when you converting this to a cwiki page. I will have some offline
> discussion
> with you.
>
> Since this feature seems to be a fairly big efforts, so I suggest we can
> settle
> down the design doc ASAP and start vote process.
> Best,
> Kurt
>
>
> On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:
> Hi Jingsong,
>
>  Thanks for explaining. It looks cool!
>
>  Thanks,
>  Biao /'bɪ.aʊ/
>
>
>
>  On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]
> .invalid>
>  wrote:
>
>  > Hi biao, thanks for your feedbacks:
>  >
>  > Actually, the runtime source partition of runtime is similar to split,
>  > which concerns data reading, parallelism and fault tolerance, all the
>  > runtime concepts.
>  > While table partition is only a virtual concept. Users are more likely
> to
>  > choose which partition to read and which partition to write. Users can
>  > manage their partitions.
>  > One is physical implementation correlation, the other is logical concept
>  > correlation.
>  > So I think they are two completely different things.
>  >
>  > About [2], The main problem is that how to write data to a catalog file
>  > system in stream mode, it is a general problem and has little to do with
>  > partition.
>  >
>  > [2]
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Best,
>  > Jingsong Lee
>  >
>  >
>  > ------------------------------------------------------------------
>  > From:Biao Liu <[hidden email]>
>  > Send Time:2019年9月10日(星期二) 14:57
>  > To:dev <[hidden email]>; JingsongLee <[hidden email]>
>  > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>  >
>  > Hi Jingsong,
>  >
>  > Thank you for bringing this discussion. Since I don't have much
> experience
>  > of Flink table/SQL, I'll ask some questions from runtime or engine
>  > perspective.
>  >
>  > > ... where we describe how to partition support in flink and how to
>  > integrate to hive partition.
>  >
>  > FLIP-27 [1] introduces "partition" concept officially. The changes of
>  > FLIP-27 are not only about source interface but also about the whole
>  > infrastructure.
>  > Have you ever thought how to integrate your proposal with these changes?
>  > Or you just want to support "partition" in table layer, there will be no
>  > requirement of underlying infrastructure?
>  >
>  > I have seen a discussion [2] that seems be a requirement of
> infrastructure
>  > to support your proposal. So I have some concerns there might be some
>  > conflicts between this proposal and FLIP-27.
>  >
>  > 1.
>  >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>  > 2.
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Thanks,
>  > Biao /'bɪ.aʊ/
>  >
>  >
>  >
>  > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]
> .invalid>
>  > wrote:
>  > Hi everyone, thank you for your comments. Mail name was updated
>  >  and streaming-related concepts were added.
>  >
>  >  We would like to start a discussion thread on "FLIP-63: Rework table
>  >  partition support"(Design doc: [1]), where we describe how to partition
>  >  support in flink and how to integrate to hive partition.
>  >
>  >  This FLIP addresses:
>  >     - Introduce whole story about partition support.
>  >     - Introduce and discuss DDL of partition support.
>  >     - Introduce static and dynamic partition insert.
>  >     - Introduce partition pruning
>  >     - Introduce dynamic partition implementation
>  >     - Introduce FileFormatSink to deal with streaming exactly-once and
>  >   partition-related logic.
>  >
>  >  Details can be seen in the design document.
>  >  Looking forward to your feedbacks. Thank you.
>  >
>  >  [1]
>  >
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>  >
>  >  Best,
>  >  Jingsong Lee
>  >
>  >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-63: Rework table partition support

JingsongLee-2
After offline discussion with Jark, the current grammar for creating partition tables is limited to hive dialect,
and the Flink built-in grammar for creating partition tables is treated as further discussion, it will be
 determined by voting after a period of time (Need more thinking).

Best,
Jingsong Lee


------------------------------------------------------------------
From:JingsongLee <[hidden email]>
Send Time:2019年9月24日(星期二) 10:19
To:dev <[hidden email]>
Cc:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

Thanks for you review, I will send a another vote thread from my apache email.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Bowen Li <[hidden email]>
Send Time:2019年9月24日(星期二) 03:06
To:JingsongLee <[hidden email]>
Cc:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-63: Rework table partition support

Hi Jingsong,

Thanks for driving this effort!

Besides a few further comments on Catalog APIs that I just left, it LGTM.

Not sure why, but the voting thread in gmail shows in the same thread as
the discussion's. After addressing all the comments, could you start a new,
separate thread to let other people be aware of it?

Thanks,
Bowen

On Mon, Sep 23, 2019 at 1:25 AM JingsongLee <[hidden email]>
wrote:

>  Thanks for your discussion on google document.
> Comments addressed and added FileSystem connector chapter, and introduce
> code prototype for file system connector to unify flink file system and
> hive connectors.
>
> Looking forward to your feedbacks. Thank you.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:JingsongLee <[hidden email]>
> Send Time:2019年9月18日(星期三) 09:45
> To:Kurt Young <[hidden email]>; dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> Thanks for your reply and google doc comments. It has been discussed
>  for two weeks now. I will start a vote thread.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <[hidden email]>
> Send Time:2019年9月16日(星期一) 15:55
> To:dev <[hidden email]>
> Cc:JingsongLee <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> +1 to this feature, I left some comments on google doc.
>
> Another comment is I think we should do some reorganize about the content
> when you converting this to a cwiki page. I will have some offline
> discussion
> with you.
>
> Since this feature seems to be a fairly big efforts, so I suggest we can
> settle
> down the design doc ASAP and start vote process.
> Best,
> Kurt
>
>
> On Thu, Sep 12, 2019 at 12:43 PM Biao Liu <[hidden email]> wrote:
> Hi Jingsong,
>
>  Thanks for explaining. It looks cool!
>
>  Thanks,
>  Biao /'bɪ.aʊ/
>
>
>
>  On Wed, 11 Sep 2019 at 11:37, JingsongLee <[hidden email]
> .invalid>
>  wrote:
>
>  > Hi biao, thanks for your feedbacks:
>  >
>  > Actually, the runtime source partition of runtime is similar to split,
>  > which concerns data reading, parallelism and fault tolerance, all the
>  > runtime concepts.
>  > While table partition is only a virtual concept. Users are more likely
> to
>  > choose which partition to read and which partition to write. Users can
>  > manage their partitions.
>  > One is physical implementation correlation, the other is logical concept
>  > correlation.
>  > So I think they are two completely different things.
>  >
>  > About [2], The main problem is that how to write data to a catalog file
>  > system in stream mode, it is a general problem and has little to do with
>  > partition.
>  >
>  > [2]
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Best,
>  > Jingsong Lee
>  >
>  >
>  > ------------------------------------------------------------------
>  > From:Biao Liu <[hidden email]>
>  > Send Time:2019年9月10日(星期二) 14:57
>  > To:dev <[hidden email]>; JingsongLee <[hidden email]>
>  > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>  >
>  > Hi Jingsong,
>  >
>  > Thank you for bringing this discussion. Since I don't have much
> experience
>  > of Flink table/SQL, I'll ask some questions from runtime or engine
>  > perspective.
>  >
>  > > ... where we describe how to partition support in flink and how to
>  > integrate to hive partition.
>  >
>  > FLIP-27 [1] introduces "partition" concept officially. The changes of
>  > FLIP-27 are not only about source interface but also about the whole
>  > infrastructure.
>  > Have you ever thought how to integrate your proposal with these changes?
>  > Or you just want to support "partition" in table layer, there will be no
>  > requirement of underlying infrastructure?
>  >
>  > I have seen a discussion [2] that seems be a requirement of
> infrastructure
>  > to support your proposal. So I have some concerns there might be some
>  > conflicts between this proposal and FLIP-27.
>  >
>  > 1.
>  >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>  > 2.
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Thanks,
>  > Biao /'bɪ.aʊ/
>  >
>  >
>  >
>  > On Fri, 6 Sep 2019 at 13:22, JingsongLee <[hidden email]
> .invalid>
>  > wrote:
>  > Hi everyone, thank you for your comments. Mail name was updated
>  >  and streaming-related concepts were added.
>  >
>  >  We would like to start a discussion thread on "FLIP-63: Rework table
>  >  partition support"(Design doc: [1]), where we describe how to partition
>  >  support in flink and how to integrate to hive partition.
>  >
>  >  This FLIP addresses:
>  >     - Introduce whole story about partition support.
>  >     - Introduce and discuss DDL of partition support.
>  >     - Introduce static and dynamic partition insert.
>  >     - Introduce partition pruning
>  >     - Introduce dynamic partition implementation
>  >     - Introduce FileFormatSink to deal with streaming exactly-once and
>  >   partition-related logic.
>  >
>  >  Details can be seen in the design document.
>  >  Looking forward to your feedbacks. Thank you.
>  >
>  >  [1]
>  >
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>  >
>  >  Best,
>  >  Jingsong Lee
>  >
>  >
>
>