Hi folks,
We would like to start the discussion thread about FLIP-36 support interactive programming in Flink Table API. https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink There has been an extended discussion[1] in the mailing list. To quick recap, we propose to add capability of caching intermediate results in user applications for later usage. Feedback and comments are welcome! Thanks, Jiangjie (Becket) Qin [1] http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E |
Hi Becket,
Thank you for driving the effort and writing down the detailed proposal. To me this FLIP looks good and it has +1 from me. Piotr Nowojski > On 12 Mar 2019, at 13:21, Becket Qin <[hidden email]> wrote: > > Hi folks, > > We would like to start the discussion thread about FLIP-36 support > interactive programming in Flink Table API. > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > > There has been an extended discussion[1] in the mailing list. To quick > recap, we propose to add capability of caching intermediate results in user > applications for later usage. > > Feedback and comments are welcome! > > Thanks, > > Jiangjie (Becket) Qin > > [1] > http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E |
Thanks Piotr, for the +1 and all the patient discussion :)
On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <[hidden email]> wrote: > Hi Becket, > > Thank you for driving the effort and writing down the detailed proposal. > To me this FLIP looks good and it has +1 from me. > > Piotr Nowojski > > > On 12 Mar 2019, at 13:21, Becket Qin <[hidden email]> wrote: > > > > Hi folks, > > > > We would like to start the discussion thread about FLIP-36 support > > interactive programming in Flink Table API. > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > > > > There has been an extended discussion[1] in the mailing list. To quick > > recap, we propose to add capability of caching intermediate results in > user > > applications for later usage. > > > > Feedback and comments are welcome! > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > [1] > > > http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E > > |
Hi folks,
Just want to revive this discussion thread. A few of us had some offline discussions around the implementation details of this FLIP. Here I briefly summarize the offline discussion: -- Some concerns were raised to the default implementation of cache service. 1. The default cache service introduces a separate service in Flink runtime, which seems complicated, especially when things like colocation is needed. 2. Using the Flink job to run default cache service may expose unnecessary implementation details to the users. (e.g. it may take some slot and resource, etc). 3. Sharing of the persistent shuffle in the network stack may need additional work in runtime. In the interest of addressing the above concerns. We would like to make some changes to the current FLIP proposal. In general we agreed that our primary goal is to unify the storage tier of default shuffle service and default intermediate result storage. Stephan gave some valuable suggestions on how to improve the current FLIP design and to align with the efforts of FLIP-31. Some highlights are: 1. Unify the storage tier of default shuffle service and default intermediate result storage to network stack. 2. We need both internal (default) and external services for Shuffle and Intermediate Result. The internal (default) implementation is for out-of-box user experience. The external service is for more sophisticated use cases. 3. Having two interfaces *ShuffleService *and *IntermediateResultStorage (for explicit cache handling). *The internal default network-stack-based solution implement both interfaces. -- As a result of these discussions, we would like to add a few more things to the current FLIP-36. More specifically: 1. A pluggable IntermediateResultStorage interface (for explicit cache handling). 2. A mechanism to enable intermediate results (persisted shuffle and explicit cache) reference across jobs. 3. A stack to manage intermediate result metadata (persisted shuffle and explicit cache) in runtime. The detail design is explained in the following doc. The doc is mostly about the implementation of default intermediate result storage. API wise, it is an addition to the existing Table API change proposed in FLIP. https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit# I'll update FLIP-36 wiki to reflect the new proposal. But we can probably use the Google Doc for discussion right now while I am updating the FLIP wiki. Thanks, Jiangjie (Becket) Qin On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <[hidden email]> wrote: > Thanks Piotr, for the +1 and all the patient discussion :) > > On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <[hidden email]> > wrote: > >> Hi Becket, >> >> Thank you for driving the effort and writing down the detailed proposal. >> To me this FLIP looks good and it has +1 from me. >> >> Piotr Nowojski >> >> > On 12 Mar 2019, at 13:21, Becket Qin <[hidden email]> wrote: >> > >> > Hi folks, >> > >> > We would like to start the discussion thread about FLIP-36 support >> > interactive programming in Flink Table API. >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink >> > >> > There has been an extended discussion[1] in the mailing list. To quick >> > recap, we propose to add capability of caching intermediate results in >> user >> > applications for later usage. >> > >> > Feedback and comments are welcome! >> > >> > Thanks, >> > >> > Jiangjie (Becket) Qin >> > >> > [1] >> > >> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E >> >> |
Hi Flink devs,
We have gone through some more discussion over the design proposed in the last email and made some further modification to the design of the default intermediate result storage. I have just updated the wiki page of FLIP-36 to reflect the latest design. https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-ImplementationDetails To summarize briefly, the default intermediate result storage relies on the network stack to store the intermediate results and maintains all the intermediate result metadata on the client side. We avoided introducing additional services in runtime but tried to integrate the design with existing components as much as possible. Looking forward to your feedback. Thanks, Jiangjie (Becket) Qin On Wed, Apr 10, 2019 at 9:36 PM Becket Qin <[hidden email]> wrote: > Hi folks, > > Just want to revive this discussion thread. A few of us had some offline > discussions around the implementation details of this FLIP. > > Here I briefly summarize the offline discussion: > > -- > Some concerns were raised to the default implementation of cache service. > 1. The default cache service introduces a separate service in Flink > runtime, which seems complicated, especially when things like colocation is > needed. > 2. Using the Flink job to run default cache service may expose unnecessary > implementation details to the users. (e.g. it may take some slot and > resource, etc). > 3. Sharing of the persistent shuffle in the network stack may need > additional work in runtime. > > In the interest of addressing the above concerns. We would like to make > some changes to the current FLIP proposal. > > In general we agreed that our primary goal is to unify the storage tier of > default shuffle service and default intermediate result storage. > > Stephan gave some valuable suggestions on how to improve the current FLIP > design and to align with the efforts of FLIP-31. Some highlights are: > 1. Unify the storage tier of default shuffle service and default > intermediate result storage to network stack. > 2. We need both internal (default) and external services for Shuffle and > Intermediate Result. The internal (default) implementation is for > out-of-box user experience. The external service is for more sophisticated > use cases. > 3. Having two interfaces *ShuffleService *and *IntermediateResultStorage > (for explicit cache handling). *The internal default network-stack-based > solution implement both interfaces. > -- > > As a result of these discussions, we would like to add a few more things > to the current FLIP-36. More specifically: > 1. A pluggable IntermediateResultStorage interface (for explicit cache > handling). > 2. A mechanism to enable intermediate results (persisted shuffle and > explicit cache) reference across jobs. > 3. A stack to manage intermediate result metadata (persisted shuffle and > explicit cache) in runtime. > > The detail design is explained in the following doc. The doc is mostly > about the implementation of default intermediate result storage. API wise, > it is an addition to the existing Table API change proposed in FLIP. > > > https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit# > > I'll update FLIP-36 wiki to reflect the new proposal. But we can probably > use the Google Doc for discussion right now while I am updating the FLIP > wiki. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <[hidden email]> wrote: > >> Thanks Piotr, for the +1 and all the patient discussion :) >> >> On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <[hidden email]> >> wrote: >> >>> Hi Becket, >>> >>> Thank you for driving the effort and writing down the detailed proposal. >>> To me this FLIP looks good and it has +1 from me. >>> >>> Piotr Nowojski >>> >>> > On 12 Mar 2019, at 13:21, Becket Qin <[hidden email]> wrote: >>> > >>> > Hi folks, >>> > >>> > We would like to start the discussion thread about FLIP-36 support >>> > interactive programming in Flink Table API. >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink >>> > >>> > There has been an extended discussion[1] in the mailing list. To quick >>> > recap, we propose to add capability of caching intermediate results in >>> user >>> > applications for later usage. >>> > >>> > Feedback and comments are welcome! >>> > >>> > Thanks, >>> > >>> > Jiangjie (Becket) Qin >>> > >>> > [1] >>> > >>> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E >>> >>> |
The FLIP looks good and is quite details, thanks!
I think we should proceed to start to vote whether to accept this FLIP. If the feature and design are accepted, the next step would be to have an implementation breakdown. Best, Stephan On Mon, May 6, 2019 at 4:18 AM Becket Qin <[hidden email]> wrote: > Hi Flink devs, > > We have gone through some more discussion over the design proposed in the > last email and made some further modification to the design of the default > intermediate result storage. I have just updated the wiki page of FLIP-36 > to reflect the latest design. > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-ImplementationDetails > > To summarize briefly, the default intermediate result storage relies on the > network stack to store the intermediate results and maintains all the > intermediate result metadata on the client side. We avoided introducing > additional services in runtime but tried to integrate the design with > existing components as much as possible. > > Looking forward to your feedback. > > Thanks, > > Jiangjie (Becket) Qin > > > > On Wed, Apr 10, 2019 at 9:36 PM Becket Qin <[hidden email]> wrote: > > > Hi folks, > > > > Just want to revive this discussion thread. A few of us had some offline > > discussions around the implementation details of this FLIP. > > > > Here I briefly summarize the offline discussion: > > > > -- > > Some concerns were raised to the default implementation of cache service. > > 1. The default cache service introduces a separate service in Flink > > runtime, which seems complicated, especially when things like colocation > is > > needed. > > 2. Using the Flink job to run default cache service may expose > unnecessary > > implementation details to the users. (e.g. it may take some slot and > > resource, etc). > > 3. Sharing of the persistent shuffle in the network stack may need > > additional work in runtime. > > > > In the interest of addressing the above concerns. We would like to make > > some changes to the current FLIP proposal. > > > > In general we agreed that our primary goal is to unify the storage tier > of > > default shuffle service and default intermediate result storage. > > > > Stephan gave some valuable suggestions on how to improve the current FLIP > > design and to align with the efforts of FLIP-31. Some highlights are: > > 1. Unify the storage tier of default shuffle service and default > > intermediate result storage to network stack. > > 2. We need both internal (default) and external services for Shuffle > and > > Intermediate Result. The internal (default) implementation is for > > out-of-box user experience. The external service is for more > sophisticated > > use cases. > > 3. Having two interfaces *ShuffleService *and > *IntermediateResultStorage > > (for explicit cache handling). *The internal default network-stack-based > > solution implement both interfaces. > > -- > > > > As a result of these discussions, we would like to add a few more things > > to the current FLIP-36. More specifically: > > 1. A pluggable IntermediateResultStorage interface (for explicit cache > > handling). > > 2. A mechanism to enable intermediate results (persisted shuffle and > > explicit cache) reference across jobs. > > 3. A stack to manage intermediate result metadata (persisted shuffle and > > explicit cache) in runtime. > > > > The detail design is explained in the following doc. The doc is mostly > > about the implementation of default intermediate result storage. API > wise, > > it is an addition to the existing Table API change proposed in FLIP. > > > > > > > https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit# > > > > I'll update FLIP-36 wiki to reflect the new proposal. But we can probably > > use the Google Doc for discussion right now while I am updating the FLIP > > wiki. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <[hidden email]> wrote: > > > >> Thanks Piotr, for the +1 and all the patient discussion :) > >> > >> On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <[hidden email]> > >> wrote: > >> > >>> Hi Becket, > >>> > >>> Thank you for driving the effort and writing down the detailed > proposal. > >>> To me this FLIP looks good and it has +1 from me. > >>> > >>> Piotr Nowojski > >>> > >>> > On 12 Mar 2019, at 13:21, Becket Qin <[hidden email]> wrote: > >>> > > >>> > Hi folks, > >>> > > >>> > We would like to start the discussion thread about FLIP-36 support > >>> > interactive programming in Flink Table API. > >>> > > >>> > > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > >>> > > >>> > There has been an extended discussion[1] in the mailing list. To > quick > >>> > recap, we propose to add capability of caching intermediate results > in > >>> user > >>> > applications for later usage. > >>> > > >>> > Feedback and comments are welcome! > >>> > > >>> > Thanks, > >>> > > >>> > Jiangjie (Becket) Qin > >>> > > >>> > [1] > >>> > > >>> > http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@...%3E > >>> > >>> > |
Free forum by Nabble | Edit this page |