Dear community,
The "weekly" community update is back after a short summer break! This time I've tried to cover most of what happened during the last four weeks, but I might pick up some older topics in the next weeks' updates, too. Activity on the dev@ mailing list has picked up quite a bit as feature development & design for the next releases of Apache Flink and Apache Flink Stateful Functions is going at full steam. In detail: Flink Development ============== * [releases] [Flink 1.12] The work on Flink 1.12 is well underway with feature freeze planned for end of October [1]. Our release managers Robert & Dian are periodically reminding the developer community of current blockers to reduce time during release testing for this release [2]. * [releases] [Stateful Functions 2.2] Igal has started a discussion releasing Stateful Functions 2.2. soon (proposed feature freeze: September 10). The most notable feature is maybe the option to embed a stateful functions module in a DataStream program via DataStream Ingress/Egress. Checkout [3] for a full list of the planned features. * [releases] [Flink 1.10] Flink 1.10.2 was released. [4] * [apis] Besides the Stateful Functions API, Flink currently has three top-level APIs: DataStream (streaming), DataSet (batch) and TableAPI/SQL (unified). A major step towards the goal of a truly unified batch and stream processing engine is the unification of the DataStream/DataSet APIs. This is one of the main topics of the upcoming release(s), specifically: * Aljoscha has published FLIP-131 [5] proposing to deprecate and eventually drop the DataSet API. In order to still support the same breadth of use cases, we need to make sure that all its use cases are covered by the two remaining APIs: a unified DataStream API and the Table API. These changes are not part of FLIP-131 itself, but are covered in other FLIPs, which already exist (like FLIP-27 [6] or FLIP-129 [7]) or will be published over the next few weeks like FLIP-134 (see below). [8] * Most importantly, FLIP-134 [9] discusses how the DataStream API could be used to efficiently execute batch workloads in the future. In essence the FLIP proposes to introduce a BATCH and a STREAMING execution mode for DataStream programs. The STREAMING mode corresponds to the current behavior, while the BATCH mode adjusts the behavior in various areas to fit the requirements of batch processing, e.g. pipelined scheduling with region failover, blocking shuffles, no checkpointing, no watermarks, ... [10] * [apis] Time proposes FLIP-136 to improve the interoperability between the Data Stream and Table API. The FLIP covers the conversion between DataStream <-> Table (incl. cnangelong streams, watermarks, etc.) as well as more additional support for working with the Row type in the DataStream API. [11] * [datastream api] Dawid proposes to remove a set of deprecated methods from the DataStream API. [12] * [runtime] Yuan Mei has started a discussion on FLIP-135 to introduce task-local recovery. The FLIP is about the introduction of a new failover/recovery strategy for Flink Jobs, that trades consistency for availability. Specifically, in the case of approximate task-local recovery the failure of some tasks would not trigger a restart of the rest of the job, but in turn you can expect data loss or duplication. [13] * [python] Xingbo Huang proposes to extend the support of Pandas/vectorized functions from scalar functions to aggregate functions. For more details on Pandas support on PyFlink see the blog post linked below. [14] * [connectors] Aljoscha has started a discussion on dropping support for Kafka 0.10/0.11 in Flink 1.12+. [15] * [connectors] Robert has revived the discussion on adding support for Hbase 2.3.x. There is a consensus to add the HBase 2.x connector Apache Flink, but no consensus yet on whether to move the existing HBase 1.x from the Flink project to Apache Bahir, too. [16] <https://flink.apache.org/news/2020/08/25/release-1.10.2.html> [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Planning-Flink-1-12-tp43348.html [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html [3] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Next-Stateful-Functions-Release-tp44063.html [4] https://flink.apache.org/news/2020/08/25/release-1.10.2.html [5] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 [6] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface?src=contextnavpagetreemode [7] https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API [8] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-131-Consolidate-the-user-facing-Dataflow-SDKs-APIs-and-deprecate-the-DataSet-API-tp43521.html [9] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522 [10] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-134-DataStream-Semantics-for-Bounded-Input-tp43839p43965.html [11] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-136-Improve-interoperability-between-DataStream-and-Table-API-tp43993.html [12] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-methods-from-DataStream-API-tp43938.html [13] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html [14] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-137-Support-Pandas-UDAF-in-PyFlink-tp44060.html [15] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Kafka-0-10-x-connector-and-possibly-0-11-x-tp44087.html [16] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html flink-packages.org ============== Jark has recently published a set of Flink connectors (DataStream & Table API/SQL) that allow to ingest the changelog of MySQL and Postgres without additional tools like Kafka or Debezium. [17] [17] https://flink-packages.org/packages/cdc-connectors Notable Bugs ========== To be honest, I did not search through every bug ticket created over the last four weeks, only the last seven days, and I did not find anything particularly notable. So, I'll leave you without any bug reports this time. Events, Blog Posts, Misc =================== * David Anderson is now an Apache Flink committer. Congrats! [18] * There have been a couple blog posts on the Flink blog recently that highlight some of the features added in latest release: * PyFlink: The Integration of Pands into PyFlink [19] <https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html> * <https://flink.apache.org/news/2020/08/06/external-resource.html>Accelerating your workload with GPU and other external resources [20] * Monitoring and Controlling Networks of IoT Devices with Flink Stateful Functions [21] * The State of Flink on Docker [22] <https://flink.apache.org/news/2020/08/20/flink-docker.html> * The schedule for Flink Forward Global is live [23]. The event is free and you can already register under [24]. [18] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-David-Anderson-tp43814p43847.html [19] https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html [20] https://flink.apache.org/news/2020/08/06/external-resource.html [21] https://flink.apache.org/2020/08/19/statefun.html [22] https://flink.apache.org/news/2020/08/20/flink-docker.html [23] https://www.flink-forward.org/global-2020/conference-program [24] https://www.eventbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516#tickets Cheers, Konstantin -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk |
Thanks a lot for doing these updates!
On Tue, Aug 25, 2020 at 10:23 PM Konstantin Knauf <[hidden email]> wrote: > Dear community, > > The "weekly" community update is back after a short summer break! This time > I've tried to cover most of what happened during the last four weeks, but I > might pick up some older topics in the next weeks' updates, too. > > Activity on the dev@ mailing list has picked up quite a bit as feature > development & design for the next releases of Apache Flink and Apache Flink > Stateful Functions is going at full steam. In detail: > > Flink Development > ============== > > * [releases] [Flink 1.12] The work on Flink 1.12 is well underway with > feature freeze planned for end of October [1]. Our release managers Robert > & Dian are periodically reminding the developer community of current > blockers to reduce time during release testing for this release [2]. > > * [releases] [Stateful Functions 2.2] Igal has started a discussion > releasing Stateful Functions 2.2. soon (proposed feature freeze: > September 10). The most notable feature is maybe the option to embed a > stateful functions module in a DataStream program via DataStream > Ingress/Egress. Checkout [3] for a full list of the planned features. > > * [releases] [Flink 1.10] Flink 1.10.2 was released. [4] > > * [apis] Besides the Stateful Functions API, Flink currently has three > top-level APIs: DataStream (streaming), DataSet (batch) and TableAPI/SQL > (unified). A major step towards the goal of a truly unified batch and > stream processing engine is the unification of the DataStream/DataSet APIs. > This is one of the main topics of the upcoming release(s), specifically: > * Aljoscha has published FLIP-131 [5] proposing to deprecate and > eventually drop the DataSet API. In order to still support the same breadth > of use cases, we need to make sure that all its use cases are covered by > the two remaining APIs: a unified DataStream API and the Table API. These > changes are not part of FLIP-131 itself, but are covered in other FLIPs, > which already exist (like FLIP-27 [6] or FLIP-129 [7]) or will be published > over the next few weeks like FLIP-134 (see below). [8] > * Most importantly, FLIP-134 [9] discusses how the DataStream API could > be used to efficiently execute batch workloads in the future. In essence > the FLIP proposes to introduce a BATCH and a STREAMING execution mode for > DataStream programs. The STREAMING mode corresponds to the current > behavior, while the BATCH mode adjusts the behavior in various areas to fit > the requirements of batch processing, e.g. pipelined scheduling with region > failover, blocking shuffles, no checkpointing, no watermarks, ... [10] > > * [apis] Time proposes FLIP-136 to improve the interoperability between the > Data Stream and Table API. The FLIP covers the conversion between > DataStream <-> Table (incl. cnangelong streams, watermarks, etc.) as well > as more additional support for working with the Row type in the DataStream > API. [11] > > * [datastream api] Dawid proposes to remove a set of deprecated methods > from the DataStream API. [12] > > * [runtime] Yuan Mei has started a discussion on FLIP-135 to introduce > task-local recovery. The FLIP is about the introduction of a new > failover/recovery strategy for Flink Jobs, that trades consistency for > availability. Specifically, in the case of approximate task-local recovery > the failure of some tasks would not trigger a restart of the rest of the > job, but in turn you can expect data loss or duplication. [13] > > * [python] Xingbo Huang proposes to extend the support of Pandas/vectorized > functions from scalar functions to aggregate functions. For more details on > Pandas support on PyFlink see the blog post linked below. [14] > > * [connectors] Aljoscha has started a discussion on dropping support for > Kafka 0.10/0.11 in Flink 1.12+. [15] > > * [connectors] Robert has revived the discussion on adding support for > Hbase 2.3.x. There is a consensus to add the HBase 2.x connector Apache > Flink, but no consensus yet on whether to move the existing HBase 1.x from > the Flink project to Apache Bahir, too. [16] > <https://flink.apache.org/news/2020/08/25/release-1.10.2.html> > [1] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Planning-Flink-1-12-tp43348.html > [2] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html > [3] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Next-Stateful-Functions-Release-tp44063.html > [4] https://flink.apache.org/news/2020/08/25/release-1.10.2.html > [5] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > [6] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface?src=contextnavpagetreemode > > [7] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API > [8] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-131-Consolidate-the-user-facing-Dataflow-SDKs-APIs-and-deprecate-the-DataSet-API-tp43521.html > > [9] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522 > [10] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-134-DataStream-Semantics-for-Bounded-Input-tp43839p43965.html > [11] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-136-Improve-interoperability-between-DataStream-and-Table-API-tp43993.html > [12] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-methods-from-DataStream-API-tp43938.html > [13] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html > [14] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-137-Support-Pandas-UDAF-in-PyFlink-tp44060.html > [15] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Kafka-0-10-x-connector-and-possibly-0-11-x-tp44087.html > [16] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html > > flink-packages.org > ============== > > Jark has recently published a set of Flink connectors (DataStream & Table > API/SQL) that allow to ingest the changelog of MySQL and Postgres without > additional tools like Kafka or Debezium. [17] > > [17] https://flink-packages.org/packages/cdc-connectors > > Notable Bugs > ========== > > To be honest, I did not search through every bug ticket created over the > last four weeks, only the last seven days, and I did not find anything > particularly notable. So, I'll leave you without any bug reports this time. > > Events, Blog Posts, Misc > =================== > > * David Anderson is now an Apache Flink committer. Congrats! [18] > > * There have been a couple blog posts on the Flink blog recently that > highlight some of the features added in latest release: > * PyFlink: The Integration of Pands into PyFlink [19] > <https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html > > > * <https://flink.apache.org/news/2020/08/06/external-resource.html > >Accelerating > your workload with GPU and other external resources [20] > * Monitoring and Controlling Networks of IoT Devices with Flink > Stateful Functions [21] > * The State of Flink on Docker [22] > <https://flink.apache.org/news/2020/08/20/flink-docker.html> > > * The schedule for Flink Forward Global is live [23]. The event is free and > you can already register under [24]. > > [18] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-David-Anderson-tp43814p43847.html > [19] > https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html > [20] https://flink.apache.org/news/2020/08/06/external-resource.html > [21] https://flink.apache.org/2020/08/19/statefun.html > [22] https://flink.apache.org/news/2020/08/20/flink-docker.html > [23] https://www.flink-forward.org/global-2020/conference-program > [24] > > https://www.eventbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516#tickets > > Cheers, > > Konstantin > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk > |
Free forum by Nabble | Edit this page |