[DISCUSS] Features for Apache Flink 1.10

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Features for Apache Flink 1.10

Gary Yao-4
Hi community,

Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
start kicking off the discussion about what we want to achieve for the 1.10
release.

Based on discussions with various people as well as observations from
mailing
list threads, Yu Li and I have compiled a list of features that we deem
important to be included in the next release. Note that the features
presented
here are not meant to be exhaustive. As always, I am sure that there will be
other contributions that will make it into the next release. This email
thread
is merely to kick off a discussion, and to give users and contributors an
understanding where the focus of the next release lies. If there is anything
we have missed that somebody is working on, please reply to this thread.


** Proposed features and focus

Following the contribution of Blink to Apache Flink, the community released
a
preview of the Blink SQL Query Processor, which offers better SQL coverage
and
improved performance for batch queries, in Flink 1.9.0. However, the
integration of the Blink query processor is not fully completed yet as there
are still pending tasks, such as implementing full TPC-DS support. With the
next Flink release, we aim at finishing the Blink integration.

Furthermore, there are several ongoing work threads addressing long-standing
issues reported by users, such as improving checkpointing under
backpressure,
and limiting RocksDBs native memory usage, which can be especially
problematic
in containerized Flink deployments.

Notable features surrounding Flink’s ecosystem that are planned for the next
release include active Kubernetes support (i.e., enabling Flink’s
ResourceManager to launch new pods), improved Hive integration, Java 11
support, and new algorithms for the Flink ML library.

Below I have included the list of features that we compiled ordered by
priority – some of which already have ongoing mailing list threads, JIRAs,
or
FLIPs.

- Improving Flink’s build system & CI [1] [2]
- Support Java 11 [3]
- Table API improvements
    - Configuration Evolution [4] [5]
    - Finish type system: Expression Re-design [6] and UDF refactor
    - Streaming DDL: Time attribute (watermark) and Changelog support
    - Full SQL partition support for both batch & streaming [7]
    - New Java Expression DSL [8]
    - SQL CLI with DDL and DML support
- Hive compatibility completion (DDL/UDF) to support full Hive integration
    - Partition/Function/View support
- Remaining Blink planner/runtime merge
    - Support all TPC-DS queries [9]
- Finer grained resource management
    - Unified TaskExecutor Memory Configuration [10]
    - Fine Grained Operator Resource Management [11]
    - Dynamic Slots Allocation [12]
- Finish scheduler re-architecture [13]
    - Allows implementing more sophisticated scheduling strategies such as
better batch scheduler or speculative execution.
- New DataStream Source Interface [14]
    - A new source connector architecture to unify the implementation of
source connectors and make it simpler to implement custom source connectors.
- Add more source/system metrics
    - For better flink job monitoring and facilitate customized solutions
like auto-scaling.
- Executor Interface / Client API [15]
    - Allow Flink downstream projects to easier and better monitor and
control flink jobs.
- Interactive Programming [16]
    - Allow users to cache the intermediate results in Table API for later
usage to avoid redundant computation when a Flink application contains
multiple jobs.
- Python User Defined Function [17]
    - Support native user-defined functions in Flink Python, including
UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
- Spillable heap backend [18]
    - A new state backend supporting automatic data spill and load when
memory exhausted/regained.
- RocksDB backend memory control [19]
    - Prevent excessive memory usage from RocksDB, especially in container
environment.
- Unaligned checkpoints [20]
    - Resolve the checkpoint timeout issue under backpressure.
- Separate framework and user class loader in per-job mode
- Active Kubernetes Integration [21]
    - Allow ResourceManager talking to Kubernetes to launch new pods
similar to Flink's Yarn/Mesos integration
- ML pipeline/library
    - Aims at delivering several core algorithms, including Logistic
Regression, Native Bayes, Random Forest, KMeans, etc.
- Add vertex subtask log url on WebUI [22]


** Suggested release timeline

Based on our usual time-based release schedule [23], and considering that
several events, such as Flink Forward Europe and Asia, are overlapping with
the current release cycle, we should aim at releasing 1.10 around the
beginning of January 2020. To give the community enough testing time, I
propose the feature freeze to be at the end of November. We should announce
an
exact date later in the release cycle.

Lastly, I would like to use the opportunity to propose Yu Li and myself as
release managers for the upcoming release.

What do you think?


Best,
Gary

[1]
https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
[3] https://issues.apache.org/jira/browse/FLINK-10725
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
[7]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
[8]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
[9] https://issues.apache.org/jira/browse/FLINK-11491
[10]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[11]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
[12]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
[13] https://issues.apache.org/jira/browse/FLINK-10429
[14]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[15]
https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
[16]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
[17]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
[18]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
[19] https://issues.apache.org/jira/browse/FLINK-7289
[20]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
[21]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
[22] https://issues.apache.org/jira/browse/FLINK-13894
[23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Kostas Kloudas-4
Hi Gary,

Thanks for kicking off the feature discussion.

+1 for Gary and Yu as release managers.

Cheers,
Kostas

On Fri, Sep 6, 2019 at 5:06 PM Gary Yao <[hidden email]> wrote:

>
> Hi community,
>
> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
> start kicking off the discussion about what we want to achieve for the 1.10
> release.
>
> Based on discussions with various people as well as observations from
> mailing
> list threads, Yu Li and I have compiled a list of features that we deem
> important to be included in the next release. Note that the features
> presented
> here are not meant to be exhaustive. As always, I am sure that there will be
> other contributions that will make it into the next release. This email
> thread
> is merely to kick off a discussion, and to give users and contributors an
> understanding where the focus of the next release lies. If there is anything
> we have missed that somebody is working on, please reply to this thread.
>
>
> ** Proposed features and focus
>
> Following the contribution of Blink to Apache Flink, the community released
> a
> preview of the Blink SQL Query Processor, which offers better SQL coverage
> and
> improved performance for batch queries, in Flink 1.9.0. However, the
> integration of the Blink query processor is not fully completed yet as there
> are still pending tasks, such as implementing full TPC-DS support. With the
> next Flink release, we aim at finishing the Blink integration.
>
> Furthermore, there are several ongoing work threads addressing long-standing
> issues reported by users, such as improving checkpointing under
> backpressure,
> and limiting RocksDBs native memory usage, which can be especially
> problematic
> in containerized Flink deployments.
>
> Notable features surrounding Flink’s ecosystem that are planned for the next
> release include active Kubernetes support (i.e., enabling Flink’s
> ResourceManager to launch new pods), improved Hive integration, Java 11
> support, and new algorithms for the Flink ML library.
>
> Below I have included the list of features that we compiled ordered by
> priority – some of which already have ongoing mailing list threads, JIRAs,
> or
> FLIPs.
>
> - Improving Flink’s build system & CI [1] [2]
> - Support Java 11 [3]
> - Table API improvements
>     - Configuration Evolution [4] [5]
>     - Finish type system: Expression Re-design [6] and UDF refactor
>     - Streaming DDL: Time attribute (watermark) and Changelog support
>     - Full SQL partition support for both batch & streaming [7]
>     - New Java Expression DSL [8]
>     - SQL CLI with DDL and DML support
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>     - Partition/Function/View support
> - Remaining Blink planner/runtime merge
>     - Support all TPC-DS queries [9]
> - Finer grained resource management
>     - Unified TaskExecutor Memory Configuration [10]
>     - Fine Grained Operator Resource Management [11]
>     - Dynamic Slots Allocation [12]
> - Finish scheduler re-architecture [13]
>     - Allows implementing more sophisticated scheduling strategies such as
> better batch scheduler or speculative execution.
> - New DataStream Source Interface [14]
>     - A new source connector architecture to unify the implementation of
> source connectors and make it simpler to implement custom source connectors.
> - Add more source/system metrics
>     - For better flink job monitoring and facilitate customized solutions
> like auto-scaling.
> - Executor Interface / Client API [15]
>     - Allow Flink downstream projects to easier and better monitor and
> control flink jobs.
> - Interactive Programming [16]
>     - Allow users to cache the intermediate results in Table API for later
> usage to avoid redundant computation when a Flink application contains
> multiple jobs.
> - Python User Defined Function [17]
>     - Support native user-defined functions in Flink Python, including
> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> - Spillable heap backend [18]
>     - A new state backend supporting automatic data spill and load when
> memory exhausted/regained.
> - RocksDB backend memory control [19]
>     - Prevent excessive memory usage from RocksDB, especially in container
> environment.
> - Unaligned checkpoints [20]
>     - Resolve the checkpoint timeout issue under backpressure.
> - Separate framework and user class loader in per-job mode
> - Active Kubernetes Integration [21]
>     - Allow ResourceManager talking to Kubernetes to launch new pods
> similar to Flink's Yarn/Mesos integration
> - ML pipeline/library
>     - Aims at delivering several core algorithms, including Logistic
> Regression, Native Bayes, Random Forest, KMeans, etc.
> - Add vertex subtask log url on WebUI [22]
>
>
> ** Suggested release timeline
>
> Based on our usual time-based release schedule [23], and considering that
> several events, such as Flink Forward Europe and Asia, are overlapping with
> the current release cycle, we should aim at releasing 1.10 around the
> beginning of January 2020. To give the community enough testing time, I
> propose the feature freeze to be at the end of November. We should announce
> an
> exact date later in the release cycle.
>
> Lastly, I would like to use the opportunity to propose Yu Li and myself as
> release managers for the upcoming release.
>
> What do you think?
>
>
> Best,
> Gary
>
> [1]
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> [3] https://issues.apache.org/jira/browse/FLINK-10725
> [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> [5]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> [6]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> [7]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> [9] https://issues.apache.org/jira/browse/FLINK-11491
> [10]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> [11]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> [12]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> [13] https://issues.apache.org/jira/browse/FLINK-10429
> [14]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> [15]
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> [16]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> [17]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [18]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> [19] https://issues.apache.org/jira/browse/FLINK-7289
> [20]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> [21]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> [22] https://issues.apache.org/jira/browse/FLINK-13894
> [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Zhijiang(wangzhijiang999)
In reply to this post by Gary Yao-4
Hi Gary,

Thanks for kicking off the features for next release 1.10.  I am very supportive of you and Yu Li to be the relaese managers.

Just mention another two improvements which want to be covered in FLINK-1.10 and I already confirmed with Piotr to reach an agreement before.

1. Data serialize and copy only once for broadcast partition [1]: It would improve the throughput performance greatly in broadcast mode and was actually proposed in Flink-1.8. Most of works already done before and only left the last critical jira/PR. It will not take much efforts to make it ready.

2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could avoid memory copy from netty stack to flink managed network buffer. The obvious benefit is decreasing the direct memory overhead greatly in large-scale jobs. I also heard of some user cases encounter direct OOM caused by netty memory overhead. Actually this improvment was proposed by nico in FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR half an year ago but have not been reviewed yet. I could help review the deign and PR codes to make it ready.

And you could make these two items as lowest priority if possible.

[1] https://issues.apache.org/jira/browse/FLINK-10745
[2] https://issues.apache.org/jira/browse/FLINK-10742

Best,
Zhijiang
------------------------------------------------------------------
From:Gary Yao <[hidden email]>
Send Time:2019年9月6日(星期五) 17:06
To:dev <[hidden email]>
Cc:carp84 <[hidden email]>
Subject:[DISCUSS] Features for Apache Flink 1.10

Hi community,

Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
start kicking off the discussion about what we want to achieve for the 1.10
release.

Based on discussions with various people as well as observations from
mailing
list threads, Yu Li and I have compiled a list of features that we deem
important to be included in the next release. Note that the features
presented
here are not meant to be exhaustive. As always, I am sure that there will be
other contributions that will make it into the next release. This email
thread
is merely to kick off a discussion, and to give users and contributors an
understanding where the focus of the next release lies. If there is anything
we have missed that somebody is working on, please reply to this thread.


** Proposed features and focus

Following the contribution of Blink to Apache Flink, the community released
a
preview of the Blink SQL Query Processor, which offers better SQL coverage
and
improved performance for batch queries, in Flink 1.9.0. However, the
integration of the Blink query processor is not fully completed yet as there
are still pending tasks, such as implementing full TPC-DS support. With the
next Flink release, we aim at finishing the Blink integration.

Furthermore, there are several ongoing work threads addressing long-standing
issues reported by users, such as improving checkpointing under
backpressure,
and limiting RocksDBs native memory usage, which can be especially
problematic
in containerized Flink deployments.

Notable features surrounding Flink’s ecosystem that are planned for the next
release include active Kubernetes support (i.e., enabling Flink’s
ResourceManager to launch new pods), improved Hive integration, Java 11
support, and new algorithms for the Flink ML library.

Below I have included the list of features that we compiled ordered by
priority – some of which already have ongoing mailing list threads, JIRAs,
or
FLIPs.

- Improving Flink’s build system & CI [1] [2]
- Support Java 11 [3]
- Table API improvements
    - Configuration Evolution [4] [5]
    - Finish type system: Expression Re-design [6] and UDF refactor
    - Streaming DDL: Time attribute (watermark) and Changelog support
    - Full SQL partition support for both batch & streaming [7]
    - New Java Expression DSL [8]
    - SQL CLI with DDL and DML support
- Hive compatibility completion (DDL/UDF) to support full Hive integration
    - Partition/Function/View support
- Remaining Blink planner/runtime merge
    - Support all TPC-DS queries [9]
- Finer grained resource management
    - Unified TaskExecutor Memory Configuration [10]
    - Fine Grained Operator Resource Management [11]
    - Dynamic Slots Allocation [12]
- Finish scheduler re-architecture [13]
    - Allows implementing more sophisticated scheduling strategies such as
better batch scheduler or speculative execution.
- New DataStream Source Interface [14]
    - A new source connector architecture to unify the implementation of
source connectors and make it simpler to implement custom source connectors.
- Add more source/system metrics
    - For better flink job monitoring and facilitate customized solutions
like auto-scaling.
- Executor Interface / Client API [15]
    - Allow Flink downstream projects to easier and better monitor and
control flink jobs.
- Interactive Programming [16]
    - Allow users to cache the intermediate results in Table API for later
usage to avoid redundant computation when a Flink application contains
multiple jobs.
- Python User Defined Function [17]
    - Support native user-defined functions in Flink Python, including
UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
- Spillable heap backend [18]
    - A new state backend supporting automatic data spill and load when
memory exhausted/regained.
- RocksDB backend memory control [19]
    - Prevent excessive memory usage from RocksDB, especially in container
environment.
- Unaligned checkpoints [20]
    - Resolve the checkpoint timeout issue under backpressure.
- Separate framework and user class loader in per-job mode
- Active Kubernetes Integration [21]
    - Allow ResourceManager talking to Kubernetes to launch new pods
similar to Flink's Yarn/Mesos integration
- ML pipeline/library
    - Aims at delivering several core algorithms, including Logistic
Regression, Native Bayes, Random Forest, KMeans, etc.
- Add vertex subtask log url on WebUI [22]


** Suggested release timeline

Based on our usual time-based release schedule [23], and considering that
several events, such as Flink Forward Europe and Asia, are overlapping with
the current release cycle, we should aim at releasing 1.10 around the
beginning of January 2020. To give the community enough testing time, I
propose the feature freeze to be at the end of November. We should announce
an
exact date later in the release cycle.

Lastly, I would like to use the opportunity to propose Yu Li and myself as
release managers for the upcoming release.

What do you think?


Best,
Gary

[1]
https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
[3] https://issues.apache.org/jira/browse/FLINK-10725
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
[7]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
[8]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
[9] https://issues.apache.org/jira/browse/FLINK-11491
[10]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[11]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
[12]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
[13] https://issues.apache.org/jira/browse/FLINK-10429
[14]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[15]
https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
[16]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
[17]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
[18]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
[19] https://issues.apache.org/jira/browse/FLINK-7289
[20]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
[21]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
[22] https://issues.apache.org/jira/browse/FLINK-13894
[23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Jark Wu-2
Thanks Gary for kicking off the discussion for 1.10 release.

+1 for Gary and Yu as release managers. Thank you for you effort.

Best,
Jark


> 在 2019年9月7日,00:52,zhijiang <[hidden email]> 写道:
>
> Hi Gary,
>
> Thanks for kicking off the features for next release 1.10.  I am very supportive of you and Yu Li to be the relaese managers.
>
> Just mention another two improvements which want to be covered in FLINK-1.10 and I already confirmed with Piotr to reach an agreement before.
>
> 1. Data serialize and copy only once for broadcast partition [1]: It would improve the throughput performance greatly in broadcast mode and was actually proposed in Flink-1.8. Most of works already done before and only left the last critical jira/PR. It will not take much efforts to make it ready.
>
> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could avoid memory copy from netty stack to flink managed network buffer. The obvious benefit is decreasing the direct memory overhead greatly in large-scale jobs. I also heard of some user cases encounter direct OOM caused by netty memory overhead. Actually this improvment was proposed by nico in FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR half an year ago but have not been reviewed yet. I could help review the deign and PR codes to make it ready.
>
> And you could make these two items as lowest priority if possible.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10745
> [2] https://issues.apache.org/jira/browse/FLINK-10742
>
> Best,
> Zhijiang
> ------------------------------------------------------------------
> From:Gary Yao <[hidden email]>
> Send Time:2019年9月6日(星期五) 17:06
> To:dev <[hidden email]>
> Cc:carp84 <[hidden email]>
> Subject:[DISCUSS] Features for Apache Flink 1.10
>
> Hi community,
>
> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
> start kicking off the discussion about what we want to achieve for the 1.10
> release.
>
> Based on discussions with various people as well as observations from
> mailing
> list threads, Yu Li and I have compiled a list of features that we deem
> important to be included in the next release. Note that the features
> presented
> here are not meant to be exhaustive. As always, I am sure that there will be
> other contributions that will make it into the next release. This email
> thread
> is merely to kick off a discussion, and to give users and contributors an
> understanding where the focus of the next release lies. If there is anything
> we have missed that somebody is working on, please reply to this thread.
>
>
> ** Proposed features and focus
>
> Following the contribution of Blink to Apache Flink, the community released
> a
> preview of the Blink SQL Query Processor, which offers better SQL coverage
> and
> improved performance for batch queries, in Flink 1.9.0. However, the
> integration of the Blink query processor is not fully completed yet as there
> are still pending tasks, such as implementing full TPC-DS support. With the
> next Flink release, we aim at finishing the Blink integration.
>
> Furthermore, there are several ongoing work threads addressing long-standing
> issues reported by users, such as improving checkpointing under
> backpressure,
> and limiting RocksDBs native memory usage, which can be especially
> problematic
> in containerized Flink deployments.
>
> Notable features surrounding Flink’s ecosystem that are planned for the next
> release include active Kubernetes support (i.e., enabling Flink’s
> ResourceManager to launch new pods), improved Hive integration, Java 11
> support, and new algorithms for the Flink ML library.
>
> Below I have included the list of features that we compiled ordered by
> priority – some of which already have ongoing mailing list threads, JIRAs,
> or
> FLIPs.
>
> - Improving Flink’s build system & CI [1] [2]
> - Support Java 11 [3]
> - Table API improvements
>    - Configuration Evolution [4] [5]
>    - Finish type system: Expression Re-design [6] and UDF refactor
>    - Streaming DDL: Time attribute (watermark) and Changelog support
>    - Full SQL partition support for both batch & streaming [7]
>    - New Java Expression DSL [8]
>    - SQL CLI with DDL and DML support
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>    - Partition/Function/View support
> - Remaining Blink planner/runtime merge
>    - Support all TPC-DS queries [9]
> - Finer grained resource management
>    - Unified TaskExecutor Memory Configuration [10]
>    - Fine Grained Operator Resource Management [11]
>    - Dynamic Slots Allocation [12]
> - Finish scheduler re-architecture [13]
>    - Allows implementing more sophisticated scheduling strategies such as
> better batch scheduler or speculative execution.
> - New DataStream Source Interface [14]
>    - A new source connector architecture to unify the implementation of
> source connectors and make it simpler to implement custom source connectors.
> - Add more source/system metrics
>    - For better flink job monitoring and facilitate customized solutions
> like auto-scaling.
> - Executor Interface / Client API [15]
>    - Allow Flink downstream projects to easier and better monitor and
> control flink jobs.
> - Interactive Programming [16]
>    - Allow users to cache the intermediate results in Table API for later
> usage to avoid redundant computation when a Flink application contains
> multiple jobs.
> - Python User Defined Function [17]
>    - Support native user-defined functions in Flink Python, including
> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> - Spillable heap backend [18]
>    - A new state backend supporting automatic data spill and load when
> memory exhausted/regained.
> - RocksDB backend memory control [19]
>    - Prevent excessive memory usage from RocksDB, especially in container
> environment.
> - Unaligned checkpoints [20]
>    - Resolve the checkpoint timeout issue under backpressure.
> - Separate framework and user class loader in per-job mode
> - Active Kubernetes Integration [21]
>    - Allow ResourceManager talking to Kubernetes to launch new pods
> similar to Flink's Yarn/Mesos integration
> - ML pipeline/library
>    - Aims at delivering several core algorithms, including Logistic
> Regression, Native Bayes, Random Forest, KMeans, etc.
> - Add vertex subtask log url on WebUI [22]
>
>
> ** Suggested release timeline
>
> Based on our usual time-based release schedule [23], and considering that
> several events, such as Flink Forward Europe and Asia, are overlapping with
> the current release cycle, we should aim at releasing 1.10 around the
> beginning of January 2020. To give the community enough testing time, I
> propose the feature freeze to be at the end of November. We should announce
> an
> exact date later in the release cycle.
>
> Lastly, I would like to use the opportunity to propose Yu Li and myself as
> release managers for the upcoming release.
>
> What do you think?
>
>
> Best,
> Gary
>
> [1]
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> [3] https://issues.apache.org/jira/browse/FLINK-10725
> [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> [5]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> [6]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> [7]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> [9] https://issues.apache.org/jira/browse/FLINK-11491
> [10]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> [11]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> [12]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> [13] https://issues.apache.org/jira/browse/FLINK-10429
> [14]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> [15]
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> [16]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> [17]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [18]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> [19] https://issues.apache.org/jira/browse/FLINK-7289
> [20]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> [21]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> [22] https://issues.apache.org/jira/browse/FLINK-13894
> [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Dian Fu-2
Hi Gary,

Thanks for kicking off the release schedule of 1.10. +1 for you and Yu Li as the release manager.

The feature freeze/release time sounds reasonable.

Thanks,
Dian

> 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
>
> Thanks Gary for kicking off the discussion for 1.10 release.
>
> +1 for Gary and Yu as release managers. Thank you for you effort.
>
> Best,
> Jark
>
>
>> 在 2019年9月7日,00:52,zhijiang <[hidden email]> 写道:
>>
>> Hi Gary,
>>
>> Thanks for kicking off the features for next release 1.10.  I am very supportive of you and Yu Li to be the relaese managers.
>>
>> Just mention another two improvements which want to be covered in FLINK-1.10 and I already confirmed with Piotr to reach an agreement before.
>>
>> 1. Data serialize and copy only once for broadcast partition [1]: It would improve the throughput performance greatly in broadcast mode and was actually proposed in Flink-1.8. Most of works already done before and only left the last critical jira/PR. It will not take much efforts to make it ready.
>>
>> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could avoid memory copy from netty stack to flink managed network buffer. The obvious benefit is decreasing the direct memory overhead greatly in large-scale jobs. I also heard of some user cases encounter direct OOM caused by netty memory overhead. Actually this improvment was proposed by nico in FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR half an year ago but have not been reviewed yet. I could help review the deign and PR codes to make it ready.
>>
>> And you could make these two items as lowest priority if possible.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-10745
>> [2] https://issues.apache.org/jira/browse/FLINK-10742
>>
>> Best,
>> Zhijiang
>> ------------------------------------------------------------------
>> From:Gary Yao <[hidden email]>
>> Send Time:2019年9月6日(星期五) 17:06
>> To:dev <[hidden email]>
>> Cc:carp84 <[hidden email]>
>> Subject:[DISCUSS] Features for Apache Flink 1.10
>>
>> Hi community,
>>
>> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
>> start kicking off the discussion about what we want to achieve for the 1.10
>> release.
>>
>> Based on discussions with various people as well as observations from
>> mailing
>> list threads, Yu Li and I have compiled a list of features that we deem
>> important to be included in the next release. Note that the features
>> presented
>> here are not meant to be exhaustive. As always, I am sure that there will be
>> other contributions that will make it into the next release. This email
>> thread
>> is merely to kick off a discussion, and to give users and contributors an
>> understanding where the focus of the next release lies. If there is anything
>> we have missed that somebody is working on, please reply to this thread.
>>
>>
>> ** Proposed features and focus
>>
>> Following the contribution of Blink to Apache Flink, the community released
>> a
>> preview of the Blink SQL Query Processor, which offers better SQL coverage
>> and
>> improved performance for batch queries, in Flink 1.9.0. However, the
>> integration of the Blink query processor is not fully completed yet as there
>> are still pending tasks, such as implementing full TPC-DS support. With the
>> next Flink release, we aim at finishing the Blink integration.
>>
>> Furthermore, there are several ongoing work threads addressing long-standing
>> issues reported by users, such as improving checkpointing under
>> backpressure,
>> and limiting RocksDBs native memory usage, which can be especially
>> problematic
>> in containerized Flink deployments.
>>
>> Notable features surrounding Flink’s ecosystem that are planned for the next
>> release include active Kubernetes support (i.e., enabling Flink’s
>> ResourceManager to launch new pods), improved Hive integration, Java 11
>> support, and new algorithms for the Flink ML library.
>>
>> Below I have included the list of features that we compiled ordered by
>> priority – some of which already have ongoing mailing list threads, JIRAs,
>> or
>> FLIPs.
>>
>> - Improving Flink’s build system & CI [1] [2]
>> - Support Java 11 [3]
>> - Table API improvements
>>   - Configuration Evolution [4] [5]
>>   - Finish type system: Expression Re-design [6] and UDF refactor
>>   - Streaming DDL: Time attribute (watermark) and Changelog support
>>   - Full SQL partition support for both batch & streaming [7]
>>   - New Java Expression DSL [8]
>>   - SQL CLI with DDL and DML support
>> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>>   - Partition/Function/View support
>> - Remaining Blink planner/runtime merge
>>   - Support all TPC-DS queries [9]
>> - Finer grained resource management
>>   - Unified TaskExecutor Memory Configuration [10]
>>   - Fine Grained Operator Resource Management [11]
>>   - Dynamic Slots Allocation [12]
>> - Finish scheduler re-architecture [13]
>>   - Allows implementing more sophisticated scheduling strategies such as
>> better batch scheduler or speculative execution.
>> - New DataStream Source Interface [14]
>>   - A new source connector architecture to unify the implementation of
>> source connectors and make it simpler to implement custom source connectors.
>> - Add more source/system metrics
>>   - For better flink job monitoring and facilitate customized solutions
>> like auto-scaling.
>> - Executor Interface / Client API [15]
>>   - Allow Flink downstream projects to easier and better monitor and
>> control flink jobs.
>> - Interactive Programming [16]
>>   - Allow users to cache the intermediate results in Table API for later
>> usage to avoid redundant computation when a Flink application contains
>> multiple jobs.
>> - Python User Defined Function [17]
>>   - Support native user-defined functions in Flink Python, including
>> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
>> - Spillable heap backend [18]
>>   - A new state backend supporting automatic data spill and load when
>> memory exhausted/regained.
>> - RocksDB backend memory control [19]
>>   - Prevent excessive memory usage from RocksDB, especially in container
>> environment.
>> - Unaligned checkpoints [20]
>>   - Resolve the checkpoint timeout issue under backpressure.
>> - Separate framework and user class loader in per-job mode
>> - Active Kubernetes Integration [21]
>>   - Allow ResourceManager talking to Kubernetes to launch new pods
>> similar to Flink's Yarn/Mesos integration
>> - ML pipeline/library
>>   - Aims at delivering several core algorithms, including Logistic
>> Regression, Native Bayes, Random Forest, KMeans, etc.
>> - Add vertex subtask log url on WebUI [22]
>>
>>
>> ** Suggested release timeline
>>
>> Based on our usual time-based release schedule [23], and considering that
>> several events, such as Flink Forward Europe and Asia, are overlapping with
>> the current release cycle, we should aim at releasing 1.10 around the
>> beginning of January 2020. To give the community enough testing time, I
>> propose the feature freeze to be at the end of November. We should announce
>> an
>> exact date later in the release cycle.
>>
>> Lastly, I would like to use the opportunity to propose Yu Li and myself as
>> release managers for the upcoming release.
>>
>> What do you think?
>>
>>
>> Best,
>> Gary
>>
>> [1]
>> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
>> [2]
>> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
>> [3] https://issues.apache.org/jira/browse/FLINK-10725
>> [4]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>> [5]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>> [6]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
>> [7]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
>> [8]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>> [9] https://issues.apache.org/jira/browse/FLINK-11491
>> [10]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>> [11]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> [12]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> [13] https://issues.apache.org/jira/browse/FLINK-10429
>> [14]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>> [15]
>> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
>> [16]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> [17]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> [18]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
>> [19] https://issues.apache.org/jira/browse/FLINK-7289
>> [20]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
>> [21]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
>> [22] https://issues.apache.org/jira/browse/FLINK-13894
>> [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Zhu Zhu
Thanks Gary for kicking off this discussion.
Really appreciate that you and Yu offer to help to manage 1.10 release.

+1 for Gary and Yu as release managers.

Thanks,
Zhu Zhu

Dian Fu <[hidden email]> 于2019年9月7日周六 下午12:26写道:

> Hi Gary,
>
> Thanks for kicking off the release schedule of 1.10. +1 for you and Yu Li
> as the release manager.
>
> The feature freeze/release time sounds reasonable.
>
> Thanks,
> Dian
>
> > 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
> >
> > Thanks Gary for kicking off the discussion for 1.10 release.
> >
> > +1 for Gary and Yu as release managers. Thank you for you effort.
> >
> > Best,
> > Jark
> >
> >
> >> 在 2019年9月7日,00:52,zhijiang <[hidden email]> 写道:
> >>
> >> Hi Gary,
> >>
> >> Thanks for kicking off the features for next release 1.10.  I am very
> supportive of you and Yu Li to be the relaese managers.
> >>
> >> Just mention another two improvements which want to be covered in
> FLINK-1.10 and I already confirmed with Piotr to reach an agreement before.
> >>
> >> 1. Data serialize and copy only once for broadcast partition [1]: It
> would improve the throughput performance greatly in broadcast mode and was
> actually proposed in Flink-1.8. Most of works already done before and only
> left the last critical jira/PR. It will not take much efforts to make it
> ready.
> >>
> >> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It
> could avoid memory copy from netty stack to flink managed network buffer.
> The obvious benefit is decreasing the direct memory overhead greatly in
> large-scale jobs. I also heard of some user cases encounter direct OOM
> caused by netty memory overhead. Actually this improvment was proposed by
> nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> submitted a PR half an year ago but have not been reviewed yet. I could
> help review the deign and PR codes to make it ready.
> >>
> >> And you could make these two items as lowest priority if possible.
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> >>
> >> Best,
> >> Zhijiang
> >> ------------------------------------------------------------------
> >> From:Gary Yao <[hidden email]>
> >> Send Time:2019年9月6日(星期五) 17:06
> >> To:dev <[hidden email]>
> >> Cc:carp84 <[hidden email]>
> >> Subject:[DISCUSS] Features for Apache Flink 1.10
> >>
> >> Hi community,
> >>
> >> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I
> want to
> >> start kicking off the discussion about what we want to achieve for the
> 1.10
> >> release.
> >>
> >> Based on discussions with various people as well as observations from
> >> mailing
> >> list threads, Yu Li and I have compiled a list of features that we deem
> >> important to be included in the next release. Note that the features
> >> presented
> >> here are not meant to be exhaustive. As always, I am sure that there
> will be
> >> other contributions that will make it into the next release. This email
> >> thread
> >> is merely to kick off a discussion, and to give users and contributors
> an
> >> understanding where the focus of the next release lies. If there is
> anything
> >> we have missed that somebody is working on, please reply to this thread.
> >>
> >>
> >> ** Proposed features and focus
> >>
> >> Following the contribution of Blink to Apache Flink, the community
> released
> >> a
> >> preview of the Blink SQL Query Processor, which offers better SQL
> coverage
> >> and
> >> improved performance for batch queries, in Flink 1.9.0. However, the
> >> integration of the Blink query processor is not fully completed yet as
> there
> >> are still pending tasks, such as implementing full TPC-DS support. With
> the
> >> next Flink release, we aim at finishing the Blink integration.
> >>
> >> Furthermore, there are several ongoing work threads addressing
> long-standing
> >> issues reported by users, such as improving checkpointing under
> >> backpressure,
> >> and limiting RocksDBs native memory usage, which can be especially
> >> problematic
> >> in containerized Flink deployments.
> >>
> >> Notable features surrounding Flink’s ecosystem that are planned for the
> next
> >> release include active Kubernetes support (i.e., enabling Flink’s
> >> ResourceManager to launch new pods), improved Hive integration, Java 11
> >> support, and new algorithms for the Flink ML library.
> >>
> >> Below I have included the list of features that we compiled ordered by
> >> priority – some of which already have ongoing mailing list threads,
> JIRAs,
> >> or
> >> FLIPs.
> >>
> >> - Improving Flink’s build system & CI [1] [2]
> >> - Support Java 11 [3]
> >> - Table API improvements
> >>   - Configuration Evolution [4] [5]
> >>   - Finish type system: Expression Re-design [6] and UDF refactor
> >>   - Streaming DDL: Time attribute (watermark) and Changelog support
> >>   - Full SQL partition support for both batch & streaming [7]
> >>   - New Java Expression DSL [8]
> >>   - SQL CLI with DDL and DML support
> >> - Hive compatibility completion (DDL/UDF) to support full Hive
> integration
> >>   - Partition/Function/View support
> >> - Remaining Blink planner/runtime merge
> >>   - Support all TPC-DS queries [9]
> >> - Finer grained resource management
> >>   - Unified TaskExecutor Memory Configuration [10]
> >>   - Fine Grained Operator Resource Management [11]
> >>   - Dynamic Slots Allocation [12]
> >> - Finish scheduler re-architecture [13]
> >>   - Allows implementing more sophisticated scheduling strategies such as
> >> better batch scheduler or speculative execution.
> >> - New DataStream Source Interface [14]
> >>   - A new source connector architecture to unify the implementation of
> >> source connectors and make it simpler to implement custom source
> connectors.
> >> - Add more source/system metrics
> >>   - For better flink job monitoring and facilitate customized solutions
> >> like auto-scaling.
> >> - Executor Interface / Client API [15]
> >>   - Allow Flink downstream projects to easier and better monitor and
> >> control flink jobs.
> >> - Interactive Programming [16]
> >>   - Allow users to cache the intermediate results in Table API for later
> >> usage to avoid redundant computation when a Flink application contains
> >> multiple jobs.
> >> - Python User Defined Function [17]
> >>   - Support native user-defined functions in Flink Python, including
> >> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> >> - Spillable heap backend [18]
> >>   - A new state backend supporting automatic data spill and load when
> >> memory exhausted/regained.
> >> - RocksDB backend memory control [19]
> >>   - Prevent excessive memory usage from RocksDB, especially in container
> >> environment.
> >> - Unaligned checkpoints [20]
> >>   - Resolve the checkpoint timeout issue under backpressure.
> >> - Separate framework and user class loader in per-job mode
> >> - Active Kubernetes Integration [21]
> >>   - Allow ResourceManager talking to Kubernetes to launch new pods
> >> similar to Flink's Yarn/Mesos integration
> >> - ML pipeline/library
> >>   - Aims at delivering several core algorithms, including Logistic
> >> Regression, Native Bayes, Random Forest, KMeans, etc.
> >> - Add vertex subtask log url on WebUI [22]
> >>
> >>
> >> ** Suggested release timeline
> >>
> >> Based on our usual time-based release schedule [23], and considering
> that
> >> several events, such as Flink Forward Europe and Asia, are overlapping
> with
> >> the current release cycle, we should aim at releasing 1.10 around the
> >> beginning of January 2020. To give the community enough testing time, I
> >> propose the feature freeze to be at the end of November. We should
> announce
> >> an
> >> exact date later in the release cycle.
> >>
> >> Lastly, I would like to use the opportunity to propose Yu Li and myself
> as
> >> release managers for the upcoming release.
> >>
> >> What do you think?
> >>
> >>
> >> Best,
> >> Gary
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> >> [2]
> >>
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> >> [3] https://issues.apache.org/jira/browse/FLINK-10725
> >> [4]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> >> [5]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> >> [6]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> >> [7]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> >> [8]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> >> [9] https://issues.apache.org/jira/browse/FLINK-11491
> >> [10]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >> [11]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >> [12]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >> [13] https://issues.apache.org/jira/browse/FLINK-10429
> >> [14]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >> [15]
> >>
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> >> [16]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >> [17]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >> [18]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> >> [19] https://issues.apache.org/jira/browse/FLINK-7289
> >> [20]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> >> [21]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> >> [22] https://issues.apache.org/jira/browse/FLINK-13894
> >> [23]
> https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Till Rohrmann
Thanks for compiling the list of 1.10 efforts for the community Gary. I
think this helps a lot to better understand what the community is currently
working on.

Thanks for volunteering as the release managers for the next major
release. +1 for Gary and Yu being the RMs for Flink 1.10.

Cheers,
Till

On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:

> Thanks Gary for kicking off this discussion.
> Really appreciate that you and Yu offer to help to manage 1.10 release.
>
> +1 for Gary and Yu as release managers.
>
> Thanks,
> Zhu Zhu
>
> Dian Fu <[hidden email]> 于2019年9月7日周六 下午12:26写道:
>
> > Hi Gary,
> >
> > Thanks for kicking off the release schedule of 1.10. +1 for you and Yu Li
> > as the release manager.
> >
> > The feature freeze/release time sounds reasonable.
> >
> > Thanks,
> > Dian
> >
> > > 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
> > >
> > > Thanks Gary for kicking off the discussion for 1.10 release.
> > >
> > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >> 在 2019年9月7日,00:52,zhijiang <[hidden email]> 写道:
> > >>
> > >> Hi Gary,
> > >>
> > >> Thanks for kicking off the features for next release 1.10.  I am very
> > supportive of you and Yu Li to be the relaese managers.
> > >>
> > >> Just mention another two improvements which want to be covered in
> > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> before.
> > >>
> > >> 1. Data serialize and copy only once for broadcast partition [1]: It
> > would improve the throughput performance greatly in broadcast mode and
> was
> > actually proposed in Flink-1.8. Most of works already done before and
> only
> > left the last critical jira/PR. It will not take much efforts to make it
> > ready.
> > >>
> > >> 2. Let Netty use Flink's buffers directly in credit-based mode [2] :
> It
> > could avoid memory copy from netty stack to flink managed network buffer.
> > The obvious benefit is decreasing the direct memory overhead greatly in
> > large-scale jobs. I also heard of some user cases encounter direct OOM
> > caused by netty memory overhead. Actually this improvment was proposed by
> > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > submitted a PR half an year ago but have not been reviewed yet. I could
> > help review the deign and PR codes to make it ready.
> > >>
> > >> And you could make these two items as lowest priority if possible.
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > >>
> > >> Best,
> > >> Zhijiang
> > >> ------------------------------------------------------------------
> > >> From:Gary Yao <[hidden email]>
> > >> Send Time:2019年9月6日(星期五) 17:06
> > >> To:dev <[hidden email]>
> > >> Cc:carp84 <[hidden email]>
> > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > >>
> > >> Hi community,
> > >>
> > >> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I
> > want to
> > >> start kicking off the discussion about what we want to achieve for the
> > 1.10
> > >> release.
> > >>
> > >> Based on discussions with various people as well as observations from
> > >> mailing
> > >> list threads, Yu Li and I have compiled a list of features that we
> deem
> > >> important to be included in the next release. Note that the features
> > >> presented
> > >> here are not meant to be exhaustive. As always, I am sure that there
> > will be
> > >> other contributions that will make it into the next release. This
> email
> > >> thread
> > >> is merely to kick off a discussion, and to give users and contributors
> > an
> > >> understanding where the focus of the next release lies. If there is
> > anything
> > >> we have missed that somebody is working on, please reply to this
> thread.
> > >>
> > >>
> > >> ** Proposed features and focus
> > >>
> > >> Following the contribution of Blink to Apache Flink, the community
> > released
> > >> a
> > >> preview of the Blink SQL Query Processor, which offers better SQL
> > coverage
> > >> and
> > >> improved performance for batch queries, in Flink 1.9.0. However, the
> > >> integration of the Blink query processor is not fully completed yet as
> > there
> > >> are still pending tasks, such as implementing full TPC-DS support.
> With
> > the
> > >> next Flink release, we aim at finishing the Blink integration.
> > >>
> > >> Furthermore, there are several ongoing work threads addressing
> > long-standing
> > >> issues reported by users, such as improving checkpointing under
> > >> backpressure,
> > >> and limiting RocksDBs native memory usage, which can be especially
> > >> problematic
> > >> in containerized Flink deployments.
> > >>
> > >> Notable features surrounding Flink’s ecosystem that are planned for
> the
> > next
> > >> release include active Kubernetes support (i.e., enabling Flink’s
> > >> ResourceManager to launch new pods), improved Hive integration, Java
> 11
> > >> support, and new algorithms for the Flink ML library.
> > >>
> > >> Below I have included the list of features that we compiled ordered by
> > >> priority – some of which already have ongoing mailing list threads,
> > JIRAs,
> > >> or
> > >> FLIPs.
> > >>
> > >> - Improving Flink’s build system & CI [1] [2]
> > >> - Support Java 11 [3]
> > >> - Table API improvements
> > >>   - Configuration Evolution [4] [5]
> > >>   - Finish type system: Expression Re-design [6] and UDF refactor
> > >>   - Streaming DDL: Time attribute (watermark) and Changelog support
> > >>   - Full SQL partition support for both batch & streaming [7]
> > >>   - New Java Expression DSL [8]
> > >>   - SQL CLI with DDL and DML support
> > >> - Hive compatibility completion (DDL/UDF) to support full Hive
> > integration
> > >>   - Partition/Function/View support
> > >> - Remaining Blink planner/runtime merge
> > >>   - Support all TPC-DS queries [9]
> > >> - Finer grained resource management
> > >>   - Unified TaskExecutor Memory Configuration [10]
> > >>   - Fine Grained Operator Resource Management [11]
> > >>   - Dynamic Slots Allocation [12]
> > >> - Finish scheduler re-architecture [13]
> > >>   - Allows implementing more sophisticated scheduling strategies such
> as
> > >> better batch scheduler or speculative execution.
> > >> - New DataStream Source Interface [14]
> > >>   - A new source connector architecture to unify the implementation of
> > >> source connectors and make it simpler to implement custom source
> > connectors.
> > >> - Add more source/system metrics
> > >>   - For better flink job monitoring and facilitate customized
> solutions
> > >> like auto-scaling.
> > >> - Executor Interface / Client API [15]
> > >>   - Allow Flink downstream projects to easier and better monitor and
> > >> control flink jobs.
> > >> - Interactive Programming [16]
> > >>   - Allow users to cache the intermediate results in Table API for
> later
> > >> usage to avoid redundant computation when a Flink application contains
> > >> multiple jobs.
> > >> - Python User Defined Function [17]
> > >>   - Support native user-defined functions in Flink Python, including
> > >> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> > >> - Spillable heap backend [18]
> > >>   - A new state backend supporting automatic data spill and load when
> > >> memory exhausted/regained.
> > >> - RocksDB backend memory control [19]
> > >>   - Prevent excessive memory usage from RocksDB, especially in
> container
> > >> environment.
> > >> - Unaligned checkpoints [20]
> > >>   - Resolve the checkpoint timeout issue under backpressure.
> > >> - Separate framework and user class loader in per-job mode
> > >> - Active Kubernetes Integration [21]
> > >>   - Allow ResourceManager talking to Kubernetes to launch new pods
> > >> similar to Flink's Yarn/Mesos integration
> > >> - ML pipeline/library
> > >>   - Aims at delivering several core algorithms, including Logistic
> > >> Regression, Native Bayes, Random Forest, KMeans, etc.
> > >> - Add vertex subtask log url on WebUI [22]
> > >>
> > >>
> > >> ** Suggested release timeline
> > >>
> > >> Based on our usual time-based release schedule [23], and considering
> > that
> > >> several events, such as Flink Forward Europe and Asia, are overlapping
> > with
> > >> the current release cycle, we should aim at releasing 1.10 around the
> > >> beginning of January 2020. To give the community enough testing time,
> I
> > >> propose the feature freeze to be at the end of November. We should
> > announce
> > >> an
> > >> exact date later in the release cycle.
> > >>
> > >> Lastly, I would like to use the opportunity to propose Yu Li and
> myself
> > as
> > >> release managers for the upcoming release.
> > >>
> > >> What do you think?
> > >>
> > >>
> > >> Best,
> > >> Gary
> > >>
> > >> [1]
> > >>
> >
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> > >> [2]
> > >>
> >
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> > >> [3] https://issues.apache.org/jira/browse/FLINK-10725
> > >> [4]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> > >> [5]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> > >> [6]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> > >> [7]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> > >> [8]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> > >> [9] https://issues.apache.org/jira/browse/FLINK-11491
> > >> [10]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > >> [11]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > >> [12]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > >> [13] https://issues.apache.org/jira/browse/FLINK-10429
> > >> [14]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > >> [15]
> > >>
> >
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> > >> [16]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > >> [17]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > >> [18]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> > >> [19] https://issues.apache.org/jira/browse/FLINK-7289
> > >> [20]
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> > >> [21]
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> > >> [22] https://issues.apache.org/jira/browse/FLINK-13894
> > >> [23]
> > https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > >>
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Xintong Song
Thanks Gray and Yu for compiling the feature list and kicking off this
discussion.

+1 for Gary and Yu being the release managers for Flink 1.10.

Thank you~

Xintong Song



On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann <[hidden email]> wrote:

> Thanks for compiling the list of 1.10 efforts for the community Gary. I
> think this helps a lot to better understand what the community is currently
> working on.
>
> Thanks for volunteering as the release managers for the next major
> release. +1 for Gary and Yu being the RMs for Flink 1.10.
>
> Cheers,
> Till
>
> On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:
>
> > Thanks Gary for kicking off this discussion.
> > Really appreciate that you and Yu offer to help to manage 1.10 release.
> >
> > +1 for Gary and Yu as release managers.
> >
> > Thanks,
> > Zhu Zhu
> >
> > Dian Fu <[hidden email]> 于2019年9月7日周六 下午12:26写道:
> >
> > > Hi Gary,
> > >
> > > Thanks for kicking off the release schedule of 1.10. +1 for you and Yu
> Li
> > > as the release manager.
> > >
> > > The feature freeze/release time sounds reasonable.
> > >
> > > Thanks,
> > > Dian
> > >
> > > > 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
> > > >
> > > > Thanks Gary for kicking off the discussion for 1.10 release.
> > > >
> > > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >> 在 2019年9月7日,00:52,zhijiang <[hidden email]> 写道:
> > > >>
> > > >> Hi Gary,
> > > >>
> > > >> Thanks for kicking off the features for next release 1.10.  I am
> very
> > > supportive of you and Yu Li to be the relaese managers.
> > > >>
> > > >> Just mention another two improvements which want to be covered in
> > > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> > before.
> > > >>
> > > >> 1. Data serialize and copy only once for broadcast partition [1]: It
> > > would improve the throughput performance greatly in broadcast mode and
> > was
> > > actually proposed in Flink-1.8. Most of works already done before and
> > only
> > > left the last critical jira/PR. It will not take much efforts to make
> it
> > > ready.
> > > >>
> > > >> 2. Let Netty use Flink's buffers directly in credit-based mode [2] :
> > It
> > > could avoid memory copy from netty stack to flink managed network
> buffer.
> > > The obvious benefit is decreasing the direct memory overhead greatly in
> > > large-scale jobs. I also heard of some user cases encounter direct OOM
> > > caused by netty memory overhead. Actually this improvment was proposed
> by
> > > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > > submitted a PR half an year ago but have not been reviewed yet. I could
> > > help review the deign and PR codes to make it ready.
> > > >>
> > > >> And you could make these two items as lowest priority if possible.
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > > >>
> > > >> Best,
> > > >> Zhijiang
> > > >> ------------------------------------------------------------------
> > > >> From:Gary Yao <[hidden email]>
> > > >> Send Time:2019年9月6日(星期五) 17:06
> > > >> To:dev <[hidden email]>
> > > >> Cc:carp84 <[hidden email]>
> > > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > > >>
> > > >> Hi community,
> > > >>
> > > >> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I
> > > want to
> > > >> start kicking off the discussion about what we want to achieve for
> the
> > > 1.10
> > > >> release.
> > > >>
> > > >> Based on discussions with various people as well as observations
> from
> > > >> mailing
> > > >> list threads, Yu Li and I have compiled a list of features that we
> > deem
> > > >> important to be included in the next release. Note that the features
> > > >> presented
> > > >> here are not meant to be exhaustive. As always, I am sure that there
> > > will be
> > > >> other contributions that will make it into the next release. This
> > email
> > > >> thread
> > > >> is merely to kick off a discussion, and to give users and
> contributors
> > > an
> > > >> understanding where the focus of the next release lies. If there is
> > > anything
> > > >> we have missed that somebody is working on, please reply to this
> > thread.
> > > >>
> > > >>
> > > >> ** Proposed features and focus
> > > >>
> > > >> Following the contribution of Blink to Apache Flink, the community
> > > released
> > > >> a
> > > >> preview of the Blink SQL Query Processor, which offers better SQL
> > > coverage
> > > >> and
> > > >> improved performance for batch queries, in Flink 1.9.0. However, the
> > > >> integration of the Blink query processor is not fully completed yet
> as
> > > there
> > > >> are still pending tasks, such as implementing full TPC-DS support.
> > With
> > > the
> > > >> next Flink release, we aim at finishing the Blink integration.
> > > >>
> > > >> Furthermore, there are several ongoing work threads addressing
> > > long-standing
> > > >> issues reported by users, such as improving checkpointing under
> > > >> backpressure,
> > > >> and limiting RocksDBs native memory usage, which can be especially
> > > >> problematic
> > > >> in containerized Flink deployments.
> > > >>
> > > >> Notable features surrounding Flink’s ecosystem that are planned for
> > the
> > > next
> > > >> release include active Kubernetes support (i.e., enabling Flink’s
> > > >> ResourceManager to launch new pods), improved Hive integration, Java
> > 11
> > > >> support, and new algorithms for the Flink ML library.
> > > >>
> > > >> Below I have included the list of features that we compiled ordered
> by
> > > >> priority – some of which already have ongoing mailing list threads,
> > > JIRAs,
> > > >> or
> > > >> FLIPs.
> > > >>
> > > >> - Improving Flink’s build system & CI [1] [2]
> > > >> - Support Java 11 [3]
> > > >> - Table API improvements
> > > >>   - Configuration Evolution [4] [5]
> > > >>   - Finish type system: Expression Re-design [6] and UDF refactor
> > > >>   - Streaming DDL: Time attribute (watermark) and Changelog support
> > > >>   - Full SQL partition support for both batch & streaming [7]
> > > >>   - New Java Expression DSL [8]
> > > >>   - SQL CLI with DDL and DML support
> > > >> - Hive compatibility completion (DDL/UDF) to support full Hive
> > > integration
> > > >>   - Partition/Function/View support
> > > >> - Remaining Blink planner/runtime merge
> > > >>   - Support all TPC-DS queries [9]
> > > >> - Finer grained resource management
> > > >>   - Unified TaskExecutor Memory Configuration [10]
> > > >>   - Fine Grained Operator Resource Management [11]
> > > >>   - Dynamic Slots Allocation [12]
> > > >> - Finish scheduler re-architecture [13]
> > > >>   - Allows implementing more sophisticated scheduling strategies
> such
> > as
> > > >> better batch scheduler or speculative execution.
> > > >> - New DataStream Source Interface [14]
> > > >>   - A new source connector architecture to unify the implementation
> of
> > > >> source connectors and make it simpler to implement custom source
> > > connectors.
> > > >> - Add more source/system metrics
> > > >>   - For better flink job monitoring and facilitate customized
> > solutions
> > > >> like auto-scaling.
> > > >> - Executor Interface / Client API [15]
> > > >>   - Allow Flink downstream projects to easier and better monitor and
> > > >> control flink jobs.
> > > >> - Interactive Programming [16]
> > > >>   - Allow users to cache the intermediate results in Table API for
> > later
> > > >> usage to avoid redundant computation when a Flink application
> contains
> > > >> multiple jobs.
> > > >> - Python User Defined Function [17]
> > > >>   - Support native user-defined functions in Flink Python, including
> > > >> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> > > >> - Spillable heap backend [18]
> > > >>   - A new state backend supporting automatic data spill and load
> when
> > > >> memory exhausted/regained.
> > > >> - RocksDB backend memory control [19]
> > > >>   - Prevent excessive memory usage from RocksDB, especially in
> > container
> > > >> environment.
> > > >> - Unaligned checkpoints [20]
> > > >>   - Resolve the checkpoint timeout issue under backpressure.
> > > >> - Separate framework and user class loader in per-job mode
> > > >> - Active Kubernetes Integration [21]
> > > >>   - Allow ResourceManager talking to Kubernetes to launch new pods
> > > >> similar to Flink's Yarn/Mesos integration
> > > >> - ML pipeline/library
> > > >>   - Aims at delivering several core algorithms, including Logistic
> > > >> Regression, Native Bayes, Random Forest, KMeans, etc.
> > > >> - Add vertex subtask log url on WebUI [22]
> > > >>
> > > >>
> > > >> ** Suggested release timeline
> > > >>
> > > >> Based on our usual time-based release schedule [23], and considering
> > > that
> > > >> several events, such as Flink Forward Europe and Asia, are
> overlapping
> > > with
> > > >> the current release cycle, we should aim at releasing 1.10 around
> the
> > > >> beginning of January 2020. To give the community enough testing
> time,
> > I
> > > >> propose the feature freeze to be at the end of November. We should
> > > announce
> > > >> an
> > > >> exact date later in the release cycle.
> > > >>
> > > >> Lastly, I would like to use the opportunity to propose Yu Li and
> > myself
> > > as
> > > >> release managers for the upcoming release.
> > > >>
> > > >> What do you think?
> > > >>
> > > >>
> > > >> Best,
> > > >> Gary
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> > > >> [2]
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> > > >> [3] https://issues.apache.org/jira/browse/FLINK-10725
> > > >> [4]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> > > >> [5]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> > > >> [6]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> > > >> [7]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> > > >> [8]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> > > >> [9] https://issues.apache.org/jira/browse/FLINK-11491
> > > >> [10]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > >> [11]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > >> [12]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > >> [13] https://issues.apache.org/jira/browse/FLINK-10429
> > > >> [14]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > >> [15]
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> > > >> [16]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > > >> [17]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > > >> [18]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> > > >> [19] https://issues.apache.org/jira/browse/FLINK-7289
> > > >> [20]
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> > > >> [21]
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> > > >> [22] https://issues.apache.org/jira/browse/FLINK-13894
> > > >> [23]
> > > https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > > >>
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Xuefu Z
Looking at feature list, I don't see an item for complete the data type
support. Specifically, high precision timestamp is needed to Hive
integration, as it's so common. Missing it would damage the completeness of
our Hive effort.

Thanks,
Xuefu

On Sat, Sep 7, 2019 at 7:06 PM Xintong Song <[hidden email]> wrote:

> Thanks Gray and Yu for compiling the feature list and kicking off this
> discussion.
>
> +1 for Gary and Yu being the release managers for Flink 1.10.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann <[hidden email]> wrote:
>
> > Thanks for compiling the list of 1.10 efforts for the community Gary. I
> > think this helps a lot to better understand what the community is
> currently
> > working on.
> >
> > Thanks for volunteering as the release managers for the next major
> > release. +1 for Gary and Yu being the RMs for Flink 1.10.
> >
> > Cheers,
> > Till
> >
> > On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:
> >
> > > Thanks Gary for kicking off this discussion.
> > > Really appreciate that you and Yu offer to help to manage 1.10 release.
> > >
> > > +1 for Gary and Yu as release managers.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Dian Fu <[hidden email]> 于2019年9月7日周六 下午12:26写道:
> > >
> > > > Hi Gary,
> > > >
> > > > Thanks for kicking off the release schedule of 1.10. +1 for you and
> Yu
> > Li
> > > > as the release manager.
> > > >
> > > > The feature freeze/release time sounds reasonable.
> > > >
> > > > Thanks,
> > > > Dian
> > > >
> > > > > 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
> > > > >
> > > > > Thanks Gary for kicking off the discussion for 1.10 release.
> > > > >
> > > > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >> 在 2019年9月7日,00:52,zhijiang <[hidden email]>
> 写道:
> > > > >>
> > > > >> Hi Gary,
> > > > >>
> > > > >> Thanks for kicking off the features for next release 1.10.  I am
> > very
> > > > supportive of you and Yu Li to be the relaese managers.
> > > > >>
> > > > >> Just mention another two improvements which want to be covered in
> > > > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> > > before.
> > > > >>
> > > > >> 1. Data serialize and copy only once for broadcast partition [1]:
> It
> > > > would improve the throughput performance greatly in broadcast mode
> and
> > > was
> > > > actually proposed in Flink-1.8. Most of works already done before and
> > > only
> > > > left the last critical jira/PR. It will not take much efforts to make
> > it
> > > > ready.
> > > > >>
> > > > >> 2. Let Netty use Flink's buffers directly in credit-based mode
> [2] :
> > > It
> > > > could avoid memory copy from netty stack to flink managed network
> > buffer.
> > > > The obvious benefit is decreasing the direct memory overhead greatly
> in
> > > > large-scale jobs. I also heard of some user cases encounter direct
> OOM
> > > > caused by netty memory overhead. Actually this improvment was
> proposed
> > by
> > > > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > > > submitted a PR half an year ago but have not been reviewed yet. I
> could
> > > > help review the deign and PR codes to make it ready.
> > > > >>
> > > > >> And you could make these two items as lowest priority if possible.
> > > > >>
> > > > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > > > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > > > >>
> > > > >> Best,
> > > > >> Zhijiang
> > > > >> ------------------------------------------------------------------
> > > > >> From:Gary Yao <[hidden email]>
> > > > >> Send Time:2019年9月6日(星期五) 17:06
> > > > >> To:dev <[hidden email]>
> > > > >> Cc:carp84 <[hidden email]>
> > > > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > > > >>
> > > > >> Hi community,
> > > > >>
> > > > >> Since Apache Flink 1.9.0 has been released more than 2 weeks ago,
> I
> > > > want to
> > > > >> start kicking off the discussion about what we want to achieve for
> > the
> > > > 1.10
> > > > >> release.
> > > > >>
> > > > >> Based on discussions with various people as well as observations
> > from
> > > > >> mailing
> > > > >> list threads, Yu Li and I have compiled a list of features that we
> > > deem
> > > > >> important to be included in the next release. Note that the
> features
> > > > >> presented
> > > > >> here are not meant to be exhaustive. As always, I am sure that
> there
> > > > will be
> > > > >> other contributions that will make it into the next release. This
> > > email
> > > > >> thread
> > > > >> is merely to kick off a discussion, and to give users and
> > contributors
> > > > an
> > > > >> understanding where the focus of the next release lies. If there
> is
> > > > anything
> > > > >> we have missed that somebody is working on, please reply to this
> > > thread.
> > > > >>
> > > > >>
> > > > >> ** Proposed features and focus
> > > > >>
> > > > >> Following the contribution of Blink to Apache Flink, the community
> > > > released
> > > > >> a
> > > > >> preview of the Blink SQL Query Processor, which offers better SQL
> > > > coverage
> > > > >> and
> > > > >> improved performance for batch queries, in Flink 1.9.0. However,
> the
> > > > >> integration of the Blink query processor is not fully completed
> yet
> > as
> > > > there
> > > > >> are still pending tasks, such as implementing full TPC-DS support.
> > > With
> > > > the
> > > > >> next Flink release, we aim at finishing the Blink integration.
> > > > >>
> > > > >> Furthermore, there are several ongoing work threads addressing
> > > > long-standing
> > > > >> issues reported by users, such as improving checkpointing under
> > > > >> backpressure,
> > > > >> and limiting RocksDBs native memory usage, which can be especially
> > > > >> problematic
> > > > >> in containerized Flink deployments.
> > > > >>
> > > > >> Notable features surrounding Flink’s ecosystem that are planned
> for
> > > the
> > > > next
> > > > >> release include active Kubernetes support (i.e., enabling Flink’s
> > > > >> ResourceManager to launch new pods), improved Hive integration,
> Java
> > > 11
> > > > >> support, and new algorithms for the Flink ML library.
> > > > >>
> > > > >> Below I have included the list of features that we compiled
> ordered
> > by
> > > > >> priority – some of which already have ongoing mailing list
> threads,
> > > > JIRAs,
> > > > >> or
> > > > >> FLIPs.
> > > > >>
> > > > >> - Improving Flink’s build system & CI [1] [2]
> > > > >> - Support Java 11 [3]
> > > > >> - Table API improvements
> > > > >>   - Configuration Evolution [4] [5]
> > > > >>   - Finish type system: Expression Re-design [6] and UDF refactor
> > > > >>   - Streaming DDL: Time attribute (watermark) and Changelog
> support
> > > > >>   - Full SQL partition support for both batch & streaming [7]
> > > > >>   - New Java Expression DSL [8]
> > > > >>   - SQL CLI with DDL and DML support
> > > > >> - Hive compatibility completion (DDL/UDF) to support full Hive
> > > > integration
> > > > >>   - Partition/Function/View support
> > > > >> - Remaining Blink planner/runtime merge
> > > > >>   - Support all TPC-DS queries [9]
> > > > >> - Finer grained resource management
> > > > >>   - Unified TaskExecutor Memory Configuration [10]
> > > > >>   - Fine Grained Operator Resource Management [11]
> > > > >>   - Dynamic Slots Allocation [12]
> > > > >> - Finish scheduler re-architecture [13]
> > > > >>   - Allows implementing more sophisticated scheduling strategies
> > such
> > > as
> > > > >> better batch scheduler or speculative execution.
> > > > >> - New DataStream Source Interface [14]
> > > > >>   - A new source connector architecture to unify the
> implementation
> > of
> > > > >> source connectors and make it simpler to implement custom source
> > > > connectors.
> > > > >> - Add more source/system metrics
> > > > >>   - For better flink job monitoring and facilitate customized
> > > solutions
> > > > >> like auto-scaling.
> > > > >> - Executor Interface / Client API [15]
> > > > >>   - Allow Flink downstream projects to easier and better monitor
> and
> > > > >> control flink jobs.
> > > > >> - Interactive Programming [16]
> > > > >>   - Allow users to cache the intermediate results in Table API for
> > > later
> > > > >> usage to avoid redundant computation when a Flink application
> > contains
> > > > >> multiple jobs.
> > > > >> - Python User Defined Function [17]
> > > > >>   - Support native user-defined functions in Flink Python,
> including
> > > > >> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> > > > >> - Spillable heap backend [18]
> > > > >>   - A new state backend supporting automatic data spill and load
> > when
> > > > >> memory exhausted/regained.
> > > > >> - RocksDB backend memory control [19]
> > > > >>   - Prevent excessive memory usage from RocksDB, especially in
> > > container
> > > > >> environment.
> > > > >> - Unaligned checkpoints [20]
> > > > >>   - Resolve the checkpoint timeout issue under backpressure.
> > > > >> - Separate framework and user class loader in per-job mode
> > > > >> - Active Kubernetes Integration [21]
> > > > >>   - Allow ResourceManager talking to Kubernetes to launch new pods
> > > > >> similar to Flink's Yarn/Mesos integration
> > > > >> - ML pipeline/library
> > > > >>   - Aims at delivering several core algorithms, including Logistic
> > > > >> Regression, Native Bayes, Random Forest, KMeans, etc.
> > > > >> - Add vertex subtask log url on WebUI [22]
> > > > >>
> > > > >>
> > > > >> ** Suggested release timeline
> > > > >>
> > > > >> Based on our usual time-based release schedule [23], and
> considering
> > > > that
> > > > >> several events, such as Flink Forward Europe and Asia, are
> > overlapping
> > > > with
> > > > >> the current release cycle, we should aim at releasing 1.10 around
> > the
> > > > >> beginning of January 2020. To give the community enough testing
> > time,
> > > I
> > > > >> propose the feature freeze to be at the end of November. We should
> > > > announce
> > > > >> an
> > > > >> exact date later in the release cycle.
> > > > >>
> > > > >> Lastly, I would like to use the opportunity to propose Yu Li and
> > > myself
> > > > as
> > > > >> release managers for the upcoming release.
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Gary
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> > > > >> [2]
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> > > > >> [3] https://issues.apache.org/jira/browse/FLINK-10725
> > > > >> [4]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> > > > >> [5]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> > > > >> [6]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> > > > >> [7]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> > > > >> [8]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> > > > >> [9] https://issues.apache.org/jira/browse/FLINK-11491
> > > > >> [10]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > >> [11]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > > >> [12]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >> [13] https://issues.apache.org/jira/browse/FLINK-10429
> > > > >> [14]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > >> [15]
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> > > > >> [16]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > > > >> [17]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > > > >> [18]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> > > > >> [19] https://issues.apache.org/jira/browse/FLINK-7289
> > > > >> [20]
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> > > > >> [21]
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> > > > >> [22] https://issues.apache.org/jira/browse/FLINK-13894
> > > > >> [23]
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>


--
Xuefu Zhang

"In Honey We Trust!"
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Yu Li
Hi Xuefu,

If I understand it correctly, the data type support work should be included
in the "Table API improvements->Finish type system" part, please check it
and let us know if anything missing there. Thanks.

Best Regards,
Yu


On Mon, 9 Sep 2019 at 11:14, Xuefu Z <[hidden email]> wrote:

> Looking at feature list, I don't see an item for complete the data type
> support. Specifically, high precision timestamp is needed to Hive
> integration, as it's so common. Missing it would damage the completeness of
> our Hive effort.
>
> Thanks,
> Xuefu
>
> On Sat, Sep 7, 2019 at 7:06 PM Xintong Song <[hidden email]> wrote:
>
> > Thanks Gray and Yu for compiling the feature list and kicking off this
> > discussion.
> >
> > +1 for Gary and Yu being the release managers for Flink 1.10.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann <[hidden email]>
> wrote:
> >
> > > Thanks for compiling the list of 1.10 efforts for the community Gary. I
> > > think this helps a lot to better understand what the community is
> > currently
> > > working on.
> > >
> > > Thanks for volunteering as the release managers for the next major
> > > release. +1 for Gary and Yu being the RMs for Flink 1.10.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:
> > >
> > > > Thanks Gary for kicking off this discussion.
> > > > Really appreciate that you and Yu offer to help to manage 1.10
> release.
> > > >
> > > > +1 for Gary and Yu as release managers.
> > > >
> > > > Thanks,
> > > > Zhu Zhu
> > > >
> > > > Dian Fu <[hidden email]> 于2019年9月7日周六 下午12:26写道:
> > > >
> > > > > Hi Gary,
> > > > >
> > > > > Thanks for kicking off the release schedule of 1.10. +1 for you and
> > Yu
> > > Li
> > > > > as the release manager.
> > > > >
> > > > > The feature freeze/release time sounds reasonable.
> > > > >
> > > > > Thanks,
> > > > > Dian
> > > > >
> > > > > > 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]> 写道:
> > > > > >
> > > > > > Thanks Gary for kicking off the discussion for 1.10 release.
> > > > > >
> > > > > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >> 在 2019年9月7日,00:52,zhijiang <[hidden email]>
> > 写道:
> > > > > >>
> > > > > >> Hi Gary,
> > > > > >>
> > > > > >> Thanks for kicking off the features for next release 1.10.  I am
> > > very
> > > > > supportive of you and Yu Li to be the relaese managers.
> > > > > >>
> > > > > >> Just mention another two improvements which want to be covered
> in
> > > > > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> > > > before.
> > > > > >>
> > > > > >> 1. Data serialize and copy only once for broadcast partition
> [1]:
> > It
> > > > > would improve the throughput performance greatly in broadcast mode
> > and
> > > > was
> > > > > actually proposed in Flink-1.8. Most of works already done before
> and
> > > > only
> > > > > left the last critical jira/PR. It will not take much efforts to
> make
> > > it
> > > > > ready.
> > > > > >>
> > > > > >> 2. Let Netty use Flink's buffers directly in credit-based mode
> > [2] :
> > > > It
> > > > > could avoid memory copy from netty stack to flink managed network
> > > buffer.
> > > > > The obvious benefit is decreasing the direct memory overhead
> greatly
> > in
> > > > > large-scale jobs. I also heard of some user cases encounter direct
> > OOM
> > > > > caused by netty memory overhead. Actually this improvment was
> > proposed
> > > by
> > > > > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > > > > submitted a PR half an year ago but have not been reviewed yet. I
> > could
> > > > > help review the deign and PR codes to make it ready.
> > > > > >>
> > > > > >> And you could make these two items as lowest priority if
> possible.
> > > > > >>
> > > > > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > > > > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > > > > >>
> > > > > >> Best,
> > > > > >> Zhijiang
> > > > > >>
> ------------------------------------------------------------------
> > > > > >> From:Gary Yao <[hidden email]>
> > > > > >> Send Time:2019年9月6日(星期五) 17:06
> > > > > >> To:dev <[hidden email]>
> > > > > >> Cc:carp84 <[hidden email]>
> > > > > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > > > > >>
> > > > > >> Hi community,
> > > > > >>
> > > > > >> Since Apache Flink 1.9.0 has been released more than 2 weeks
> ago,
> > I
> > > > > want to
> > > > > >> start kicking off the discussion about what we want to achieve
> for
> > > the
> > > > > 1.10
> > > > > >> release.
> > > > > >>
> > > > > >> Based on discussions with various people as well as observations
> > > from
> > > > > >> mailing
> > > > > >> list threads, Yu Li and I have compiled a list of features that
> we
> > > > deem
> > > > > >> important to be included in the next release. Note that the
> > features
> > > > > >> presented
> > > > > >> here are not meant to be exhaustive. As always, I am sure that
> > there
> > > > > will be
> > > > > >> other contributions that will make it into the next release.
> This
> > > > email
> > > > > >> thread
> > > > > >> is merely to kick off a discussion, and to give users and
> > > contributors
> > > > > an
> > > > > >> understanding where the focus of the next release lies. If there
> > is
> > > > > anything
> > > > > >> we have missed that somebody is working on, please reply to this
> > > > thread.
> > > > > >>
> > > > > >>
> > > > > >> ** Proposed features and focus
> > > > > >>
> > > > > >> Following the contribution of Blink to Apache Flink, the
> community
> > > > > released
> > > > > >> a
> > > > > >> preview of the Blink SQL Query Processor, which offers better
> SQL
> > > > > coverage
> > > > > >> and
> > > > > >> improved performance for batch queries, in Flink 1.9.0. However,
> > the
> > > > > >> integration of the Blink query processor is not fully completed
> > yet
> > > as
> > > > > there
> > > > > >> are still pending tasks, such as implementing full TPC-DS
> support.
> > > > With
> > > > > the
> > > > > >> next Flink release, we aim at finishing the Blink integration.
> > > > > >>
> > > > > >> Furthermore, there are several ongoing work threads addressing
> > > > > long-standing
> > > > > >> issues reported by users, such as improving checkpointing under
> > > > > >> backpressure,
> > > > > >> and limiting RocksDBs native memory usage, which can be
> especially
> > > > > >> problematic
> > > > > >> in containerized Flink deployments.
> > > > > >>
> > > > > >> Notable features surrounding Flink’s ecosystem that are planned
> > for
> > > > the
> > > > > next
> > > > > >> release include active Kubernetes support (i.e., enabling
> Flink’s
> > > > > >> ResourceManager to launch new pods), improved Hive integration,
> > Java
> > > > 11
> > > > > >> support, and new algorithms for the Flink ML library.
> > > > > >>
> > > > > >> Below I have included the list of features that we compiled
> > ordered
> > > by
> > > > > >> priority – some of which already have ongoing mailing list
> > threads,
> > > > > JIRAs,
> > > > > >> or
> > > > > >> FLIPs.
> > > > > >>
> > > > > >> - Improving Flink’s build system & CI [1] [2]
> > > > > >> - Support Java 11 [3]
> > > > > >> - Table API improvements
> > > > > >>   - Configuration Evolution [4] [5]
> > > > > >>   - Finish type system: Expression Re-design [6] and UDF
> refactor
> > > > > >>   - Streaming DDL: Time attribute (watermark) and Changelog
> > support
> > > > > >>   - Full SQL partition support for both batch & streaming [7]
> > > > > >>   - New Java Expression DSL [8]
> > > > > >>   - SQL CLI with DDL and DML support
> > > > > >> - Hive compatibility completion (DDL/UDF) to support full Hive
> > > > > integration
> > > > > >>   - Partition/Function/View support
> > > > > >> - Remaining Blink planner/runtime merge
> > > > > >>   - Support all TPC-DS queries [9]
> > > > > >> - Finer grained resource management
> > > > > >>   - Unified TaskExecutor Memory Configuration [10]
> > > > > >>   - Fine Grained Operator Resource Management [11]
> > > > > >>   - Dynamic Slots Allocation [12]
> > > > > >> - Finish scheduler re-architecture [13]
> > > > > >>   - Allows implementing more sophisticated scheduling strategies
> > > such
> > > > as
> > > > > >> better batch scheduler or speculative execution.
> > > > > >> - New DataStream Source Interface [14]
> > > > > >>   - A new source connector architecture to unify the
> > implementation
> > > of
> > > > > >> source connectors and make it simpler to implement custom source
> > > > > connectors.
> > > > > >> - Add more source/system metrics
> > > > > >>   - For better flink job monitoring and facilitate customized
> > > > solutions
> > > > > >> like auto-scaling.
> > > > > >> - Executor Interface / Client API [15]
> > > > > >>   - Allow Flink downstream projects to easier and better monitor
> > and
> > > > > >> control flink jobs.
> > > > > >> - Interactive Programming [16]
> > > > > >>   - Allow users to cache the intermediate results in Table API
> for
> > > > later
> > > > > >> usage to avoid redundant computation when a Flink application
> > > contains
> > > > > >> multiple jobs.
> > > > > >> - Python User Defined Function [17]
> > > > > >>   - Support native user-defined functions in Flink Python,
> > including
> > > > > >> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> > > > > >> - Spillable heap backend [18]
> > > > > >>   - A new state backend supporting automatic data spill and load
> > > when
> > > > > >> memory exhausted/regained.
> > > > > >> - RocksDB backend memory control [19]
> > > > > >>   - Prevent excessive memory usage from RocksDB, especially in
> > > > container
> > > > > >> environment.
> > > > > >> - Unaligned checkpoints [20]
> > > > > >>   - Resolve the checkpoint timeout issue under backpressure.
> > > > > >> - Separate framework and user class loader in per-job mode
> > > > > >> - Active Kubernetes Integration [21]
> > > > > >>   - Allow ResourceManager talking to Kubernetes to launch new
> pods
> > > > > >> similar to Flink's Yarn/Mesos integration
> > > > > >> - ML pipeline/library
> > > > > >>   - Aims at delivering several core algorithms, including
> Logistic
> > > > > >> Regression, Native Bayes, Random Forest, KMeans, etc.
> > > > > >> - Add vertex subtask log url on WebUI [22]
> > > > > >>
> > > > > >>
> > > > > >> ** Suggested release timeline
> > > > > >>
> > > > > >> Based on our usual time-based release schedule [23], and
> > considering
> > > > > that
> > > > > >> several events, such as Flink Forward Europe and Asia, are
> > > overlapping
> > > > > with
> > > > > >> the current release cycle, we should aim at releasing 1.10
> around
> > > the
> > > > > >> beginning of January 2020. To give the community enough testing
> > > time,
> > > > I
> > > > > >> propose the feature freeze to be at the end of November. We
> should
> > > > > announce
> > > > > >> an
> > > > > >> exact date later in the release cycle.
> > > > > >>
> > > > > >> Lastly, I would like to use the opportunity to propose Yu Li and
> > > > myself
> > > > > as
> > > > > >> release managers for the upcoming release.
> > > > > >>
> > > > > >> What do you think?
> > > > > >>
> > > > > >>
> > > > > >> Best,
> > > > > >> Gary
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> > > > > >> [2]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> > > > > >> [3] https://issues.apache.org/jira/browse/FLINK-10725
> > > > > >> [4]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> > > > > >> [5]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> > > > > >> [6]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> > > > > >> [7]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> > > > > >> [8]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> > > > > >> [9] https://issues.apache.org/jira/browse/FLINK-11491
> > > > > >> [10]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > >> [11]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > > > >> [12]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> [13] https://issues.apache.org/jira/browse/FLINK-10429
> > > > > >> [14]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > > >> [15]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> > > > > >> [16]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > > > > >> [17]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > > > > >> [18]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> > > > > >> [19] https://issues.apache.org/jira/browse/FLINK-7289
> > > > > >> [20]
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> > > > > >> [21]
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> > > > > >> [22] https://issues.apache.org/jira/browse/FLINK-13894
> > > > > >> [23]
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Aljoscha Krettek-2
Hi,

Thanks for putting together the list! And I’m +1 for the suggested
release timeline and also for Gary and Yu as the release managers.

Best,
Aljoscha

On 9 Sep 2019, at 7:39, Yu Li wrote:

> Hi Xuefu,
>
> If I understand it correctly, the data type support work should be
> included
> in the "Table API improvements->Finish type system" part, please check
> it
> and let us know if anything missing there. Thanks.
>
> Best Regards,
> Yu
>
>
> On Mon, 9 Sep 2019 at 11:14, Xuefu Z <[hidden email]> wrote:
>
>> Looking at feature list, I don't see an item for complete the data
>> type
>> support. Specifically, high precision timestamp is needed to Hive
>> integration, as it's so common. Missing it would damage the
>> completeness of
>> our Hive effort.
>>
>> Thanks,
>> Xuefu
>>
>> On Sat, Sep 7, 2019 at 7:06 PM Xintong Song <[hidden email]>
>> wrote:
>>
>>> Thanks Gray and Yu for compiling the feature list and kicking off
>>> this
>>> discussion.
>>>
>>> +1 for Gary and Yu being the release managers for Flink 1.10.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann <[hidden email]>
>> wrote:
>>>
>>>> Thanks for compiling the list of 1.10 efforts for the community
>>>> Gary. I
>>>> think this helps a lot to better understand what the community is
>>> currently
>>>> working on.
>>>>
>>>> Thanks for volunteering as the release managers for the next major
>>>> release. +1 for Gary and Yu being the RMs for Flink 1.10.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:
>>>>
>>>>> Thanks Gary for kicking off this discussion.
>>>>> Really appreciate that you and Yu offer to help to manage 1.10
>> release.
>>>>>
>>>>> +1 for Gary and Yu as release managers.
>>>>>
>>>>> Thanks,
>>>>> Zhu Zhu
>>>>>
>>>>> Dian Fu <[hidden email]> 于2019年9月7日周六
>>>>> 下午12:26写道:
>>>>>
>>>>>> Hi Gary,
>>>>>>
>>>>>> Thanks for kicking off the release schedule of 1.10. +1 for you
>>>>>> and
>>> Yu
>>>> Li
>>>>>> as the release manager.
>>>>>>
>>>>>> The feature freeze/release time sounds reasonable.
>>>>>>
>>>>>> Thanks,
>>>>>> Dian
>>>>>>
>>>>>>> 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]>
>>>>>>> 写道:
>>>>>>>
>>>>>>> Thanks Gary for kicking off the discussion for 1.10 release.
>>>>>>>
>>>>>>> +1 for Gary and Yu as release managers. Thank you for you
>>>>>>> effort.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>>> 在 2019年9月7日,00:52,zhijiang
>>>>>>>> <[hidden email]>
>>> 写道:
>>>>>>>>
>>>>>>>> Hi Gary,
>>>>>>>>
>>>>>>>> Thanks for kicking off the features for next release 1.10.  I
>>>>>>>> am
>>>> very
>>>>>> supportive of you and Yu Li to be the relaese managers.
>>>>>>>>
>>>>>>>> Just mention another two improvements which want to be covered
>> in
>>>>>> FLINK-1.10 and I already confirmed with Piotr to reach an
>>>>>> agreement
>>>>> before.
>>>>>>>>
>>>>>>>> 1. Data serialize and copy only once for broadcast partition
>> [1]:
>>> It
>>>>>> would improve the throughput performance greatly in broadcast
>>>>>> mode
>>> and
>>>>> was
>>>>>> actually proposed in Flink-1.8. Most of works already done before
>> and
>>>>> only
>>>>>> left the last critical jira/PR. It will not take much efforts to
>> make
>>>> it
>>>>>> ready.
>>>>>>>>
>>>>>>>> 2. Let Netty use Flink's buffers directly in credit-based mode
>>> [2] :
>>>>> It
>>>>>> could avoid memory copy from netty stack to flink managed network
>>>> buffer.
>>>>>> The obvious benefit is decreasing the direct memory overhead
>> greatly
>>> in
>>>>>> large-scale jobs. I also heard of some user cases encounter
>>>>>> direct
>>> OOM
>>>>>> caused by netty memory overhead. Actually this improvment was
>>> proposed
>>>> by
>>>>>> nico in FLINK-1.7 and always no time to focus then. Yun Gao
>>>>>> already
>>>>>> submitted a PR half an year ago but have not been reviewed yet. I
>>> could
>>>>>> help review the deign and PR codes to make it ready.
>>>>>>>>
>>>>>>>> And you could make these two items as lowest priority if
>> possible.
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10745
>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10742
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Zhijiang
>>>>>>>>
>> ------------------------------------------------------------------
>>>>>>>> From:Gary Yao <[hidden email]>
>>>>>>>> Send Time:2019年9月6日(星期五) 17:06
>>>>>>>> To:dev <[hidden email]>
>>>>>>>> Cc:carp84 <[hidden email]>
>>>>>>>> Subject:[DISCUSS] Features for Apache Flink 1.10
>>>>>>>>
>>>>>>>> Hi community,
>>>>>>>>
>>>>>>>> Since Apache Flink 1.9.0 has been released more than 2 weeks
>> ago,
>>> I
>>>>>> want to
>>>>>>>> start kicking off the discussion about what we want to achieve
>> for
>>>> the
>>>>>> 1.10
>>>>>>>> release.
>>>>>>>>
>>>>>>>> Based on discussions with various people as well as
>>>>>>>> observations
>>>> from
>>>>>>>> mailing
>>>>>>>> list threads, Yu Li and I have compiled a list of features that
>> we
>>>>> deem
>>>>>>>> important to be included in the next release. Note that the
>>> features
>>>>>>>> presented
>>>>>>>> here are not meant to be exhaustive. As always, I am sure that
>>> there
>>>>>> will be
>>>>>>>> other contributions that will make it into the next release.
>> This
>>>>> email
>>>>>>>> thread
>>>>>>>> is merely to kick off a discussion, and to give users and
>>>> contributors
>>>>>> an
>>>>>>>> understanding where the focus of the next release lies. If
>>>>>>>> there
>>> is
>>>>>> anything
>>>>>>>> we have missed that somebody is working on, please reply to
>>>>>>>> this
>>>>> thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> ** Proposed features and focus
>>>>>>>>
>>>>>>>> Following the contribution of Blink to Apache Flink, the
>> community
>>>>>> released
>>>>>>>> a
>>>>>>>> preview of the Blink SQL Query Processor, which offers better
>> SQL
>>>>>> coverage
>>>>>>>> and
>>>>>>>> improved performance for batch queries, in Flink 1.9.0.
>>>>>>>> However,
>>> the
>>>>>>>> integration of the Blink query processor is not fully completed
>>> yet
>>>> as
>>>>>> there
>>>>>>>> are still pending tasks, such as implementing full TPC-DS
>> support.
>>>>> With
>>>>>> the
>>>>>>>> next Flink release, we aim at finishing the Blink integration.
>>>>>>>>
>>>>>>>> Furthermore, there are several ongoing work threads addressing
>>>>>> long-standing
>>>>>>>> issues reported by users, such as improving checkpointing under
>>>>>>>> backpressure,
>>>>>>>> and limiting RocksDBs native memory usage, which can be
>> especially
>>>>>>>> problematic
>>>>>>>> in containerized Flink deployments.
>>>>>>>>
>>>>>>>> Notable features surrounding Flink’s ecosystem that are
>>>>>>>> planned
>>> for
>>>>> the
>>>>>> next
>>>>>>>> release include active Kubernetes support (i.e., enabling
>> Flink’s
>>>>>>>> ResourceManager to launch new pods), improved Hive integration,
>>> Java
>>>>> 11
>>>>>>>> support, and new algorithms for the Flink ML library.
>>>>>>>>
>>>>>>>> Below I have included the list of features that we compiled
>>> ordered
>>>> by
>>>>>>>> priority – some of which already have ongoing mailing list
>>> threads,
>>>>>> JIRAs,
>>>>>>>> or
>>>>>>>> FLIPs.
>>>>>>>>
>>>>>>>> - Improving Flink’s build system & CI [1] [2]
>>>>>>>> - Support Java 11 [3]
>>>>>>>> - Table API improvements
>>>>>>>>   - Configuration Evolution [4] [5]
>>>>>>>>   - Finish type system: Expression Re-design [6] and UDF
>> refactor
>>>>>>>>   - Streaming DDL: Time attribute (watermark) and Changelog
>>> support
>>>>>>>>   - Full SQL partition support for both batch & streaming [7]
>>>>>>>>   - New Java Expression DSL [8]
>>>>>>>>   - SQL CLI with DDL and DML support
>>>>>>>> - Hive compatibility completion (DDL/UDF) to support full Hive
>>>>>> integration
>>>>>>>>   - Partition/Function/View support
>>>>>>>> - Remaining Blink planner/runtime merge
>>>>>>>>   - Support all TPC-DS queries [9]
>>>>>>>> - Finer grained resource management
>>>>>>>>   - Unified TaskExecutor Memory Configuration [10]
>>>>>>>>   - Fine Grained Operator Resource Management [11]
>>>>>>>>   - Dynamic Slots Allocation [12]
>>>>>>>> - Finish scheduler re-architecture [13]
>>>>>>>>   - Allows implementing more sophisticated scheduling
>>>>>>>> strategies
>>>> such
>>>>> as
>>>>>>>> better batch scheduler or speculative execution.
>>>>>>>> - New DataStream Source Interface [14]
>>>>>>>>   - A new source connector architecture to unify the
>>> implementation
>>>> of
>>>>>>>> source connectors and make it simpler to implement custom
>>>>>>>> source
>>>>>> connectors.
>>>>>>>> - Add more source/system metrics
>>>>>>>>   - For better flink job monitoring and facilitate customized
>>>>> solutions
>>>>>>>> like auto-scaling.
>>>>>>>> - Executor Interface / Client API [15]
>>>>>>>>   - Allow Flink downstream projects to easier and better
>>>>>>>> monitor
>>> and
>>>>>>>> control flink jobs.
>>>>>>>> - Interactive Programming [16]
>>>>>>>>   - Allow users to cache the intermediate results in Table API
>> for
>>>>> later
>>>>>>>> usage to avoid redundant computation when a Flink application
>>>> contains
>>>>>>>> multiple jobs.
>>>>>>>> - Python User Defined Function [17]
>>>>>>>>   - Support native user-defined functions in Flink Python,
>>> including
>>>>>>>> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
>>>>>>>> - Spillable heap backend [18]
>>>>>>>>   - A new state backend supporting automatic data spill and
>>>>>>>> load
>>>> when
>>>>>>>> memory exhausted/regained.
>>>>>>>> - RocksDB backend memory control [19]
>>>>>>>>   - Prevent excessive memory usage from RocksDB, especially in
>>>>> container
>>>>>>>> environment.
>>>>>>>> - Unaligned checkpoints [20]
>>>>>>>>   - Resolve the checkpoint timeout issue under backpressure.
>>>>>>>> - Separate framework and user class loader in per-job mode
>>>>>>>> - Active Kubernetes Integration [21]
>>>>>>>>   - Allow ResourceManager talking to Kubernetes to launch new
>> pods
>>>>>>>> similar to Flink's Yarn/Mesos integration
>>>>>>>> - ML pipeline/library
>>>>>>>>   - Aims at delivering several core algorithms, including
>> Logistic
>>>>>>>> Regression, Native Bayes, Random Forest, KMeans, etc.
>>>>>>>> - Add vertex subtask log url on WebUI [22]
>>>>>>>>
>>>>>>>>
>>>>>>>> ** Suggested release timeline
>>>>>>>>
>>>>>>>> Based on our usual time-based release schedule [23], and
>>> considering
>>>>>> that
>>>>>>>> several events, such as Flink Forward Europe and Asia, are
>>>> overlapping
>>>>>> with
>>>>>>>> the current release cycle, we should aim at releasing 1.10
>> around
>>>> the
>>>>>>>> beginning of January 2020. To give the community enough testing
>>>> time,
>>>>> I
>>>>>>>> propose the feature freeze to be at the end of November. We
>> should
>>>>>> announce
>>>>>>>> an
>>>>>>>> exact date later in the release cycle.
>>>>>>>>
>>>>>>>> Lastly, I would like to use the opportunity to propose Yu Li
>>>>>>>> and
>>>>> myself
>>>>>> as
>>>>>>>> release managers for the upcoming release.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Gary
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
>>>>>>>> [2]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
>>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10725
>>>>>>>> [4]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>>>>>>>> [5]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>>>>>>>> [6]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
>>>>>>>> [7]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
>>>>>>>> [8]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>>>>>>>> [9] https://issues.apache.org/jira/browse/FLINK-11491
>>>>>>>> [10]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>>>> [11]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>> [12]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>>>>>>>> [13] https://issues.apache.org/jira/browse/FLINK-10429
>>>>>>>> [14]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>>>>>>>> [15]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
>>>>>>>> [16]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>>>>>>>> [17]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>>> [18]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
>>>>>>>> [19] https://issues.apache.org/jira/browse/FLINK-7289
>>>>>>>> [20]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
>>>>>>>> [21]
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
>>>>>>>> [22] https://issues.apache.org/jira/browse/FLINK-13894
>>>>>>>> [23]
>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Features for Apache Flink 1.10

Biao Liu
Thanks Gary for kicking off the discussion.

+1 for the feature freeze time. Also thanks Gary and Yu Li for volunteering
as the release manager.

BTW, I'm working on refactoring of `CheckpointCoordinator` [1]. It would be
great if it is included in 1.10.

1. https://issues.apache.org/jira/browse/FLINK-13698

Thanks,
Biao /'bɪ.aʊ/



On Wed, 11 Sep 2019 at 18:33, Aljoscha Krettek <[hidden email]> wrote:

> Hi,
>
> Thanks for putting together the list! And I’m +1 for the suggested
> release timeline and also for Gary and Yu as the release managers.
>
> Best,
> Aljoscha
>
> On 9 Sep 2019, at 7:39, Yu Li wrote:
>
> > Hi Xuefu,
> >
> > If I understand it correctly, the data type support work should be
> > included
> > in the "Table API improvements->Finish type system" part, please check
> > it
> > and let us know if anything missing there. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Mon, 9 Sep 2019 at 11:14, Xuefu Z <[hidden email]> wrote:
> >
> >> Looking at feature list, I don't see an item for complete the data
> >> type
> >> support. Specifically, high precision timestamp is needed to Hive
> >> integration, as it's so common. Missing it would damage the
> >> completeness of
> >> our Hive effort.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >> On Sat, Sep 7, 2019 at 7:06 PM Xintong Song <[hidden email]>
> >> wrote:
> >>
> >>> Thanks Gray and Yu for compiling the feature list and kicking off
> >>> this
> >>> discussion.
> >>>
> >>> +1 for Gary and Yu being the release managers for Flink 1.10.
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann <[hidden email]>
> >> wrote:
> >>>
> >>>> Thanks for compiling the list of 1.10 efforts for the community
> >>>> Gary. I
> >>>> think this helps a lot to better understand what the community is
> >>> currently
> >>>> working on.
> >>>>
> >>>> Thanks for volunteering as the release managers for the next major
> >>>> release. +1 for Gary and Yu being the RMs for Flink 1.10.
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu <[hidden email]> wrote:
> >>>>
> >>>>> Thanks Gary for kicking off this discussion.
> >>>>> Really appreciate that you and Yu offer to help to manage 1.10
> >> release.
> >>>>>
> >>>>> +1 for Gary and Yu as release managers.
> >>>>>
> >>>>> Thanks,
> >>>>> Zhu Zhu
> >>>>>
> >>>>> Dian Fu <[hidden email]> 于2019年9月7日周六
> >>>>> 下午12:26写道:
> >>>>>
> >>>>>> Hi Gary,
> >>>>>>
> >>>>>> Thanks for kicking off the release schedule of 1.10. +1 for you
> >>>>>> and
> >>> Yu
> >>>> Li
> >>>>>> as the release manager.
> >>>>>>
> >>>>>> The feature freeze/release time sounds reasonable.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Dian
> >>>>>>
> >>>>>>> 在 2019年9月7日,上午11:30,Jark Wu <[hidden email]>
> >>>>>>> 写道:
> >>>>>>>
> >>>>>>> Thanks Gary for kicking off the discussion for 1.10 release.
> >>>>>>>
> >>>>>>> +1 for Gary and Yu as release managers. Thank you for you
> >>>>>>> effort.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>>
> >>>>>>>> 在 2019年9月7日,00:52,zhijiang
> >>>>>>>> <[hidden email]>
> >>> 写道:
> >>>>>>>>
> >>>>>>>> Hi Gary,
> >>>>>>>>
> >>>>>>>> Thanks for kicking off the features for next release 1.10.  I
> >>>>>>>> am
> >>>> very
> >>>>>> supportive of you and Yu Li to be the relaese managers.
> >>>>>>>>
> >>>>>>>> Just mention another two improvements which want to be covered
> >> in
> >>>>>> FLINK-1.10 and I already confirmed with Piotr to reach an
> >>>>>> agreement
> >>>>> before.
> >>>>>>>>
> >>>>>>>> 1. Data serialize and copy only once for broadcast partition
> >> [1]:
> >>> It
> >>>>>> would improve the throughput performance greatly in broadcast
> >>>>>> mode
> >>> and
> >>>>> was
> >>>>>> actually proposed in Flink-1.8. Most of works already done before
> >> and
> >>>>> only
> >>>>>> left the last critical jira/PR. It will not take much efforts to
> >> make
> >>>> it
> >>>>>> ready.
> >>>>>>>>
> >>>>>>>> 2. Let Netty use Flink's buffers directly in credit-based mode
> >>> [2] :
> >>>>> It
> >>>>>> could avoid memory copy from netty stack to flink managed network
> >>>> buffer.
> >>>>>> The obvious benefit is decreasing the direct memory overhead
> >> greatly
> >>> in
> >>>>>> large-scale jobs. I also heard of some user cases encounter
> >>>>>> direct
> >>> OOM
> >>>>>> caused by netty memory overhead. Actually this improvment was
> >>> proposed
> >>>> by
> >>>>>> nico in FLINK-1.7 and always no time to focus then. Yun Gao
> >>>>>> already
> >>>>>> submitted a PR half an year ago but have not been reviewed yet. I
> >>> could
> >>>>>> help review the deign and PR codes to make it ready.
> >>>>>>>>
> >>>>>>>> And you could make these two items as lowest priority if
> >> possible.
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10745
> >>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10742
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Zhijiang
> >>>>>>>>
> >> ------------------------------------------------------------------
> >>>>>>>> From:Gary Yao <[hidden email]>
> >>>>>>>> Send Time:2019年9月6日(星期五) 17:06
> >>>>>>>> To:dev <[hidden email]>
> >>>>>>>> Cc:carp84 <[hidden email]>
> >>>>>>>> Subject:[DISCUSS] Features for Apache Flink 1.10
> >>>>>>>>
> >>>>>>>> Hi community,
> >>>>>>>>
> >>>>>>>> Since Apache Flink 1.9.0 has been released more than 2 weeks
> >> ago,
> >>> I
> >>>>>> want to
> >>>>>>>> start kicking off the discussion about what we want to achieve
> >> for
> >>>> the
> >>>>>> 1.10
> >>>>>>>> release.
> >>>>>>>>
> >>>>>>>> Based on discussions with various people as well as
> >>>>>>>> observations
> >>>> from
> >>>>>>>> mailing
> >>>>>>>> list threads, Yu Li and I have compiled a list of features that
> >> we
> >>>>> deem
> >>>>>>>> important to be included in the next release. Note that the
> >>> features
> >>>>>>>> presented
> >>>>>>>> here are not meant to be exhaustive. As always, I am sure that
> >>> there
> >>>>>> will be
> >>>>>>>> other contributions that will make it into the next release.
> >> This
> >>>>> email
> >>>>>>>> thread
> >>>>>>>> is merely to kick off a discussion, and to give users and
> >>>> contributors
> >>>>>> an
> >>>>>>>> understanding where the focus of the next release lies. If
> >>>>>>>> there
> >>> is
> >>>>>> anything
> >>>>>>>> we have missed that somebody is working on, please reply to
> >>>>>>>> this
> >>>>> thread.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ** Proposed features and focus
> >>>>>>>>
> >>>>>>>> Following the contribution of Blink to Apache Flink, the
> >> community
> >>>>>> released
> >>>>>>>> a
> >>>>>>>> preview of the Blink SQL Query Processor, which offers better
> >> SQL
> >>>>>> coverage
> >>>>>>>> and
> >>>>>>>> improved performance for batch queries, in Flink 1.9.0.
> >>>>>>>> However,
> >>> the
> >>>>>>>> integration of the Blink query processor is not fully completed
> >>> yet
> >>>> as
> >>>>>> there
> >>>>>>>> are still pending tasks, such as implementing full TPC-DS
> >> support.
> >>>>> With
> >>>>>> the
> >>>>>>>> next Flink release, we aim at finishing the Blink integration.
> >>>>>>>>
> >>>>>>>> Furthermore, there are several ongoing work threads addressing
> >>>>>> long-standing
> >>>>>>>> issues reported by users, such as improving checkpointing under
> >>>>>>>> backpressure,
> >>>>>>>> and limiting RocksDBs native memory usage, which can be
> >> especially
> >>>>>>>> problematic
> >>>>>>>> in containerized Flink deployments.
> >>>>>>>>
> >>>>>>>> Notable features surrounding Flink’s ecosystem that are
> >>>>>>>> planned
> >>> for
> >>>>> the
> >>>>>> next
> >>>>>>>> release include active Kubernetes support (i.e., enabling
> >> Flink’s
> >>>>>>>> ResourceManager to launch new pods), improved Hive integration,
> >>> Java
> >>>>> 11
> >>>>>>>> support, and new algorithms for the Flink ML library.
> >>>>>>>>
> >>>>>>>> Below I have included the list of features that we compiled
> >>> ordered
> >>>> by
> >>>>>>>> priority – some of which already have ongoing mailing list
> >>> threads,
> >>>>>> JIRAs,
> >>>>>>>> or
> >>>>>>>> FLIPs.
> >>>>>>>>
> >>>>>>>> - Improving Flink’s build system & CI [1] [2]
> >>>>>>>> - Support Java 11 [3]
> >>>>>>>> - Table API improvements
> >>>>>>>>   - Configuration Evolution [4] [5]
> >>>>>>>>   - Finish type system: Expression Re-design [6] and UDF
> >> refactor
> >>>>>>>>   - Streaming DDL: Time attribute (watermark) and Changelog
> >>> support
> >>>>>>>>   - Full SQL partition support for both batch & streaming [7]
> >>>>>>>>   - New Java Expression DSL [8]
> >>>>>>>>   - SQL CLI with DDL and DML support
> >>>>>>>> - Hive compatibility completion (DDL/UDF) to support full Hive
> >>>>>> integration
> >>>>>>>>   - Partition/Function/View support
> >>>>>>>> - Remaining Blink planner/runtime merge
> >>>>>>>>   - Support all TPC-DS queries [9]
> >>>>>>>> - Finer grained resource management
> >>>>>>>>   - Unified TaskExecutor Memory Configuration [10]
> >>>>>>>>   - Fine Grained Operator Resource Management [11]
> >>>>>>>>   - Dynamic Slots Allocation [12]
> >>>>>>>> - Finish scheduler re-architecture [13]
> >>>>>>>>   - Allows implementing more sophisticated scheduling
> >>>>>>>> strategies
> >>>> such
> >>>>> as
> >>>>>>>> better batch scheduler or speculative execution.
> >>>>>>>> - New DataStream Source Interface [14]
> >>>>>>>>   - A new source connector architecture to unify the
> >>> implementation
> >>>> of
> >>>>>>>> source connectors and make it simpler to implement custom
> >>>>>>>> source
> >>>>>> connectors.
> >>>>>>>> - Add more source/system metrics
> >>>>>>>>   - For better flink job monitoring and facilitate customized
> >>>>> solutions
> >>>>>>>> like auto-scaling.
> >>>>>>>> - Executor Interface / Client API [15]
> >>>>>>>>   - Allow Flink downstream projects to easier and better
> >>>>>>>> monitor
> >>> and
> >>>>>>>> control flink jobs.
> >>>>>>>> - Interactive Programming [16]
> >>>>>>>>   - Allow users to cache the intermediate results in Table API
> >> for
> >>>>> later
> >>>>>>>> usage to avoid redundant computation when a Flink application
> >>>> contains
> >>>>>>>> multiple jobs.
> >>>>>>>> - Python User Defined Function [17]
> >>>>>>>>   - Support native user-defined functions in Flink Python,
> >>> including
> >>>>>>>> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> >>>>>>>> - Spillable heap backend [18]
> >>>>>>>>   - A new state backend supporting automatic data spill and
> >>>>>>>> load
> >>>> when
> >>>>>>>> memory exhausted/regained.
> >>>>>>>> - RocksDB backend memory control [19]
> >>>>>>>>   - Prevent excessive memory usage from RocksDB, especially in
> >>>>> container
> >>>>>>>> environment.
> >>>>>>>> - Unaligned checkpoints [20]
> >>>>>>>>   - Resolve the checkpoint timeout issue under backpressure.
> >>>>>>>> - Separate framework and user class loader in per-job mode
> >>>>>>>> - Active Kubernetes Integration [21]
> >>>>>>>>   - Allow ResourceManager talking to Kubernetes to launch new
> >> pods
> >>>>>>>> similar to Flink's Yarn/Mesos integration
> >>>>>>>> - ML pipeline/library
> >>>>>>>>   - Aims at delivering several core algorithms, including
> >> Logistic
> >>>>>>>> Regression, Native Bayes, Random Forest, KMeans, etc.
> >>>>>>>> - Add vertex subtask log url on WebUI [22]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ** Suggested release timeline
> >>>>>>>>
> >>>>>>>> Based on our usual time-based release schedule [23], and
> >>> considering
> >>>>>> that
> >>>>>>>> several events, such as Flink Forward Europe and Asia, are
> >>>> overlapping
> >>>>>> with
> >>>>>>>> the current release cycle, we should aim at releasing 1.10
> >> around
> >>>> the
> >>>>>>>> beginning of January 2020. To give the community enough testing
> >>>> time,
> >>>>> I
> >>>>>>>> propose the feature freeze to be at the end of November. We
> >> should
> >>>>>> announce
> >>>>>>>> an
> >>>>>>>> exact date later in the release cycle.
> >>>>>>>>
> >>>>>>>> Lastly, I would like to use the opportunity to propose Yu Li
> >>>>>>>> and
> >>>>> myself
> >>>>>> as
> >>>>>>>> release managers for the upcoming release.
> >>>>>>>>
> >>>>>>>> What do you think?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Gary
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10725
> >>>>>>>> [4]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> >>>>>>>> [5]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> >>>>>>>> [6]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> >>>>>>>> [7]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> >>>>>>>> [8]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> >>>>>>>> [9] https://issues.apache.org/jira/browse/FLINK-11491
> >>>>>>>> [10]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >>>>>>>> [11]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>> [12]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> >>>>>>>> [13] https://issues.apache.org/jira/browse/FLINK-10429
> >>>>>>>> [14]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >>>>>>>> [15]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> >>>>>>>> [16]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >>>>>>>> [17]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>>>>>>> [18]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> >>>>>>>> [19] https://issues.apache.org/jira/browse/FLINK-7289
> >>>>>>>> [20]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> >>>>>>>> [21]
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> >>>>>>>> [22] https://issues.apache.org/jira/browse/FLINK-13894
> >>>>>>>> [23]
> >>>>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
>