(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Make window state queryable

Classic

List

Threaded

3 messages Options

vino yang

[DISCUSS] Make window state queryable

Hi folks,

Currently, the queryable state is not widely used in production. IMO, there
are two key reasons caused this result. 1) the client of the queryable
state is hard to use. Because it requires users to know the address of
TaskManager and the port of the proxy. Actually, most business users who do
not have good knowledge about the Flink's inner and runtime in production.
2) The benefit of this feature has not been excavated. In Flink DataStream
API, State is the first level citizen, it’s Flink key advantage compared
with other compute engines. Because the queryable state is the most
effective way to pry the latest computing progress.

Three months ago, I started a discussion about improving the queryable
state and introducing a proxy component.[1] It brings a lot of attention
and discussion. Recently, I have submitted a design document about the
proposal.[2] These efforts try to process the first problem.

About the second question, the most essential solution is that we should
really make the queryable state work. The window operator is one of the
most valuable and most frequently used operators of all Flink operators.
And it also uses keyed state which is queryable. So we propose to let the
state of the window operator be queried. This is not only for increasing
the value of the queryable state but also for the real business needs.

IMO, allowing window state to be queried will provide great value. In many
scenarios, we often use large windows for aggregate calculations. A very
common example is a day-level window that counts the PV of a day. But
usually, the user is not only satisfied to wait until the end of the window
to get the result. They want to get "intermediate results" at a smaller
time granularity to analyze trends. Because Flink does not provide periodic
triggers for fixed windows. We have extended this and implemented an
"incremental window". It can trigger a fixed window with a smaller interval
period and feedback intermediate results. However, we believe that this
approach is still not flexible enough. We should let the user query the
current calculation result of the window through the API at any time.

However, I know that if we want to implement it, we still have some details
that need to be discussed, such as how to let users know the state
descriptors in the window, namespace and so on.

This discussion thread is mainly to listen to the community's opinion on
this proposal.

Any feedback and ideas are welcome and appreciated.

Best,
Vino

[1]:
http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E
[2]:
https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing

mayo zhang

Re: [DISCUSS] Make window state queryable

It’s a good idea to get the process information of large ongoing window.
+1 from my side.

> 在 2019年7月4日，11:41，vino yang <[hidden email]> 写道：
>
> Hi folks,
>
> Currently, the queryable state is not widely used in production. IMO, there
> are two key reasons caused this result. 1) the client of the queryable
> state is hard to use. Because it requires users to know the address of
> TaskManager and the port of the proxy. Actually, most business users who do
> not have good knowledge about the Flink's inner and runtime in production.
> 2) The benefit of this feature has not been excavated. In Flink DataStream
> API, State is the first level citizen, it’s Flink key advantage compared
> with other compute engines. Because the queryable state is the most
> effective way to pry the latest computing progress.
>
> Three months ago, I started a discussion about improving the queryable
> state and introducing a proxy component.[1] It brings a lot of attention
> and discussion. Recently, I have submitted a design document about the
> proposal.[2] These efforts try to process the first problem.
>
> About the second question, the most essential solution is that we should
> really make the queryable state work. The window operator is one of the
> most valuable and most frequently used operators of all Flink operators.
> And it also uses keyed state which is queryable. So we propose to let the
> state of the window operator be queried. This is not only for increasing
> the value of the queryable state but also for the real business needs.
>
> IMO, allowing window state to be queried will provide great value. In many
> scenarios, we often use large windows for aggregate calculations. A very
> common example is a day-level window that counts the PV of a day. But
> usually, the user is not only satisfied to wait until the end of the window
> to get the result. They want to get "intermediate results" at a smaller
> time granularity to analyze trends. Because Flink does not provide periodic
> triggers for fixed windows. We have extended this and implemented an
> "incremental window". It can trigger a fixed window with a smaller interval
> period and feedback intermediate results. However, we believe that this
> approach is still not flexible enough. We should let the user query the
> current calculation result of the window through the API at any time.
>
> However, I know that if we want to implement it, we still have some details
> that need to be discussed, such as how to let users know the state
> descriptors in the window, namespace and so on.
>
> This discussion thread is mainly to listen to the community's opinion on
> this proposal.
>
> Any feedback and ideas are welcome and appreciated.
>
> Best,
> Vino
>
> [1]:
> http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E
> [2]:
> https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing

vino yang

Re: [DISCUSS] Make window state queryable

Hi all,

Thanks to Kostas for reminding me that as early as March 2017, the
community had a thread called "Future of Queryable State Feature". [1]

It has already discussed the queryable state and how to make the window
state queryable. I still think it can offer many advantages, especially for
Ad-Hoc.

Best,
Vino

[1]:
http://mail-archives.apache.org/mod_mbox/flink-dev/201703.mbox/%3C362C780C-9672-4DBD-B3F1-4EE7D1DB4CA6%40apache.org%3E

mayozhang <[hidden email]> 于2019年7月4日周四下午10:21写道：

> It’s a good idea to get the process information of large ongoing window.
> +1 from my side.
>
> > 在 2019年7月4日，11:41，vino yang <[hidden email]> 写道：
> >
> > Hi folks,
> >
> > Currently, the queryable state is not widely used in production. IMO,
> there
> > are two key reasons caused this result. 1) the client of the queryable
> > state is hard to use. Because it requires users to know the address of
> > TaskManager and the port of the proxy. Actually, most business users who
> do
> > not have good knowledge about the Flink's inner and runtime in
> production.
> > 2) The benefit of this feature has not been excavated. In Flink
> DataStream
> > API, State is the first level citizen, it’s Flink key advantage compared
> > with other compute engines. Because the queryable state is the most
> > effective way to pry the latest computing progress.
> >
> > Three months ago, I started a discussion about improving the queryable
> > state and introducing a proxy component.[1] It brings a lot of attention
> > and discussion. Recently, I have submitted a design document about the
> > proposal.[2] These efforts try to process the first problem.
> >
> > About the second question, the most essential solution is that we should
> > really make the queryable state work. The window operator is one of the
> > most valuable and most frequently used operators of all Flink operators.
> > And it also uses keyed state which is queryable. So we propose to let the
> > state of the window operator be queried. This is not only for increasing
> > the value of the queryable state but also for the real business needs.
> >
> > IMO, allowing window state to be queried will provide great value. In
> many
> > scenarios, we often use large windows for aggregate calculations. A very
> > common example is a day-level window that counts the PV of a day. But
> > usually, the user is not only satisfied to wait until the end of the
> window
> > to get the result. They want to get "intermediate results" at a smaller
> > time granularity to analyze trends. Because Flink does not provide
> periodic
> > triggers for fixed windows. We have extended this and implemented an
> > "incremental window". It can trigger a fixed window with a smaller
> interval
> > period and feedback intermediate results. However, we believe that this
> > approach is still not flexible enough. We should let the user query the
> > current calculation result of the window through the API at any time.
> >
> > However, I know that if we want to implement it, we still have some
> details
> > that need to be discussed, such as how to let users know the state
> > descriptors in the window, namespace and so on.
> >
> > This discussion thread is mainly to listen to the community's opinion on
> > this proposal.
> >
> > Any feedback and ideas are welcome and appreciated.
> >
> > Best,
> > Vino
> >
> > [1]:
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E
> > [2]:
> >
> https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing
>
>
>