Hi folks,
Currently, the queryable state is not widely used in production. IMO, there are two key reasons caused this result. 1) the client of the queryable state is hard to use. Because it requires users to know the address of TaskManager and the port of the proxy. Actually, most business users who do not have good knowledge about the Flink's inner and runtime in production. 2) The benefit of this feature has not been excavated. In Flink DataStream API, State is the first level citizen, it’s Flink key advantage compared with other compute engines. Because the queryable state is the most effective way to pry the latest computing progress. Three months ago, I started a discussion about improving the queryable state and introducing a proxy component.[1] It brings a lot of attention and discussion. Recently, I have submitted a design document about the proposal.[2] These efforts try to process the first problem. About the second question, the most essential solution is that we should really make the queryable state work. The window operator is one of the most valuable and most frequently used operators of all Flink operators. And it also uses keyed state which is queryable. So we propose to let the state of the window operator be queried. This is not only for increasing the value of the queryable state but also for the real business needs. IMO, allowing window state to be queried will provide great value. In many scenarios, we often use large windows for aggregate calculations. A very common example is a day-level window that counts the PV of a day. But usually, the user is not only satisfied to wait until the end of the window to get the result. They want to get "intermediate results" at a smaller time granularity to analyze trends. Because Flink does not provide periodic triggers for fixed windows. We have extended this and implemented an "incremental window". It can trigger a fixed window with a smaller interval period and feedback intermediate results. However, we believe that this approach is still not flexible enough. We should let the user query the current calculation result of the window through the API at any time. However, I know that if we want to implement it, we still have some details that need to be discussed, such as how to let users know the state descriptors in the window, namespace and so on. This discussion thread is mainly to listen to the community's opinion on this proposal. Any feedback and ideas are welcome and appreciated. Best, Vino [1]: http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E [2]: https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing |
It’s a good idea to get the process information of large ongoing window.
+1 from my side. > 在 2019年7月4日,11:41,vino yang <[hidden email]> 写道: > > Hi folks, > > Currently, the queryable state is not widely used in production. IMO, there > are two key reasons caused this result. 1) the client of the queryable > state is hard to use. Because it requires users to know the address of > TaskManager and the port of the proxy. Actually, most business users who do > not have good knowledge about the Flink's inner and runtime in production. > 2) The benefit of this feature has not been excavated. In Flink DataStream > API, State is the first level citizen, it’s Flink key advantage compared > with other compute engines. Because the queryable state is the most > effective way to pry the latest computing progress. > > Three months ago, I started a discussion about improving the queryable > state and introducing a proxy component.[1] It brings a lot of attention > and discussion. Recently, I have submitted a design document about the > proposal.[2] These efforts try to process the first problem. > > About the second question, the most essential solution is that we should > really make the queryable state work. The window operator is one of the > most valuable and most frequently used operators of all Flink operators. > And it also uses keyed state which is queryable. So we propose to let the > state of the window operator be queried. This is not only for increasing > the value of the queryable state but also for the real business needs. > > IMO, allowing window state to be queried will provide great value. In many > scenarios, we often use large windows for aggregate calculations. A very > common example is a day-level window that counts the PV of a day. But > usually, the user is not only satisfied to wait until the end of the window > to get the result. They want to get "intermediate results" at a smaller > time granularity to analyze trends. Because Flink does not provide periodic > triggers for fixed windows. We have extended this and implemented an > "incremental window". It can trigger a fixed window with a smaller interval > period and feedback intermediate results. However, we believe that this > approach is still not flexible enough. We should let the user query the > current calculation result of the window through the API at any time. > > However, I know that if we want to implement it, we still have some details > that need to be discussed, such as how to let users know the state > descriptors in the window, namespace and so on. > > This discussion thread is mainly to listen to the community's opinion on > this proposal. > > Any feedback and ideas are welcome and appreciated. > > Best, > Vino > > [1]: > http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E > [2]: > https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing |
Hi all,
Thanks to Kostas for reminding me that as early as March 2017, the community had a thread called "Future of Queryable State Feature". [1] It has already discussed the queryable state and how to make the window state queryable. I still think it can offer many advantages, especially for Ad-Hoc. Best, Vino [1]: http://mail-archives.apache.org/mod_mbox/flink-dev/201703.mbox/%3C362C780C-9672-4DBD-B3F1-4EE7D1DB4CA6%40apache.org%3E mayozhang <[hidden email]> 于2019年7月4日周四 下午10:21写道: > It’s a good idea to get the process information of large ongoing window. > +1 from my side. > > > 在 2019年7月4日,11:41,vino yang <[hidden email]> 写道: > > > > Hi folks, > > > > Currently, the queryable state is not widely used in production. IMO, > there > > are two key reasons caused this result. 1) the client of the queryable > > state is hard to use. Because it requires users to know the address of > > TaskManager and the port of the proxy. Actually, most business users who > do > > not have good knowledge about the Flink's inner and runtime in > production. > > 2) The benefit of this feature has not been excavated. In Flink > DataStream > > API, State is the first level citizen, it’s Flink key advantage compared > > with other compute engines. Because the queryable state is the most > > effective way to pry the latest computing progress. > > > > Three months ago, I started a discussion about improving the queryable > > state and introducing a proxy component.[1] It brings a lot of attention > > and discussion. Recently, I have submitted a design document about the > > proposal.[2] These efforts try to process the first problem. > > > > About the second question, the most essential solution is that we should > > really make the queryable state work. The window operator is one of the > > most valuable and most frequently used operators of all Flink operators. > > And it also uses keyed state which is queryable. So we propose to let the > > state of the window operator be queried. This is not only for increasing > > the value of the queryable state but also for the real business needs. > > > > IMO, allowing window state to be queried will provide great value. In > many > > scenarios, we often use large windows for aggregate calculations. A very > > common example is a day-level window that counts the PV of a day. But > > usually, the user is not only satisfied to wait until the end of the > window > > to get the result. They want to get "intermediate results" at a smaller > > time granularity to analyze trends. Because Flink does not provide > periodic > > triggers for fixed windows. We have extended this and implemented an > > "incremental window". It can trigger a fixed window with a smaller > interval > > period and feedback intermediate results. However, we believe that this > > approach is still not flexible enough. We should let the user query the > > current calculation result of the window through the API at any time. > > > > However, I know that if we want to implement it, we still have some > details > > that need to be discussed, such as how to let users know the state > > descriptors in the window, namespace and so on. > > > > This discussion thread is mainly to listen to the community's opinion on > > this proposal. > > > > Any feedback and ideas are welcome and appreciated. > > > > Best, > > Vino > > > > [1]: > > > http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3Ctencent_35A56D6858408BE2E2064722@...%3E > > [2]: > > > https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing > > > |
Free forum by Nabble | Edit this page |