Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component

Jiayi Liao
Hi yang,
 +1 for this proposal. Queryable state is a very common usage in our scenarios when we debug and query the realtime status in streaming process like CEP. And we’ve done a lot to improve the “user experience” of this feature like exposing the taskmanager’s proxy port in TaskManagerInfo.
 I’m looking forward to a more detailed and deeper discussion and I’d like to contribute back to the community on this.


Best Regards,
Jiayi Liao


Original Message
Sender:vino [hidden email]
Recipient:[hidden email]@flink.apache.org
Date:Friday, Apr 26, 2019 16:41
Subject:Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component


Hi Paul, Thanks for your reply. You are right, currently, the queryable state has few users. And I totally agree with you, it makes the streaming works more like a DB. About the architecture and the problem you concern: yes, it maybe affect the JobManager if they are deployed together. I think it's important to guarantee the JobManager's available and stability, and the QueryProxyServer is just a secondary service component. So when describing the role of the QueryProxyServer, I mentioned SLA policy, I think it's a solution. But the detail may need to be discussed. About starting queryable state client with a cmd, I think it's a good idea and valuable. Best, Vino. Paul Lam [hidden email] 于2019年4月26日周五 下午3:31写道:  Hi Vino,   Thanks a lot for bringing up the discussion! Queryable state has been at  beta version for a long time, and due to its complexity and instability I  think there are not many users, but there’s a great value in it which makes  state as database one step closer.   WRT the architecture, I’d vote for opt 3, because it fits the cloud  architecture the most and avoids putting more burdens on JM (sometimes the  queries could be slow and resources intensive). My concern is that on many  cluster frameworks the container resources are limited (IIUC, the JM and QS  are running in the same container), would JM gets killed if QS eats up too  much memory?   And a minor suggestion: can we introduce a cmd script to setup a  QueryableStateClient? That would be easier for users who wants to try out  this feature.   Best,  Paul Lam    在 2019年4月26日,11:09,vino yang [hidden email] 写道:     Hi Quan,     Thanks for your reply.     Actually, I did not try this way.     But, there are two factors we should consider:       1. The local state storage is not equals to RocksDB, otherwise Flink   does not need to provide a queryable state client. What's more,  querying   the RocksDB is still an address-explicit action.   2. IMO, the proposal's more valuable suggestion is to make the  queryable   state's architecture more reasonable, let it encapsulated more details  and   improve its scalability.     Best,   Vino         Shi Quan [hidden email] 于2019年4月26日周五 上午10:38写道:     Hi,     How about take states from RocksDB directly, in this case, TM host is   unnecessary.     Best     Quan Shi     ________________________________   From: vino yang [hidden email]   Sent: Thursday, April 25, 2019 10:18:20 PM   To: dev; user   Cc: Stefan Richter; Aljoscha Krettek; [hidden email]   Subject: [DISCUSS] Improve Queryable State and introduce a   QueryServerProxy component     Hi all,     I want to share my thought with you about improving the queryable state   and introducing a QueryServerProxy component.     I think the current queryable state's client is hard to use. Because it   needs users to know the TaskManager's address and proxy's port.  Actually,   some business users who do not have good knowledge about the Flink's  inner   or runtime in production. However, sometimes they need to query the  values   of states.     IMO, the reason caused this problem is because of the queryable state's   architecture. Currently, the queryable state clients interact with query   state client proxy components which host on each TaskManager. This  design   is difficult to encapsulate the point of change and exposes too much  detail   to the user.     My personal idea is that we could introduce a really queryable state   server, named e.g. QueryStateProxyServer which would delegate all the  query   state request and query the local registry then redirect the request to  the   specific QueryStateClientProxy(runs on each TaskManager). The server is  the   users really want to care about. And it would make the users ignorant to   the TaskManagers' address and proxies' port. The current   QueryStateClientProxy would become QueryStateProxyClient.     Generally speaking, the roles of the QueryStateProxyServer list below:       * works as all the query client's proxy to receive all the request  and   send response;   * a router to redirect the real query requests to the specific proxy   client;   * maintain route table registry (state - TaskManager,   TaskManager-proxy client address)   * more fine-granted control, such as cache result, ACL, TTL, SLA(rate   limit) and so on     About the implementation, there are three opts:     opt 1:     Let the JobManager acts as the query proxy server.     * pros: reuse the exists JM, do not need to introduce a new process   can reduce the complexity;   * cons: would make JM heavy burdens, depends on the query frequency,   may impact on the stability     [Screen Shot 2019-04-25 at 5.12.07 PM.png]     opt 2:     Introduce a new component which runs as a single process and acts as  the   query proxy server:       * pros: reduce the burdens and make the JM more stability   * cons: introduced a new component will make the implementation more   complexity     [Screen Shot 2019-04-25 at 5.14.05 PM.png]     opt 3 (suggestion comes from Stefan Richter):     Combining the two opts, the query server could run as a single entry   point(process) and integrate with JobManager.     If we keep it well encapsulated, the only difference would be how we   register new TMs with the query server in the different scenarios, in  JM we   might have this information already, in standalone e.g. the TMs be  started   with the query server address to register. This would give the  convenience   to start QS with the JM and the flexibility for power user to reduce  load   on their JM.     IMO, the queryable state is a very valuable feature. It can let users   query some real-time measure results. I hope it will get the attention  of   the community.     It is just a roughly thought. If it is valuable to the community, I will   give a design draft.     What's your opinion? Any feedback and comment are welcome!     Best,   Vino.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component

vino yang
Hi Jiayi,

Thanks for your reply and glad to hear that you have taken some effort for
it, the potential contribution is also welcome.

I also want to explore it in depth. Currently, let's listen to the
community's opinions.

Best,
Vino.

bupt_ljy <[hidden email]> 于2019年4月26日周五 下午9:54写道:

> Hi yang,
>  +1 for this proposal. Queryable state is a very common usage in our
> scenarios when we debug and query the realtime status in streaming process
> like CEP. And we’ve done a lot to improve the “user experience” of this
> feature like exposing the taskmanager’s proxy port in TaskManagerInfo.
>  I’m looking forward to a more detailed and deeper discussion and I’d like
> to contribute back to the community on this.
>
>
> Best Regards,
> Jiayi Liao
>
>
> Original Message
> Sender:vino [hidden email]
> Recipient:[hidden email]@flink.apache.org
> Date:Friday, Apr 26, 2019 16:41
> Subject:Re: [DISCUSS] Improve Queryable State and introduce
> aQueryServerProxy component
>
>
> Hi Paul, Thanks for your reply. You are right, currently, the queryable
> state has few users. And I totally agree with you, it makes the streaming
> works more like a DB. About the architecture and the problem you concern:
> yes, it maybe affect the JobManager if they are deployed together. I think
> it's important to guarantee the JobManager's available and stability, and
> the QueryProxyServer is just a secondary service component. So when
> describing the role of the QueryProxyServer, I mentioned SLA policy, I
> think it's a solution. But the detail may need to be discussed. About
> starting queryable state client with a cmd, I think it's a good idea and
> valuable. Best, Vino. Paul Lam [hidden email] 于2019年4月26日周五
> 下午3:31写道:  Hi Vino,   Thanks a lot for bringing up the discussion!
> Queryable state has been at  beta version for a long time, and due to its
> complexity and instability I  think there are not many users, but there’s a
> great value in it which makes  state as database one step closer.   WRT the
> architecture, I’d vote for opt 3, because it fits the cloud  architecture
> the most and avoids putting more burdens on JM (sometimes the  queries
> could be slow and resources intensive). My concern is that on many  cluster
> frameworks the container resources are limited (IIUC, the JM and QS  are
> running in the same container), would JM gets killed if QS eats up too
> much memory?   And a minor suggestion: can we introduce a cmd script to
> setup a  QueryableStateClient? That would be easier for users who wants to
> try out  this feature.   Best,  Paul Lam    在 2019年4月26日,11:09,vino yang
> [hidden email] 写道:     Hi Quan,     Thanks for your reply.
>  Actually, I did not try this way.     But, there are two factors we should
> consider:       1. The local state storage is not equals to RocksDB,
> otherwise Flink   does not need to provide a queryable state client. What's
> more,  querying   the RocksDB is still an address-explicit action.   2.
> IMO, the proposal's more valuable suggestion is to make the  queryable
>  state's architecture more reasonable, let it encapsulated more details
> and   improve its scalability.     Best,   Vino         Shi Quan
> [hidden email] 于2019年4月26日周五 上午10:38写道:     Hi,     How about take
> states from RocksDB directly, in this case, TM host is   unnecessary.
>  Best     Quan Shi     ________________________________   From: vino yang
> [hidden email]   Sent: Thursday, April 25, 2019 10:18:20 PM   To:
> dev; user   Cc: Stefan Richter; Aljoscha Krettek; [hidden email]
>  Subject: [DISCUSS] Improve Queryable State and introduce a
>  QueryServerProxy component     Hi all,     I want to share my thought with
> you about improving the queryable state   and introducing a
> QueryServerProxy component.     I think the current queryable state's
> client is hard to use. Because it   needs users to know the TaskManager's
> address and proxy's port.  Actually,   some business users who do not have
> good knowledge about the Flink's  inner   or runtime in production.
> However, sometimes they need to query the  values   of states.     IMO, the
> reason caused this problem is because of the queryable state's
>  architecture. Currently, the queryable state clients interact with query
>  state client proxy components which host on each TaskManager. This
> design   is difficult to encapsulate the point of change and exposes too
> much  detail   to the user.     My personal idea is that we could introduce
> a really queryable state   server, named e.g. QueryStateProxyServer which
> would delegate all the  query   state request and query the local registry
> then redirect the request to  the   specific QueryStateClientProxy(runs on
> each TaskManager). The server is  the   users really want to care about.
> And it would make the users ignorant to   the TaskManagers' address and
> proxies' port. The current   QueryStateClientProxy would become
> QueryStateProxyClient.     Generally speaking, the roles of the
> QueryStateProxyServer list below:       * works as all the query client's
> proxy to receive all the request  and   send response;   * a router to
> redirect the real query requests to the specific proxy   client;   *
> maintain route table registry (state - TaskManager,   TaskManager-proxy
> client address)   * more fine-granted control, such as cache result, ACL,
> TTL, SLA(rate   limit) and so on     About the implementation, there are
> three opts:     opt 1:     Let the JobManager acts as the query proxy
> server.     * pros: reuse the exists JM, do not need to introduce a new
> process   can reduce the complexity;   * cons: would make JM heavy burdens,
> depends on the query frequency,   may impact on the stability     [Screen
> Shot 2019-04-25 at 5.12.07 PM.png]     opt 2:     Introduce a new component
> which runs as a single process and acts as  the   query proxy server:
>  * pros: reduce the burdens and make the JM more stability   * cons:
> introduced a new component will make the implementation more   complexity
>    [Screen Shot 2019-04-25 at 5.14.05 PM.png]     opt 3 (suggestion comes
> from Stefan Richter):     Combining the two opts, the query server could
> run as a single entry   point(process) and integrate with JobManager.
>  If we keep it well encapsulated, the only difference would be how we
>  register new TMs with the query server in the different scenarios, in  JM
> we   might have this information already, in standalone e.g. the TMs be
> started   with the query server address to register. This would give the
> convenience   to start QS with the JM and the flexibility for power user to
> reduce  load   on their JM.     IMO, the queryable state is a very valuable
> feature. It can let users   query some real-time measure results. I hope it
> will get the attention  of   the community.     It is just a roughly
> thought. If it is valuable to the community, I will   give a design draft.
>    What's your opinion? Any feedback and comment are welcome!     Best,
>  Vino.