Hi there,
I am very happy about Flink 1.2 release. It was much more robust and feature rich compare to previous versions. In the following section, I would like to discuss a non typical use case in flink community. With ever increasing popularity of micro services[1] to scale out popular online services. Various aspect of source of truth is stored (a.k.a partitioned) behind various of service rpc endpoints. There is a general need of managing events traversal and enrichment throughout org SOA systems. (SOA) It is no longer part of data infrastructure scope, where traditionally known as batched slow and analytic(small % lossy is okay). Flink might also find a fit into core services as well. It's part of online production services, serving directly from mobile client events more importantly services database post commit logs and orchestrate adhoc stream toplogies to transform and transfer between online services(usually backed by databases and serving request response with stragent latency requirement) Use case: user updates comes from mobile client via kafka topic, which consumed both by user service as well as streaming job. When streaming job do RPC and trying to enrich user information, it cause race condition which turns out database persistence is not as speedy as streaming job. In general, streaming job should consume user service commit logs instead of karfka topic which defines as source of truth in term of user information. Is there a general way to couple with these issues? P.S I was able to build task manager as jar package and deployed to production environment. Instead of using YARN to manage warehouse machines. Utilize same deployment environment as other online services as docker. So far, it seems running smoothly. Thanks, Chen [1] https://en.wikipedia.org/wiki/Microservices [2] https://martinfowler.com/eaaDev/EventSourcing.html |
Hi, Chen Qin
We also met your end-to-end use case. A RPC Source and Sink such as netty source sink can fit such requirements. I’ve submit a natty module in bahir-flink project which only a demo. If use connector source instead of Kafka, how do we make the data persistent? One choice is distributedlog project developed by twitter. The idea of micro service is very good. Playframework is better choice to provide micro-service of Flink instead of Flink Monitor which implemented by netty. Submit Flink job in the Mesos cluster, at the same time deploy the micro-service by marathon to the same Mesos cluster, and enable mesos-dns for service discovery. The the micro-service can be a API Gateway for: 1. receiving data from device 2. Sending the data to the Flink Job Source(Netty Source with distributedlog) 3. At same time, the sink send the streaming result data to the API Gateway 4. API Gateway support streaming invoke: send the sink result data to the device channel So this plan can guarantee the end-user invoke the service synchronized, and don’t care about Flink Job’s data processing. By the way, X as a Service actually is called by SAAS/PAAS in the cloud platform, such as AWS/Azure. We can call it Flink micro service.:) Best Regards Jinkui Shi 在 2017/3/14 下午2:13, "Chen Qin" <[hidden email]> 写入: >Hi there, > >I am very happy about Flink 1.2 release. It was much more robust and >feature rich compare to previous versions. In the following section, I >would like to discuss a non typical use case in flink community. > >With ever increasing popularity of micro services[1] to scale out popular >online services. Various aspect of source of truth is stored (a.k.a >partitioned) behind various of service rpc endpoints. There is a general >need of managing events traversal and enrichment throughout org SOA >systems. (SOA) It is no longer part of data infrastructure scope, where >traditionally known as batched slow and analytic(small % lossy is okay). >Flink might also find a fit into core services as well. > >It's part of online production services, serving directly from mobile >client events more importantly services database post commit logs and >orchestrate adhoc stream toplogies to transform and transfer between >online >services(usually backed by databases and serving request response with >stragent latency requirement) > >Use case: >user updates comes from mobile client via kafka topic, which consumed both >by user service as well as streaming job. When streaming job do RPC and >trying to enrich user information, it cause race condition which turns out >database persistence is not as speedy as streaming job. > >In general, streaming job should consume user service commit logs instead >of karfka topic which defines as source of truth in term of user >information. Is there a general way to couple with these issues? > >P.S I was able to build task manager as jar package and deployed to >production environment. Instead of using YARN to manage warehouse >machines. >Utilize same deployment environment as other online services as docker. So >far, it seems running smoothly. > >Thanks, >Chen > > >[1] https://en.wikipedia.org/wiki/Microservices >[2] https://martinfowler.com/eaaDev/EventSourcing.html |
Hi jinkui,
I haven't go down to that deep yet. Sounds like you have better idea what needs to be in place. Can you try to come up with a doc and may be draw some diagram so we can discuss from there? My original intention is to discuss general function gap of running lots of micro services(like thousands of services as I observed). I feel flink low level has potential to fit in to highly critical services space and do good job fill those gaps. mobile apps ----------------------------------- front end request router -------------------------------------- service A | service B | service C database A |database B| database C --------------------------------------- Flink as a service ---------------------------------------- service D | service E |service F database D | database E |database F Thanks, Chen On Tue, Mar 14, 2017 at 12:01 AM, shijinkui <[hidden email]> wrote: > Hi, Chen Qin > > We also met your end-to-end use case. A RPC Source and Sink such as netty > source sink can fit such requirements. I’ve submit a natty module in > bahir-flink project which only a demo. > If use connector source instead of Kafka, how do we make the data > persistent? One choice is distributedlog project developed by twitter. > > The idea of micro service is very good. Playframework is better choice to > provide micro-service of Flink instead of Flink Monitor which implemented > by netty. > Submit Flink job in the Mesos cluster, at the same time deploy the > micro-service by marathon to the same Mesos cluster, and enable mesos-dns > for service discovery. > > The the micro-service can be a API Gateway for: > 1. receiving data from device > 2. Sending the data to the Flink Job Source(Netty Source with > distributedlog) > 3. At same time, the sink send the streaming result data to the API Gateway > 4. API Gateway support streaming invoke: send the sink result data to the > device channel > > So this plan can guarantee the end-user invoke the service synchronized, > and don’t care about Flink Job’s data processing. > > By the way, X as a Service actually is called by SAAS/PAAS in the cloud > platform, such as AWS/Azure. We can call it Flink micro service.:) > > Best Regards > Jinkui Shi > > 在 2017/3/14 下午2:13, "Chen Qin" <[hidden email]> 写入: > > >Hi there, > > > >I am very happy about Flink 1.2 release. It was much more robust and > >feature rich compare to previous versions. In the following section, I > >would like to discuss a non typical use case in flink community. > > > >With ever increasing popularity of micro services[1] to scale out popular > >online services. Various aspect of source of truth is stored (a.k.a > >partitioned) behind various of service rpc endpoints. There is a general > >need of managing events traversal and enrichment throughout org SOA > >systems. (SOA) It is no longer part of data infrastructure scope, where > >traditionally known as batched slow and analytic(small % lossy is okay). > >Flink might also find a fit into core services as well. > > > >It's part of online production services, serving directly from mobile > >client events more importantly services database post commit logs and > >orchestrate adhoc stream toplogies to transform and transfer between > >online > >services(usually backed by databases and serving request response with > >stragent latency requirement) > > > >Use case: > >user updates comes from mobile client via kafka topic, which consumed both > >by user service as well as streaming job. When streaming job do RPC and > >trying to enrich user information, it cause race condition which turns out > >database persistence is not as speedy as streaming job. > > > >In general, streaming job should consume user service commit logs instead > >of karfka topic which defines as source of truth in term of user > >information. Is there a general way to couple with these issues? > > > >P.S I was able to build task manager as jar package and deployed to > >production environment. Instead of using YARN to manage warehouse > >machines. > >Utilize same deployment environment as other online services as docker. So > >far, it seems running smoothly. > > > >Thanks, > >Chen > > > > > >[1] https://en.wikipedia.org/wiki/Microservices > >[2] https://martinfowler.com/eaaDev/EventSourcing.html > > |
Hi,
I propose that we consider also the type of connectivity to be supported in the Flink API Gateway. I would propose to support a couple of calls option to ingest also events. I am thinking of: - callback mechanism - REST - RPC -----Original Message----- From: Chen Qin [mailto:[hidden email]] Sent: Wednesday, March 15, 2017 7:31 PM To: [hidden email] Subject: Re: Flink as a Service (FaaS) Hi jinkui, I haven't go down to that deep yet. Sounds like you have better idea what needs to be in place. Can you try to come up with a doc and may be draw some diagram so we can discuss from there? My original intention is to discuss general function gap of running lots of micro services(like thousands of services as I observed). I feel flink low level has potential to fit in to highly critical services space and do good job fill those gaps. mobile apps ----------------------------------- front end request router -------------------------------------- service A | service B | service C database A |database B| database C --------------------------------------- Flink as a service ---------------------------------------- service D | service E |service F database D | database E |database F Thanks, Chen On Tue, Mar 14, 2017 at 12:01 AM, shijinkui <[hidden email]> wrote: > Hi, Chen Qin > > We also met your end-to-end use case. A RPC Source and Sink such as > netty source sink can fit such requirements. I’ve submit a natty > module in bahir-flink project which only a demo. > If use connector source instead of Kafka, how do we make the data > persistent? One choice is distributedlog project developed by twitter. > > The idea of micro service is very good. Playframework is better choice > to provide micro-service of Flink instead of Flink Monitor which > implemented by netty. > Submit Flink job in the Mesos cluster, at the same time deploy the > micro-service by marathon to the same Mesos cluster, and enable > mesos-dns for service discovery. > > The the micro-service can be a API Gateway for: > 1. receiving data from device > 2. Sending the data to the Flink Job Source(Netty Source with > distributedlog) > 3. At same time, the sink send the streaming result data to the API > Gateway 4. API Gateway support streaming invoke: send the sink result > data to the device channel > > So this plan can guarantee the end-user invoke the service > synchronized, > and don’t care about Flink Job’s data processing. > > By the way, X as a Service actually is called by SAAS/PAAS in the > cloud platform, such as AWS/Azure. We can call it Flink micro > service.:) > > Best Regards > Jinkui Shi > > 在 2017/3/14 下午2:13, "Chen Qin" <[hidden email]> 写入: > > >Hi there, > > > >I am very happy about Flink 1.2 release. It was much more robust and > >feature rich compare to previous versions. In the following section, > >I would like to discuss a non typical use case in flink community. > > > >With ever increasing popularity of micro services[1] to scale out > >popular online services. Various aspect of source of truth is stored > >(a.k.a > >partitioned) behind various of service rpc endpoints. There is a > >general need of managing events traversal and enrichment throughout > >org SOA systems. (SOA) It is no longer part of data infrastructure > >scope, where traditionally known as batched slow and analytic(small % lossy is okay). > >Flink might also find a fit into core services as well. > > > >It's part of online production services, serving directly from mobile > >client events more importantly services database post commit logs and > >orchestrate adhoc stream toplogies to transform and transfer between > >online services(usually backed by databases and serving request > >response with stragent latency requirement) > > > >Use case: > >user updates comes from mobile client via kafka topic, which consumed > >both by user service as well as streaming job. When streaming job do > >RPC and trying to enrich user information, it cause race condition > >which turns out database persistence is not as speedy as streaming job. > > > >In general, streaming job should consume user service commit logs > >instead of karfka topic which defines as source of truth in term of > >user information. Is there a general way to couple with these issues? > > > >P.S I was able to build task manager as jar package and deployed to > >production environment. Instead of using YARN to manage warehouse > >machines. > >Utilize same deployment environment as other online services as > >docker. So far, it seems running smoothly. > > > >Thanks, > >Chen > > > > > >[1] https://en.wikipedia.org/wiki/Microservices > >[2] https://martinfowler.com/eaaDev/EventSourcing.html > > |
Hi Radu/jinkui,
Thanks for your input! I filed a master task to track discussion on this front https://issues.apache.org/jira/browse/FLINK-6085 Since this is a very broad topic, I would like to kick start with a tiny deployment helper project. What it try to address is to leverage various of service continuous deployment pipelines in various of companies (amazon/facebook/uber) and deploy/update jobmanager/taskmanagers as a high available micro service (via zk and aws s3) I run this service in prod for a month (2 dc, 2 job managers per dc, 8-64 task managers per dc depending on workload) for testing usage. Haven't seen problem so far. https://github.com/chenqin/flink-jar Thanks, Chen On Thu, Mar 16, 2017 at 2:05 AM, Radu Tudoran <[hidden email]> wrote: > Hi, > > I propose that we consider also the type of connectivity to be supported > in the Flink API Gateway. I would propose to support a couple of calls > option to ingest also events. I am thinking of: > - callback mechanism > - REST > - RPC > > > > > -----Original Message----- > From: Chen Qin [mailto:[hidden email]] > Sent: Wednesday, March 15, 2017 7:31 PM > To: [hidden email] > Subject: Re: Flink as a Service (FaaS) > > Hi jinkui, > > I haven't go down to that deep yet. Sounds like you have better idea what > needs to be in place. > Can you try to come up with a doc and may be draw some diagram so we can > discuss from there? > > My original intention is to discuss general function gap of running lots > of micro services(like thousands of services as I observed). I feel flink > low level has potential to fit in to highly critical services space and do > good job fill those gaps. > > > mobile apps > ----------------------------------- > front end request router > -------------------------------------- > service A | service B | service C > database A |database B| database C > --------------------------------------- > Flink as a service > ---------------------------------------- > service D | service E |service F > database D | database E |database F > > Thanks, > Chen > > > > > > On Tue, Mar 14, 2017 at 12:01 AM, shijinkui <[hidden email]> wrote: > > > Hi, Chen Qin > > > > We also met your end-to-end use case. A RPC Source and Sink such as > > netty source sink can fit such requirements. I’ve submit a natty > > module in bahir-flink project which only a demo. > > If use connector source instead of Kafka, how do we make the data > > persistent? One choice is distributedlog project developed by twitter. > > > > The idea of micro service is very good. Playframework is better choice > > to provide micro-service of Flink instead of Flink Monitor which > > implemented by netty. > > Submit Flink job in the Mesos cluster, at the same time deploy the > > micro-service by marathon to the same Mesos cluster, and enable > > mesos-dns for service discovery. > > > > The the micro-service can be a API Gateway for: > > 1. receiving data from device > > 2. Sending the data to the Flink Job Source(Netty Source with > > distributedlog) > > 3. At same time, the sink send the streaming result data to the API > > Gateway 4. API Gateway support streaming invoke: send the sink result > > data to the device channel > > > > So this plan can guarantee the end-user invoke the service > > synchronized, > > and don’t care about Flink Job’s data processing. > > > > By the way, X as a Service actually is called by SAAS/PAAS in the > > cloud platform, such as AWS/Azure. We can call it Flink micro > > service.:) > > > > Best Regards > > Jinkui Shi > > > > 在 2017/3/14 下午2:13, "Chen Qin" <[hidden email]> 写入: > > > > >Hi there, > > > > > >I am very happy about Flink 1.2 release. It was much more robust and > > >feature rich compare to previous versions. In the following section, > > >I would like to discuss a non typical use case in flink community. > > > > > >With ever increasing popularity of micro services[1] to scale out > > >popular online services. Various aspect of source of truth is stored > > >(a.k.a > > >partitioned) behind various of service rpc endpoints. There is a > > >general need of managing events traversal and enrichment throughout > > >org SOA systems. (SOA) It is no longer part of data infrastructure > > >scope, where traditionally known as batched slow and analytic(small % > lossy is okay). > > >Flink might also find a fit into core services as well. > > > > > >It's part of online production services, serving directly from mobile > > >client events more importantly services database post commit logs and > > >orchestrate adhoc stream toplogies to transform and transfer between > > >online services(usually backed by databases and serving request > > >response with stragent latency requirement) > > > > > >Use case: > > >user updates comes from mobile client via kafka topic, which consumed > > >both by user service as well as streaming job. When streaming job do > > >RPC and trying to enrich user information, it cause race condition > > >which turns out database persistence is not as speedy as streaming job. > > > > > >In general, streaming job should consume user service commit logs > > >instead of karfka topic which defines as source of truth in term of > > >user information. Is there a general way to couple with these issues? > > > > > >P.S I was able to build task manager as jar package and deployed to > > >production environment. Instead of using YARN to manage warehouse > > >machines. > > >Utilize same deployment environment as other online services as > > >docker. So far, it seems running smoothly. > > > > > >Thanks, > > >Chen > > > > > > > > >[1] https://en.wikipedia.org/wiki/Microservices > > >[2] https://martinfowler.com/eaaDev/EventSourcing.html > > > > > |
Quick capture comments on FLINK-6085, we want to have rpc source that accept requests from clients and reroute response (callback to corresponding rpc source) On Tue, Mar 21, 2017 at 10:47 PM, Chen Qin <[hidden email]> wrote:
|
Here is a working draft doc, feel free to comment out :) On Thu, Mar 23, 2017 at 5:00 PM, Chen Qin <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |