Hello
Firstly, I am an absolute Flink newbie. I am interested in using Drools in Flink - in a similar case to what is described in this blog, where Drools is used in Spark. http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ The basic idea is that Drools can be used to reason over streaming data. The high-level use case is, there are several hundred users who want to write rules to be notified on events related to changes in specific streams. For example, notify me when a specific stock price changes by so much. Due to the number of users, the more end-user focused syntax of Drools, and the number of rule changes, doing this in Drools makes more sense than to write and deploy plain-old-flink (apologies, am not familiar with the correct term for a flink process). Also, as Drools has some powerful CEP operators, it could be very interested to have these available in Flink too. My question, therefore, is, very general - how best to integrate Drools into Flink? Where should I start? Thanks and regards Anton |
Hi Anton,
I think embedding Drools into Flink should be doable. Drools seems to be implemented in Java, so you can probably call the engine from Flink. I would start putting a flatMap() operator into a stream. In the operator, I would start the Drools engine (probably in the open() method) and then give the individual records to the engine. Regards, Robert On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote: > Hello > > Firstly, I am an absolute Flink newbie. > > I am interested in using Drools in Flink - in a similar case to what is > described in this blog, where Drools is used in Spark. > > http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ > > The basic idea is that Drools can be used to reason over streaming data. > > The high-level use case is, there are several hundred users who want to > write rules to be notified on events related to changes in specific > streams. For example, notify me when a specific stock price changes by so > much. > > Due to the number of users, the more end-user focused syntax of Drools, and > the number of rule changes, doing this in Drools makes more sense than to > write and deploy plain-old-flink (apologies, am not familiar with the > correct term for a flink process). > > Also, as Drools has some powerful CEP operators, it could be very > interested to have these available in Flink too. > > My question, therefore, is, very general - how best to integrate Drools > into Flink? Where should I start? > > Thanks and regards > Anton > |
Hi,
you can also take a look at this: https://techblog.king.com/rbea-scalable-real-time-analytics-king/. From a high level it seems Drools implements something similar to the system that they developed on top of Flink. Cheers, Aljoscha On Tue, 21 Jun 2016 at 12:04 Robert Metzger <[hidden email]> wrote: > Hi Anton, > > I think embedding Drools into Flink should be doable. Drools seems to be > implemented in Java, so you can probably call the engine from Flink. > I would start putting a flatMap() operator into a stream. In the operator, > I would start the Drools engine (probably in the open() method) and then > give the individual records to the engine. > > Regards, > Robert > > > On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote: > > > Hello > > > > Firstly, I am an absolute Flink newbie. > > > > I am interested in using Drools in Flink - in a similar case to what is > > described in this blog, where Drools is used in Spark. > > > > > http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ > > > > The basic idea is that Drools can be used to reason over streaming data. > > > > The high-level use case is, there are several hundred users who want to > > write rules to be notified on events related to changes in specific > > streams. For example, notify me when a specific stock price changes by so > > much. > > > > Due to the number of users, the more end-user focused syntax of Drools, > and > > the number of rule changes, doing this in Drools makes more sense than to > > write and deploy plain-old-flink (apologies, am not familiar with the > > correct term for a flink process). > > > > Also, as Drools has some powerful CEP operators, it could be very > > interested to have these available in Flink too. > > > > My question, therefore, is, very general - how best to integrate Drools > > into Flink? Where should I start? > > > > Thanks and regards > > Anton > > > |
In reply to this post by Anton
Hi,
I've been using drools in a few projects. Embedding Drools in Flink is certainly possible, however there are a few points you have to consider. - are you going to use Drools in stateful or stateless way? The blog post you mention uses flink in stateless way - that is, each event is processed separately and each drools session behaves like a bunch of if's. This is straightforward, it also shouldn't be problematic performance wise OTOH, if you want to use stateful sessions things are getting more complicated - because if you want to play well with Flink you'd have to integrate drools sessions with checkpoints in Flink - and that can be a bit more tricky (although certainly possible) - especially when it comes to size of session, partitioning and so on. Even more difficult would be to use Drools CEP features - because then you'd have to consider how to handle time (e.g. during restore??). To be honest, I wouldn't try to integrate Drools CEP with Flink - Flink has it's own time handling, own CEP engine and mixing those two can lead to tricky issues... Even if you decide to use stateless sessions there's issue with deployment. If you don't want to redeploy Flink process each time one of the users change rules, you'd have to implement some notion of detecting changes (and storing rules somewhere?) and recompiling rules on each task manager. Again - this is certainly doable and possible, but requires some thinking about the design. To sum up - there are many ways you can integrate drools with flink. I'd start with something simple - exactly as Robert wrote hope this helps a bit and good luck, maciek On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote: > Hello > > Firstly, I am an absolute Flink newbie. > > I am interested in using Drools in Flink - in a similar case to what is > described in this blog, where Drools is used in Spark. > > http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ > > The basic idea is that Drools can be used to reason over streaming data. > > The high-level use case is, there are several hundred users who want to > write rules to be notified on events related to changes in specific > streams. For example, notify me when a specific stock price changes by so > much. > > Due to the number of users, the more end-user focused syntax of Drools, and > the number of rule changes, doing this in Drools makes more sense than to > write and deploy plain-old-flink (apologies, am not familiar with the > correct term for a flink process). > > Also, as Drools has some powerful CEP operators, it could be very > interested to have these available in Flink too. > > My question, therefore, is, very general - how best to integrate Drools > into Flink? Where should I start? > > Thanks and regards > Anton > |
Thanks Maiek
On Wed, Jun 22, 2016 at 9:00 PM, Maciek Próchniak <[hidden email]> wrote: > This is straightforward, it also shouldn't be problematic performance wise > OTOH, if you want to use stateful sessions things are getting more > complicated - because if you want to play well with Flink you'd have to > integrate drools sessions with checkpoints in Flink - and that can be a bit > more tricky (although certainly possible) > Yes, this is exactly the point I wanted to discuss. And I know Drools stateful can be tricky even without using Flink. However, I was thinking - and am interested to know if this is possible - if it would be possible to do a puedo stateful drools but passing the state from flink into drools stateless. Does that make sense? And would it be possible? Thanks |
Hi Anton,
you mean - keeping working memory facts in Flink state and with each event throw them into stateless session? Can you elaborate a bit on your use case? What would be the state? Would it be for example some event aggregations, which would be filtered by the rules? maciek On 22/06/2016 21:19, Anton wrote: > Thanks Maiek > On Wed, Jun 22, 2016 at 9:00 PM, Maciek Próchniak <[hidden email]> wrote: > >> This is straightforward, it also shouldn't be problematic performance wise >> OTOH, if you want to use stateful sessions things are getting more >> complicated - because if you want to play well with Flink you'd have to >> integrate drools sessions with checkpoints in Flink - and that can be a bit >> more tricky (although certainly possible) >> > Yes, this is exactly the point I wanted to discuss. And I know Drools > stateful can be tricky even without using Flink. > > However, I was thinking - and am interested to know if this is possible - > if it would be possible to do a puedo stateful drools but passing the state > from flink into drools stateless. Does that make sense? And would it be > possible? > > Thanks > |
Hi Maciek
Firstly, thanks for your replies and interest. It is really appreciated. I am familiar with Drools but am new to Flink. Whereas you seem familiar with both - so your feedback is really appreciated. On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <[hidden email]> wrote: > > you mean - keeping working memory facts in Flink state and with each event > throw them into stateless session? > Yes kind of. > Can you elaborate a bit on your use case? What would be the state? > *Use case* Flink ingests a large amount of tweets. Application users can write rules to reason of tweet contents and frequency. For example, user John writes a rule that says: *When I receive 3 tweets about Apache Flink within 1 day* *Then Flink is popular* *Fire new Popular("Flink") event* *When I have not received a tweet about Apache Attic project ___ in one month* *then this project is not popular* *Fire new UnPopular("Attic project") event* Another important aspect of the use-case is the frequency of rule changes/updates. There will be many users writing rules to reason over incoming events - and these rules are expected to change often. The main goal is to make Drools CEP stateful. Part of the challenge is ensuring that new messages are routed to the correct drools session. But, as Flink already seems to have scaling and state well done - it would be nice to combine the two. Please have a look at http://blog.athico.com/2016/03/high-availability-drools-stateless.html Also, I cannot comment on the strengths and weaknesses of Flink CEP, but Drools CEP is certainly very mature - and when combined with the rule language - very powerful. To be able to combine the statefulness of Flink with Drools would be amazing! The concern with using Stateful sessions is the possibility of memory leaks. However, a stateless session is just a wrapper around stateless - which calls dispose. So this makes me think, perhaps, when populating the working memory, just extract all relevant events from the Flink state into the Drools WM - and it could even be stateful - and then call displose after calling fireAllRules. And do this each time a new message/event arrives. Would it be for example some event aggregations, which would be filtered by > the rules? Yes. |
Hi Anton,
I think I start to understand what you want to achieve. I would be a very interesting thing to do ;) If the drools are the piece that decides what is interesting and what is not, then probably also drools should be responisble for keeping state/aggregates - otherwise Flink would keep too much stuff in memory. I think that serializing session wouldn't be impossible. I still wonder how to deal with clocks in Drools CEP - maybe the clock should be moved by incoming events - but then there are problems with out of order ones, or only by watermarks - but then you have worse latency. One thing that is still not so clear to me is - how do you want to partition the stream of tweets? I guess that it cannot be done by some tweet property - e.g. author, because otherwise users would have many sessions and aggregations would not be correct? I may be wrong, but I think that to leverage Flink scaling/state capabilities the processing logic should be somehow Flink-aware - that is, Flink should understand (at least partially) what are specific rules... maciek On 23/06/2016 16:54, Anton wrote: > Hi Maciek > > Firstly, thanks for your replies and interest. It is really appreciated. I > am familiar with Drools but am new to Flink. Whereas you seem familiar with > both - so your feedback is really appreciated. > > On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <[hidden email]> wrote: > >> you mean - keeping working memory facts in Flink state and with each event >> throw them into stateless session? >> > Yes kind of. > >> Can you elaborate a bit on your use case? What would be the state? >> > *Use case* > Flink ingests a large amount of tweets. > Application users can write rules to reason of tweet contents and > frequency. > For example, user John writes a rule that says: > > *When I receive 3 tweets about Apache Flink within 1 day* > > *Then Flink is popular* > > *Fire new Popular("Flink") event* > > > > *When I have not received a tweet about Apache Attic project ___ in one > month* > > *then this project is not popular* > *Fire new UnPopular("Attic project") event* > > > Another important aspect of the use-case is the frequency of rule > changes/updates. There will be many users writing rules to reason over > incoming events - and these rules are expected to change often. > > The main goal is to make Drools CEP stateful. Part of the challenge is > ensuring that new messages are routed to the correct drools session. But, > as Flink already seems to have scaling and state well done - it would be > nice to combine the two. > Please have a look at > http://blog.athico.com/2016/03/high-availability-drools-stateless.html > > Also, I cannot comment on the strengths and weaknesses of Flink CEP, but > Drools CEP is certainly very mature - and when combined with the rule > language - very powerful. To be able to combine the statefulness of Flink > with Drools would be amazing! > > The concern with using Stateful sessions is the possibility of memory > leaks. However, a stateless session is just a wrapper around stateless - > which calls dispose. So this makes me think, perhaps, when populating the > working memory, just extract all relevant events from the Flink state into > the Drools WM - and it could even be stateful - and then call displose > after calling fireAllRules. And do this each time a new message/event > arrives. > > Would it be for example some event aggregations, which would be filtered by >> the rules? > Yes. > |
Free forum by Nabble | Edit this page |