Advice - Drools in Flink

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice - Drools in Flink

Anton
Hello

Firstly, I am an absolute Flink newbie.

I am interested in using Drools in Flink - in a similar case to what is
described in this blog, where Drools is used in Spark.
http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/

The basic idea is that Drools can be used to reason over streaming data.

The high-level use case is, there are several hundred users who want to
write rules to be notified on events related to changes in specific
streams. For example, notify me when a specific stock price changes by so
much.

Due to the number of users, the more end-user focused syntax of Drools, and
the number of rule changes, doing this in Drools makes more sense than to
write and deploy plain-old-flink (apologies, am not familiar with the
correct term for a flink process).

Also, as Drools has some powerful CEP operators, it could be very
interested to have these available in Flink too.

My question, therefore, is, very general - how best to integrate Drools
into Flink? Where should I start?

Thanks and regards
Anton
Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Robert Metzger
Hi Anton,

I think embedding Drools into Flink should be doable. Drools seems to be
implemented in Java, so you can probably call the engine from Flink.
I would start putting a flatMap() operator into a stream. In the operator,
I would start the Drools engine (probably in the open() method) and then
give the individual records to the engine.

Regards,
Robert


On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote:

> Hello
>
> Firstly, I am an absolute Flink newbie.
>
> I am interested in using Drools in Flink - in a similar case to what is
> described in this blog, where Drools is used in Spark.
>
> http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
>
> The basic idea is that Drools can be used to reason over streaming data.
>
> The high-level use case is, there are several hundred users who want to
> write rules to be notified on events related to changes in specific
> streams. For example, notify me when a specific stock price changes by so
> much.
>
> Due to the number of users, the more end-user focused syntax of Drools, and
> the number of rule changes, doing this in Drools makes more sense than to
> write and deploy plain-old-flink (apologies, am not familiar with the
> correct term for a flink process).
>
> Also, as Drools has some powerful CEP operators, it could be very
> interested to have these available in Flink too.
>
> My question, therefore, is, very general - how best to integrate Drools
> into Flink? Where should I start?
>
> Thanks and regards
> Anton
>
Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Aljoscha Krettek-2
Hi,
you can also take a look at this:
https://techblog.king.com/rbea-scalable-real-time-analytics-king/. From a
high level it seems Drools implements something similar to the system that
they developed on top of Flink.

Cheers,
Aljoscha

On Tue, 21 Jun 2016 at 12:04 Robert Metzger <[hidden email]> wrote:

> Hi Anton,
>
> I think embedding Drools into Flink should be doable. Drools seems to be
> implemented in Java, so you can probably call the engine from Flink.
> I would start putting a flatMap() operator into a stream. In the operator,
> I would start the Drools engine (probably in the open() method) and then
> give the individual records to the engine.
>
> Regards,
> Robert
>
>
> On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote:
>
> > Hello
> >
> > Firstly, I am an absolute Flink newbie.
> >
> > I am interested in using Drools in Flink - in a similar case to what is
> > described in this blog, where Drools is used in Spark.
> >
> >
> http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
> >
> > The basic idea is that Drools can be used to reason over streaming data.
> >
> > The high-level use case is, there are several hundred users who want to
> > write rules to be notified on events related to changes in specific
> > streams. For example, notify me when a specific stock price changes by so
> > much.
> >
> > Due to the number of users, the more end-user focused syntax of Drools,
> and
> > the number of rule changes, doing this in Drools makes more sense than to
> > write and deploy plain-old-flink (apologies, am not familiar with the
> > correct term for a flink process).
> >
> > Also, as Drools has some powerful CEP operators, it could be very
> > interested to have these available in Flink too.
> >
> > My question, therefore, is, very general - how best to integrate Drools
> > into Flink? Where should I start?
> >
> > Thanks and regards
> > Anton
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Maciek Próchniak
In reply to this post by Anton
Hi,

I've been using drools in a few projects. Embedding Drools in Flink is
certainly possible, however there are a few points you have to consider.
- are you going to use Drools in stateful or stateless way? The blog
post you mention uses flink in stateless way - that is, each event is
processed separately and
each drools session behaves like a bunch of if's. This is
straightforward, it also shouldn't be problematic performance wise
OTOH, if you want to use stateful sessions things are getting more
complicated - because if you want to play well with Flink you'd have to
integrate drools sessions with checkpoints in Flink - and that can be a
bit more tricky (although certainly possible) - especially when it comes
to size of session, partitioning and so on.
Even more difficult would be to use Drools CEP features - because then
you'd have to consider how to handle time (e.g. during restore??).
To be honest, I wouldn't try to integrate Drools CEP with Flink - Flink
has it's own time handling, own CEP engine and mixing those two can lead
to tricky issues...

Even if you decide to use stateless sessions there's issue with
deployment. If you don't want to redeploy Flink process each time one of
the users change rules, you'd have to implement some notion of detecting
changes (and storing rules somewhere?) and recompiling rules on each
task manager. Again - this is certainly doable and possible, but
requires some thinking about the design.

To sum up - there are many ways you can integrate drools with flink. I'd
start with something simple - exactly as Robert wrote
hope this helps a bit and good luck,

maciek



On Tue, Jun 21, 2016 at 7:55 AM, Anton <[hidden email]> wrote:

> Hello
>
> Firstly, I am an absolute Flink newbie.
>
> I am interested in using Drools in Flink - in a similar case to what is
> described in this blog, where Drools is used in Spark.
>
> http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
>
> The basic idea is that Drools can be used to reason over streaming data.
>
> The high-level use case is, there are several hundred users who want to
> write rules to be notified on events related to changes in specific
> streams. For example, notify me when a specific stock price changes by so
> much.
>
> Due to the number of users, the more end-user focused syntax of Drools, and
> the number of rule changes, doing this in Drools makes more sense than to
> write and deploy plain-old-flink (apologies, am not familiar with the
> correct term for a flink process).
>
> Also, as Drools has some powerful CEP operators, it could be very
> interested to have these available in Flink too.
>
> My question, therefore, is, very general - how best to integrate Drools
> into Flink? Where should I start?
>
> Thanks and regards
> Anton
>


Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Anton
Thanks Maiek
On Wed, Jun 22, 2016 at 9:00 PM, Maciek Próchniak <[hidden email]> wrote:

> This is straightforward, it also shouldn't be problematic performance wise
> OTOH, if you want to use stateful sessions things are getting more
> complicated - because if you want to play well with Flink you'd have to
> integrate drools sessions with checkpoints in Flink - and that can be a bit
> more tricky (although certainly possible)
>

Yes, this is exactly the point I wanted to discuss. And I know Drools
stateful can be tricky even without using Flink.

However, I was thinking - and am interested to know if this is possible -
if it would be possible to do a puedo stateful drools but passing the state
from flink into drools stateless. Does that make sense? And would it be
possible?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Maciek Próchniak
Hi Anton,

you mean - keeping working memory facts in Flink state and with each
event throw them into stateless session?
Can you elaborate a bit on your use case? What would be the state?
Would it be for example some event aggregations, which would be filtered
by the rules?

maciek

On 22/06/2016 21:19, Anton wrote:

> Thanks Maiek
> On Wed, Jun 22, 2016 at 9:00 PM, Maciek Próchniak <[hidden email]> wrote:
>
>> This is straightforward, it also shouldn't be problematic performance wise
>> OTOH, if you want to use stateful sessions things are getting more
>> complicated - because if you want to play well with Flink you'd have to
>> integrate drools sessions with checkpoints in Flink - and that can be a bit
>> more tricky (although certainly possible)
>>
> Yes, this is exactly the point I wanted to discuss. And I know Drools
> stateful can be tricky even without using Flink.
>
> However, I was thinking - and am interested to know if this is possible -
> if it would be possible to do a puedo stateful drools but passing the state
> from flink into drools stateless. Does that make sense? And would it be
> possible?
>
> Thanks
>

Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Anton
Hi Maciek

Firstly, thanks for your replies and interest. It is really appreciated. I
am familiar with Drools but am new to Flink. Whereas you seem familiar with
both - so your feedback is really appreciated.

On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <[hidden email]> wrote:

>
> you mean - keeping working memory facts in Flink state and with each event
> throw them into stateless session?
>
Yes kind of.

> Can you elaborate a bit on your use case? What would be the state?
>

*Use case*
Flink ingests a large amount of tweets.
Application users can write rules to reason of tweet contents and
frequency.
For example, user John writes a rule that says:

*When I receive 3 tweets about Apache Flink within 1 day*

*Then Flink is popular*

*Fire new Popular("Flink") event*



*When I have not received a tweet about Apache Attic project ___ in one
month*

*then this project is not popular*
*Fire new UnPopular("Attic project") event*


Another important aspect of the use-case is the frequency of rule
changes/updates. There will be many users writing rules to reason over
incoming events - and these rules are expected to change often.

The main goal is to make Drools CEP stateful. Part of the challenge is
ensuring that new messages are routed to the correct drools session. But,
as Flink already seems to have scaling and state well done - it would be
nice to combine the two.
Please have a look at
http://blog.athico.com/2016/03/high-availability-drools-stateless.html

Also, I cannot comment on the strengths and weaknesses of Flink CEP, but
Drools CEP is certainly very mature - and when combined with the rule
language - very powerful. To be able to combine the statefulness of Flink
with Drools would be amazing!

The concern with using Stateful sessions is the possibility of memory
leaks. However, a stateless session is just a wrapper around stateless -
which calls dispose. So this makes me think, perhaps, when populating the
working memory, just extract all relevant events from the Flink state into
the Drools WM - and it could even be stateful - and then call displose
after calling fireAllRules. And do this each time a new message/event
arrives.

Would it be for example some event aggregations, which would be filtered by
> the rules?

Yes.
Reply | Threaded
Open this post in threaded view
|

Re: Advice - Drools in Flink

Maciek Próchniak
Hi Anton,

I think I start to understand what you want to achieve. I would be a
very interesting thing to do ;)
If the drools are the piece that decides what is interesting and what is
not, then probably also drools should be responisble for keeping
state/aggregates - otherwise
Flink would keep too much stuff in memory.

I think that serializing session wouldn't be impossible. I still wonder
how to deal with clocks in Drools CEP - maybe the clock should be moved
by incoming events - but then there are problems with out of order ones,
or only by watermarks - but then you have worse latency.
One thing that is still not so clear to me is - how do you want to
partition the stream of tweets? I guess that it cannot be done by some
tweet property - e.g. author, because otherwise users would have many
sessions and aggregations would not be correct?
I may be wrong, but I think that to leverage Flink scaling/state
capabilities the processing logic should be somehow Flink-aware - that
is, Flink should understand (at least partially) what are specific rules...


maciek


On 23/06/2016 16:54, Anton wrote:

> Hi Maciek
>
> Firstly, thanks for your replies and interest. It is really appreciated. I
> am familiar with Drools but am new to Flink. Whereas you seem familiar with
> both - so your feedback is really appreciated.
>
> On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <[hidden email]> wrote:
>
>> you mean - keeping working memory facts in Flink state and with each event
>> throw them into stateless session?
>>
> Yes kind of.
>
>> Can you elaborate a bit on your use case? What would be the state?
>>
> *Use case*
> Flink ingests a large amount of tweets.
> Application users can write rules to reason of tweet contents and
> frequency.
> For example, user John writes a rule that says:
>
> *When I receive 3 tweets about Apache Flink within 1 day*
>
> *Then Flink is popular*
>
> *Fire new Popular("Flink") event*
>
>
>
> *When I have not received a tweet about Apache Attic project ___ in one
> month*
>
> *then this project is not popular*
> *Fire new UnPopular("Attic project") event*
>
>
> Another important aspect of the use-case is the frequency of rule
> changes/updates. There will be many users writing rules to reason over
> incoming events - and these rules are expected to change often.
>
> The main goal is to make Drools CEP stateful. Part of the challenge is
> ensuring that new messages are routed to the correct drools session. But,
> as Flink already seems to have scaling and state well done - it would be
> nice to combine the two.
> Please have a look at
> http://blog.athico.com/2016/03/high-availability-drools-stateless.html
>
> Also, I cannot comment on the strengths and weaknesses of Flink CEP, but
> Drools CEP is certainly very mature - and when combined with the rule
> language - very powerful. To be able to combine the statefulness of Flink
> with Drools would be amazing!
>
> The concern with using Stateful sessions is the possibility of memory
> leaks. However, a stateless session is just a wrapper around stateless -
> which calls dispose. So this makes me think, perhaps, when populating the
> working memory, just extract all relevant events from the Flink state into
> the Drools WM - and it could even be stateful - and then call displose
> after calling fireAllRules. And do this each time a new message/event
> arrives.
>
> Would it be for example some event aggregations, which would be filtered by
>> the rules?
> Yes.
>