Need Help on Flink suitability to our usecase

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Need Help on Flink suitability to our usecase

Prasanna kumar
Hi,

I have the following usecase to implement in my organization.

Say there is huge relational database(1000 tables for each of our 30k
customers) in our monolith setup

We want to reduce the load on the DB and prevent the applications from
hitting it for latest events. So an extract is done from redo logs on to
kafka.

We need to set up a streaming platform based on the table updates that
happen(read from kafka) , we need to form events and send it consumer.

Each consumer may be interested in same table but different updates/columns
respective of their business needs and then deliver it to their
endpoint/kinesis/SQS/a kafka topic.

So the case here is *1* table update : *m* events : *n* sink.
Peak Load expected is easily a  100k-million table updates per second(all
customers put together)
Latency expected by most customers is less than a second. Mostly in
100-500ms.

Is this usecase suited for flink ?

I went through the Flink book and documentation. These are the following
questions i have

1). If we have situation like this *1* table update : *m* events : *n* sink
, is it better to write our micro service on our own or it it better to
implement through flink.
      1 a)  How does checkpointing happens if we have *1* input: *n* output
situations.
      1 b)  There are no heavy transformations maximum we might do is to
check the required columns are present in the db updates and decide whether
to create an event. So there is an alternative thought process to write a
service in node since it more IO and less process.

2)  I see that we are writing a Job and it is deployed and flink takes care
of the rest in handling parallelism, latency and throughput.
     But what i need is to write a generic framework so that we should be
able to handle any table structure. we should not end up writing one job
driver for each case.
    There are at least 200 type of events in the existing monolith system
which might move to this new system once built.

3)  How do we maintain flink cluster HA . From the book , i get that
internal task level failures are handled gracefully in flink.  But what if
the flink cluster goes down, how do we make sure its HA ?
    I had earlier worked with spark and we had issues managing it. (Not
much problem was there since there the latency requirement is 15 min and we
could make sure to ramp another one up within that time).
    These are absolute realtime cases and we cannot miss even one
message/event.

There are also thoughts whether to use kafka streams/apache storm for the
same. [They are investigated by different set of folks]

Thanks,
Prasanna.