flink-git - an experiment in exactly-once semantics

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

flink-git - an experiment in exactly-once semantics

Eron Wright
Hello,
On a long plane trip I had some fun with writing a Flink streaming connector based on Git.   https://github.com/EronWright/flink-git
Not intended for real application use; flink-git is just an experiment meant for discussion.
Flink's Kafka connector provides exactly-once guarantees when acting as a source (consumer) but not as a sink (producer), due to a limitation of Kafka.  This limitation invites the question of how to extend Kafka (or a similar system) to provide exactly-once guarantees for a sink. Since Kafka is envisioned as a commit log, may an answer be found in commit log concepts? The flink-git repository explores that possibility.
Git provides a useful conceptual framework for the investigation, since its concepts are familiar and it is easily programmable with jgit. The flink-git repository is thus an experimental connector, based on jgit, that explores providing exactly-once guarantees as both a source and as a sink.
Enjoy,Eron Wright
     
mxm
Reply | Threaded
Open this post in threaded view
|

Re: flink-git - an experiment in exactly-once semantics

mxm
Hi Eron,

Very interesting idea to support exactly once semantics for sinks via
Git! I would be curious about the performance of such a sink.

Since this currently works on local file systems only (throws an
Exception otherwise), I wonder how does it work on failures when the
"git-${subtaskIndex}" directory is not available on a node? We might
loose some of the exactly once semantics because the task deployment
is not deterministic.

Nevertheless, very elegant hack!

Cheers,
Max

On Sat, Apr 23, 2016 at 12:23 AM, Eron Wright <[hidden email]> wrote:
> Hello,
> On a long plane trip I had some fun with writing a Flink streaming connector based on Git.   https://github.com/EronWright/flink-git
> Not intended for real application use; flink-git is just an experiment meant for discussion.
> Flink's Kafka connector provides exactly-once guarantees when acting as a source (consumer) but not as a sink (producer), due to a limitation of Kafka.  This limitation invites the question of how to extend Kafka (or a similar system) to provide exactly-once guarantees for a sink. Since Kafka is envisioned as a commit log, may an answer be found in commit log concepts? The flink-git repository explores that possibility.
> Git provides a useful conceptual framework for the investigation, since its concepts are familiar and it is easily programmable with jgit. The flink-git repository is thus an experimental connector, based on jgit, that explores providing exactly-once guarantees as both a source and as a sink.
> Enjoy,Eron Wright
>