Mongo Connector

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Mongo Connector

Vijay Srinivasaraghavan
Hello,
Do we know how much of support we have for Mongo? The documentation page is pointing to a connector repo that was very old (last updated 5 years ago) and looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as well as bulk upserts). Trying to understand if anyone has used Mongo in the pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay
Reply | Threaded
Open this post in threaded view
|

Re: Mongo Connector

JingsongLee-2
Hi vijay:

I developed an append stream sink for Mongo internally, which writes data in batches according to configureable batch size, and also provides asynchronous flush. But only insert not update or upsert. It's a good job. It's been working very well for a long time. (Throughput depends mainly on MongoServer)

I've investigated Mongo's API, use "UpdateOptions.upsert" to achieve upsert functionality, which should be similar to ElasticsearchUpsertTableSink.

About reading, the main problem is how to support distributed multiple parallelism reads, need to get the splitKeys of MongoShard to build the filtering conditions to split mongo source to multiple parallelism. (I think beam MongoDbIO is a very good example)

Best,
Jingsong Lee


------------------------------------------------------------------
From:Vijay Srinivasaraghavan <[hidden email]>
Send Time:2019年10月13日(星期日) 11:52
To:Dev <[hidden email]>
Subject:Mongo Connector

Hello,
Do we know how much of support we have for Mongo? The documentation page is pointing to a connector repo that was very old (last updated 5 years ago) and looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as well as bulk upserts). Trying to understand if anyone has used Mongo in the pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay
Reply | Threaded
Open this post in threaded view
|

Re: Mongo Connector

Vijay Srinivasaraghavan
 Hi Jingsong Lee,
Thanks for the details. Were you able to achieve end-to-end exactly once support with Mongo? 
Also, for doing any intermittent reads from Mongo (Kafka -> process event -> lookup Mongo -> enhance event -> Sink to Mongo), I am thinking of using Async IO (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html). In your implementation, did you have a need to use this API?
RegardsVijay

    On Sunday, October 13, 2019, 08:11:06 PM PDT, JingsongLee <[hidden email]> wrote:  
 
 Hi vijay:

I developed an append stream sink for Mongo internally, which writes data in batches according to configureable batch size, and also provides asynchronous flush. But only insert not update or upsert. It's a good job. It's been working very well for a long time. (Throughput depends mainly on MongoServer)

I've investigated Mongo's API, use "UpdateOptions.upsert" to achieve upsert functionality, which should be similar to ElasticsearchUpsertTableSink.

About reading, the main problem is how to support distributed multiple parallelism reads, need to get the splitKeys of MongoShard to build the filtering conditions to split mongo source to multiple parallelism. (I think beam MongoDbIO is a very good example)

Best,
Jingsong Lee


------------------------------------------------------------------
From:Vijay Srinivasaraghavan <[hidden email]>
Send Time:2019年10月13日(星期日) 11:52
To:Dev <[hidden email]>
Subject:Mongo Connector

Hello,
Do we know how much of support we have for Mongo? The documentation page is pointing to a connector repo that was very old (last updated 5 years ago) and looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as well as bulk upserts). Trying to understand if anyone has used Mongo in the pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay