(DEPRECATED) Apache Flink Mailing List archive.

Best configurations for bounded and unbounded streams pipelines

Classic

List

Threaded

1 message

Cam Mach

Best configurations for bounded and unbounded streams pipelines

Hello Flink experts,

I believe the question below has been already asked, but since I couldn't find my answer from internet, I'd love to reach out the community for help.

We basically want to find out the best configurations for Flink that running on Kubernetes to achieve the best performance. Thinks like what are the parameters to tun e.g. number of Task Manager? number of task slot? parallelism? ....

Our use case:
We have around one terabyte of data from legacy systems, and want to stream them to cloud. Our pipeline has 2 sources (one from Kinesis, and the other from SQL), one operator (that join the two sources by key), and a sink
We like to enable RocksDb and checkpointing to S3. We're also looking for what is the best windowing strategy that can be applied in this scenario?

Assuming resources is not a constraints (since we can scale out easily in AWS's Kubernetes)

Appreciate if you can help or give us some pointers.

Thanks,