|
Hello Flink community,
I believe the question below has been already asked, but since I couldn't find my answer from internet, I'd love to reach out the community for help.
We basically want to find out the best configurations for Flink that running on Kubernetes to achieve the best performance. Thinks like what are the parameters to tun e.g. number of Task Manager? number of task slot? parallelism? ....
Our use case:
We have terabyte of data from legacy systems, and want to stream them to cloud. Our pipeline is a streaming one which has 2 sources (one from Kinesis, and the other from SQL), one operator (that join the two sources by key), and a sink
We like to enable RocksDb and checkpointing to S3. We're also looking for what is the best windowing strategy that can be applied in this scenario?
We would love to achieve at least 100GB/s, assuming resources is not a constraints (since we're running Flink on AWS's Kubernetes)
Appreciate if you can help or give us some pointers.
Thanks,
Cam Mach
|