(DEPRECATED) Apache Flink Mailing List archive.

Re: Batch Flink Job S3 write performance vs Spark

Classic

List

Threaded

2 messages Options

Arvid Heise-3

Re: Batch Flink Job S3 write performance vs Spark

Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its
vendor you report that. Then, the competitor benchmarks a different use
case where his system outperforms ours. In the end, customers are more
confused than before.

You should do your own benchmarks for your own workloads. That is the only
reliable way.

In the end, both systems use similar setups and improvements in one system
are often also incorporated into the other system with some delay, such
that there should be no ground-breaking differences between the two systems
running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use
case, that could be much faster.

On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <
[hidden email]> wrote:

> Hi All,
>
> have a question did anyone compared the performance of Flink batch job
> writing to s3 vs spark writing to s3?
>
> --
> Thanks & Regards
> Sri Tummala
>
>

Arvid Heise-3

Re: Batch Flink Job S3 write performance vs Spark

Exactly. We use the hadoop-fs as an indirection on top of that, but Spark
probably does the same.

On Wed, Feb 26, 2020 at 3:52 PM sri hari kali charan Tummala <
[hidden email]> wrote:

> Thank you (the two systems running on Java and using the same set of
> libraries), so from my understanding, Flink uses AWS SDK behind the scenes
> same as spark.
>
> On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise <[hidden email]> wrote:
>
>> Fair benchmarks are notoriously difficult to setup.
>>
>> Usually, it's easy to find a workload where one system shines and as its
>> vendor you report that. Then, the competitor benchmarks a different use
>> case where his system outperforms ours. In the end, customers are more
>> confused than before.
>>
>> You should do your own benchmarks for your own workloads. That is the
>> only reliable way.
>>
>> In the end, both systems use similar setups and improvements in one
>> system are often also incorporated into the other system with some delay,
>> such that there should be no ground-breaking differences between the two
>> systems running on Java and using the same set of libraries.
>> Of course, if one system has a very specific optimization for your use
>> case, that could be much faster.
>>
>>
>> On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <
>> [hidden email]> wrote:
>>
>>> Hi All,
>>>
>>> have a question did anyone compared the performance of Flink batch job
>>> writing to s3 vs spark writing to s3?
>>>
>>> --
>>> Thanks & Regards
>>> Sri Tummala
>>>
>>>
>
> --
> Thanks & Regards
> Sri Tummala
>
>