Performance and Latency Chart for Flink

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance and Latency Chart for Flink

Chawla,Sumit
Hi

Is there any performance run that is done for each Flink release? Or you
are aware of any third party evaluation of performance metrics for Flink?
I am interested in seeing how performance has improved over release to
release, and performance vs other competitors.

Regards
Sumit Chawla
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Fabian Hueske-2
Hi,

I am not aware of periodic performance runs for the Flink releases.
I know a few benchmarks which have been published at different points in
time like [1], [2], and [3] (you'll probably find more).

In general, fair benchmarks that compare different systems (if there is
such thing) are very difficult and the results often depend on the use case.
IMO the best option is to run your own benchmarks, if you have a concrete
use case.

Best, Fabian

[1] 08/2015:
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] 12/2015:
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
[3] 02/2016:
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/


2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:

> Hi
>
> Is there any performance run that is done for each Flink release? Or you
> are aware of any third party evaluation of performance metrics for Flink?
> I am interested in seeing how performance has improved over release to
> release, and performance vs other competitors.
>
> Regards
> Sumit Chawla
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
FYI, we, at a well known IT department, have been actively measuring Beam Flink Runner performance using MIT's Linear Road to stress the Flink Cluster servers.The results, thus far does not even come close to the previous streaming engines we have bench-marked.Our optimistic assumption was, when we started, that Beam runners (Flink for instance) will leave Storm & IBM in smoke.Wrong. What IBM managed to perform is 150 times better than Flink. Needless to mention Storm, and Hortonworks.As an example, IBM  handled 150 expressways in 3.5 hours.In the same identical topology, everything fixed, Beam Flink Runner in a Flink Cluster handled 10 expressways in 17 hours at its best so far.
I have followed every single performance tuning recommendation that is out there & none improved it even a bit.Works fine with 1 expressway. Sorry but thats our findings so far unless we are doing something wrong.I posted all details to this forum but never got any solid response that would make a difference in our observations.Therefore, we assume what we are seeing is the reality which we have to report to our superiors.Pls prove us wrong. We still have some time.Thanks.Amir-

      From: Fabian Hueske <[hidden email]>
 To: "[hidden email]" <[hidden email]>
 Sent: Friday, September 16, 2016 12:31 AM
 Subject: Re: Performance and Latency Chart for Flink
   
Hi,

I am not aware of periodic performance runs for the Flink releases.
I know a few benchmarks which have been published at different points in
time like [1], [2], and [3] (you'll probably find more).

In general, fair benchmarks that compare different systems (if there is
such thing) are very difficult and the results often depend on the use case.
IMO the best option is to run your own benchmarks, if you have a concrete
use case.

Best, Fabian

[1] 08/2015:
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] 12/2015:
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
[3] 02/2016:
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/


2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:

> Hi
>
> Is there any performance run that is done for each Flink release? Or you
> are aware of any third party evaluation of performance metrics for Flink?
> I am interested in seeing how performance has improved over release to
> release, and performance vs other competitors.
>
> Regards
> Sumit Chawla
>


   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Timo Walther-2
Hi Amir,

it would be great if you could link to the details of your benchmark
environment if you make such claims. Compared to which IBM system?
Characteristics of your machines? Configuration of the software?
Implementation code? etc.

In general the Beam Runner also adds some overhead compared to native
Flink jobs.  There are many factors that could affect results. I don't
know the Linear Road Benchmark but 150 times sounds unrealistic.

Timo


Am 16/09/16 um 10:02 schrieb amir bahmanyari:

> FYI, we, at a well known IT department, have been actively measuring Beam Flink Runner performance using MIT's Linear Road to stress the Flink Cluster servers.The results, thus far does not even come close to the previous streaming engines we have bench-marked.Our optimistic assumption was, when we started, that Beam runners (Flink for instance) will leave Storm & IBM in smoke.Wrong. What IBM managed to perform is 150 times better than Flink. Needless to mention Storm, and Hortonworks.As an example, IBM  handled 150 expressways in 3.5 hours.In the same identical topology, everything fixed, Beam Flink Runner in a Flink Cluster handled 10 expressways in 17 hours at its best so far.
> I have followed every single performance tuning recommendation that is out there & none improved it even a bit.Works fine with 1 expressway. Sorry but thats our findings so far unless we are doing something wrong.I posted all details to this forum but never got any solid response that would make a difference in our observations.Therefore, we assume what we are seeing is the reality which we have to report to our superiors.Pls prove us wrong. We still have some time.Thanks.Amir-
>
>        From: Fabian Hueske <[hidden email]>
>   To: "[hidden email]" <[hidden email]>
>   Sent: Friday, September 16, 2016 12:31 AM
>   Subject: Re: Performance and Latency Chart for Flink
>    
> Hi,
>
> I am not aware of periodic performance runs for the Flink releases.
> I know a few benchmarks which have been published at different points in
> time like [1], [2], and [3] (you'll probably find more).
>
> In general, fair benchmarks that compare different systems (if there is
> such thing) are very difficult and the results often depend on the use case.
> IMO the best option is to run your own benchmarks, if you have a concrete
> use case.
>
> Best, Fabian
>
> [1] 08/2015:
> http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
> [2] 12/2015:
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
> [3] 02/2016:
> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>
>
> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>
>> Hi
>>
>> Is there any performance run that is done for each Flink release? Or you
>> are aware of any third party evaluation of performance metrics for Flink?
>> I am interested in seeing how performance has improved over release to
>> release, and performance vs other competitors.
>>
>> Regards
>> Sumit Chawla
>>
>
>    


--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr

Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
In reply to this post by Fabian Hueske-2
Hi Fabian,FYI. This is report on other engines we did the same type of bench-marking.Also explains what Linear Road bench-marking is.Thanks for your help.
http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-linear-road-benchmark
https://github.com/IBMStreams/benchmarks 
https://www.datatorrent.com/blog/blog-implementing-linear-road-benchmark-in-apex/


      From: Fabian Hueske <[hidden email]>
 To: "[hidden email]" <[hidden email]>
 Sent: Friday, September 16, 2016 12:31 AM
 Subject: Re: Performance and Latency Chart for Flink
   
Hi,

I am not aware of periodic performance runs for the Flink releases.
I know a few benchmarks which have been published at different points in
time like [1], [2], and [3] (you'll probably find more).

In general, fair benchmarks that compare different systems (if there is
such thing) are very difficult and the results often depend on the use case.
IMO the best option is to run your own benchmarks, if you have a concrete
use case.

Best, Fabian

[1] 08/2015:
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] 12/2015:
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
[3] 02/2016:
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/


2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:

> Hi
>
> Is there any performance run that is done for each Flink release? Or you
> are aware of any third party evaluation of performance metrics for Flink?
> I am interested in seeing how performance has improved over release to
> release, and performance vs other competitors.
>
> Regards
> Sumit Chawla
>


   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Chawla,Sumit
Hi Amir

Would it be possible for you to share the numbers? Also share if possible
your configuration details.

Regards
Sumit Chawla


On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
[hidden email]> wrote:

> Hi Fabian,FYI. This is report on other engines we did the same type of
> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
> your help.
> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-
> the-linear-road-benchmark
> https://github.com/IBMStreams/benchmarks
> https://www.datatorrent.com/blog/blog-implementing-linear-
> road-benchmark-in-apex/
>
>
>       From: Fabian Hueske <[hidden email]>
>  To: "[hidden email]" <[hidden email]>
>  Sent: Friday, September 16, 2016 12:31 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi,
>
> I am not aware of periodic performance runs for the Flink releases.
> I know a few benchmarks which have been published at different points in
> time like [1], [2], and [3] (you'll probably find more).
>
> In general, fair benchmarks that compare different systems (if there is
> such thing) are very difficult and the results often depend on the use
> case.
> IMO the best option is to run your own benchmarks, if you have a concrete
> use case.
>
> Best, Fabian
>
> [1] 08/2015:
> http://data-artisans.com/high-throughput-low-latency-and-
> exactly-once-stream-processing-with-apache-flink/
> [2] 12/2015:
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-
> computation-engines-at
> [3] 02/2016:
> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>
>
> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>
> > Hi
> >
> > Is there any performance run that is done for each Flink release? Or you
> > are aware of any third party evaluation of performance metrics for Flink?
> > I am interested in seeing how performance has improved over release to
> > release, and performance vs other competitors.
> >
> > Regards
> > Sumit Chawla
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Chawla,Sumit
Has anyone else run these kind of benchmarks?  Would love to hear more
people'e experience and details about those benchmarks.

Regards
Sumit Chawla


On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
wrote:

> Hi Amir
>
> Would it be possible for you to share the numbers? Also share if possible
> your configuration details.
>
> Regards
> Sumit Chawla
>
>
> On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> [hidden email]> wrote:
>
>> Hi Fabian,FYI. This is report on other engines we did the same type of
>> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
>> your help.
>> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
>> linear-road-benchmark
>> https://github.com/IBMStreams/benchmarks
>> https://www.datatorrent.com/blog/blog-implementing-linear-ro
>> ad-benchmark-in-apex/
>>
>>
>>       From: Fabian Hueske <[hidden email]>
>>  To: "[hidden email]" <[hidden email]>
>>  Sent: Friday, September 16, 2016 12:31 AM
>>  Subject: Re: Performance and Latency Chart for Flink
>>
>> Hi,
>>
>> I am not aware of periodic performance runs for the Flink releases.
>> I know a few benchmarks which have been published at different points in
>> time like [1], [2], and [3] (you'll probably find more).
>>
>> In general, fair benchmarks that compare different systems (if there is
>> such thing) are very difficult and the results often depend on the use
>> case.
>> IMO the best option is to run your own benchmarks, if you have a concrete
>> use case.
>>
>> Best, Fabian
>>
>> [1] 08/2015:
>> http://data-artisans.com/high-throughput-low-latency-and-exa
>> ctly-once-stream-processing-with-apache-flink/
>> [2] 12/2015:
>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
>> streaming-computation-engines-at
>> [3] 02/2016:
>> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>>
>>
>> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>>
>> > Hi
>> >
>> > Is there any performance run that is done for each Flink release? Or you
>> > are aware of any third party evaluation of performance metrics for
>> Flink?
>> > I am interested in seeing how performance has improved over release to
>> > release, and performance vs other competitors.
>> >
>> > Regards
>> > Sumit Chawla
>> >
>>
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
I have new findings & subsequently relative improvements.Am testing as we speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had keep state somewhere. I went with Redis. I found it to be a major bottle neck as Beam nodes constantly are going across NW to update its repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. Then Kafka went out of disk space and the replication manager complained. So I clustered the two Kafka nodes hoping for sharing space. As of this second I am typing this email, its sustaining but only 1/2 of the 201401969  tuples have been processed after 3.5 hours.According to the Linear Road benchmarking expectations, if your system is working well, this whole 201401969   tuples must be done in 3.5 hrs max.So this means there is still room for tuning Flink nodes. I have already shared with you all more details about my config.It run perfect yesterday with almost 1/10th of this load. Perfect real-time send/processed streaming behavior.If thats the case & I cannot get better performance with FlinkRunner, my nest stop is SparkRunner and repeat of the whole thing for final benchmarking of the two under Beam APIs.Which was the initial intent anyways.If you have suggestions to make improvements in the above case, I am all ears & greatly appreciate it.Cheers,Amir-

      From: "Chawla,Sumit" <[hidden email]>
 To: [hidden email]; amir bahmanyari <[hidden email]>
 Sent: Sunday, September 18, 2016 2:07 PM
 Subject: Re: Performance and Latency Chart for Flink
   
Has anyone else run these kind of benchmarks?  Would love to hear more
people'e experience and details about those benchmarks.

Regards
Sumit Chawla


On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
wrote:

> Hi Amir
>
> Would it be possible for you to share the numbers? Also share if possible
> your configuration details.
>
> Regards
> Sumit Chawla
>
>
> On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> [hidden email]> wrote:
>
>> Hi Fabian,FYI. This is report on other engines we did the same type of
>> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
>> your help.
>> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
>> linear-road-benchmark
>> https://github.com/IBMStreams/benchmarks
>> https://www.datatorrent.com/blog/blog-implementing-linear-ro
>> ad-benchmark-in-apex/
>>
>>
>>      From: Fabian Hueske <[hidden email]>
>>  To: "[hidden email]" <[hidden email]>
>>  Sent: Friday, September 16, 2016 12:31 AM
>>  Subject: Re: Performance and Latency Chart for Flink
>>
>> Hi,
>>
>> I am not aware of periodic performance runs for the Flink releases.
>> I know a few benchmarks which have been published at different points in
>> time like [1], [2], and [3] (you'll probably find more).
>>
>> In general, fair benchmarks that compare different systems (if there is
>> such thing) are very difficult and the results often depend on the use
>> case.
>> IMO the best option is to run your own benchmarks, if you have a concrete
>> use case.
>>
>> Best, Fabian
>>
>> [1] 08/2015:
>> http://data-artisans.com/high-throughput-low-latency-and-exa
>> ctly-once-stream-processing-with-apache-flink/
>> [2] 12/2015:
>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
>> streaming-computation-engines-at
>> [3] 02/2016:
>> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>>
>>
>> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>>
>> > Hi
>> >
>> > Is there any performance run that is done for each Flink release? Or you
>> > are aware of any third party evaluation of performance metrics for
>> Flink?
>> > I am interested in seeing how performance has improved over release to
>> > release, and performance vs other competitors.
>> >
>> > Regards
>> > Sumit Chawla
>> >
>>
>>
>>
>>
>
>


   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Greg Hogan
Hi Amir,

You may see improved performance setting "taskmanager.memory.preallocate:
true" in order to use off-heap memory.

Also, your number of buffers looks quite low and you may want to increase
"taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB.

As this is a only benchmark are you able to post the code to github to
solicit feedback?

Greg

On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
[hidden email]> wrote:

> I have new findings & subsequently relative improvements.Am testing as we
> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> keep state somewhere. I went with Redis. I found it to be a major bottle
> neck as Beam nodes constantly are going across NW to update its
> repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> Then Kafka went out of disk space and the replication manager
> complained. So I clustered the two Kafka nodes hoping for sharing space. As
> of this second I am typing this email, its sustaining but only 1/2 of
> the 201401969  tuples have been processed after 3.5 hours.According to the
> Linear Road benchmarking expectations, if your system is working well, this
> whole 201401969   tuples must be done in 3.5 hrs max.So this means there is
> still room for tuning Flink nodes. I have already shared with you all more
> details about my config.It run perfect yesterday with almost 1/10th of this
> load. Perfect real-time send/processed streaming behavior.If thats the case
> & I cannot get better performance with FlinkRunner, my nest stop is
> SparkRunner and repeat of the whole thing for final benchmarking of the two
> under Beam APIs.Which was the initial intent anyways.If you have
> suggestions to make improvements in the above case, I am all ears & greatly
> appreciate it.Cheers,Amir-
>
>       From: "Chawla,Sumit" <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Sunday, September 18, 2016 2:07 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Has anyone else run these kind of benchmarks?  Would love to hear more
> people'e experience and details about those benchmarks.
>
> Regards
> Sumit Chawla
>
>
> On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> wrote:
>
> > Hi Amir
> >
> > Would it be possible for you to share the numbers? Also share if possible
> > your configuration details.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > [hidden email]> wrote:
> >
> >> Hi Fabian,FYI. This is report on other engines we did the same type of
> >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
> >> your help.
> >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> >> linear-road-benchmark
> >> https://github.com/IBMStreams/benchmarks
> >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> >> ad-benchmark-in-apex/
> >>
> >>
> >>      From: Fabian Hueske <[hidden email]>
> >>  To: "[hidden email]" <[hidden email]>
> >>  Sent: Friday, September 16, 2016 12:31 AM
> >>  Subject: Re: Performance and Latency Chart for Flink
> >>
> >> Hi,
> >>
> >> I am not aware of periodic performance runs for the Flink releases.
> >> I know a few benchmarks which have been published at different points in
> >> time like [1], [2], and [3] (you'll probably find more).
> >>
> >> In general, fair benchmarks that compare different systems (if there is
> >> such thing) are very difficult and the results often depend on the use
> >> case.
> >> IMO the best option is to run your own benchmarks, if you have a
> concrete
> >> use case.
> >>
> >> Best, Fabian
> >>
> >> [1] 08/2015:
> >> http://data-artisans.com/high-throughput-low-latency-and-exa
> >> ctly-once-stream-processing-with-apache-flink/
> >> [2] 12/2015:
> >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> >> streaming-computation-engines-at
> >> [3] 02/2016:
> >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> >>
> >>
> >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> >>
> >> > Hi
> >> >
> >> > Is there any performance run that is done for each Flink release? Or
> you
> >> > are aware of any third party evaluation of performance metrics for
> >> Flink?
> >> > I am interested in seeing how performance has improved over release to
> >> > release, and performance vs other competitors.
> >> >
> >> > Regards
> >> > Sumit Chawla
> >> >
> >>
> >>
> >>
> >>
> >
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: Configuration

 
|  
|  
|  
|   |    |

   |

  |
|  
|   |  
Apache Flink 1.2-SNAPSHOT Documentation: Configuration
   |   |

  |

  |

 

4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 is there in the formula.What would you set it to? Once I have that number, I will set  "taskmanager.memory.preallocate" to true & will give it another shot.Thanks Greg

      From: Greg Hogan <[hidden email]>
 To: [hidden email]; amir bahmanyari <[hidden email]>
 Sent: Monday, September 19, 2016 8:29 AM
 Subject: Re: Performance and Latency Chart for Flink
   
Hi Amir,

You may see improved performance setting "taskmanager.memory.preallocate:
true" in order to use off-heap memory.

Also, your number of buffers looks quite low and you may want to increase
"taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB.

As this is a only benchmark are you able to post the code to github to
solicit feedback?

Greg

On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
[hidden email]> wrote:

> I have new findings & subsequently relative improvements.Am testing as we
> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> keep state somewhere. I went with Redis. I found it to be a major bottle
> neck as Beam nodes constantly are going across NW to update its
> repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> Then Kafka went out of disk space and the replication manager
> complained. So I clustered the two Kafka nodes hoping for sharing space. As
> of this second I am typing this email, its sustaining but only 1/2 of
> the 201401969  tuples have been processed after 3.5 hours.According to the
> Linear Road benchmarking expectations, if your system is working well, this
> whole 201401969  tuples must be done in 3.5 hrs max.So this means there is
> still room for tuning Flink nodes. I have already shared with you all more
> details about my config.It run perfect yesterday with almost 1/10th of this
> load. Perfect real-time send/processed streaming behavior.If thats the case
> & I cannot get better performance with FlinkRunner, my nest stop is
> SparkRunner and repeat of the whole thing for final benchmarking of the two
> under Beam APIs.Which was the initial intent anyways.If you have
> suggestions to make improvements in the above case, I am all ears & greatly
> appreciate it.Cheers,Amir-
>
>      From: "Chawla,Sumit" <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Sunday, September 18, 2016 2:07 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Has anyone else run these kind of benchmarks?  Would love to hear more
> people'e experience and details about those benchmarks.
>
> Regards
> Sumit Chawla
>
>
> On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> wrote:
>
> > Hi Amir
> >
> > Would it be possible for you to share the numbers? Also share if possible
> > your configuration details.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > [hidden email]> wrote:
> >
> >> Hi Fabian,FYI. This is report on other engines we did the same type of
> >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
> >> your help.
> >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> >> linear-road-benchmark
> >> https://github.com/IBMStreams/benchmarks
> >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> >> ad-benchmark-in-apex/
> >>
> >>
> >>      From: Fabian Hueske <[hidden email]>
> >>  To: "[hidden email]" <[hidden email]>
> >>  Sent: Friday, September 16, 2016 12:31 AM
> >>  Subject: Re: Performance and Latency Chart for Flink
> >>
> >> Hi,
> >>
> >> I am not aware of periodic performance runs for the Flink releases.
> >> I know a few benchmarks which have been published at different points in
> >> time like [1], [2], and [3] (you'll probably find more).
> >>
> >> In general, fair benchmarks that compare different systems (if there is
> >> such thing) are very difficult and the results often depend on the use
> >> case.
> >> IMO the best option is to run your own benchmarks, if you have a
> concrete
> >> use case.
> >>
> >> Best, Fabian
> >>
> >> [1] 08/2015:
> >> http://data-artisans.com/high-throughput-low-latency-and-exa
> >> ctly-once-stream-processing-with-apache-flink/
> >> [2] 12/2015:
> >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> >> streaming-computation-engines-at
> >> [3] 02/2016:
> >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> >>
> >>
> >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> >>
> >> > Hi
> >> >
> >> > Is there any performance run that is done for each Flink release? Or
> you
> >> > are aware of any third party evaluation of performance metrics for
> >> Flink?
> >> > I am interested in seeing how performance has improved over release to
> >> > release, and performance vs other competitors.
> >> >
> >> > Regards
> >> > Sumit Chawla
> >> >
> >>
> >>
> >>
> >>
> >
> >
>
>
>
>


   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
Hi Greg,In the same Flink config link below, there are parameters that dont even exist in flink-conf.yaml.Are they defined somewhere else?I grepped the followings & none existed in any of the files under conf folder."taskmanager.memory.fraction", taskmanager.memory.off-heap, taskmanager.memory.segment-size & many more.
Also, isnt the example calculating the network buffers wrong? Based on the example, roughly 5000 buffers x 32KiB = 160000 KiB should be allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example saying "the system would allocate roughly 300 MiBytes for network buffers." ?Thats roughly twice as much. Am i Missing something here?I still need your help to set the accurate number for my   
   - taskmanager.network.numberOfBuffers = 4096.

Thanks for your response Greg.Amir-      From: amir bahmanyari <[hidden email]>
 To: "[hidden email]" <[hidden email]>
 Sent: Monday, September 19, 2016 10:34 AM
 Subject: Re: Performance and Latency Chart for Flink
   
Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: Configuration

 
|  
|  
|  
|   |    |

  |

  |
|  
|   |  
Apache Flink 1.2-SNAPSHOT Documentation: Configuration
   |   |

  |

  |

 

4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 is there in the formula.What would you set it to? Once I have that number, I will set  "taskmanager.memory.preallocate" to true & will give it another shot.Thanks Greg

      From: Greg Hogan <[hidden email]>
 To: [hidden email]; amir bahmanyari <[hidden email]>
 Sent: Monday, September 19, 2016 8:29 AM
 Subject: Re: Performance and Latency Chart for Flink
 
Hi Amir,

You may see improved performance setting "taskmanager.memory.preallocate:
true" in order to use off-heap memory.

Also, your number of buffers looks quite low and you may want to increase
"taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB.

As this is a only benchmark are you able to post the code to github to
solicit feedback?

Greg

On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
[hidden email]> wrote:

> I have new findings & subsequently relative improvements.Am testing as we
> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> keep state somewhere. I went with Redis. I found it to be a major bottle
> neck as Beam nodes constantly are going across NW to update its
> repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> Then Kafka went out of disk space and the replication manager
> complained. So I clustered the two Kafka nodes hoping for sharing space. As
> of this second I am typing this email, its sustaining but only 1/2 of
> the 201401969  tuples have been processed after 3.5 hours.According to the
> Linear Road benchmarking expectations, if your system is working well, this
> whole 201401969  tuples must be done in 3.5 hrs max.So this means there is
> still room for tuning Flink nodes. I have already shared with you all more
> details about my config.It run perfect yesterday with almost 1/10th of this
> load. Perfect real-time send/processed streaming behavior.If thats the case
> & I cannot get better performance with FlinkRunner, my nest stop is
> SparkRunner and repeat of the whole thing for final benchmarking of the two
> under Beam APIs.Which was the initial intent anyways.If you have
> suggestions to make improvements in the above case, I am all ears & greatly
> appreciate it.Cheers,Amir-
>
>      From: "Chawla,Sumit" <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Sunday, September 18, 2016 2:07 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Has anyone else run these kind of benchmarks?  Would love to hear more
> people'e experience and details about those benchmarks.
>
> Regards
> Sumit Chawla
>
>
> On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> wrote:
>
> > Hi Amir
> >
> > Would it be possible for you to share the numbers? Also share if possible
> > your configuration details.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > [hidden email]> wrote:
> >
> >> Hi Fabian,FYI. This is report on other engines we did the same type of
> >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for
> >> your help.
> >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> >> linear-road-benchmark
> >> https://github.com/IBMStreams/benchmarks
> >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> >> ad-benchmark-in-apex/
> >>
> >>
> >>      From: Fabian Hueske <[hidden email]>
> >>  To: "[hidden email]" <[hidden email]>
> >>  Sent: Friday, September 16, 2016 12:31 AM
> >>  Subject: Re: Performance and Latency Chart for Flink
> >>
> >> Hi,
> >>
> >> I am not aware of periodic performance runs for the Flink releases.
> >> I know a few benchmarks which have been published at different points in
> >> time like [1], [2], and [3] (you'll probably find more).
> >>
> >> In general, fair benchmarks that compare different systems (if there is
> >> such thing) are very difficult and the results often depend on the use
> >> case.
> >> IMO the best option is to run your own benchmarks, if you have a
> concrete
> >> use case.
> >>
> >> Best, Fabian
> >>
> >> [1] 08/2015:
> >> http://data-artisans.com/high-throughput-low-latency-and-exa
> >> ctly-once-stream-processing-with-apache-flink/
> >> [2] 12/2015:
> >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> >> streaming-computation-engines-at
> >> [3] 02/2016:
> >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> >>
> >>
> >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> >>
> >> > Hi
> >> >
> >> > Is there any performance run that is done for each Flink release? Or
> you
> >> > are aware of any third party evaluation of performance metrics for
> >> Flink?
> >> > I am interested in seeing how performance has improved over release to
> >> > release, and performance vs other competitors.
> >> >
> >> > Regards
> >> > Sumit Chawla
> >> >
> >>
> >>
> >>
> >>
> >
> >
>
>
>
>


   

   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Greg Hogan
In reply to this post by amir bahmanyari
The nightly snapshots now include "[FLINK-4389] Expose metrics to
WebFrontend":
  https://flink.apache.org/contribute-code.html#snapshots-nightly-builds

For 1.2 we have metrics for "AvailableMemorySegments" and
"TotalMemorySegments":

https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#list-of-all-variables

However, when I download the snapshot and start a cluster with the default
configuration I am not seeing a value for this metric in the web UI.

An alternative is to configure the JMX reporter in flink-conf.yaml:

metrics.reporters: jmx_reporter
metrics.reporter.jmx_reporter.class:
org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx_reporter.port: 9020

You can then monitor the system for the number of used memory segments. Let
us know what you discover!

On Mon, Sep 19, 2016 at 1:34 PM, amir bahmanyari <
[hidden email]> wrote:

> Hi Greg,I used this guideline to calculate "taskmanager.
> network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation:
> Configuration
>
>
> |
> |
> |
> |   |    |
>
>    |
>
>   |
> |
> |   |
> Apache Flink 1.2-SNAPSHOT Documentation: Configuration
>    |   |
>
>   |
>
>   |
>
>
>
> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> is there in the formula.What would you set it to? Once I have that number,
> I will set  "taskmanager.memory.preallocate" to true & will give it
> another shot.Thanks Greg
>
>       From: Greg Hogan <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Monday, September 19, 2016 8:29 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi Amir,
>
> You may see improved performance setting "taskmanager.memory.preallocate:
> true" in order to use off-heap memory.
>
> Also, your number of buffers looks quite low and you may want to increase
> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> MiB.
>
> As this is a only benchmark are you able to post the code to github to
> solicit feedback?
>
> Greg
>
> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> [hidden email]> wrote:
>
> > I have new findings & subsequently relative improvements.Am testing as we
> > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> > keep state somewhere. I went with Redis. I found it to be a major bottle
> > neck as Beam nodes constantly are going across NW to update its
> > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> > Then Kafka went out of disk space and the replication manager
> > complained. So I clustered the two Kafka nodes hoping for sharing space.
> As
> > of this second I am typing this email, its sustaining but only 1/2 of
> > the 201401969  tuples have been processed after 3.5 hours.According to
> the
> > Linear Road benchmarking expectations, if your system is working well,
> this
> > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> is
> > still room for tuning Flink nodes. I have already shared with you all
> more
> > details about my config.It run perfect yesterday with almost 1/10th of
> this
> > load. Perfect real-time send/processed streaming behavior.If thats the
> case
> > & I cannot get better performance with FlinkRunner, my nest stop is
> > SparkRunner and repeat of the whole thing for final benchmarking of the
> two
> > under Beam APIs.Which was the initial intent anyways.If you have
> > suggestions to make improvements in the above case, I am all ears &
> greatly
> > appreciate it.Cheers,Amir-
> >
> >      From: "Chawla,Sumit" <[hidden email]>
> >  To: [hidden email]; amir bahmanyari <[hidden email]>
> >  Sent: Sunday, September 18, 2016 2:07 PM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Has anyone else run these kind of benchmarks?  Would love to hear more
> > people'e experience and details about those benchmarks.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> > wrote:
> >
> > > Hi Amir
> > >
> > > Would it be possible for you to share the numbers? Also share if
> possible
> > > your configuration details.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > [hidden email]> wrote:
> > >
> > >> Hi Fabian,FYI. This is report on other engines we did the same type of
> > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> for
> > >> your help.
> > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > >> linear-road-benchmark
> > >> https://github.com/IBMStreams/benchmarks
> > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > >> ad-benchmark-in-apex/
> > >>
> > >>
> > >>      From: Fabian Hueske <[hidden email]>
> > >>  To: "[hidden email]" <[hidden email]>
> > >>  Sent: Friday, September 16, 2016 12:31 AM
> > >>  Subject: Re: Performance and Latency Chart for Flink
> > >>
> > >> Hi,
> > >>
> > >> I am not aware of periodic performance runs for the Flink releases.
> > >> I know a few benchmarks which have been published at different points
> in
> > >> time like [1], [2], and [3] (you'll probably find more).
> > >>
> > >> In general, fair benchmarks that compare different systems (if there
> is
> > >> such thing) are very difficult and the results often depend on the use
> > >> case.
> > >> IMO the best option is to run your own benchmarks, if you have a
> > concrete
> > >> use case.
> > >>
> > >> Best, Fabian
> > >>
> > >> [1] 08/2015:
> > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > >> ctly-once-stream-processing-with-apache-flink/
> > >> [2] 12/2015:
> > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > >> streaming-computation-engines-at
> > >> [3] 02/2016:
> > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > >>
> > >>
> > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> > >>
> > >> > Hi
> > >> >
> > >> > Is there any performance run that is done for each Flink release? Or
> > you
> > >> > are aware of any third party evaluation of performance metrics for
> > >> Flink?
> > >> > I am interested in seeing how performance has improved over release
> to
> > >> > release, and performance vs other competitors.
> > >> >
> > >> > Regards
> > >> > Sumit Chawla
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
> >
> >
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Chesnay Schepler-3
It is normal that you don't see it in the WebInterface.

FLINK-4389 was only about exposing metrics *to* the WebInterface, not
exposing them *from* it.

Essentially, a metric travels from TaskManager -> WebInterface -> User.
FLINK-4389 was about the first arrow, which is a prerequisite step for
the second one.

Regards,
Chesnay

On 19.09.2016 21:35, Greg Hogan wrote:

> The nightly snapshots now include "[FLINK-4389] Expose metrics to
> WebFrontend":
>    https://flink.apache.org/contribute-code.html#snapshots-nightly-builds
>
> For 1.2 we have metrics for "AvailableMemorySegments" and
> "TotalMemorySegments":
>
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#list-of-all-variables
>
> However, when I download the snapshot and start a cluster with the default
> configuration I am not seeing a value for this metric in the web UI.
>
> An alternative is to configure the JMX reporter in flink-conf.yaml:
>
> metrics.reporters: jmx_reporter
> metrics.reporter.jmx_reporter.class:
> org.apache.flink.metrics.jmx.JMXReporter
> metrics.reporter.jmx_reporter.port: 9020
>
> You can then monitor the system for the number of used memory segments. Let
> us know what you discover!
>
> On Mon, Sep 19, 2016 at 1:34 PM, amir bahmanyari <
> [hidden email]> wrote:
>
>> Hi Greg,I used this guideline to calculate "taskmanager.
>> network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation:
>> Configuration
>>
>>
>> |
>> |
>> |
>> |   |    |
>>
>>     |
>>
>>    |
>> |
>> |   |
>> Apache Flink 1.2-SNAPSHOT Documentation: Configuration
>>     |   |
>>
>>    |
>>
>>    |
>>
>>
>>
>> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
>> is there in the formula.What would you set it to? Once I have that number,
>> I will set  "taskmanager.memory.preallocate" to true & will give it
>> another shot.Thanks Greg
>>
>>        From: Greg Hogan <[hidden email]>
>>   To: [hidden email]; amir bahmanyari <[hidden email]>
>>   Sent: Monday, September 19, 2016 8:29 AM
>>   Subject: Re: Performance and Latency Chart for Flink
>>
>> Hi Amir,
>>
>> You may see improved performance setting "taskmanager.memory.preallocate:
>> true" in order to use off-heap memory.
>>
>> Also, your number of buffers looks quite low and you may want to increase
>> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
>> MiB.
>>
>> As this is a only benchmark are you able to post the code to github to
>> solicit feedback?
>>
>> Greg
>>
>> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
>> [hidden email]> wrote:
>>
>>> I have new findings & subsequently relative improvements.Am testing as we
>>> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
>>> keep state somewhere. I went with Redis. I found it to be a major bottle
>>> neck as Beam nodes constantly are going across NW to update its
>>> repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
>>> Then Kafka went out of disk space and the replication manager
>>> complained. So I clustered the two Kafka nodes hoping for sharing space.
>> As
>>> of this second I am typing this email, its sustaining but only 1/2 of
>>> the 201401969  tuples have been processed after 3.5 hours.According to
>> the
>>> Linear Road benchmarking expectations, if your system is working well,
>> this
>>> whole 201401969  tuples must be done in 3.5 hrs max.So this means there
>> is
>>> still room for tuning Flink nodes. I have already shared with you all
>> more
>>> details about my config.It run perfect yesterday with almost 1/10th of
>> this
>>> load. Perfect real-time send/processed streaming behavior.If thats the
>> case
>>> & I cannot get better performance with FlinkRunner, my nest stop is
>>> SparkRunner and repeat of the whole thing for final benchmarking of the
>> two
>>> under Beam APIs.Which was the initial intent anyways.If you have
>>> suggestions to make improvements in the above case, I am all ears &
>> greatly
>>> appreciate it.Cheers,Amir-
>>>
>>>       From: "Chawla,Sumit" <[hidden email]>
>>>   To: [hidden email]; amir bahmanyari <[hidden email]>
>>>   Sent: Sunday, September 18, 2016 2:07 PM
>>>   Subject: Re: Performance and Latency Chart for Flink
>>>
>>> Has anyone else run these kind of benchmarks?  Would love to hear more
>>> people'e experience and details about those benchmarks.
>>>
>>> Regards
>>> Sumit Chawla
>>>
>>>
>>> On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
>>> wrote:
>>>
>>>> Hi Amir
>>>>
>>>> Would it be possible for you to share the numbers? Also share if
>> possible
>>>> your configuration details.
>>>>
>>>> Regards
>>>> Sumit Chawla
>>>>
>>>>
>>>> On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
>>>> [hidden email]> wrote:
>>>>
>>>>> Hi Fabian,FYI. This is report on other engines we did the same type of
>>>>> bench-marking.Also explains what Linear Road bench-marking is.Thanks
>> for
>>>>> your help.
>>>>> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
>>>>> linear-road-benchmark
>>>>> https://github.com/IBMStreams/benchmarks
>>>>> https://www.datatorrent.com/blog/blog-implementing-linear-ro
>>>>> ad-benchmark-in-apex/
>>>>>
>>>>>
>>>>>       From: Fabian Hueske <[hidden email]>
>>>>>   To: "[hidden email]" <[hidden email]>
>>>>>   Sent: Friday, September 16, 2016 12:31 AM
>>>>>   Subject: Re: Performance and Latency Chart for Flink
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am not aware of periodic performance runs for the Flink releases.
>>>>> I know a few benchmarks which have been published at different points
>> in
>>>>> time like [1], [2], and [3] (you'll probably find more).
>>>>>
>>>>> In general, fair benchmarks that compare different systems (if there
>> is
>>>>> such thing) are very difficult and the results often depend on the use
>>>>> case.
>>>>> IMO the best option is to run your own benchmarks, if you have a
>>> concrete
>>>>> use case.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> [1] 08/2015:
>>>>> http://data-artisans.com/high-throughput-low-latency-and-exa
>>>>> ctly-once-stream-processing-with-apache-flink/
>>>>> [2] 12/2015:
>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
>>>>> streaming-computation-engines-at
>>>>> [3] 02/2016:
>>>>> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>>>>>
>>>>>
>>>>> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Is there any performance run that is done for each Flink release? Or
>>> you
>>>>>> are aware of any third party evaluation of performance metrics for
>>>>> Flink?
>>>>>> I am interested in seeing how performance has improved over release
>> to
>>>>>> release, and performance vs other competitors.
>>>>>>
>>>>>> Regards
>>>>>> Sumit Chawla
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Greg Hogan
In reply to this post by amir bahmanyari
You will need to add the configuration parameters to your flink-conf.yaml.
I believe the intent is that all configuration parameters should be listed
at

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#full-reference

My understanding is that the Flink buffers are currently copied to Netty
buffers, although I don't understand the stated memory doubling.


On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari <
[hidden email]> wrote:

> Hi Greg,In the same Flink config link below, there are parameters that
> dont even exist in flink-conf.yaml.Are they defined somewhere else?I
> grepped the followings & none existed in any of the files under conf
> folder."taskmanager.memory.fraction", taskmanager.memory.off
> -heap, taskmanager.memory.segment-size & many more.
> Also, isnt the example calculating the network buffers wrong? Based on the
> example, roughly 5000 buffers x 32KiB = 160000 KiB should be
> allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example
> saying "the system would allocate roughly 300 MiBytes for network buffers."
> ?Thats roughly twice as much. Am i Missing something here?I still need your
> help to set the accurate number for my
>    - taskmanager.network.numberOfBuffers = 4096.
>
> Thanks for your response Greg.Amir-      From: amir bahmanyari <
> [hidden email]>
>  To: "[hidden email]" <[hidden email]>
>  Sent: Monday, September 19, 2016 10:34 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache
> Flink 1.2-SNAPSHOT Documentation: Configuration
>
>
> |
> |
> |
> |   |    |
>
>   |
>
>   |
> |
> |   |
> Apache Flink 1.2-SNAPSHOT Documentation: Configuration
>    |   |
>
>   |
>
>   |
>
>
>
> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> is there in the formula.What would you set it to? Once I have that number,
> I will set  "taskmanager.memory.preallocate" to true & will give it
> another shot.Thanks Greg
>
>       From: Greg Hogan <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Monday, September 19, 2016 8:29 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi Amir,
>
> You may see improved performance setting "taskmanager.memory.preallocate:
> true" in order to use off-heap memory.
>
> Also, your number of buffers looks quite low and you may want to increase
> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> MiB.
>
> As this is a only benchmark are you able to post the code to github to
> solicit feedback?
>
> Greg
>
> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> [hidden email]> wrote:
>
> > I have new findings & subsequently relative improvements.Am testing as we
> > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> > keep state somewhere. I went with Redis. I found it to be a major bottle
> > neck as Beam nodes constantly are going across NW to update its
> > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> > Then Kafka went out of disk space and the replication manager
> > complained. So I clustered the two Kafka nodes hoping for sharing space.
> As
> > of this second I am typing this email, its sustaining but only 1/2 of
> > the 201401969  tuples have been processed after 3.5 hours.According to
> the
> > Linear Road benchmarking expectations, if your system is working well,
> this
> > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> is
> > still room for tuning Flink nodes. I have already shared with you all
> more
> > details about my config.It run perfect yesterday with almost 1/10th of
> this
> > load. Perfect real-time send/processed streaming behavior.If thats the
> case
> > & I cannot get better performance with FlinkRunner, my nest stop is
> > SparkRunner and repeat of the whole thing for final benchmarking of the
> two
> > under Beam APIs.Which was the initial intent anyways.If you have
> > suggestions to make improvements in the above case, I am all ears &
> greatly
> > appreciate it.Cheers,Amir-
> >
> >      From: "Chawla,Sumit" <[hidden email]>
> >  To: [hidden email]; amir bahmanyari <[hidden email]>
> >  Sent: Sunday, September 18, 2016 2:07 PM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Has anyone else run these kind of benchmarks?  Would love to hear more
> > people'e experience and details about those benchmarks.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> > wrote:
> >
> > > Hi Amir
> > >
> > > Would it be possible for you to share the numbers? Also share if
> possible
> > > your configuration details.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > [hidden email]> wrote:
> > >
> > >> Hi Fabian,FYI. This is report on other engines we did the same type of
> > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> for
> > >> your help.
> > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > >> linear-road-benchmark
> > >> https://github.com/IBMStreams/benchmarks
> > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > >> ad-benchmark-in-apex/
> > >>
> > >>
> > >>      From: Fabian Hueske <[hidden email]>
> > >>  To: "[hidden email]" <[hidden email]>
> > >>  Sent: Friday, September 16, 2016 12:31 AM
> > >>  Subject: Re: Performance and Latency Chart for Flink
> > >>
> > >> Hi,
> > >>
> > >> I am not aware of periodic performance runs for the Flink releases.
> > >> I know a few benchmarks which have been published at different points
> in
> > >> time like [1], [2], and [3] (you'll probably find more).
> > >>
> > >> In general, fair benchmarks that compare different systems (if there
> is
> > >> such thing) are very difficult and the results often depend on the use
> > >> case.
> > >> IMO the best option is to run your own benchmarks, if you have a
> > concrete
> > >> use case.
> > >>
> > >> Best, Fabian
> > >>
> > >> [1] 08/2015:
> > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > >> ctly-once-stream-processing-with-apache-flink/
> > >> [2] 12/2015:
> > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > >> streaming-computation-engines-at
> > >> [3] 02/2016:
> > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > >>
> > >>
> > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> > >>
> > >> > Hi
> > >> >
> > >> > Is there any performance run that is done for each Flink release? Or
> > you
> > >> > are aware of any third party evaluation of performance metrics for
> > >> Flink?
> > >> > I am interested in seeing how performance has improved over release
> to
> > >> > release, and performance vs other competitors.
> > >> >
> > >> > Regards
> > >> > Sumit Chawla
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
> >
> >
> >
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Greg Hogan
In reply to this post by Chesnay Schepler-3
Excellent!

On Mon, Sep 19, 2016 at 3:43 PM, Chesnay Schepler <[hidden email]>
wrote:

> It is normal that you don't see it in the WebInterface.
>
> FLINK-4389 was only about exposing metrics *to* the WebInterface, not
> exposing them *from* it.
>
> Essentially, a metric travels from TaskManager -> WebInterface -> User.
> FLINK-4389 was about the first arrow, which is a prerequisite step for the
> second one.
>
> Regards,
> Chesnay
>
>
> On 19.09.2016 21:35, Greg Hogan wrote:
>
>> The nightly snapshots now include "[FLINK-4389] Expose metrics to
>> WebFrontend":
>>    https://flink.apache.org/contribute-code.html#snapshots-nightly-builds
>>
>> For 1.2 we have metrics for "AvailableMemorySegments" and
>> "TotalMemorySegments":
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/
>> monitoring/metrics.html#list-of-all-variables
>>
>> However, when I download the snapshot and start a cluster with the default
>> configuration I am not seeing a value for this metric in the web UI.
>>
>> An alternative is to configure the JMX reporter in flink-conf.yaml:
>>
>> metrics.reporters: jmx_reporter
>> metrics.reporter.jmx_reporter.class:
>> org.apache.flink.metrics.jmx.JMXReporter
>> metrics.reporter.jmx_reporter.port: 9020
>>
>> You can then monitor the system for the number of used memory segments.
>> Let
>> us know what you discover!
>>
>> On Mon, Sep 19, 2016 at 1:34 PM, amir bahmanyari <
>> [hidden email]> wrote:
>>
>> Hi Greg,I used this guideline to calculate "taskmanager.
>>> network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation:
>>> Configuration
>>>
>>>
>>> |
>>> |
>>> |
>>> |   |    |
>>>
>>>     |
>>>
>>>    |
>>> |
>>> |   |
>>> Apache Flink 1.2-SNAPSHOT Documentation: Configuration
>>>     |   |
>>>
>>>    |
>>>
>>>    |
>>>
>>>
>>>
>>> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
>>> is there in the formula.What would you set it to? Once I have that
>>> number,
>>> I will set  "taskmanager.memory.preallocate" to true & will give it
>>> another shot.Thanks Greg
>>>
>>>        From: Greg Hogan <[hidden email]>
>>>   To: [hidden email]; amir bahmanyari <[hidden email]>
>>>   Sent: Monday, September 19, 2016 8:29 AM
>>>   Subject: Re: Performance and Latency Chart for Flink
>>>
>>> Hi Amir,
>>>
>>> You may see improved performance setting "taskmanager.memory.preallocat
>>> e:
>>> true" in order to use off-heap memory.
>>>
>>> Also, your number of buffers looks quite low and you may want to increase
>>> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
>>> MiB.
>>>
>>> As this is a only benchmark are you able to post the code to github to
>>> solicit feedback?
>>>
>>> Greg
>>>
>>> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
>>> [hidden email]> wrote:
>>>
>>> I have new findings & subsequently relative improvements.Am testing as we
>>>> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
>>>> keep state somewhere. I went with Redis. I found it to be a major bottle
>>>> neck as Beam nodes constantly are going across NW to update its
>>>> repository.So I replaced Redis with Java Concurrenthashmaps. Must
>>>> faster.
>>>> Then Kafka went out of disk space and the replication manager
>>>> complained. So I clustered the two Kafka nodes hoping for sharing space.
>>>>
>>> As
>>>
>>>> of this second I am typing this email, its sustaining but only 1/2 of
>>>> the 201401969  tuples have been processed after 3.5 hours.According to
>>>>
>>> the
>>>
>>>> Linear Road benchmarking expectations, if your system is working well,
>>>>
>>> this
>>>
>>>> whole 201401969  tuples must be done in 3.5 hrs max.So this means there
>>>>
>>> is
>>>
>>>> still room for tuning Flink nodes. I have already shared with you all
>>>>
>>> more
>>>
>>>> details about my config.It run perfect yesterday with almost 1/10th of
>>>>
>>> this
>>>
>>>> load. Perfect real-time send/processed streaming behavior.If thats the
>>>>
>>> case
>>>
>>>> & I cannot get better performance with FlinkRunner, my nest stop is
>>>> SparkRunner and repeat of the whole thing for final benchmarking of the
>>>>
>>> two
>>>
>>>> under Beam APIs.Which was the initial intent anyways.If you have
>>>> suggestions to make improvements in the above case, I am all ears &
>>>>
>>> greatly
>>>
>>>> appreciate it.Cheers,Amir-
>>>>
>>>>       From: "Chawla,Sumit" <[hidden email]>
>>>>   To: [hidden email]; amir bahmanyari <[hidden email]>
>>>>   Sent: Sunday, September 18, 2016 2:07 PM
>>>>   Subject: Re: Performance and Latency Chart for Flink
>>>>
>>>> Has anyone else run these kind of benchmarks?  Would love to hear more
>>>> people'e experience and details about those benchmarks.
>>>>
>>>> Regards
>>>> Sumit Chawla
>>>>
>>>>
>>>> On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
>>>> wrote:
>>>>
>>>> Hi Amir
>>>>>
>>>>> Would it be possible for you to share the numbers? Also share if
>>>>>
>>>> possible
>>>
>>>> your configuration details.
>>>>>
>>>>> Regards
>>>>> Sumit Chawla
>>>>>
>>>>>
>>>>> On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> Hi Fabian,FYI. This is report on other engines we did the same type of
>>>>>> bench-marking.Also explains what Linear Road bench-marking is.Thanks
>>>>>>
>>>>> for
>>>
>>>> your help.
>>>>>> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
>>>>>> linear-road-benchmark
>>>>>> https://github.com/IBMStreams/benchmarks
>>>>>> https://www.datatorrent.com/blog/blog-implementing-linear-ro
>>>>>> ad-benchmark-in-apex/
>>>>>>
>>>>>>
>>>>>>       From: Fabian Hueske <[hidden email]>
>>>>>>   To: "[hidden email]" <[hidden email]>
>>>>>>   Sent: Friday, September 16, 2016 12:31 AM
>>>>>>   Subject: Re: Performance and Latency Chart for Flink
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am not aware of periodic performance runs for the Flink releases.
>>>>>> I know a few benchmarks which have been published at different points
>>>>>>
>>>>> in
>>>
>>>> time like [1], [2], and [3] (you'll probably find more).
>>>>>>
>>>>>> In general, fair benchmarks that compare different systems (if there
>>>>>>
>>>>> is
>>>
>>>> such thing) are very difficult and the results often depend on the use
>>>>>> case.
>>>>>> IMO the best option is to run your own benchmarks, if you have a
>>>>>>
>>>>> concrete
>>>>
>>>>> use case.
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>> [1] 08/2015:
>>>>>> http://data-artisans.com/high-throughput-low-latency-and-exa
>>>>>> ctly-once-stream-processing-with-apache-flink/
>>>>>> [2] 12/2015:
>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
>>>>>> streaming-computation-engines-at
>>>>>> [3] 02/2016:
>>>>>> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
>>>>>>
>>>>>>
>>>>>> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
>>>>>>
>>>>>> Hi
>>>>>>>
>>>>>>> Is there any performance run that is done for each Flink release? Or
>>>>>>>
>>>>>> you
>>>>
>>>>> are aware of any third party evaluation of performance metrics for
>>>>>>>
>>>>>> Flink?
>>>>>>
>>>>>>> I am interested in seeing how performance has improved over release
>>>>>>>
>>>>>> to
>>>
>>>> release, and performance vs other competitors.
>>>>>>>
>>>>>>> Regards
>>>>>>> Sumit Chawla
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
In reply to this post by Greg Hogan
Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I followed that formula :-)))I can bump it up to twice as much like what the example is doing to for instance 300 MiB.Is this reasonable? what do you suggest as a reasonable range?Thanks Greg

      From: Greg Hogan <[hidden email]>
 To: [hidden email]; amir bahmanyari <[hidden email]>
 Sent: Monday, September 19, 2016 12:43 PM
 Subject: Re: Performance and Latency Chart for Flink
   
You will need to add the configuration parameters to your flink-conf.yaml.
I believe the intent is that all configuration parameters should be listed
at

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#full-reference

My understanding is that the Flink buffers are currently copied to Netty
buffers, although I don't understand the stated memory doubling.


On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari <
[hidden email]> wrote:

> Hi Greg,In the same Flink config link below, there are parameters that
> dont even exist in flink-conf.yaml.Are they defined somewhere else?I
> grepped the followings & none existed in any of the files under conf
> folder."taskmanager.memory.fraction", taskmanager.memory.off
> -heap, taskmanager.memory.segment-size & many more.
> Also, isnt the example calculating the network buffers wrong? Based on the
> example, roughly 5000 buffers x 32KiB = 160000 KiB should be
> allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example
> saying "the system would allocate roughly 300 MiBytes for network buffers."
> ?Thats roughly twice as much. Am i Missing something here?I still need your
> help to set the accurate number for my
>    - taskmanager.network.numberOfBuffers = 4096.
>
> Thanks for your response Greg.Amir-      From: amir bahmanyari <
> [hidden email]>
>  To: "[hidden email]" <[hidden email]>
>  Sent: Monday, September 19, 2016 10:34 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache
> Flink 1.2-SNAPSHOT Documentation: Configuration
>
>
> |
> |
> |
> |  |    |
>
>  |
>
>  |
> |
> |  |
> Apache Flink 1.2-SNAPSHOT Documentation: Configuration
>    |  |
>
>  |
>
>  |
>
>
>
> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> is there in the formula.What would you set it to? Once I have that number,
> I will set  "taskmanager.memory.preallocate" to true & will give it
> another shot.Thanks Greg
>
>      From: Greg Hogan <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Monday, September 19, 2016 8:29 AM
>  Subject: Re: Performance and Latency Chart for Flink
>
> Hi Amir,
>
> You may see improved performance setting "taskmanager.memory.preallocate:
> true" in order to use off-heap memory.
>
> Also, your number of buffers looks quite low and you may want to increase
> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> MiB.
>
> As this is a only benchmark are you able to post the code to github to
> solicit feedback?
>
> Greg
>
> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> [hidden email]> wrote:
>
> > I have new findings & subsequently relative improvements.Am testing as we
> > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had
> > keep state somewhere. I went with Redis. I found it to be a major bottle
> > neck as Beam nodes constantly are going across NW to update its
> > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster.
> > Then Kafka went out of disk space and the replication manager
> > complained. So I clustered the two Kafka nodes hoping for sharing space.
> As
> > of this second I am typing this email, its sustaining but only 1/2 of
> > the 201401969  tuples have been processed after 3.5 hours.According to
> the
> > Linear Road benchmarking expectations, if your system is working well,
> this
> > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> is
> > still room for tuning Flink nodes. I have already shared with you all
> more
> > details about my config.It run perfect yesterday with almost 1/10th of
> this
> > load. Perfect real-time send/processed streaming behavior.If thats the
> case
> > & I cannot get better performance with FlinkRunner, my nest stop is
> > SparkRunner and repeat of the whole thing for final benchmarking of the
> two
> > under Beam APIs.Which was the initial intent anyways.If you have
> > suggestions to make improvements in the above case, I am all ears &
> greatly
> > appreciate it.Cheers,Amir-
> >
> >      From: "Chawla,Sumit" <[hidden email]>
> >  To: [hidden email]; amir bahmanyari <[hidden email]>
> >  Sent: Sunday, September 18, 2016 2:07 PM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Has anyone else run these kind of benchmarks?  Would love to hear more
> > people'e experience and details about those benchmarks.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> > wrote:
> >
> > > Hi Amir
> > >
> > > Would it be possible for you to share the numbers? Also share if
> possible
> > > your configuration details.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > [hidden email]> wrote:
> > >
> > >> Hi Fabian,FYI. This is report on other engines we did the same type of
> > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> for
> > >> your help.
> > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > >> linear-road-benchmark
> > >> https://github.com/IBMStreams/benchmarks
> > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > >> ad-benchmark-in-apex/
> > >>
> > >>
> > >>      From: Fabian Hueske <[hidden email]>
> > >>  To: "[hidden email]" <[hidden email]>
> > >>  Sent: Friday, September 16, 2016 12:31 AM
> > >>  Subject: Re: Performance and Latency Chart for Flink
> > >>
> > >> Hi,
> > >>
> > >> I am not aware of periodic performance runs for the Flink releases.
> > >> I know a few benchmarks which have been published at different points
> in
> > >> time like [1], [2], and [3] (you'll probably find more).
> > >>
> > >> In general, fair benchmarks that compare different systems (if there
> is
> > >> such thing) are very difficult and the results often depend on the use
> > >> case.
> > >> IMO the best option is to run your own benchmarks, if you have a
> > concrete
> > >> use case.
> > >>
> > >> Best, Fabian
> > >>
> > >> [1] 08/2015:
> > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > >> ctly-once-stream-processing-with-apache-flink/
> > >> [2] 12/2015:
> > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > >> streaming-computation-engines-at
> > >> [3] 02/2016:
> > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > >>
> > >>
> > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> > >>
> > >> > Hi
> > >> >
> > >> > Is there any performance run that is done for each Flink release? Or
> > you
> > >> > are aware of any third party evaluation of performance metrics for
> > >> Flink?
> > >> > I am interested in seeing how performance has improved over release
> to
> > >> > release, and performance vs other competitors.
> > >> >
> > >> > Regards
> > >> > Sumit Chawla
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
> >
> >
> >
>
>
>
>
>
>


   
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

Greg Hogan
My thought would be to compare the data rate and buffer sizes which gives a
refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of
network buffers then the refresh rate is at most 1/8 second. There is the
same consideration with spill files if the system does not have sufficient
free memory for a large number of readahead buffers. Another set of buffers
are the kernel socket buffers and you can increase from the Linux default 4
MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is
in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig).

Your nodes have 100+ GB of memory so a conservative assignment might be a
gigabyte of network buffers. Then add the following to the conf, restart
the cluster, start jconsole on a TaskManager, connect to the TaskManager
process, and on the MBeans tab look under org.apache.flink.metrics for
Network.AvailableMemorySegments.

metrics.reporters: my_jmx_reporter
metrics.reporter.my_jmx_reporter.class:
org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.my_jmx_reporter.port: 9020-9040


On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari <
[hidden email]> wrote:

> Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I
> followed that formula :-)))I can bump it up to twice as much like what the
> example is doing to for instance 300 MiB.Is this reasonable? what do you
> suggest as a reasonable range?Thanks Greg
>
>       From: Greg Hogan <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Monday, September 19, 2016 12:43 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> You will need to add the configuration parameters to your flink-conf.yaml.
> I believe the intent is that all configuration parameters should be listed
> at
>
> https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html#full-reference
>
> My understanding is that the Flink buffers are currently copied to Netty
> buffers, although I don't understand the stated memory doubling.
>
>
> On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari <
> [hidden email]> wrote:
>
> > Hi Greg,In the same Flink config link below, there are parameters that
> > dont even exist in flink-conf.yaml.Are they defined somewhere else?I
> > grepped the followings & none existed in any of the files under conf
> > folder."taskmanager.memory.fraction", taskmanager.memory.off
> > -heap, taskmanager.memory.segment-size & many more.
> > Also, isnt the example calculating the network buffers wrong? Based on
> the
> > example, roughly 5000 buffers x 32KiB = 160000 KiB should be
> > allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example
> > saying "the system would allocate roughly 300 MiBytes for network
> buffers."
> > ?Thats roughly twice as much. Am i Missing something here?I still need
> your
> > help to set the accurate number for my
> >    - taskmanager.network.numberOfBuffers = 4096.
> >
> > Thanks for your response Greg.Amir-      From: amir bahmanyari <
> > [hidden email]>
> >  To: "[hidden email]" <[hidden email]>
> >  Sent: Monday, September 19, 2016 10:34 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Greg,I used this guideline to calculate "taskmanager.network.
> numberOfBuffers":Apache
> > Flink 1.2-SNAPSHOT Documentation: Configuration
> >
> >
> > |
> > |
> > |
> > |  |    |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Apache Flink 1.2-SNAPSHOT Documentation: Configuration
> >    |  |
> >
> >  |
> >
> >  |
> >
> >
> >
> > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> > is there in the formula.What would you set it to? Once I have that
> number,
> > I will set  "taskmanager.memory.preallocate" to true & will give it
> > another shot.Thanks Greg
> >
> >      From: Greg Hogan <[hidden email]>
> >  To: [hidden email]; amir bahmanyari <[hidden email]>
> >  Sent: Monday, September 19, 2016 8:29 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Amir,
> >
> > You may see improved performance setting "taskmanager.memory.
> preallocate:
> > true" in order to use off-heap memory.
> >
> > Also, your number of buffers looks quite low and you may want to increase
> > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> > MiB.
> >
> > As this is a only benchmark are you able to post the code to github to
> > solicit feedback?
> >
> > Greg
> >
> > On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> > [hidden email]> wrote:
> >
> > > I have new findings & subsequently relative improvements.Am testing as
> we
> > > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I
> had
> > > keep state somewhere. I went with Redis. I found it to be a major
> bottle
> > > neck as Beam nodes constantly are going across NW to update its
> > > repository.So I replaced Redis with Java Concurrenthashmaps. Must
> faster.
> > > Then Kafka went out of disk space and the replication manager
> > > complained. So I clustered the two Kafka nodes hoping for sharing
> space.
> > As
> > > of this second I am typing this email, its sustaining but only 1/2 of
> > > the 201401969  tuples have been processed after 3.5 hours.According to
> > the
> > > Linear Road benchmarking expectations, if your system is working well,
> > this
> > > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> > is
> > > still room for tuning Flink nodes. I have already shared with you all
> > more
> > > details about my config.It run perfect yesterday with almost 1/10th of
> > this
> > > load. Perfect real-time send/processed streaming behavior.If thats the
> > case
> > > & I cannot get better performance with FlinkRunner, my nest stop is
> > > SparkRunner and repeat of the whole thing for final benchmarking of the
> > two
> > > under Beam APIs.Which was the initial intent anyways.If you have
> > > suggestions to make improvements in the above case, I am all ears &
> > greatly
> > > appreciate it.Cheers,Amir-
> > >
> > >      From: "Chawla,Sumit" <[hidden email]>
> > >  To: [hidden email]; amir bahmanyari <[hidden email]>
> > >  Sent: Sunday, September 18, 2016 2:07 PM
> > >  Subject: Re: Performance and Latency Chart for Flink
> > >
> > > Has anyone else run these kind of benchmarks?  Would love to hear more
> > > people'e experience and details about those benchmarks.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> > > wrote:
> > >
> > > > Hi Amir
> > > >
> > > > Would it be possible for you to share the numbers? Also share if
> > possible
> > > > your configuration details.
> > > >
> > > > Regards
> > > > Sumit Chawla
> > > >
> > > >
> > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > > [hidden email]> wrote:
> > > >
> > > >> Hi Fabian,FYI. This is report on other engines we did the same type
> of
> > > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> > for
> > > >> your help.
> > > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > > >> linear-road-benchmark
> > > >> https://github.com/IBMStreams/benchmarks
> > > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > > >> ad-benchmark-in-apex/
> > > >>
> > > >>
> > > >>      From: Fabian Hueske <[hidden email]>
> > > >>  To: "[hidden email]" <[hidden email]>
> > > >>  Sent: Friday, September 16, 2016 12:31 AM
> > > >>  Subject: Re: Performance and Latency Chart for Flink
> > > >>
> > > >> Hi,
> > > >>
> > > >> I am not aware of periodic performance runs for the Flink releases.
> > > >> I know a few benchmarks which have been published at different
> points
> > in
> > > >> time like [1], [2], and [3] (you'll probably find more).
> > > >>
> > > >> In general, fair benchmarks that compare different systems (if there
> > is
> > > >> such thing) are very difficult and the results often depend on the
> use
> > > >> case.
> > > >> IMO the best option is to run your own benchmarks, if you have a
> > > concrete
> > > >> use case.
> > > >>
> > > >> Best, Fabian
> > > >>
> > > >> [1] 08/2015:
> > > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > > >> ctly-once-stream-processing-with-apache-flink/
> > > >> [2] 12/2015:
> > > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > > >> streaming-computation-engines-at
> > > >> [3] 02/2016:
> > > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > > >>
> > > >>
> > > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > Is there any performance run that is done for each Flink release?
> Or
> > > you
> > > >> > are aware of any third party evaluation of performance metrics for
> > > >> Flink?
> > > >> > I am interested in seeing how performance has improved over
> release
> > to
> > > >> > release, and performance vs other competitors.
> > > >> >
> > > >> > Regards
> > > >> > Sumit Chawla
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance and Latency Chart for Flink

amir bahmanyari
Hi Greg,Setting  "taskmanager.memory.preallocate" to true caused "Association with remote system [akka.tcp://flink@" "has failed" "[Disassociated]" on all TMs.Changed it back to false.I increased the NW buffers to 1 G & started to get TM slots  exceptions. 
So I am going incremental with that value. Have it set at 8192 (twice as much as before 4096).Thanks

      From: Greg Hogan <[hidden email]>
 To: [hidden email]; amir bahmanyari <[hidden email]>
 Sent: Monday, September 19, 2016 1:28 PM
 Subject: Re: Performance and Latency Chart for Flink
   
My thought would be to compare the data rate and buffer sizes which gives a
refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of
network buffers then the refresh rate is at most 1/8 second. There is the
same consideration with spill files if the system does not have sufficient
free memory for a large number of readahead buffers. Another set of buffers
are the kernel socket buffers and you can increase from the Linux default 4
MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is
in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig).

Your nodes have 100+ GB of memory so a conservative assignment might be a
gigabyte of network buffers. Then add the following to the conf, restart
the cluster, start jconsole on a TaskManager, connect to the TaskManager
process, and on the MBeans tab look under org.apache.flink.metrics for
Network.AvailableMemorySegments.

metrics.reporters: my_jmx_reporter
metrics.reporter.my_jmx_reporter.class:
org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.my_jmx_reporter.port: 9020-9040


On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari <
[hidden email]> wrote:

> Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I
> followed that formula :-)))I can bump it up to twice as much like what the
> example is doing to for instance 300 MiB.Is this reasonable? what do you
> suggest as a reasonable range?Thanks Greg
>
>      From: Greg Hogan <[hidden email]>
>  To: [hidden email]; amir bahmanyari <[hidden email]>
>  Sent: Monday, September 19, 2016 12:43 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> You will need to add the configuration parameters to your flink-conf.yaml.
> I believe the intent is that all configuration parameters should be listed
> at
>
> https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html#full-reference
>
> My understanding is that the Flink buffers are currently copied to Netty
> buffers, although I don't understand the stated memory doubling.
>
>
> On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari <
> [hidden email]> wrote:
>
> > Hi Greg,In the same Flink config link below, there are parameters that
> > dont even exist in flink-conf.yaml.Are they defined somewhere else?I
> > grepped the followings & none existed in any of the files under conf
> > folder."taskmanager.memory.fraction", taskmanager.memory.off
> > -heap, taskmanager.memory.segment-size & many more.
> > Also, isnt the example calculating the network buffers wrong? Based on
> the
> > example, roughly 5000 buffers x 32KiB = 160000 KiB should be
> > allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example
> > saying "the system would allocate roughly 300 MiBytes for network
> buffers."
> > ?Thats roughly twice as much. Am i Missing something here?I still need
> your
> > help to set the accurate number for my
> >    - taskmanager.network.numberOfBuffers = 4096.
> >
> > Thanks for your response Greg.Amir-      From: amir bahmanyari <
> > [hidden email]>
> >  To: "[hidden email]" <[hidden email]>
> >  Sent: Monday, September 19, 2016 10:34 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Greg,I used this guideline to calculate "taskmanager.network.
> numberOfBuffers":Apache
> > Flink 1.2-SNAPSHOT Documentation: Configuration
> >
> >
> > |
> > |
> > |
> > |  |    |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Apache Flink 1.2-SNAPSHOT Documentation: Configuration
> >    |  |
> >
> >  |
> >
> >  |
> >
> >
> >
> > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> > is there in the formula.What would you set it to? Once I have that
> number,
> > I will set  "taskmanager.memory.preallocate" to true & will give it
> > another shot.Thanks Greg
> >
> >      From: Greg Hogan <[hidden email]>
> >  To: [hidden email]; amir bahmanyari <[hidden email]>
> >  Sent: Monday, September 19, 2016 8:29 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Amir,
> >
> > You may see improved performance setting "taskmanager.memory.
> preallocate:
> > true" in order to use off-heap memory.
> >
> > Also, your number of buffers looks quite low and you may want to increase
> > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> > MiB.
> >
> > As this is a only benchmark are you able to post the code to github to
> > solicit feedback?
> >
> > Greg
> >
> > On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> > [hidden email]> wrote:
> >
> > > I have new findings & subsequently relative improvements.Am testing as
> we
> > > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I
> had
> > > keep state somewhere. I went with Redis. I found it to be a major
> bottle
> > > neck as Beam nodes constantly are going across NW to update its
> > > repository.So I replaced Redis with Java Concurrenthashmaps. Must
> faster.
> > > Then Kafka went out of disk space and the replication manager
> > > complained. So I clustered the two Kafka nodes hoping for sharing
> space.
> > As
> > > of this second I am typing this email, its sustaining but only 1/2 of
> > > the 201401969  tuples have been processed after 3.5 hours.According to
> > the
> > > Linear Road benchmarking expectations, if your system is working well,
> > this
> > > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> > is
> > > still room for tuning Flink nodes. I have already shared with you all
> > more
> > > details about my config.It run perfect yesterday with almost 1/10th of
> > this
> > > load. Perfect real-time send/processed streaming behavior.If thats the
> > case
> > > & I cannot get better performance with FlinkRunner, my nest stop is
> > > SparkRunner and repeat of the whole thing for final benchmarking of the
> > two
> > > under Beam APIs.Which was the initial intent anyways.If you have
> > > suggestions to make improvements in the above case, I am all ears &
> > greatly
> > > appreciate it.Cheers,Amir-
> > >
> > >      From: "Chawla,Sumit" <[hidden email]>
> > >  To: [hidden email]; amir bahmanyari <[hidden email]>
> > >  Sent: Sunday, September 18, 2016 2:07 PM
> > >  Subject: Re: Performance and Latency Chart for Flink
> > >
> > > Has anyone else run these kind of benchmarks?  Would love to hear more
> > > people'e experience and details about those benchmarks.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[hidden email]>
> > > wrote:
> > >
> > > > Hi Amir
> > > >
> > > > Would it be possible for you to share the numbers? Also share if
> > possible
> > > > your configuration details.
> > > >
> > > > Regards
> > > > Sumit Chawla
> > > >
> > > >
> > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > > [hidden email]> wrote:
> > > >
> > > >> Hi Fabian,FYI. This is report on other engines we did the same type
> of
> > > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> > for
> > > >> your help.
> > > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > > >> linear-road-benchmark
> > > >> https://github.com/IBMStreams/benchmarks
> > > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > > >> ad-benchmark-in-apex/
> > > >>
> > > >>
> > > >>      From: Fabian Hueske <[hidden email]>
> > > >>  To: "[hidden email]" <[hidden email]>
> > > >>  Sent: Friday, September 16, 2016 12:31 AM
> > > >>  Subject: Re: Performance and Latency Chart for Flink
> > > >>
> > > >> Hi,
> > > >>
> > > >> I am not aware of periodic performance runs for the Flink releases.
> > > >> I know a few benchmarks which have been published at different
> points
> > in
> > > >> time like [1], [2], and [3] (you'll probably find more).
> > > >>
> > > >> In general, fair benchmarks that compare different systems (if there
> > is
> > > >> such thing) are very difficult and the results often depend on the
> use
> > > >> case.
> > > >> IMO the best option is to run your own benchmarks, if you have a
> > > concrete
> > > >> use case.
> > > >>
> > > >> Best, Fabian
> > > >>
> > > >> [1] 08/2015:
> > > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > > >> ctly-once-stream-processing-with-apache-flink/
> > > >> [2] 12/2015:
> > > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > > >> streaming-computation-engines-at
> > > >> [3] 02/2016:
> > > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > > >>
> > > >>
> > > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[hidden email]>:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > Is there any performance run that is done for each Flink release?
> Or
> > > you
> > > >> > are aware of any third party evaluation of performance metrics for
> > > >> Flink?
> > > >> > I am interested in seeing how performance has improved over
> release
> > to
> > > >> > release, and performance vs other competitors.
> > > >> >
> > > >> > Regards
> > > >> > Sumit Chawla
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>