(DEPRECATED) Apache Flink Mailing List archive.

Backpressure rationale

Classic

List

Threaded

2 messages Options

Gábor Gévay

Backpressure rationale

Hello,

I would like to ask about the rationale behind the backpressure
mechanism in Flink.

As I understand it, backpressure is for handling the problem of one
operator (or source) producing records faster then the next operator
can consume them. However, an alternative solution would be to have a
potentially "infinite" buffer for the incoming records of an operator,
by spilling the buffer to disk when it is getting too large. Could you
tell me why is backpressure considered a better option?

Also, I would be interested in whether going with backpressure was a
streaming-specific decision, or do you think that having backpressure
is also better in batch jobs?

Thanks,
Gábor

Dean Wampler

Re: Backpressure rationale

You can find longer explanations for backpressure on teh Interwebs, but
briefly,

1. Buffers are never infinite. Even if you're willing to write and maintain
a nontrivial scheme for flushing to disk, you might still have a major
failure scenario where the consumer is gone forever. Depending on volume
and disk size, you could run out of space quickly. Then what should you do?
Does this low-level producer know what makes sense from a strategic
perspective? The answers lead you to...
2. Backpressure composes. If you have a graph of backpressure-enabled
streams, when one slows down, the backpressure can propagate up stream.
Somewhere at the beginning of the graph, someone has to decide what to do
about a major slow down, like in the previous scenario, but now it's a
strategic concern that you can solve at the architecture level, rather than
forcing some low-level stream component make an arbitrary decision about
what to do.

dean

On Thu, Jan 12, 2017 at 9:36 AM, Gábor Gévay <[hidden email]> wrote:

> Hello,
>
> I would like to ask about the rationale behind the backpressure
> mechanism in Flink.
>
> As I understand it, backpressure is for handling the problem of one
> operator (or source) producing records faster then the next operator
> can consume them. However, an alternative solution would be to have a
> potentially "infinite" buffer for the incoming records of an operator,
> by spilling the buffer to disk when it is getting too large. Could you
> tell me why is backpressure considered a better option?
>
> Also, I would be interested in whether going with backpressure was a
> streaming-specific decision, or do you think that having backpressure
> is also better in batch jobs?
>
> Thanks,
> Gábor
>

--
*Dean Wampler, Ph.D.*
Fast Data Product Architect, Office of the CTO

[hidden email]
Lightbend <http://lightbend.com>
@deanwampler <http://twitter.com/deanwampler>
https://www.linkedin.com/in/deanwampler