Hello,
I would like to ask about the rationale behind the backpressure mechanism in Flink. As I understand it, backpressure is for handling the problem of one operator (or source) producing records faster then the next operator can consume them. However, an alternative solution would be to have a potentially "infinite" buffer for the incoming records of an operator, by spilling the buffer to disk when it is getting too large. Could you tell me why is backpressure considered a better option? Also, I would be interested in whether going with backpressure was a streaming-specific decision, or do you think that having backpressure is also better in batch jobs? Thanks, Gábor |
You can find longer explanations for backpressure on teh Interwebs, but
briefly, 1. Buffers are never infinite. Even if you're willing to write and maintain a nontrivial scheme for flushing to disk, you might still have a major failure scenario where the consumer is gone forever. Depending on volume and disk size, you could run out of space quickly. Then what should you do? Does this low-level producer know what makes sense from a strategic perspective? The answers lead you to... 2. Backpressure composes. If you have a graph of backpressure-enabled streams, when one slows down, the backpressure can propagate up stream. Somewhere at the beginning of the graph, someone has to decide what to do about a major slow down, like in the previous scenario, but now it's a strategic concern that you can solve at the architecture level, rather than forcing some low-level stream component make an arbitrary decision about what to do. dean On Thu, Jan 12, 2017 at 9:36 AM, Gábor Gévay <[hidden email]> wrote: > Hello, > > I would like to ask about the rationale behind the backpressure > mechanism in Flink. > > As I understand it, backpressure is for handling the problem of one > operator (or source) producing records faster then the next operator > can consume them. However, an alternative solution would be to have a > potentially "infinite" buffer for the incoming records of an operator, > by spilling the buffer to disk when it is getting too large. Could you > tell me why is backpressure considered a better option? > > Also, I would be interested in whether going with backpressure was a > streaming-specific decision, or do you think that having backpressure > is also better in batch jobs? > > Thanks, > Gábor > -- *Dean Wampler, Ph.D.* Fast Data Product Architect, Office of the CTO [hidden email] Lightbend <http://lightbend.com> @deanwampler <http://twitter.com/deanwampler> https://www.linkedin.com/in/deanwampler |
Free forum by Nabble | Edit this page |