Gaël Renoux created FLINK-20390:
-----------------------------------
Summary: Programmatic access to the back-pressure
Key: FLINK-20390
URL:
https://issues.apache.org/jira/browse/FLINK-20390 Project: Flink
Issue Type: New Feature
Components: API / Core
Reporter: Gaël Renoux
It would be useful to access the back-pressure monitoring from within functions.
Here is our use case: we have a real-time Flink job, which takes decisions based on input data. Sometimes, we have traffic spikes on the input and the decisions process cannot processe records fast enough. Back-pressure starts mounting, all the way back to the Source. What we want to do is to start dropping records in this case, because it's better to make decisions based on just a sample of the data rather than accumulate too much lag.
Right now, the only way is to have a filter with a hard-limit on the number of records per-interval-of-time, and to drop records once we are over this limit. However, this requires a lot of tuning to find out what the correct limit is, especially since it may depend on the nature of the inputs (some decisions take longer to make than others). It's also heavily dependent on the buffers: the limit needs to be low enough that all records that pass the limit can fit in the downstream buffers, or the back-pressure will will go back past the filtering task and we're back to square one. Finally, it's not very resilient to change: whenever we scale the infrastructure up, we need to redo the whole tuning thing.
With programmatic access to the back-pressure, we could simply start dropping records based on its current level. No tuning, and adjusted to the actual issue. For performance, I assume it would be better if it reused the existing back-pressure monitoring mechanism, rather than looking directly into the buffer. A sampling of the back-pressure should be enough, and if more precision is needed you can simply change the existing back-pressure configuration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)