Service frontend as a source and sink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Service frontend as a source and sink

Maxim
I'm looking into integrating AWS Flow Framework
<https://aws.amazon.com/swf/details/flow/> with Flink. One of the
requirements is to provide a service API to initiate workflows and query
their state. I envision that this API could be integrated as a specialized
source that takes web requests and a sink that returns the replies to the
calling web front end.
The ideal solution is to run JobManager inside a web server and pin a
single *WebFrontEndSource and WebFrontEndSync *to each web server instance.
This way Flink handles all interprocess communication and the webserver cod
calls into the source and sink directly.
From the Flink side the missing feature is support for different JobManager
roles and allowing to specify that given operator has to be placed only
into a JobManager of a specific role. Such feature would be useful in other
scenarios like having *high memory* *slot *role vs *high CPU slot
*role*. *Another
requirement would be support for explicit placement of subtasks of
operators that have the same parallelism into the same slot ensuring that
their indexes are the same.
So in my case I would define "WebService" JobManager role with a single
slot per instance and put pair of *WebFrontEndSource *and
*WebFrontEndSync *into
slot on each webserver.

If such feature is not feasible I would like to find a way to lookup
addresses of hosts that run a specific sink operator. Then each frontend
would query Flink for hosts that run *WebFrontEndSources *and forward
requests to them.

Is there a better way to support my use case?

Thanks,

Maxim.
Reply | Threaded
Open this post in threaded view
|

Re: Service frontend as a source and sink

Till Rohrmann
Hi Maxim,

Flink does not execute the operators in the JobManager but the TaskManager.
The JobManager's role is the orchestration of the Flink job.

Unfortunately, there is currently no way to explicitly control the
deployment of tasks to TaskManagers with different roles. However, Flink
supports so-called CoLocationConstraints (see CoLocationConstraint.java)
which tell the system that subtasks with the same index have to be deployed
to the same slot.

Using Flink's REST API you should be able to detect on which host the
sources and sinks run. Take a look at the web dashboard when you run a job.
There you also have this information. But you would have to manually parse
the returned JSON file.

Cheers,
Till

On Mon, May 9, 2016 at 11:38 PM, Maxim <[hidden email]> wrote:

> I'm looking into integrating AWS Flow Framework
> <https://aws.amazon.com/swf/details/flow/> with Flink. One of the
> requirements is to provide a service API to initiate workflows and query
> their state. I envision that this API could be integrated as a specialized
> source that takes web requests and a sink that returns the replies to the
> calling web front end.
> The ideal solution is to run JobManager inside a web server and pin a
> single *WebFrontEndSource and WebFrontEndSync *to each web server instance.
> This way Flink handles all interprocess communication and the webserver cod
> calls into the source and sink directly.
> From the Flink side the missing feature is support for different JobManager
> roles and allowing to specify that given operator has to be placed only
> into a JobManager of a specific role. Such feature would be useful in other
> scenarios like having *high memory* *slot *role vs *high CPU slot
> *role*. *Another
> requirement would be support for explicit placement of subtasks of
> operators that have the same parallelism into the same slot ensuring that
> their indexes are the same.
> So in my case I would define "WebService" JobManager role with a single
> slot per instance and put pair of *WebFrontEndSource *and
> *WebFrontEndSync *into
> slot on each webserver.
>
> If such feature is not feasible I would like to find a way to lookup
> addresses of hosts that run a specific sink operator. Then each frontend
> would query Flink for hosts that run *WebFrontEndSources *and forward
> requests to them.
>
> Is there a better way to support my use case?
>
> Thanks,
>
> Maxim.
>