(DEPRECATED) Apache Flink Mailing List archive.

Use specific Task Manager for heavy computations

Classic

List

Threaded

6 messages Options

Mariano Gonzalez

Use specific Task Manager for heavy computations

This post was updated on .

I need to load a file that can be around 10 GB and later process it with hadrian which is the
java implementation for PFA's (https://github.com/opendatagroup/hadrian).

I would like to execute this transformation step inside a specific task manager
of the cluster (since I don't want to load 10 GB on every task manager
node). Unfortunately, hadrian cannot be executed in a distributed way.

So my question would be if there is a way to do some routing with Flink and
execute this particular transformation step using always the same task
manager node?

Perhaps my approach is completely wrong, so if anybody has any suggestions
I would be more than happy to hear them:)

Thanks

Mariano Gonzalez

Re: Use specific Task Manager for heavy computations

Any ideas?

amir bahmanyari

Re: Use specific Task Manager for heavy computations

Looks like what I am currently doing, or at least close.No need to copy the big file on every node.Copy on one node. Read the data, and send it to a Kafka cluster using KafkaProducer() object.Use KafkaIO() (in case its a Beam app). Deploy to the node where the JM is running.It will be executed in s distributed fashion across all nodes.If that help, I can privately help you how the logistics & the code may look like + a loooooooooooot of tricksI have learned the hard way LOL!Thanks.
Amir-

From: Mariano Gonzalez <[hidden email]>
To: [hidden email]
Sent: Thursday, September 29, 2016 3:07 PM
Subject: Re: Use specific Task Manager for heavy computations

Any ideas?

--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Use-specific-Task-Manager-for-heavy-computations-tp13747p13771.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Mariano Gonzalez

Re: Use specific Task Manager for heavy computations

I think what you're suggesting is to load a large file into Kafka which will replicate it and make it available to all nodes. However, that is not what I want.

What I want is to run a specific transformation step on a specific TaskManager.

Robert Metzger

Re: Use specific worker for heavy computations

In reply to this post by Mariano Gonzalez

Hi Mariano,

currently, there is nothing available in Flink to execute an operation on a
specific machine.

Regards,
Robert

On Wed, Sep 28, 2016 at 9:40 PM, Mariano Gonzalez <
[hidden email]> wrote:

> I need to load a PFA (portable format for analytics) that can be around 30
> GB and later process it with hadrian which is the java implementation for
> PFA's (https://github.com/opendatagroup/hadrian).
>
> I would like to execute this transformation step inside a specific worker
> of the cluster (since I don't want to load 30 GB on every single worker
> node). Unfortunately, hadrian cannot be executed in a distributed way.
>
> So my question would be if there is a way to do some routing with Flink and
> execute this particular transformation step using always the same worker
> node?
>
> Perhaps my approach is completely wrong, so if anybody has any suggestions
> I would be more than happy to hear them:)
>
> Thanks
>

Stephan Ewen

Re: Use specific worker for heavy computations

So far, we have not introduced location constraints.
The reason is that this goes a bit against the paradigm of location
transparency, which is necessary for failover, dynamically adjusting
parallelism (which is a feature being worked on), etc.

On Wed, Oct 12, 2016 at 10:35 AM, Robert Metzger <[hidden email]>
wrote:

> Hi Mariano,
>
> currently, there is nothing available in Flink to execute an operation on a
> specific machine.
>
> Regards,
> Robert
>
>
> On Wed, Sep 28, 2016 at 9:40 PM, Mariano Gonzalez <
> [hidden email]> wrote:
>
> > I need to load a PFA (portable format for analytics) that can be around
> 30
> > GB and later process it with hadrian which is the java implementation for
> > PFA's (https://github.com/opendatagroup/hadrian).
> >
> > I would like to execute this transformation step inside a specific worker
> > of the cluster (since I don't want to load 30 GB on every single worker
> > node). Unfortunately, hadrian cannot be executed in a distributed way.
> >
> > So my question would be if there is a way to do some routing with Flink
> and
> > execute this particular transformation step using always the same worker
> > node?
> >
> > Perhaps my approach is completely wrong, so if anybody has any
> suggestions
> > I would be more than happy to hear them:)
> >
> > Thanks
> >
>