Use specific Task Manager for heavy computations

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Use specific Task Manager for heavy computations

Mariano Gonzalez
This post was updated on .
I need to load a file that can be around 10 GB and later process it with hadrian which is the
java implementation for PFA's (https://github.com/opendatagroup/hadrian).

I would like to execute this transformation step inside a specific task manager
of the cluster (since I don't want to load 10 GB on every task manager
node). Unfortunately, hadrian cannot be executed in a distributed way.

So my question would be if there is a way to do some routing with Flink and
execute this particular transformation step using always the same task
manager node?

Perhaps my approach is completely wrong, so if anybody has any suggestions
I would be more than happy to hear them:)

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Use specific Task Manager for heavy computations

Mariano Gonzalez
Any ideas?
Reply | Threaded
Open this post in threaded view
|

Re: Use specific Task Manager for heavy computations

amir bahmanyari
Looks like what I am currently doing, or at least close.No need to copy the big file on every node.Copy on one node. Read the data, and send it to a Kafka cluster using KafkaProducer() object.Use KafkaIO() (in case its a Beam app). Deploy to the node where the JM is running.It will be executed in s distributed fashion across all nodes.If that help, I can privately help you how the logistics & the code may look like + a loooooooooooot of tricksI have learned the hard way LOL!Thanks.
Amir-

      From: Mariano Gonzalez <[hidden email]>
 To: [hidden email]
 Sent: Thursday, September 29, 2016 3:07 PM
 Subject: Re: Use specific Task Manager for heavy computations
   
Any ideas?



--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Use-specific-Task-Manager-for-heavy-computations-tp13747p13771.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.


   
Reply | Threaded
Open this post in threaded view
|

Re: Use specific Task Manager for heavy computations

Mariano Gonzalez
I think what you're suggesting is to load a large file into Kafka which will replicate it and make it available to all nodes. However, that is not what I want.

What I want is to run a specific transformation step on a specific TaskManager.
Reply | Threaded
Open this post in threaded view
|

Re: Use specific worker for heavy computations

Robert Metzger
In reply to this post by Mariano Gonzalez
Hi Mariano,

currently, there is nothing available in Flink to execute an operation on a
specific machine.

Regards,
Robert


On Wed, Sep 28, 2016 at 9:40 PM, Mariano Gonzalez <
[hidden email]> wrote:

> I need to load a PFA (portable format for analytics) that can be around 30
> GB and later process it with hadrian which is the java implementation for
> PFA's (https://github.com/opendatagroup/hadrian).
>
> I would like to execute this transformation step inside a specific worker
> of the cluster (since I don't want to load 30 GB on every single worker
> node). Unfortunately, hadrian cannot be executed in a distributed way.
>
> So my question would be if there is a way to do some routing with Flink and
> execute this particular transformation step using always the same worker
> node?
>
> Perhaps my approach is completely wrong, so if anybody has any suggestions
> I would be more than happy to hear them:)
>
> Thanks
>
Reply | Threaded
Open this post in threaded view
|

Re: Use specific worker for heavy computations

Stephan Ewen
So far, we have not introduced location constraints.
The reason is that this goes a bit against the paradigm of location
transparency, which is necessary for failover, dynamically adjusting
parallelism (which is a feature being worked on), etc.

On Wed, Oct 12, 2016 at 10:35 AM, Robert Metzger <[hidden email]>
wrote:

> Hi Mariano,
>
> currently, there is nothing available in Flink to execute an operation on a
> specific machine.
>
> Regards,
> Robert
>
>
> On Wed, Sep 28, 2016 at 9:40 PM, Mariano Gonzalez <
> [hidden email]> wrote:
>
> > I need to load a PFA (portable format for analytics) that can be around
> 30
> > GB and later process it with hadrian which is the java implementation for
> > PFA's (https://github.com/opendatagroup/hadrian).
> >
> > I would like to execute this transformation step inside a specific worker
> > of the cluster (since I don't want to load 30 GB on every single worker
> > node). Unfortunately, hadrian cannot be executed in a distributed way.
> >
> > So my question would be if there is a way to do some routing with Flink
> and
> > execute this particular transformation step using always the same worker
> > node?
> >
> > Perhaps my approach is completely wrong, so if anybody has any
> suggestions
> > I would be more than happy to hear them:)
> >
> > Thanks
> >
>