KMeans job gets stuck and never completes

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

KMeans job gets stuck and never completes

José Luis López Pino
Hi,

I'm running the KMeans java and scala examples in two nodes. It works fine
with very small files (3MB) but when I try with files of 30MB or bigger the
process never ends. After several hours, the DataChain process that is
reading the input points is still working.

I have tried before with way bigger files in the same environment and I had
no issue. I have already tried:
- Check that the process is not locked using all the CPU time.
- Format the datanodes.
- Compile the last version available on github.
- The debug log mode doesn't give any additional information.

Could someone give me a hint where to look at that? Thanks for your help!

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Sebastian Schelter-2
Have you looked at a jstack dump on one of the workera? That typically
helps finding out, where the processes are stuck.

-s
Am 22.06.2014 13:32 schrieb "José Luis López Pino" <[hidden email]>:

> Hi,
>
> I'm running the KMeans java and scala examples in two nodes. It works fine
> with very small files (3MB) but when I try with files of 30MB or bigger the
> process never ends. After several hours, the DataChain process that is
> reading the input points is still working.
>
> I have tried before with way bigger files in the same environment and I had
> no issue. I have already tried:
> - Check that the process is not locked using all the CPU time.
> - Format the datanodes.
> - Compile the last version available on github.
> - The debug log mode doesn't give any additional information.
>
> Could someone give me a hint where to look at that? Thanks for your help!
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

José Luis López Pino
It seems like the thread reading the points file is locked waiting for a
buffer from the global buffer pool that doesn't come. What could be causing
this?

   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
- waiting on <0x6b985888> (a java.util.ArrayDeque)
at
eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
 - locked <0x6b985888> (a java.util.ArrayDeque)
at
eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
 at
eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
at
eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
 at
eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
at
eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
 at
eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
at
eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
 at eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
at
eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
 at
eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
at
eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
 at
eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
at java.lang.Thread.run(Thread.java:744)


Thanks for your help Sebastian.

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino


On 22 June 2014 13:38, Sebastian Schelter <[hidden email]> wrote:

> Have you looked at a jstack dump on one of the workera? That typically
> helps finding out, where the processes are stuck.
>
> -s
> Am 22.06.2014 13:32 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > Hi,
> >
> > I'm running the KMeans java and scala examples in two nodes. It works
> fine
> > with very small files (3MB) but when I try with files of 30MB or bigger
> the
> > process never ends. After several hours, the DataChain process that is
> > reading the input points is still working.
> >
> > I have tried before with way bigger files in the same environment and I
> had
> > no issue. I have already tried:
> > - Check that the process is not locked using all the CPU time.
> > - Format the datanodes.
> > - Compile the last version available on github.
> > - The debug log mode doesn't give any additional information.
> >
> > Could someone give me a hint where to look at that? Thanks for your help!
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Sebastian Schelter-2
You could try to increase the number of buffers available to the network
stack. That solved similar problems for me in the past.

-s
Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]>:

> It seems like the thread reading the points file is locked waiting for a
> buffer from the global buffer pool that doesn't come. What could be causing
> this?
>
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
> - waiting on <0x6b985888> (a java.util.ArrayDeque)
> at
>
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
>  - locked <0x6b985888> (a java.util.ArrayDeque)
> at
>
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
>  at
>
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> at
>
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
>  at
>
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> at
>
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
>  at
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> at
>
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
>  at eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> at
>
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
>  at
>
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> at
>
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
>  at
>
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> at java.lang.Thread.run(Thread.java:744)
>
>
> Thanks for your help Sebastian.
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 13:38, Sebastian Schelter <[hidden email]> wrote:
>
> > Have you looked at a jstack dump on one of the workera? That typically
> > helps finding out, where the processes are stuck.
> >
> > -s
> > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> [hidden email]
> > >:
> >
> > > Hi,
> > >
> > > I'm running the KMeans java and scala examples in two nodes. It works
> > fine
> > > with very small files (3MB) but when I try with files of 30MB or bigger
> > the
> > > process never ends. After several hours, the DataChain process that is
> > > reading the input points is still working.
> > >
> > > I have tried before with way bigger files in the same environment and I
> > had
> > > no issue. I have already tried:
> > > - Check that the process is not locked using all the CPU time.
> > > - Format the datanodes.
> > > - Compile the last version available on github.
> > > - The debug log mode doesn't give any additional information.
> > >
> > > Could someone give me a hint where to look at that? Thanks for your
> help!
> > >
> > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > Pino
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Robert Metzger
Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
a distributed deadlock.
Can you send me some instructions on how to get the same input data you
have (download url? generator settings?) and what configuration parameters
you are using (max iteration limit, k, ?) when calling the K-Means example.
I would like to try it on our cluster.

Just out of curiosity, what hardware are you using? Is it the IBM Power
cluster at TU Berlin?

Robert


On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <[hidden email]
> wrote:

> You could try to increase the number of buffers available to the network
> stack. That solved similar problems for me in the past.
>
> -s
> Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > It seems like the thread reading the points file is locked waiting for a
> > buffer from the global buffer pool that doesn't come. What could be
> causing
> > this?
> >
> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> >  at java.lang.Object.wait(Native Method)
> > - waiting on <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
> >  - locked <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
> >  at
> >
> >
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
> >  at
> >
> >
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
> >  at
> >
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> > at
> >
> >
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
> >  at
> eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> > at
> >
> >
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
> >  at
> >
> >
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
> >  at
> >
> >
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > Thanks for your help Sebastian.
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
> >
> > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
> wrote:
> >
> > > Have you looked at a jstack dump on one of the workera? That typically
> > > helps finding out, where the processes are stuck.
> > >
> > > -s
> > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> > [hidden email]
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I'm running the KMeans java and scala examples in two nodes. It works
> > > fine
> > > > with very small files (3MB) but when I try with files of 30MB or
> bigger
> > > the
> > > > process never ends. After several hours, the DataChain process that
> is
> > > > reading the input points is still working.
> > > >
> > > > I have tried before with way bigger files in the same environment
> and I
> > > had
> > > > no issue. I have already tried:
> > > > - Check that the process is not locked using all the CPU time.
> > > > - Format the datanodes.
> > > > - Compile the last version available on github.
> > > > - The debug log mode doesn't give any additional information.
> > > >
> > > > Could someone give me a hint where to look at that? Thanks for your
> > help!
> > > >
> > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > > Pino
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

José Luis López Pino
Hi,

I'm using two instances of a VPS and using as this input for the program:
- Iterations: 2
- Dimensions: 2 (3 for the scala example program)
- Number of centers (k): 10

This is my current configuration for network buffers (i think they are values by default):
# Number of network buffers (used by each TaskManager)
taskmanager.network.numberOfBuffers: 2048
# Size of network buffers
taskmanager.network.bufferSizeInBytes: 32768

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino


On 22 June 2014 14:19, Robert Metzger <[hidden email]> wrote:
Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
a distributed deadlock.
Can you send me some instructions on how to get the same input data you
have (download url? generator settings?) and what configuration parameters
you are using (max iteration limit, k, ?) when calling the K-Means example.
I would like to try it on our cluster.

Just out of curiosity, what hardware are you using? Is it the IBM Power
cluster at TU Berlin?

Robert


On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <[hidden email]
> wrote:

> You could try to increase the number of buffers available to the network
> stack. That solved similar problems for me in the past.
>
> -s
> Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > It seems like the thread reading the points file is locked waiting for a
> > buffer from the global buffer pool that doesn't come. What could be
> causing
> > this?
> >
> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> >  at java.lang.Object.wait(Native Method)
> > - waiting on <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
> >  - locked <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
> >  at
> >
> >
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
> >  at
> >
> >
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
> >  at
> >
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> > at
> >
> >
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
> >  at
> eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> > at
> >
> >
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
> >  at
> >
> >
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
> >  at
> >
> >
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > Thanks for your help Sebastian.
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
> >
> > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
> wrote:
> >
> > > Have you looked at a jstack dump on one of the workera? That typically
> > > helps finding out, where the processes are stuck.
> > >
> > > -s
> > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> > [hidden email]
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I'm running the KMeans java and scala examples in two nodes. It works
> > > fine
> > > > with very small files (3MB) but when I try with files of 30MB or
> bigger
> > > the
> > > > process never ends. After several hours, the DataChain process that
> is
> > > > reading the input points is still working.
> > > >
> > > > I have tried before with way bigger files in the same environment
> and I
> > > had
> > > > no issue. I have already tried:
> > > > - Check that the process is not locked using all the CPU time.
> > > > - Format the datanodes.
> > > > - Compile the last version available on github.
> > > > - The debug log mode doesn't give any additional information.
> > > >
> > > > Could someone give me a hint where to look at that? Thanks for your
> > help!
> > > >
> > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > > Pino
> > > >
> > >
> >
>


ford2.py (302 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Robert Metzger
Thank you. What degree of parallelism are you using when submitting the job?
You can either set it with the "-p" argument or as
env.setDegreeOfParalleism().
How much heapspace do you assign to the TaskManagers?



On Sun, Jun 22, 2014 at 3:07 PM, José Luis López Pino <[hidden email]
> wrote:

> Hi,
>
> I'm using two instances of a VPS and using as this input for the program:
> - Iterations: 2
> - Dimensions: 2 (3 for the scala example program)
> - Number of centers (k): 10
>
> This is my current configuration for network buffers (i think they are
> values by default):
> # Number of network buffers (used by each TaskManager)
> taskmanager.network.numberOfBuffers: 2048
> # Size of network buffers
> taskmanager.network.bufferSizeInBytes: 32768
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 14:19, Robert Metzger <[hidden email]> wrote:
>
>> Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
>> a distributed deadlock.
>> Can you send me some instructions on how to get the same input data you
>> have (download url? generator settings?) and what configuration parameters
>> you are using (max iteration limit, k, ?) when calling the K-Means
>> example.
>> I would like to try it on our cluster.
>>
>> Just out of curiosity, what hardware are you using? Is it the IBM Power
>> cluster at TU Berlin?
>>
>> Robert
>>
>>
>> On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <
>> [hidden email]
>> > wrote:
>>
>> > You could try to increase the number of buffers available to the network
>> > stack. That solved similar problems for me in the past.
>> >
>> > -s
>> > Am 22.06.2014 13:48 schrieb "José Luis López Pino" <
>> [hidden email]
>> > >:
>> >
>> > > It seems like the thread reading the points file is locked waiting
>> for a
>> > > buffer from the global buffer pool that doesn't come. What could be
>> > causing
>> > > this?
>> > >
>> > >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>> > >  at java.lang.Object.wait(Native Method)
>> > > - waiting on <0x6b985888> (a java.util.ArrayDeque)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
>> > >  - locked <0x6b985888> (a java.util.ArrayDeque)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
>> > >  at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
>> > >  at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
>> > >  at
>> > >
>> >
>> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
>> > >  at
>> > eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
>> > >  at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
>> > >  at
>> > >
>> > >
>> >
>> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > > Thanks for your help Sebastian.
>> > >
>> > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
>> > > Pino
>> > >
>> > >
>> > > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
>> > wrote:
>> > >
>> > > > Have you looked at a jstack dump on one of the workera? That
>> typically
>> > > > helps finding out, where the processes are stuck.
>> > > >
>> > > > -s
>> > > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
>> > > [hidden email]
>> > > > >:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I'm running the KMeans java and scala examples in two nodes. It
>> works
>> > > > fine
>> > > > > with very small files (3MB) but when I try with files of 30MB or
>> > bigger
>> > > > the
>> > > > > process never ends. After several hours, the DataChain process
>> that
>> > is
>> > > > > reading the input points is still working.
>> > > > >
>> > > > > I have tried before with way bigger files in the same environment
>> > and I
>> > > > had
>> > > > > no issue. I have already tried:
>> > > > > - Check that the process is not locked using all the CPU time.
>> > > > > - Format the datanodes.
>> > > > > - Compile the last version available on github.
>> > > > > - The debug log mode doesn't give any additional information.
>> > > > >
>> > > > > Could someone give me a hint where to look at that? Thanks for
>> your
>> > > help!
>> > > > >
>> > > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien
>> cordialement,
>> > > > > Pino
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Stephan Ewen
In reply to this post by Robert Metzger
There was a patch for deadlocks on broadcast variables a few days ago.

Can you try the current master branch (0.6-SNAPSHOT) and see if that solves
your problem?
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Robert Metzger
I think Pino wrote that he is using the latest master.

I just finished running KMeans on a cluster, with the following
configuration:
- 2 nodes, 18 GB heapspace each
- DOP=32
 - 29 MB input data, 10 centers, 15 iterations max.

I also reduced the heapspace to 1GB and both worked like charm.
I've added a TODO to my list to test also with more data.




On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:

> There was a patch for deadlocks on broadcast variables a few days ago.
>
> Can you try the current master branch (0.6-SNAPSHOT) and see if that solves
> your problem?
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

José Luis López Pino
Yes, I pulled and compiled the latest master from github.

Thank you for the test Robert, I'll try then to double check the
configuration of both nodes, there should be something wrong. I've tried to
execute the job with p = 1, 2 and 4.


Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino


On 22 June 2014 16:43, Robert Metzger <[hidden email]> wrote:

> I think Pino wrote that he is using the latest master.
>
> I just finished running KMeans on a cluster, with the following
> configuration:
> - 2 nodes, 18 GB heapspace each
> - DOP=32
>  - 29 MB input data, 10 centers, 15 iterations max.
>
> I also reduced the heapspace to 1GB and both worked like charm.
> I've added a TODO to my list to test also with more data.
>
>
>
>
> On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:
>
> > There was a patch for deadlocks on broadcast variables a few days ago.
> >
> > Can you try the current master branch (0.6-SNAPSHOT) and see if that
> solves
> > your problem?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: KMeans job gets stuck and never completes

Robert Metzger
Okay, let us know if you found the solution.

If you want, we can also do a short Google Hangout session with
screensharing. Maybe I see something.


On Sun, Jun 22, 2014 at 5:09 PM, José Luis López Pino <[hidden email]
> wrote:

> Yes, I pulled and compiled the latest master from github.
>
> Thank you for the test Robert, I'll try then to double check the
> configuration of both nodes, there should be something wrong. I've tried to
> execute the job with p = 1, 2 and 4.
>
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 16:43, Robert Metzger <[hidden email]> wrote:
>
> > I think Pino wrote that he is using the latest master.
> >
> > I just finished running KMeans on a cluster, with the following
> > configuration:
> > - 2 nodes, 18 GB heapspace each
> > - DOP=32
> >  - 29 MB input data, 10 centers, 15 iterations max.
> >
> > I also reduced the heapspace to 1GB and both worked like charm.
> > I've added a TODO to my list to test also with more data.
> >
> >
> >
> >
> > On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > There was a patch for deadlocks on broadcast variables a few days ago.
> > >
> > > Can you try the current master branch (0.6-SNAPSHOT) and see if that
> > solves
> > > your problem?
> > >
> >
>