(DEPRECATED) Apache Flink Mailing List archive.

KMeans job gets stuck and never completes

Classic

List

Threaded

11 messages Options

José Luis López Pino

KMeans job gets stuck and never completes

Hi,

I'm running the KMeans java and scala examples in two nodes. It works fine
with very small files (3MB) but when I try with files of 30MB or bigger the
process never ends. After several hours, the DataChain process that is
reading the input points is still working.

I have tried before with way bigger files in the same environment and I had
no issue. I have already tried:
- Check that the process is not locked using all the CPU time.
- Format the datanodes.
- Compile the last version available on github.
- The debug log mode doesn't give any additional information.

Could someone give me a hint where to look at that? Thanks for your help!

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino

Sebastian Schelter-2

Re: KMeans job gets stuck and never completes

Have you looked at a jstack dump on one of the workera? That typically
helps finding out, where the processes are stuck.

-s
Am 22.06.2014 13:32 schrieb "José Luis López Pino" <[hidden email]>:

> Hi,
>
> I'm running the KMeans java and scala examples in two nodes. It works fine
> with very small files (3MB) but when I try with files of 30MB or bigger the
> process never ends. After several hours, the DataChain process that is
> reading the input points is still working.
>
> I have tried before with way bigger files in the same environment and I had
> no issue. I have already tried:
> - Check that the process is not locked using all the CPU time.
> - Format the datanodes.
> - Compile the last version available on github.
> - The debug log mode doesn't give any additional information.
>
> Could someone give me a hint where to look at that? Thanks for your help!
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>

José Luis López Pino

Re: KMeans job gets stuck and never completes

It seems like the thread reading the points file is locked waiting for a
buffer from the global buffer pool that doesn't come. What could be causing
this?

java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x6b985888> (a java.util.ArrayDeque)
at
eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
- locked <0x6b985888> (a java.util.ArrayDeque)
at
eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
at
eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
at
eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
at
eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
at
eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
at
eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
at
eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
at eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
at
eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
at
eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
at
eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
at
eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
at java.lang.Thread.run(Thread.java:744)

Thanks for your help Sebastian.

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino

On 22 June 2014 13:38, Sebastian Schelter <[hidden email]> wrote:

> Have you looked at a jstack dump on one of the workera? That typically
> helps finding out, where the processes are stuck.
>
> -s
> Am 22.06.2014 13:32 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > Hi,
> >
> > I'm running the KMeans java and scala examples in two nodes. It works
> fine
> > with very small files (3MB) but when I try with files of 30MB or bigger
> the
> > process never ends. After several hours, the DataChain process that is
> > reading the input points is still working.
> >
> > I have tried before with way bigger files in the same environment and I
> had
> > no issue. I have already tried:
> > - Check that the process is not locked using all the CPU time.
> > - Format the datanodes.
> > - Compile the last version available on github.
> > - The debug log mode doesn't give any additional information.
> >
> > Could someone give me a hint where to look at that? Thanks for your help!
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
>

Sebastian Schelter-2

Re: KMeans job gets stuck and never completes

You could try to increase the number of buffers available to the network
stack. That solved similar problems for me in the past.

-s
Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]>:

> It seems like the thread reading the points file is locked waiting for a
> buffer from the global buffer pool that doesn't come. What could be causing
> this?
>
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x6b985888> (a java.util.ArrayDeque)
> at
>
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
> - locked <0x6b985888> (a java.util.ArrayDeque)
> at
>
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
> at
>
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> at
>
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
> at
>
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> at
>
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
> at
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> at
>
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
> at eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> at
>
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
> at
>
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> at
>
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
> at
>
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> at java.lang.Thread.run(Thread.java:744)
>
>
> Thanks for your help Sebastian.
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 13:38, Sebastian Schelter <[hidden email]> wrote:
>
> > Have you looked at a jstack dump on one of the workera? That typically
> > helps finding out, where the processes are stuck.
> >
> > -s
> > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> [hidden email]
> > >:
> >
> > > Hi,
> > >
> > > I'm running the KMeans java and scala examples in two nodes. It works
> > fine
> > > with very small files (3MB) but when I try with files of 30MB or bigger
> > the
> > > process never ends. After several hours, the DataChain process that is
> > > reading the input points is still working.
> > >
> > > I have tried before with way bigger files in the same environment and I
> > had
> > > no issue. I have already tried:
> > > - Check that the process is not locked using all the CPU time.
> > > - Format the datanodes.
> > > - Compile the last version available on github.
> > > - The debug log mode doesn't give any additional information.
> > >
> > > Could someone give me a hint where to look at that? Thanks for your
> help!
> > >
> > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > Pino
> > >
> >
>

Robert Metzger

Re: KMeans job gets stuck and never completes

Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
a distributed deadlock.
Can you send me some instructions on how to get the same input data you
have (download url? generator settings?) and what configuration parameters
you are using (max iteration limit, k, ?) when calling the K-Means example.
I would like to try it on our cluster.

Just out of curiosity, what hardware are you using? Is it the IBM Power
cluster at TU Berlin?

Robert

On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <[hidden email]
> wrote:

> You could try to increase the number of buffers available to the network
> stack. That solved similar problems for me in the past.
>
> -s
> Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > It seems like the thread reading the points file is locked waiting for a
> > buffer from the global buffer pool that doesn't come. What could be
> causing
> > this?
> >
> > java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > - waiting on <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
> > - locked <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
> > at
> >
> >
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
> > at
> >
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> > at
> >
> >
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
> > at
> eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> > at
> >
> >
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
> > at
> >
> >
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > Thanks for your help Sebastian.
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
> >
> > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
> wrote:
> >
> > > Have you looked at a jstack dump on one of the workera? That typically
> > > helps finding out, where the processes are stuck.
> > >
> > > -s
> > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> > [hidden email]
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I'm running the KMeans java and scala examples in two nodes. It works
> > > fine
> > > > with very small files (3MB) but when I try with files of 30MB or
> bigger
> > > the
> > > > process never ends. After several hours, the DataChain process that
> is
> > > > reading the input points is still working.
> > > >
> > > > I have tried before with way bigger files in the same environment
> and I
> > > had
> > > > no issue. I have already tried:
> > > > - Check that the process is not locked using all the CPU time.
> > > > - Format the datanodes.
> > > > - Compile the last version available on github.
> > > > - The debug log mode doesn't give any additional information.
> > > >
> > > > Could someone give me a hint where to look at that? Thanks for your
> > help!
> > > >
> > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > > Pino
> > > >
> > >
> >
>

José Luis López Pino

Re: KMeans job gets stuck and never completes

Hi,

I'm using two instances of a VPS and using as this input for the program:

- Iterations: 2

- Dimensions: 2 (3 for the scala example program)

- Number of centers (k): 10

This is my current configuration for network buffers (i think they are values by default):

# Number of network buffers (used by each TaskManager)

taskmanager.network.numberOfBuffers: 2048

# Size of network buffers

taskmanager.network.bufferSizeInBytes: 32768

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,

Pino

On 22 June 2014 14:19, Robert Metzger <[hidden email]> wrote:

Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
a distributed deadlock.
Can you send me some instructions on how to get the same input data you
have (download url? generator settings?) and what configuration parameters
you are using (max iteration limit, k, ?) when calling the K-Means example.
I would like to try it on our cluster.

Just out of curiosity, what hardware are you using? Is it the IBM Power
cluster at TU Berlin?

Robert

On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <[hidden email]
> wrote:

> You could try to increase the number of buffers available to the network
> stack. That solved similar problems for me in the past.
>
> -s
> Am 22.06.2014 13:48 schrieb "José Luis López Pino" <[hidden email]
> >:
>
> > It seems like the thread reading the points file is locked waiting for a
> > buffer from the global buffer pool that doesn't come. What could be
> causing
> > this?
> >
> > java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > - waiting on <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
> > - locked <0x6b985888> (a java.util.ArrayDeque)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
> > at
> >
> >
> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
> > at
> >
> >
> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
> > at
> >
> >
> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
> > at
> >
> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
> > at
> >
> >
> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
> > at
> eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
> > at
> >
> >
> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
> > at
> >
> >
> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
> > at
> >
> >
> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > Thanks for your help Sebastian.
> >
> > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > Pino
> >
> >
> > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
> wrote:
> >
> > > Have you looked at a jstack dump on one of the workera? That typically
> > > helps finding out, where the processes are stuck.
> > >
> > > -s
> > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
> > [hidden email]
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I'm running the KMeans java and scala examples in two nodes. It works
> > > fine
> > > > with very small files (3MB) but when I try with files of 30MB or
> bigger
> > > the
> > > > process never ends. After several hours, the DataChain process that
> is
> > > > reading the input points is still working.
> > > >
> > > > I have tried before with way bigger files in the same environment
> and I
> > > had
> > > > no issue. I have already tried:
> > > > - Check that the process is not locked using all the CPU time.
> > > > - Format the datanodes.
> > > > - Compile the last version available on github.
> > > > - The debug log mode doesn't give any additional information.
> > > >
> > > > Could someone give me a hint where to look at that? Thanks for your
> > help!
> > > >
> > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> > > > Pino
> > > >
> > >
> >
>

ford2.py (302 bytes) Download Attachment

Robert Metzger

Re: KMeans job gets stuck and never completes

Thank you. What degree of parallelism are you using when submitting the job?
You can either set it with the "-p" argument or as
env.setDegreeOfParalleism().
How much heapspace do you assign to the TaskManagers?

On Sun, Jun 22, 2014 at 3:07 PM, José Luis López Pino <[hidden email]
> wrote:

> Hi,
>
> I'm using two instances of a VPS and using as this input for the program:
> - Iterations: 2
> - Dimensions: 2 (3 for the scala example program)
> - Number of centers (k): 10
>
> This is my current configuration for network buffers (i think they are
> values by default):
> # Number of network buffers (used by each TaskManager)
> taskmanager.network.numberOfBuffers: 2048
> # Size of network buffers
> taskmanager.network.bufferSizeInBytes: 32768
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 14:19, Robert Metzger <[hidden email]> wrote:
>
>> Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for
>> a distributed deadlock.
>> Can you send me some instructions on how to get the same input data you
>> have (download url? generator settings?) and what configuration parameters
>> you are using (max iteration limit, k, ?) when calling the K-Means
>> example.
>> I would like to try it on our cluster.
>>
>> Just out of curiosity, what hardware are you using? Is it the IBM Power
>> cluster at TU Berlin?
>>
>> Robert
>>
>>
>> On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter <
>> [hidden email]
>> > wrote:
>>
>> > You could try to increase the number of buffers available to the network
>> > stack. That solved similar problems for me in the past.
>> >
>> > -s
>> > Am 22.06.2014 13:48 schrieb "José Luis López Pino" <
>> [hidden email]
>> > >:
>> >
>> > > It seems like the thread reading the points file is locked waiting
>> for a
>> > > buffer from the global buffer pool that doesn't come. What could be
>> > causing
>> > > this?
>> > >
>> > > java.lang.Thread.State: TIMED_WAITING (on object monitor)
>> > > at java.lang.Object.wait(Native Method)
>> > > - waiting on <0x6b985888> (a java.util.ArrayDeque)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160)
>> > > - locked <0x6b985888> (a java.util.ArrayDeque)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74)
>> > > at
>> > >
>> >
>> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35)
>> > > at
>> > eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228)
>> > > at
>> > >
>> > >
>> >
>> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > > Thanks for your help Sebastian.
>> > >
>> > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
>> > > Pino
>> > >
>> > >
>> > > On 22 June 2014 13:38, Sebastian Schelter <[hidden email]>
>> > wrote:
>> > >
>> > > > Have you looked at a jstack dump on one of the workera? That
>> typically
>> > > > helps finding out, where the processes are stuck.
>> > > >
>> > > > -s
>> > > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" <
>> > > [hidden email]
>> > > > >:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I'm running the KMeans java and scala examples in two nodes. It
>> works
>> > > > fine
>> > > > > with very small files (3MB) but when I try with files of 30MB or
>> > bigger
>> > > > the
>> > > > > process never ends. After several hours, the DataChain process
>> that
>> > is
>> > > > > reading the input points is still working.
>> > > > >
>> > > > > I have tried before with way bigger files in the same environment
>> > and I
>> > > > had
>> > > > > no issue. I have already tried:
>> > > > > - Check that the process is not locked using all the CPU time.
>> > > > > - Format the datanodes.
>> > > > > - Compile the last version available on github.
>> > > > > - The debug log mode doesn't give any additional information.
>> > > > >
>> > > > > Could someone give me a hint where to look at that? Thanks for
>> your
>> > > help!
>> > > > >
>> > > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien
>> cordialement,
>> > > > > Pino
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Stephan Ewen

Re: KMeans job gets stuck and never completes

In reply to this post by Robert Metzger

There was a patch for deadlocks on broadcast variables a few days ago.

Can you try the current master branch (0.6-SNAPSHOT) and see if that solves
your problem?

Robert Metzger

Re: KMeans job gets stuck and never completes

I think Pino wrote that he is using the latest master.

I just finished running KMeans on a cluster, with the following
configuration:
- 2 nodes, 18 GB heapspace each
- DOP=32
- 29 MB input data, 10 centers, 15 iterations max.

I also reduced the heapspace to 1GB and both worked like charm.
I've added a TODO to my list to test also with more data.

On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:

> There was a patch for deadlocks on broadcast variables a few days ago.
>
> Can you try the current master branch (0.6-SNAPSHOT) and see if that solves
> your problem?
>

José Luis López Pino

Re: KMeans job gets stuck and never completes

Yes, I pulled and compiled the latest master from github.

Thank you for the test Robert, I'll try then to double check the
configuration of both nodes, there should be something wrong. I've tried to
execute the job with p = 1, 2 and 4.

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino

On 22 June 2014 16:43, Robert Metzger <[hidden email]> wrote:

> I think Pino wrote that he is using the latest master.
>
> I just finished running KMeans on a cluster, with the following
> configuration:
> - 2 nodes, 18 GB heapspace each
> - DOP=32
> - 29 MB input data, 10 centers, 15 iterations max.
>
> I also reduced the heapspace to 1GB and both worked like charm.
> I've added a TODO to my list to test also with more data.
>
>
>
>
> On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:
>
> > There was a patch for deadlocks on broadcast variables a few days ago.
> >
> > Can you try the current master branch (0.6-SNAPSHOT) and see if that
> solves
> > your problem?
> >
>

Robert Metzger

Re: KMeans job gets stuck and never completes

Okay, let us know if you found the solution.

If you want, we can also do a short Google Hangout session with
screensharing. Maybe I see something.

On Sun, Jun 22, 2014 at 5:09 PM, José Luis López Pino <[hidden email]
> wrote:

> Yes, I pulled and compiled the latest master from github.
>
> Thank you for the test Robert, I'll try then to double check the
> configuration of both nodes, there should be something wrong. I've tried to
> execute the job with p = 1, 2 and 4.
>
>
> Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
> Pino
>
>
> On 22 June 2014 16:43, Robert Metzger <[hidden email]> wrote:
>
> > I think Pino wrote that he is using the latest master.
> >
> > I just finished running KMeans on a cluster, with the following
> > configuration:
> > - 2 nodes, 18 GB heapspace each
> > - DOP=32
> > - 29 MB input data, 10 centers, 15 iterations max.
> >
> > I also reduced the heapspace to 1GB and both worked like charm.
> > I've added a TODO to my list to test also with more data.
> >
> >
> >
> >
> > On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > There was a patch for deadlocks on broadcast variables a few days ago.
> > >
> > > Can you try the current master branch (0.6-SNAPSHOT) and see if that
> > solves
> > > your problem?
> > >
> >
>