Hey All,
I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4 and I have noticed some very strange segfaults. (I am running in test environments - with the minicluster) It is a fairly complex job so I wouldnt go into details but the interesting part is that adding/removing a simple filter in the wrong place in the topology (such as (e -> true) or anything actually ) seems to cause frequent segfaults during execution. Basically the key part looks something like: ... DataStream stream = source.map().setParallelism(1)..uid("AssignFieldIds"). name("AssignFieldIds").startNewChain(); DataStream filtered = input1.filter(t -> true).setParallelism(1) IterativeStream itStream = filtered.iterate(...) ... Some notes before the actual error: replacing the filter with a map or other chained transforms also leads to this problem. If the filter is not chained there is no error (or if I remove the filter). The error I get looks like this: https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242 I wonder if anyone has seen something like this before, or have some ideas how to debug it. The simple work around is to not chain the filter but it's still very strange. Regards, Gyula |
Hey!
I am actually a bit puzzled how these segfaults could come, unless via a native library, or a JVM bug. Can you try how it behaves when not using RocksDB or using a newer JVM version? Stephan On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra <[hidden email]> wrote: > Hey All, > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4 and I > have noticed some very strange segfaults. (I am running in test > environments - with the minicluster) > It is a fairly complex job so I wouldnt go into details but the interesting > part is that adding/removing a simple filter in the wrong place in the > topology (such as (e -> true) or anything actually ) seems to cause > frequent segfaults during execution. > > Basically the key part looks something like: > > ... > DataStream stream = source.map().setParallelism(1)..uid("AssignFieldIds"). > name("AssignFieldIds").startNewChain(); > DataStream filtered = input1.filter(t -> true).setParallelism(1) > IterativeStream itStream = filtered.iterate(...) > ... > > Some notes before the actual error: replacing the filter with a map or > other chained transforms also leads to this problem. If the filter is not > chained there is no error (or if I remove the filter). > > The error I get looks like this: > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242 > > I wonder if anyone has seen something like this before, or have some ideas > how to debug it. The simple work around is to not chain the filter but it's > still very strange. > > Regards, > Gyula > |
Hi,
I tried it with and without RocksDB, no difference. Also tried with the newest JDK version, still fails for that specific topology. On the other hand after I made some other changes in the topology (somwhere else) I can't reproduce it anymore. So it seems to be a very specific thing. To me it looks like a JVM problem, if it happens again I will let you know. Gyula Stephan Ewen <[hidden email]> ezt írta (időpont: 2017. jan. 22., V, 21:16): > Hey! > > I am actually a bit puzzled how these segfaults could come, unless via a > native library, or a JVM bug. > > Can you try how it behaves when not using RocksDB or using a newer JVM > version? > > Stephan > > > On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra <[hidden email]> wrote: > > > Hey All, > > > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4 > and I > > have noticed some very strange segfaults. (I am running in test > > environments - with the minicluster) > > It is a fairly complex job so I wouldnt go into details but the > interesting > > part is that adding/removing a simple filter in the wrong place in the > > topology (such as (e -> true) or anything actually ) seems to cause > > frequent segfaults during execution. > > > > Basically the key part looks something like: > > > > ... > > DataStream stream = > source.map().setParallelism(1)..uid("AssignFieldIds"). > > name("AssignFieldIds").startNewChain(); > > DataStream filtered = input1.filter(t -> true).setParallelism(1) > > IterativeStream itStream = filtered.iterate(...) > > ... > > > > Some notes before the actual error: replacing the filter with a map or > > other chained transforms also leads to this problem. If the filter is not > > chained there is no error (or if I remove the filter). > > > > The error I get looks like this: > > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242 > > > > I wonder if anyone has seen something like this before, or have some > ideas > > how to debug it. The simple work around is to not chain the filter but > it's > > still very strange. > > > > Regards, > > Gyula > > > |
I think I saw similar segfaults when rebuilding Flink while Flink daemons
were running out of the build-target repository. After the flink-dist jar has been rebuild and I tried stopping the daemons, they were running into these segfaults. On Mon, Jan 23, 2017 at 12:02 PM, Gyula Fóra <[hidden email]> wrote: > Hi, > I tried it with and without RocksDB, no difference. Also tried with the > newest JDK version, still fails for that specific topology. > > On the other hand after I made some other changes in the topology (somwhere > else) I can't reproduce it anymore. So it seems to be a very specific > thing. To me it looks like a JVM problem, if it happens again I will let > you know. > > Gyula > > Stephan Ewen <[hidden email]> ezt írta (időpont: 2017. jan. 22., V, > 21:16): > > > Hey! > > > > I am actually a bit puzzled how these segfaults could come, unless via a > > native library, or a JVM bug. > > > > Can you try how it behaves when not using RocksDB or using a newer JVM > > version? > > > > Stephan > > > > > > On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra <[hidden email]> wrote: > > > > > Hey All, > > > > > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4 > > and I > > > have noticed some very strange segfaults. (I am running in test > > > environments - with the minicluster) > > > It is a fairly complex job so I wouldnt go into details but the > > interesting > > > part is that adding/removing a simple filter in the wrong place in the > > > topology (such as (e -> true) or anything actually ) seems to cause > > > frequent segfaults during execution. > > > > > > Basically the key part looks something like: > > > > > > ... > > > DataStream stream = > > source.map().setParallelism(1)..uid("AssignFieldIds"). > > > name("AssignFieldIds").startNewChain(); > > > DataStream filtered = input1.filter(t -> true).setParallelism(1) > > > IterativeStream itStream = filtered.iterate(...) > > > ... > > > > > > Some notes before the actual error: replacing the filter with a map or > > > other chained transforms also leads to this problem. If the filter is > not > > > chained there is no error (or if I remove the filter). > > > > > > The error I get looks like this: > > > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242 > > > > > > I wonder if anyone has seen something like this before, or have some > > ideas > > > how to debug it. The simple work around is to not chain the filter but > > it's > > > still very strange. > > > > > > Regards, > > > Gyula > > > > > > |
@Robert is this about "hot code replace" in the JVM?
On Mon, Jan 23, 2017 at 1:11 PM, Robert Metzger <[hidden email]> wrote: > I think I saw similar segfaults when rebuilding Flink while Flink daemons > were running out of the build-target repository. After the flink-dist jar > has been rebuild and I tried stopping the daemons, they were running into > these segfaults. > > On Mon, Jan 23, 2017 at 12:02 PM, Gyula Fóra <[hidden email]> wrote: > > > Hi, > > I tried it with and without RocksDB, no difference. Also tried with the > > newest JDK version, still fails for that specific topology. > > > > On the other hand after I made some other changes in the topology > (somwhere > > else) I can't reproduce it anymore. So it seems to be a very specific > > thing. To me it looks like a JVM problem, if it happens again I will let > > you know. > > > > Gyula > > > > Stephan Ewen <[hidden email]> ezt írta (időpont: 2017. jan. 22., V, > > 21:16): > > > > > Hey! > > > > > > I am actually a bit puzzled how these segfaults could come, unless via > a > > > native library, or a JVM bug. > > > > > > Can you try how it behaves when not using RocksDB or using a newer JVM > > > version? > > > > > > Stephan > > > > > > > > > On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra <[hidden email]> wrote: > > > > > > > Hey All, > > > > > > > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4 > > > and I > > > > have noticed some very strange segfaults. (I am running in test > > > > environments - with the minicluster) > > > > It is a fairly complex job so I wouldnt go into details but the > > > interesting > > > > part is that adding/removing a simple filter in the wrong place in > the > > > > topology (such as (e -> true) or anything actually ) seems to cause > > > > frequent segfaults during execution. > > > > > > > > Basically the key part looks something like: > > > > > > > > ... > > > > DataStream stream = > > > source.map().setParallelism(1)..uid("AssignFieldIds"). > > > > name("AssignFieldIds").startNewChain(); > > > > DataStream filtered = input1.filter(t -> true).setParallelism(1) > > > > IterativeStream itStream = filtered.iterate(...) > > > > ... > > > > > > > > Some notes before the actual error: replacing the filter with a map > or > > > > other chained transforms also leads to this problem. If the filter is > > not > > > > chained there is no error (or if I remove the filter). > > > > > > > > The error I get looks like this: > > > > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242 > > > > > > > > I wonder if anyone has seen something like this before, or have some > > > ideas > > > > how to debug it. The simple work around is to not chain the filter > but > > > it's > > > > still very strange. > > > > > > > > Regards, > > > > Gyula > > > > > > > > > > |
Free forum by Nabble | Edit this page |