Calvin Han created FLINK-11595:
---------------------------------- Summary: Gelly addEdge in certain circumstances still include duplicate vertices. Key: FLINK-11595 URL: https://issues.apache.org/jira/browse/FLINK-11595 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 1.7.1 Environment: MacOS, intelliJ Reporter: Calvin Han Assuming a base graph constructed by: ``` public class GraphCorn { public static Graph<String, VertexLabel, EdgeLabel> gc; public GraphCorn(String filename) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Tuple6<String, String, String, String, String, String>> csvInput = env.readCsvFile(filename) .types(String.class, String.class, String.class, String.class, String.class, String.class); DataSet<Vertex<String, VertexLabel>> srcTuples = csvInput.project(0, 2) .map(new MapFunction<Tuple, Vertex<String, VertexLabel>>() { @Override public Vertex<String, VertexLabel> map(Tuple tuple) throws Exception { VertexLabel lb = new VertexLabel(Util.hash(tuple.getField(1))); return new Vertex<>(tuple.getField(0), lb); } }).returns(new TypeHint<Vertex<String, VertexLabel>>(){}); DataSet<Vertex<String, VertexLabel>> dstTuples = csvInput.project(1, 3) .map(new MapFunction<Tuple, Vertex<String, VertexLabel>>() { @Override public Vertex<String, VertexLabel> map(Tuple tuple) throws Exception { VertexLabel lb = new VertexLabel(Util.hash(tuple.getField(1))); return new Vertex<>(tuple.getField(0), lb); } }).returns(new TypeHint<Vertex<String, VertexLabel>>(){}); DataSet<Vertex<String, VertexLabel>> vertexTuples = srcTuples.union(dstTuples).distinct(0); DataSet<Edge<String, EdgeLabel>> edgeTuples = csvInput.project(0, 1, 4, 5) .map(new MapFunction<Tuple, Edge<String, EdgeLabel>>() { @Override public Edge<String, EdgeLabel> map(Tuple tuple) throws Exception { EdgeLabel lb = new EdgeLabel(Util.hash(tuple.getField(2)), Long.parseLong(tuple.getField(3))); return new Edge<>(tuple.getField(0), tuple.getField(1), lb); } }).returns(new TypeHint<Edge<String, EdgeLabel>>(){}); this.gc = Graph.fromDataSet(vertexTuples, edgeTuples, env); } } ``` Base graph CSV: ``` 0,1,a,b,c,0 0,2,a,d,e,1 1,2,b,d,f,2 ``` Attempt to add edges using the following function: ``` try(BufferedReader br = new BufferedReader(new FileReader(this.fileName))) { for(String line; (line = br.readLine()) != null; ) { String[] attributes = line.split(","); assert(attributes.length == 6); String srcID = attributes[0]; String dstID = attributes[1]; String srcLb = attributes[2]; String dstLb = attributes[3]; String edgeLb = attributes[4]; String ts = attributes[5]; Vertex<String, VertexLabel> src = new Vertex<>(srcID, new VertexLabel(Util.hash(srcLb))); Vertex<String, VertexLabel> dst = new Vertex<>(dstID, new VertexLabel(Util.hash(dstLb))); EdgeLabel edge = new EdgeLabel(Util.hash(edgeLb), Long.parseLong(ts)); GraphCorn.gc = GraphCorn.gc.addEdge(src, dst, edge); } } catch (Exception e) { System.err.println(e.getMessage()); } ``` The graph components to add is: ``` 0,4,a,d,k,3 1,3,b,a,g,3 2,3,d,a,h,4 ``` GraphCorn.gc will contain duplicate node 0, 1, and 2 (those that exist in base graph), which should not be the case acceding to the documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |