Hello,
how can flink be useful for processing the data to RDFs and build the ontology? Regards, Santosh |
Hi Santosh,
I'm not aware of any existing tools in Flink to process RDFs. However, Flink should be useful for processing such data. You can probably use an existing RDF parser for Java to get the data into the system. Best, Robert On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> wrote: > Hello, > > how can flink be useful for processing the data to RDFs and build the > ontology? > > Regards, > Santosh > > > > > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
In reply to this post by santosh_rajaguru
Hey Santosh!
RDF processing often involves either joins, or graph-query like operations (transitive). Flink is fairly good at both types of operations. I would look into the graph examples and the graph API for a start: - Graph examples: https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph - Graph API: https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph If you have a more specific question, I can give you better pointers ;-) Stephan On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> wrote: > Hello, > > how can flink be useful for processing the data to RDFs and build the > ontology? > > Regards, > Santosh > > > > > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
I have a nice case of RDF manipulation :)
Let's say I have the following RDF triples (Tuple3) in two files or tables: TABLE A: http://test/John, type, Person http://test/John, name, John http://test/John, knows, http://test/Mary http://test/John, knows, http://test/Jerry http://test/Jerry, type, Person http://test/Jerry, name, Jerry http://test/Jerry, knows, http://test/Frank http://test/Mary, type, Person http://test/Mary, name, Mary TABLE B: http://test/Frank, type, Person http://test/Frank, name, Frank http://test/Frank, marriedWith, http://test/Mary What is the best way to build up Person-rooted trees with all node's data properties and some expanded path like 'Person.knows.marriedWith' ? Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get from a Key/value store or what? The expected 4 trees should be: tree 1 (root is John) ------------------ http://test/John, type, Person http://test/John, name, John http://test/John, knows, http://test/Mary http://test/John, knows, http://test/Jerry http://test/Jerry, type, Person http://test/Jerry, name, Jerry http://test/Jerry, knows, http://test/Frank http://test/Mary, type, Person http://test/Mary, name, Mary http://test/Frank, type, Person http://test/Frank, name, Frank http://test/Frank, marriedWith, http://test/Mary tree 2 (root is Jerry) ------------------ http://test/Jerry, type, Person http://test/Jerry, name, Jerry http://test/Jerry, knows, http://test/Frank http://test/Frank, type, Person http://test/Frank, name, Frank http://test/Frank, marriedWith, http://test/Mary http://test/Mary, type, Person http://test/Mary, name, Mary tree 3 (root is Mary) ------------------ http://test/Mary, type, Person http://test/Mary, name, Mary tree 4 (root is Frank) ------------------ http://test/Frank, type, Person http://test/Frank, name, Frank http://test/Frank, marriedWith, http://test/Mary http://test/Mary, type, Person http://test/Mary, name, Mary Thanks in advance, Flavio On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote: > Hey Santosh! > > RDF processing often involves either joins, or graph-query like operations > (transitive). Flink is fairly good at both types of operations. > > I would look into the graph examples and the graph API for a start: > > - Graph examples: > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > - Graph API: > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > If you have a more specific question, I can give you better pointers ;-) > > Stephan > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> > wrote: > > > Hello, > > > > how can flink be useful for processing the data to RDFs and build the > > ontology? > > > > Regards, > > Santosh > > > > > > > > > > > > > > > > -- > > View this message in context: > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > > archive at Nabble.com. > > > |
Hi Flavio,
if you want to use Gelly to model your data as a graph, you can load your Tuple3s as Edges. This will result in "http://test/John", "Person", "Frank", etc to be vertices and "type", "name", "knows" to be edge values. In the first case, you can use filterOnEdges() to get the subgraph with the relation edges. Once you have the graph, you could probably use a vertex-centric iteration to generate the trees. It seems to me that you need something like a BFS from each vertex. Keep in mind that this can be a very costly operation in terms of memory and communication for large graphs. Let me know if you have any questions! Cheers, V. On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> wrote: > I have a nice case of RDF manipulation :) > Let's say I have the following RDF triples (Tuple3) in two files or tables: > > TABLE A: > http://test/John, type, Person > http://test/John, name, John > http://test/John, knows, http://test/Mary > http://test/John, knows, http://test/Jerry > http://test/Jerry, type, Person > http://test/Jerry, name, Jerry > http://test/Jerry, knows, http://test/Frank > http://test/Mary, type, Person > http://test/Mary, name, Mary > > TABLE B: > http://test/Frank, type, Person > http://test/Frank, name, Frank > http://test/Frank, marriedWith, http://test/Mary > > What is the best way to build up Person-rooted trees with all node's data > properties and some expanded path like 'Person.knows.marriedWith' ? > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get > from a Key/value store or what? > > The expected 4 trees should be: > > tree 1 (root is John) ------------------ > http://test/John, type, Person > http://test/John, name, John > http://test/John, knows, http://test/Mary > http://test/John, knows, http://test/Jerry > http://test/Jerry, type, Person > http://test/Jerry, name, Jerry > http://test/Jerry, knows, http://test/Frank > http://test/Mary, type, Person > http://test/Mary, name, Mary > http://test/Frank, type, Person > http://test/Frank, name, Frank > http://test/Frank, marriedWith, http://test/Mary > > tree 2 (root is Jerry) ------------------ > http://test/Jerry, type, Person > http://test/Jerry, name, Jerry > http://test/Jerry, knows, http://test/Frank > http://test/Frank, type, Person > http://test/Frank, name, Frank > http://test/Frank, marriedWith, http://test/Mary > http://test/Mary, type, Person > http://test/Mary, name, Mary > > tree 3 (root is Mary) ------------------ > http://test/Mary, type, Person > http://test/Mary, name, Mary > > tree 4 (root is Frank) ------------------ > http://test/Frank, type, Person > http://test/Frank, name, Frank > http://test/Frank, marriedWith, http://test/Mary > http://test/Mary, type, Person > http://test/Mary, name, Mary > > Thanks in advance, > Flavio > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote: > > > Hey Santosh! > > > > RDF processing often involves either joins, or graph-query like > operations > > (transitive). Flink is fairly good at both types of operations. > > > > I would look into the graph examples and the graph API for a start: > > > > - Graph examples: > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > - Graph API: > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > If you have a more specific question, I can give you better pointers ;-) > > > > Stephan > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> > > wrote: > > > > > Hello, > > > > > > how can flink be useful for processing the data to RDFs and build the > > > ontology? > > > > > > Regards, > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > -- > > > View this message in context: > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing > list > > > archive at Nabble.com. > > > > > > |
Hi to all,
I'm back to this task again :) Summarizing again: I have some source dataset that has contains RDF "stars" (SubjectURI, RdfType and a list of RDF triples belonging to this subject -> the "a.k.a." star schema) and I have to extract some sub-graphs for some RDF types of interest. As described in the previous email I'd like to expand some root node (if its type is of interest) and explode some of its path(s). For example, if I'm interested in the expansion of rdf type Person (as in the example), I could want to create a mini-graph with all of its triples plus those obtained exploding the path(s) knows.marriedWith and knows.knows.knows. At the moment I do it with a punctual get from HBase but I didn't get whether this could be done more efficiently with other strategies in Flink. @Vasiliki: you said that I could need "something like a BFS from each vertex". Do you have an example that could fit my use case? Is it possible to filter out those vertices I'm interested in? Thanks in advance, Flavio On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <[hidden email]> wrote: > Hi Flavio, > > if you want to use Gelly to model your data as a graph, you can load your > Tuple3s as Edges. > This will result in "http://test/John", "Person", "Frank", etc to be > vertices and "type", "name", "knows" to be edge values. > In the first case, you can use filterOnEdges() to get the subgraph with the > relation edges. > > Once you have the graph, you could probably use a vertex-centric iteration > to generate the trees. > It seems to me that you need something like a BFS from each vertex. Keep in > mind that this can be a very costly operation in terms of memory and > communication for large graphs. > > Let me know if you have any questions! > > Cheers, > V. > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> wrote: > > > I have a nice case of RDF manipulation :) > > Let's say I have the following RDF triples (Tuple3) in two files or > tables: > > > > TABLE A: > > http://test/John, type, Person > > http://test/John, name, John > > http://test/John, knows, http://test/Mary > > http://test/John, knows, http://test/Jerry > > http://test/Jerry, type, Person > > http://test/Jerry, name, Jerry > > http://test/Jerry, knows, http://test/Frank > > http://test/Mary, type, Person > > http://test/Mary, name, Mary > > > > TABLE B: > > http://test/Frank, type, Person > > http://test/Frank, name, Frank > > http://test/Frank, marriedWith, http://test/Mary > > > > What is the best way to build up Person-rooted trees with all node's data > > properties and some expanded path like 'Person.knows.marriedWith' ? > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get > > from a Key/value store or what? > > > > The expected 4 trees should be: > > > > tree 1 (root is John) ------------------ > > http://test/John, type, Person > > http://test/John, name, John > > http://test/John, knows, http://test/Mary > > http://test/John, knows, http://test/Jerry > > http://test/Jerry, type, Person > > http://test/Jerry, name, Jerry > > http://test/Jerry, knows, http://test/Frank > > http://test/Mary, type, Person > > http://test/Mary, name, Mary > > http://test/Frank, type, Person > > http://test/Frank, name, Frank > > http://test/Frank, marriedWith, http://test/Mary > > > > tree 2 (root is Jerry) ------------------ > > http://test/Jerry, type, Person > > http://test/Jerry, name, Jerry > > http://test/Jerry, knows, http://test/Frank > > http://test/Frank, type, Person > > http://test/Frank, name, Frank > > http://test/Frank, marriedWith, http://test/Mary > > http://test/Mary, type, Person > > http://test/Mary, name, Mary > > > > tree 3 (root is Mary) ------------------ > > http://test/Mary, type, Person > > http://test/Mary, name, Mary > > > > tree 4 (root is Frank) ------------------ > > http://test/Frank, type, Person > > http://test/Frank, name, Frank > > http://test/Frank, marriedWith, http://test/Mary > > http://test/Mary, type, Person > > http://test/Mary, name, Mary > > > > Thanks in advance, > > Flavio > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote: > > > > > Hey Santosh! > > > > > > RDF processing often involves either joins, or graph-query like > > operations > > > (transitive). Flink is fairly good at both types of operations. > > > > > > I would look into the graph examples and the graph API for a start: > > > > > > - Graph examples: > > > > > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > > - Graph API: > > > > > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > > > If you have a more specific question, I can give you better pointers > ;-) > > > > > > Stephan > > > > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> > > > wrote: > > > > > > > Hello, > > > > > > > > how can flink be useful for processing the data to RDFs and build the > > > > ontology? > > > > > > > > Regards, > > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > > > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing > > list > > > > archive at Nabble.com. > > > > > > > > > > |
Hi Flavio!
I see initially two ways of doing this: 1) Do a series of joins. You start with your subject and join two or three times using the "objects-from-triplets == subject" to make one hop. You can filter the verbs from the triplets before if you are only interested in a special relationship. 2) If you want to recursively explode the subgraph (something like all reachable subjects) or do a rather long series of hops, then you should be able to model this nicely as a delta iterations, or as a vertex-centric graph computation. For that, you can use both "Gelly" (the graph library) or the standalone Spargel operator (Giraph-like). Does that help with your questions? Greetings, Stephan On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <[hidden email]> wrote: > Hi to all, > I'm back to this task again :) > > Summarizing again: I have some source dataset that has contains RDF "stars" > (SubjectURI, RdfType and a list of RDF triples belonging to this subject -> > the "a.k.a." star schema) > and I have to extract some sub-graphs for some RDF types of interest. > As described in the previous email I'd like to expand some root node (if > its type is of interest) and explode some of its path(s). > For example, if I'm interested in the expansion of rdf type Person (as in > the example), I could want to create a mini-graph with all of its triples > plus those obtained exploding the path(s) > knows.marriedWith and knows.knows.knows. > At the moment I do it with a punctual get from HBase but I didn't > get whether this could be done more efficiently with other strategies in > Flink. > @Vasiliki: you said that I could need "something like a BFS from each > vertex". Do you have an example that could fit my use case? Is it possible > to filter out those vertices I'm interested in? > > Thanks in advance, > Flavio > > > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri < > [hidden email]> > wrote: > > > Hi Flavio, > > > > if you want to use Gelly to model your data as a graph, you can load your > > Tuple3s as Edges. > > This will result in "http://test/John", "Person", "Frank", etc to be > > vertices and "type", "name", "knows" to be edge values. > > In the first case, you can use filterOnEdges() to get the subgraph with > the > > relation edges. > > > > Once you have the graph, you could probably use a vertex-centric > iteration > > to generate the trees. > > It seems to me that you need something like a BFS from each vertex. Keep > in > > mind that this can be a very costly operation in terms of memory and > > communication for large graphs. > > > > Let me know if you have any questions! > > > > Cheers, > > V. > > > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> > wrote: > > > > > I have a nice case of RDF manipulation :) > > > Let's say I have the following RDF triples (Tuple3) in two files or > > tables: > > > > > > TABLE A: > > > http://test/John, type, Person > > > http://test/John, name, John > > > http://test/John, knows, http://test/Mary > > > http://test/John, knows, http://test/Jerry > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > TABLE B: > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > What is the best way to build up Person-rooted trees with all node's > data > > > properties and some expanded path like 'Person.knows.marriedWith' ? > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals > get > > > from a Key/value store or what? > > > > > > The expected 4 trees should be: > > > > > > tree 1 (root is John) ------------------ > > > http://test/John, type, Person > > > http://test/John, name, John > > > http://test/John, knows, http://test/Mary > > > http://test/John, knows, http://test/Jerry > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > tree 2 (root is Jerry) ------------------ > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > tree 3 (root is Mary) ------------------ > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > tree 4 (root is Frank) ------------------ > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > Thanks in advance, > > > Flavio > > > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote: > > > > > > > Hey Santosh! > > > > > > > > RDF processing often involves either joins, or graph-query like > > > operations > > > > (transitive). Flink is fairly good at both types of operations. > > > > > > > > I would look into the graph examples and the graph API for a start: > > > > > > > > - Graph examples: > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > > > - Graph API: > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > > > > > If you have a more specific question, I can give you better pointers > > ;-) > > > > > > > > Stephan > > > > > > > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email] > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > how can flink be useful for processing the data to RDFs and build > the > > > > > ontology? > > > > > > > > > > Regards, > > > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > View this message in context: > > > > > > > > > > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > > > Sent from the Apache Flink (Incubator) Mailing List archive. > mailing > > > list > > > > > archive at Nabble.com. > > > > > > > > > > > > > > > |
Hi Stephan,
thanks for the response. Unfortunately I'm not familiar with the new Gelly APIs and the old Spargel ones (I still don't understand the difference actually). Do you think it is possible to add such an example to the documentation/examples? Best, Flavio On Sat, Mar 21, 2015 at 7:48 PM, Stephan Ewen <[hidden email]> wrote: > Hi Flavio! > > I see initially two ways of doing this: > > 1) Do a series of joins. You start with your subject and join two or three > times using the "objects-from-triplets == subject" to make one hop. You can > filter the verbs from the triplets before if you are only interested in a > special relationship. > > 2) If you want to recursively explode the subgraph (something like all > reachable subjects) or do a rather long series of hops, then you should be > able to model this nicely as a delta iterations, or as a vertex-centric > graph computation. For that, you can use both "Gelly" (the graph library) > or the standalone Spargel operator (Giraph-like). > > Does that help with your questions? > > Greetings, > Stephan > > > On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <[hidden email]> > wrote: > > > Hi to all, > > I'm back to this task again :) > > > > Summarizing again: I have some source dataset that has contains RDF > "stars" > > (SubjectURI, RdfType and a list of RDF triples belonging to this subject > -> > > the "a.k.a." star schema) > > and I have to extract some sub-graphs for some RDF types of interest. > > As described in the previous email I'd like to expand some root node (if > > its type is of interest) and explode some of its path(s). > > For example, if I'm interested in the expansion of rdf type Person (as in > > the example), I could want to create a mini-graph with all of its triples > > plus those obtained exploding the path(s) > > knows.marriedWith and knows.knows.knows. > > At the moment I do it with a punctual get from HBase but I didn't > > get whether this could be done more efficiently with other strategies in > > Flink. > > @Vasiliki: you said that I could need "something like a BFS from each > > vertex". Do you have an example that could fit my use case? Is it > possible > > to filter out those vertices I'm interested in? > > > > Thanks in advance, > > Flavio > > > > > > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri < > > [hidden email]> > > wrote: > > > > > Hi Flavio, > > > > > > if you want to use Gelly to model your data as a graph, you can load > your > > > Tuple3s as Edges. > > > This will result in "http://test/John", "Person", "Frank", etc to be > > > vertices and "type", "name", "knows" to be edge values. > > > In the first case, you can use filterOnEdges() to get the subgraph with > > the > > > relation edges. > > > > > > Once you have the graph, you could probably use a vertex-centric > > iteration > > > to generate the trees. > > > It seems to me that you need something like a BFS from each vertex. > Keep > > in > > > mind that this can be a very costly operation in terms of memory and > > > communication for large graphs. > > > > > > Let me know if you have any questions! > > > > > > Cheers, > > > V. > > > > > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> > > wrote: > > > > > > > I have a nice case of RDF manipulation :) > > > > Let's say I have the following RDF triples (Tuple3) in two files or > > > tables: > > > > > > > > TABLE A: > > > > http://test/John, type, Person > > > > http://test/John, name, John > > > > http://test/John, knows, http://test/Mary > > > > http://test/John, knows, http://test/Jerry > > > > http://test/Jerry, type, Person > > > > http://test/Jerry, name, Jerry > > > > http://test/Jerry, knows, http://test/Frank > > > > http://test/Mary, type, Person > > > > http://test/Mary, name, Mary > > > > > > > > TABLE B: > > > > http://test/Frank, type, Person > > > > http://test/Frank, name, Frank > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > > > What is the best way to build up Person-rooted trees with all node's > > data > > > > properties and some expanded path like 'Person.knows.marriedWith' ? > > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals > > get > > > > from a Key/value store or what? > > > > > > > > The expected 4 trees should be: > > > > > > > > tree 1 (root is John) ------------------ > > > > http://test/John, type, Person > > > > http://test/John, name, John > > > > http://test/John, knows, http://test/Mary > > > > http://test/John, knows, http://test/Jerry > > > > http://test/Jerry, type, Person > > > > http://test/Jerry, name, Jerry > > > > http://test/Jerry, knows, http://test/Frank > > > > http://test/Mary, type, Person > > > > http://test/Mary, name, Mary > > > > http://test/Frank, type, Person > > > > http://test/Frank, name, Frank > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > > > tree 2 (root is Jerry) ------------------ > > > > http://test/Jerry, type, Person > > > > http://test/Jerry, name, Jerry > > > > http://test/Jerry, knows, http://test/Frank > > > > http://test/Frank, type, Person > > > > http://test/Frank, name, Frank > > > > http://test/Frank, marriedWith, http://test/Mary > > > > http://test/Mary, type, Person > > > > http://test/Mary, name, Mary > > > > > > > > tree 3 (root is Mary) ------------------ > > > > http://test/Mary, type, Person > > > > http://test/Mary, name, Mary > > > > > > > > tree 4 (root is Frank) ------------------ > > > > http://test/Frank, type, Person > > > > http://test/Frank, name, Frank > > > > http://test/Frank, marriedWith, http://test/Mary > > > > http://test/Mary, type, Person > > > > http://test/Mary, name, Mary > > > > > > > > Thanks in advance, > > > > Flavio > > > > > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> > wrote: > > > > > > > > > Hey Santosh! > > > > > > > > > > RDF processing often involves either joins, or graph-query like > > > > operations > > > > > (transitive). Flink is fairly good at both types of operations. > > > > > > > > > > I would look into the graph examples and the graph API for a start: > > > > > > > > > > - Graph examples: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > > > > - Graph API: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > > > > > > > If you have a more specific question, I can give you better > pointers > > > ;-) > > > > > > > > > > Stephan > > > > > > > > > > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > how can flink be useful for processing the data to RDFs and build > > the > > > > > > ontology? > > > > > > > > > > > > Regards, > > > > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > View this message in context: > > > > > > > > > > > > > > > > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > > > > Sent from the Apache Flink (Incubator) Mailing List archive. > > mailing > > > > list > > > > > > archive at Nabble.com. > > > > > > > > > > > > > > > > > > > > > |
Gelly has a section in the docs, it should explain the vertex-centric
iterations. Is that not extensive enough? Am 22.03.2015 12:04 schrieb "Flavio Pompermaier" <[hidden email]>: > Hi Stephan, > thanks for the response. Unfortunately I'm not familiar with the new Gelly > APIs and the old Spargel ones (I still don't understand the difference > actually). > Do you think it is possible to add such an example to the > documentation/examples? > > Best, > Flavio > > > > On Sat, Mar 21, 2015 at 7:48 PM, Stephan Ewen <[hidden email]> wrote: > > > Hi Flavio! > > > > I see initially two ways of doing this: > > > > 1) Do a series of joins. You start with your subject and join two or > three > > times using the "objects-from-triplets == subject" to make one hop. You > can > > filter the verbs from the triplets before if you are only interested in a > > special relationship. > > > > 2) If you want to recursively explode the subgraph (something like all > > reachable subjects) or do a rather long series of hops, then you should > be > > able to model this nicely as a delta iterations, or as a vertex-centric > > graph computation. For that, you can use both "Gelly" (the graph library) > > or the standalone Spargel operator (Giraph-like). > > > > Does that help with your questions? > > > > Greetings, > > Stephan > > > > > > On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier < > [hidden email]> > > wrote: > > > > > Hi to all, > > > I'm back to this task again :) > > > > > > Summarizing again: I have some source dataset that has contains RDF > > "stars" > > > (SubjectURI, RdfType and a list of RDF triples belonging to this > subject > > -> > > > the "a.k.a." star schema) > > > and I have to extract some sub-graphs for some RDF types of interest. > > > As described in the previous email I'd like to expand some root node > (if > > > its type is of interest) and explode some of its path(s). > > > For example, if I'm interested in the expansion of rdf type Person (as > in > > > the example), I could want to create a mini-graph with all of its > triples > > > plus those obtained exploding the path(s) > > > knows.marriedWith and knows.knows.knows. > > > At the moment I do it with a punctual get from HBase but I didn't > > > get whether this could be done more efficiently with other strategies > in > > > Flink. > > > @Vasiliki: you said that I could need "something like a BFS from each > > > vertex". Do you have an example that could fit my use case? Is it > > possible > > > to filter out those vertices I'm interested in? > > > > > > Thanks in advance, > > > Flavio > > > > > > > > > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri < > > > [hidden email]> > > > wrote: > > > > > > > Hi Flavio, > > > > > > > > if you want to use Gelly to model your data as a graph, you can load > > your > > > > Tuple3s as Edges. > > > > This will result in "http://test/John", "Person", "Frank", etc to be > > > > vertices and "type", "name", "knows" to be edge values. > > > > In the first case, you can use filterOnEdges() to get the subgraph > with > > > the > > > > relation edges. > > > > > > > > Once you have the graph, you could probably use a vertex-centric > > > iteration > > > > to generate the trees. > > > > It seems to me that you need something like a BFS from each vertex. > > Keep > > > in > > > > mind that this can be a very costly operation in terms of memory and > > > > communication for large graphs. > > > > > > > > Let me know if you have any questions! > > > > > > > > Cheers, > > > > V. > > > > > > > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> > > > wrote: > > > > > > > > > I have a nice case of RDF manipulation :) > > > > > Let's say I have the following RDF triples (Tuple3) in two files or > > > > tables: > > > > > > > > > > TABLE A: > > > > > http://test/John, type, Person > > > > > http://test/John, name, John > > > > > http://test/John, knows, http://test/Mary > > > > > http://test/John, knows, http://test/Jerry > > > > > http://test/Jerry, type, Person > > > > > http://test/Jerry, name, Jerry > > > > > http://test/Jerry, knows, http://test/Frank > > > > > http://test/Mary, type, Person > > > > > http://test/Mary, name, Mary > > > > > > > > > > TABLE B: > > > > > http://test/Frank, type, Person > > > > > http://test/Frank, name, Frank > > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > > > > > What is the best way to build up Person-rooted trees with all > node's > > > data > > > > > properties and some expanded path like 'Person.knows.marriedWith' ? > > > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple > punctuals > > > get > > > > > from a Key/value store or what? > > > > > > > > > > The expected 4 trees should be: > > > > > > > > > > tree 1 (root is John) ------------------ > > > > > http://test/John, type, Person > > > > > http://test/John, name, John > > > > > http://test/John, knows, http://test/Mary > > > > > http://test/John, knows, http://test/Jerry > > > > > http://test/Jerry, type, Person > > > > > http://test/Jerry, name, Jerry > > > > > http://test/Jerry, knows, http://test/Frank > > > > > http://test/Mary, type, Person > > > > > http://test/Mary, name, Mary > > > > > http://test/Frank, type, Person > > > > > http://test/Frank, name, Frank > > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > > > > > tree 2 (root is Jerry) ------------------ > > > > > http://test/Jerry, type, Person > > > > > http://test/Jerry, name, Jerry > > > > > http://test/Jerry, knows, http://test/Frank > > > > > http://test/Frank, type, Person > > > > > http://test/Frank, name, Frank > > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > http://test/Mary, type, Person > > > > > http://test/Mary, name, Mary > > > > > > > > > > tree 3 (root is Mary) ------------------ > > > > > http://test/Mary, type, Person > > > > > http://test/Mary, name, Mary > > > > > > > > > > tree 4 (root is Frank) ------------------ > > > > > http://test/Frank, type, Person > > > > > http://test/Frank, name, Frank > > > > > http://test/Frank, marriedWith, http://test/Mary > > > > > http://test/Mary, type, Person > > > > > http://test/Mary, name, Mary > > > > > > > > > > Thanks in advance, > > > > > Flavio > > > > > > > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> > > wrote: > > > > > > > > > > > Hey Santosh! > > > > > > > > > > > > RDF processing often involves either joins, or graph-query like > > > > > operations > > > > > > (transitive). Flink is fairly good at both types of operations. > > > > > > > > > > > > I would look into the graph examples and the graph API for a > start: > > > > > > > > > > > > - Graph examples: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > > > > > - Graph API: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > > > > > > > > > If you have a more specific question, I can give you better > > pointers > > > > ;-) > > > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > how can flink be useful for processing the data to RDFs and > build > > > the > > > > > > > ontology? > > > > > > > > > > > > > > Regards, > > > > > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > View this message in context: > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > > > > > Sent from the Apache Flink (Incubator) Mailing List archive. > > > mailing > > > > > list > > > > > > > archive at Nabble.com. > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
In reply to this post by Flavio Pompermaier
Hi Flavio,
also, Gelly is a superset of Spargel. It provides the same features and much more. Since RDF is graph-structured, Gelly might be a good fit for your use case. Cheers, Fabian |
Is there anu example about rdf graph generation based on a skeleton
structure? On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote: > Hi Flavio, > > also, Gelly is a superset of Spargel. It provides the same features and > much more. > > Since RDF is graph-structured, Gelly might be a good fit for your use case. > > Cheers, Fabian > |
Hi Flavio,
We don't have a specific example for generating RDF graphs using Gelly, but I will try to drop some lines of code here and hope you will find them useful. An RDF statement is formed of Subject - Predicate - Object triples. In Edge notation, the Subject and the Object will be the source and target vertices respectively, while the edge value will be the predicate. This being said, say you have an input plain text file that represents the edges. A line would look like this : http://test/Frank, marriedWith, http://test/Mary This is internally coded in Flink like a Tuple3. So, to read this edge file you should have something like this: private static DataSet<Edge<String, String>> getEdgesDataSet(ExecutionEnvironment env) { if (fileOutput) { return env.readCsvFile(edgesInputPath) .lineDelimiter("\n") // the subject, predicate, object .types(String.class, String.class, String.class) .map(new MapFunction<Tuple3<String, String, String>, Edge<String, String>>() { @Override public Edge<String, String> map(Tuple3<String, String, String> tuple3) throws Exception { return new Edge(tuple3.f0, tuple3.f2, tuple3.f1); } }); } else { return getDefaultEdges(env); } } After you have this, in your main method, you just write: Graph<Long, String, String> rdfGraph = Graph.fromDataSet(edges, env); I picked up the conversation later on, not sure if that's what you meant by "graph generation"... Cheers, Andra On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier <[hidden email]> wrote: > Is there anu example about rdf graph generation based on a skeleton > structure? > On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote: > > > Hi Flavio, > > > > also, Gelly is a superset of Spargel. It provides the same features and > > much more. > > > > Since RDF is graph-structured, Gelly might be a good fit for your use > case. > > > > Cheers, Fabian > > > |
Thanks Andrea for the help!
For graph generation I mean that I'd like to materialize subgraphs of my overall knowledge starting from some root nodes whose rdf type is of interest (something similar to what JSON-LD does). Is that clear? On Mar 22, 2015 1:09 PM, "Andra Lungu" <[hidden email]> wrote: > Hi Flavio, > > We don't have a specific example for generating RDF graphs using Gelly, but > I will try to drop some lines of code here and hope you will find them > useful. > > An RDF statement is formed of Subject - Predicate - Object triples. In Edge > notation, the Subject and the Object will be the source and target vertices > respectively, while the edge value will be the predicate. > > This being said, say you have an input plain text file that represents the > edges. > A line would look like this : http://test/Frank, marriedWith, > http://test/Mary > > This is internally coded in Flink like a Tuple3. So, to read this edge file > you should have something like this: > > private static DataSet<Edge<String, String>> > getEdgesDataSet(ExecutionEnvironment env) { > if (fileOutput) { > return env.readCsvFile(edgesInputPath) > .lineDelimiter("\n") > > // the subject, predicate, object > > .types(String.class, String.class, String.class) > .map(new MapFunction<Tuple3<String, String, String>, > Edge<String, > String>>() { > > @Override > public Edge<String, String> map(Tuple3<String, String, > String> tuple3) throws Exception { > return new Edge(tuple3.f0, tuple3.f2, tuple3.f1); > } > }); > } else { > return getDefaultEdges(env); > } > } > > After you have this, in your main method, you just write: > Graph<Long, String, String> rdfGraph = Graph.fromDataSet(edges, env); > > I picked up the conversation later on, not sure if that's what you meant by > "graph generation"... > > Cheers, > Andra > > On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier <[hidden email] > > > wrote: > > > Is there anu example about rdf graph generation based on a skeleton > > structure? > > On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote: > > > > > Hi Flavio, > > > > > > also, Gelly is a superset of Spargel. It provides the same features and > > > much more. > > > > > > Since RDF is graph-structured, Gelly might be a good fit for your use > > case. > > > > > > Cheers, Fabian > > > > > > |
Hi Flavio,
I'm not familiar with JSON-LD, but as far as I understand, you want to generate some trees from selected root nodes. Once you have created the Graph as Andra describes above, you can first filter out the edges that are of no interest to you, using filterOnEdges. There is a description of how edge filtering works in the Gelly docs [1]. Then, you could use a vertex-centric iteration and propagate a message from the selected root node to the neighbors recursively, until you have the tree. In the vertex-centric model, you program from the perspective of a vertex in the graph. You basically need to define what each vertex does within each iteration (superstep). In Gelly this boils down to two things: (a) what messages this vertex will send to its neighbors and (b) how a vertex will update its value using the received messages. This is also described in the Gelly docs [2]. Also, take a look at the Gelly library [3]. The library methods are implemented using this model and should give you an idea. In your case, you will probably need to simply propagate one message from the root node and gather the newly discovered neighbors in each superstep. I hope this helps! Let us know if you have further questions! -Vasia. [1]: http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations [2]: http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations [3]: http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library |
Thanks Vasiliki,
when I'll find the time I'll try to make a quick prototype using the pointers you suggested! Thanks for the support, Flavio On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri < [hidden email]> wrote: > Hi Flavio, > > I'm not familiar with JSON-LD, but as far as I understand, you want to > generate some trees from selected root nodes. > > Once you have created the Graph as Andra describes above, you can first > filter out the edges that are of no interest to you, using filterOnEdges. > There is a description of how edge filtering works in the Gelly docs [1]. > Then, you could use a vertex-centric iteration and propagate a message from > the selected root node to the neighbors recursively, until you have the > tree. > > In the vertex-centric model, you program from the perspective of a vertex > in the graph. You basically need to define what each vertex does within > each iteration (superstep). In Gelly this boils down to two things: > (a) what messages this vertex will send to its neighbors and > (b) how a vertex will update its value using the received messages. > > This is also described in the Gelly docs [2]. > Also, take a look at the Gelly library [3]. The library methods are > implemented using this model and should give you an idea. > > In your case, you will probably need to simply propagate one message from > the root node and gather the newly discovered neighbors in each superstep. > > I hope this helps! Let us know if you have further questions! > > -Vasia. > > [1]: > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations > > [2]: > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations > > [3]: > > http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library > |
Hi to all,
I made a simple RDF Gelly test and I shared it on my github repo at https://github.com/fpompermaier/rdf-gelly-test. I basically setup the Gelly stuff but I can't proceed and compute the drafted TODOs. Could someone help me and implementing them..? I think this could become a nice example of how Gelly could help in handling RDF graphs :) Best, Flavio On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <[hidden email]> wrote: > Thanks Vasiliki, > when I'll find the time I'll try to make a quick prototype using the > pointers you suggested! > > Thanks for the support, > Flavio > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri < > [hidden email]> wrote: > >> Hi Flavio, >> >> I'm not familiar with JSON-LD, but as far as I understand, you want to >> generate some trees from selected root nodes. >> >> Once you have created the Graph as Andra describes above, you can first >> filter out the edges that are of no interest to you, using filterOnEdges. >> There is a description of how edge filtering works in the Gelly docs [1]. >> Then, you could use a vertex-centric iteration and propagate a message >> from >> the selected root node to the neighbors recursively, until you have the >> tree. >> >> In the vertex-centric model, you program from the perspective of a vertex >> in the graph. You basically need to define what each vertex does within >> each iteration (superstep). In Gelly this boils down to two things: >> (a) what messages this vertex will send to its neighbors and >> (b) how a vertex will update its value using the received messages. >> >> This is also described in the Gelly docs [2]. >> Also, take a look at the Gelly library [3]. The library methods are >> implemented using this model and should give you an idea. >> >> In your case, you will probably need to simply propagate one message from >> the root node and gather the newly discovered neighbors in each superstep. >> >> I hope this helps! Let us know if you have further questions! >> >> -Vasia. >> >> [1]: >> >> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations >> >> [2]: >> >> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations >> >> [3]: >> >> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library >> > > > |
Hi Flavio,
I'm not quite familiar with RDF or sparql, so not all of your code is clear to me. Your first TODO is "compute subgraph for Person". Is "Person" a vertex id in your graph? A vertex value? And by "subgraph of Person", do you mean all the vertices that can be reached starting from this node and following the graph edges? -Vasia. On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]> wrote: > Hi to all, > I made a simple RDF Gelly test and I shared it on my github repo at > https://github.com/fpompermaier/rdf-gelly-test. > I basically setup the Gelly stuff but I can't proceed and compute the > drafted TODOs. > Could someone help me and implementing them..? > I think this could become a nice example of how Gelly could help in > handling RDF graphs :) > > Best, > Flavio > > On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <[hidden email] > > > wrote: > > > Thanks Vasiliki, > > when I'll find the time I'll try to make a quick prototype using the > > pointers you suggested! > > > > Thanks for the support, > > Flavio > > > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri < > > [hidden email]> wrote: > > > >> Hi Flavio, > >> > >> I'm not familiar with JSON-LD, but as far as I understand, you want to > >> generate some trees from selected root nodes. > >> > >> Once you have created the Graph as Andra describes above, you can first > >> filter out the edges that are of no interest to you, using > filterOnEdges. > >> There is a description of how edge filtering works in the Gelly docs > [1]. > >> Then, you could use a vertex-centric iteration and propagate a message > >> from > >> the selected root node to the neighbors recursively, until you have the > >> tree. > >> > >> In the vertex-centric model, you program from the perspective of a > vertex > >> in the graph. You basically need to define what each vertex does within > >> each iteration (superstep). In Gelly this boils down to two things: > >> (a) what messages this vertex will send to its neighbors and > >> (b) how a vertex will update its value using the received messages. > >> > >> This is also described in the Gelly docs [2]. > >> Also, take a look at the Gelly library [3]. The library methods are > >> implemented using this model and should give you an idea. > >> > >> In your case, you will probably need to simply propagate one message > from > >> the root node and gather the newly discovered neighbors in each > superstep. > >> > >> I hope this helps! Let us know if you have further questions! > >> > >> -Vasia. > >> > >> [1]: > >> > >> > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations > >> > >> [2]: > >> > >> > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations > >> > >> [3]: > >> > >> > http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library > >> > > > > > > > |
Hi Vasia,
for compute subgraph for Person I mean exactly all the vertices that can be reached starting from this node and following the graph edges. I drafted the graph as a set of vertices (where the id is the subject of the set of triples and the value is all of its triples) and a set of edges (properties connecting two vertices, this is only possible if the object is an URI). Thus, once computed the subgraph of a Person, if I merge the values of all reachable vertices, I'll obtain all the triples of such a subgraph. On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri <[hidden email] > wrote: > Hi Flavio, > > I'm not quite familiar with RDF or sparql, so not all of your code is clear > to me. > > Your first TODO is "compute subgraph for Person". Is "Person" a vertex id > in your graph? A vertex value? > And by "subgraph of Person", do you mean all the vertices that can be > reached starting from this node and following the graph edges? > > -Vasia. > > On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]> > wrote: > > > Hi to all, > > I made a simple RDF Gelly test and I shared it on my github repo at > > https://github.com/fpompermaier/rdf-gelly-test. > > I basically setup the Gelly stuff but I can't proceed and compute the > > drafted TODOs. > > Could someone help me and implementing them..? > > I think this could become a nice example of how Gelly could help in > > handling RDF graphs :) > > > > Best, > > Flavio > > > > On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier < > [hidden email] > > > > > wrote: > > > > > Thanks Vasiliki, > > > when I'll find the time I'll try to make a quick prototype using the > > > pointers you suggested! > > > > > > Thanks for the support, > > > Flavio > > > > > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri < > > > [hidden email]> wrote: > > > > > >> Hi Flavio, > > >> > > >> I'm not familiar with JSON-LD, but as far as I understand, you want to > > >> generate some trees from selected root nodes. > > >> > > >> Once you have created the Graph as Andra describes above, you can > first > > >> filter out the edges that are of no interest to you, using > > filterOnEdges. > > >> There is a description of how edge filtering works in the Gelly docs > > [1]. > > >> Then, you could use a vertex-centric iteration and propagate a message > > >> from > > >> the selected root node to the neighbors recursively, until you have > the > > >> tree. > > >> > > >> In the vertex-centric model, you program from the perspective of a > > vertex > > >> in the graph. You basically need to define what each vertex does > within > > >> each iteration (superstep). In Gelly this boils down to two things: > > >> (a) what messages this vertex will send to its neighbors and > > >> (b) how a vertex will update its value using the received messages. > > >> > > >> This is also described in the Gelly docs [2]. > > >> Also, take a look at the Gelly library [3]. The library methods are > > >> implemented using this model and should give you an idea. > > >> > > >> In your case, you will probably need to simply propagate one message > > from > > >> the root node and gather the newly discovered neighbors in each > > superstep. > > >> > > >> I hope this helps! Let us know if you have further questions! > > >> > > >> -Vasia. > > >> > > >> [1]: > > >> > > >> > > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations > > >> > > >> [2]: > > >> > > >> > > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations > > >> > > >> [3]: > > >> > > >> > > > http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library > > >> > > > > > > > > > > > > |
Ok, so, exactly as I wrote a few e-mails back in this thread, you can do
this with a vertex-centric iteration :-) All you need to do is call "myGraph.runVertexCentricIteration(new MyUpdateFunction(), new MyMessagingFunction(), maxIterations)" and define MyUpdateFunction and MyMessagingFunction. The first function defines how a vertex updates its value based on the received messages, while the second defines what messages a vertex sends in each superstep. Inside both functions, you have access to the vertex ID and value, so you can check whether it's the vertex you're interested in. In your case, in the first superstep, the Person vertex sends a message to its neighbors​. You can do this with something like the following inside the the sendMessages() method: for (Edge<K, V> edge : getOutgoingEdges()) { sendMessageTo(edge.getTarget(), msg); } The rest of the vertices don't need to do anything in the first superstep. In the next supersteps, the vertices which have received a message, propagate it to their neighbors in the same way. One thing you need to be careful about is detecting cycles, so that the iteration terminates. One way to do this is to mark the vertices you visit, e.g. by setting a flag in the vertex value and not propagate messages from a "visited" vertex. If you are totally unfamiliar with the vertex-centric model, it might be a good idea to first do some reading on this, in order to understand how it works, for example take a look at the Pregel paper [1]. Let us know how it goes! Cheers, -Vasia. [1]: http://kowshik.github.io/JPregel/pregel_paper.pdf On 14 April 2015 at 18:12, Flavio Pompermaier <[hidden email]> wrote: > Hi Vasia, > for compute subgraph for Person I mean exactly all the vertices that > can be reached > starting from this node and following the graph edges. > I drafted the graph as a set of vertices (where the id is the subject of > the set of triples and the value is all of its triples) > and a set of edges (properties connecting two vertices, this is only > possible if the object is an URI). > > Thus, once computed the subgraph of a Person, if I merge the values of all > reachable vertices, I'll obtain all the triples of such a subgraph. > > On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri < > [hidden email] > > wrote: > > > Hi Flavio, > > > > I'm not quite familiar with RDF or sparql, so not all of your code is > clear > > to me. > > > > Your first TODO is "compute subgraph for Person". Is "Person" a vertex id > > in your graph? A vertex value? > > And by "subgraph of Person", do you mean all the vertices that can be > > reached starting from this node and following the graph edges? > > > > -Vasia. > > > > On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]> > > wrote: > > > > > Hi to all, > > > I made a simple RDF Gelly test and I shared it on my github repo at > > > https://github.com/fpompermaier/rdf-gelly-test. > > > I basically setup the Gelly stuff but I can't proceed and compute the > > > drafted TODOs. > > > Could someone help me and implementing them..? > > > I think this could become a nice example of how Gelly could help in > > > handling RDF graphs :) > > > > > > Best, > > > Flavio > > > > > > On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier < > > [hidden email] > > > > > > > wrote: > > > > > > > Thanks Vasiliki, > > > > when I'll find the time I'll try to make a quick prototype using the > > > > pointers you suggested! > > > > > > > > Thanks for the support, > > > > Flavio > > > > > > > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri < > > > > [hidden email]> wrote: > > > > > > > >> Hi Flavio, > > > >> > > > >> I'm not familiar with JSON-LD, but as far as I understand, you want > to > > > >> generate some trees from selected root nodes. > > > >> > > > >> Once you have created the Graph as Andra describes above, you can > > first > > > >> filter out the edges that are of no interest to you, using > > > filterOnEdges. > > > >> There is a description of how edge filtering works in the Gelly docs > > > [1]. > > > >> Then, you could use a vertex-centric iteration and propagate a > message > > > >> from > > > >> the selected root node to the neighbors recursively, until you have > > the > > > >> tree. > > > >> > > > >> In the vertex-centric model, you program from the perspective of a > > > vertex > > > >> in the graph. You basically need to define what each vertex does > > within > > > >> each iteration (superstep). In Gelly this boils down to two things: > > > >> (a) what messages this vertex will send to its neighbors and > > > >> (b) how a vertex will update its value using the received messages. > > > >> > > > >> This is also described in the Gelly docs [2]. > > > >> Also, take a look at the Gelly library [3]. The library methods are > > > >> implemented using this model and should give you an idea. > > > >> > > > >> In your case, you will probably need to simply propagate one message > > > from > > > >> the root node and gather the newly discovered neighbors in each > > > superstep. > > > >> > > > >> I hope this helps! Let us know if you have further questions! > > > >> > > > >> -Vasia. > > > >> > > > >> [1]: > > > >> > > > >> > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations > > > >> > > > >> [2]: > > > >> > > > >> > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations > > > >> > > > >> [3]: > > > >> > > > >> > > > > > > http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library > > > >> > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |