Queries regarding RDFs with Flink

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Queries regarding RDFs with Flink

santosh_rajaguru
Hello,

how can flink be useful for processing the data to RDFs and build the ontology?

Regards,
Santosh



Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Robert Metzger
Hi Santosh,

I'm not aware of any existing tools in Flink to process RDFs. However,
Flink should be useful for processing such data.
You can probably use an existing RDF parser for Java to get the data into
the system.

Best,
Robert

On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> wrote:

> Hello,
>
> how can flink be useful for processing the data to RDFs and build the
> ontology?
>
> Regards,
> Santosh
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Stephan Ewen
In reply to this post by santosh_rajaguru
Hey Santosh!

RDF processing often involves either joins, or graph-query like operations
(transitive). Flink is fairly good at both types of operations.

I would look into the graph examples and the graph API for a start:

 - Graph examples:
https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
 - Graph API:
https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph

If you have a more specific question, I can give you better pointers ;-)

Stephan


On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]> wrote:

> Hello,
>
> how can flink be useful for processing the data to RDFs and build the
> ontology?
>
> Regards,
> Santosh
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
I have a nice case of RDF manipulation :)
Let's say I have the following RDF triples (Tuple3) in two files or tables:

TABLE A:
http://test/John, type, Person
http://test/John, name, John
http://test/John, knows, http://test/Mary
http://test/John, knows, http://test/Jerry
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Mary, type, Person
http://test/Mary, name, Mary

TABLE B:
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary

What is the best way to build up Person-rooted trees with all node's data
properties and some expanded path like 'Person.knows.marriedWith' ?
Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
from a Key/value store or what?

The expected 4 trees should be:

tree 1 (root is John) ------------------
http://test/John, type, Person
http://test/John, name, John
http://test/John, knows, http://test/Mary
http://test/John, knows, http://test/Jerry
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Mary, type, Person
http://test/Mary, name, Mary
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary

tree 2 (root is Jerry) ------------------
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary
http://test/Mary, type, Person
http://test/Mary, name, Mary

tree 3 (root is Mary) ------------------
http://test/Mary, type, Person
http://test/Mary, name, Mary

tree 4 (root is Frank) ------------------
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary
http://test/Mary, type, Person
http://test/Mary, name, Mary

Thanks in advance,
Flavio

On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote:

> Hey Santosh!
>
> RDF processing often involves either joins, or graph-query like operations
> (transitive). Flink is fairly good at both types of operations.
>
> I would look into the graph examples and the graph API for a start:
>
>  - Graph examples:
>
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
>  - Graph API:
>
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
>
> If you have a more specific question, I can give you better pointers ;-)
>
> Stephan
>
>
> On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]>
> wrote:
>
> > Hello,
> >
> > how can flink be useful for processing the data to RDFs and build the
> > ontology?
> >
> > Regards,
> > Santosh
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> > archive at Nabble.com.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Vasiliki Kalavri
Hi Flavio,

if you want to use Gelly to model your data as a graph, you can load your
Tuple3s as Edges.
This will result in "http://test/John", "Person", "Frank", etc to be
vertices and "type", "name", "knows" to be edge values.
In the first case, you can use filterOnEdges() to get the subgraph with the
relation edges.

Once you have the graph, you could probably use a vertex-centric iteration
to generate the trees.
It seems to me that you need something like a BFS from each vertex. Keep in
mind that this can be a very costly operation in terms of memory and
communication for large graphs.

Let me know if you have any questions!

Cheers,
V.

On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> wrote:

> I have a nice case of RDF manipulation :)
> Let's say I have the following RDF triples (Tuple3) in two files or tables:
>
> TABLE A:
> http://test/John, type, Person
> http://test/John, name, John
> http://test/John, knows, http://test/Mary
> http://test/John, knows, http://test/Jerry
> http://test/Jerry, type, Person
> http://test/Jerry, name, Jerry
> http://test/Jerry, knows, http://test/Frank
> http://test/Mary, type, Person
> http://test/Mary, name, Mary
>
> TABLE B:
> http://test/Frank, type, Person
> http://test/Frank, name, Frank
> http://test/Frank, marriedWith, http://test/Mary
>
> What is the best way to build up Person-rooted trees with all node's data
> properties and some expanded path like 'Person.knows.marriedWith' ?
> Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
> from a Key/value store or what?
>
> The expected 4 trees should be:
>
> tree 1 (root is John) ------------------
> http://test/John, type, Person
> http://test/John, name, John
> http://test/John, knows, http://test/Mary
> http://test/John, knows, http://test/Jerry
> http://test/Jerry, type, Person
> http://test/Jerry, name, Jerry
> http://test/Jerry, knows, http://test/Frank
> http://test/Mary, type, Person
> http://test/Mary, name, Mary
> http://test/Frank, type, Person
> http://test/Frank, name, Frank
> http://test/Frank, marriedWith, http://test/Mary
>
> tree 2 (root is Jerry) ------------------
> http://test/Jerry, type, Person
> http://test/Jerry, name, Jerry
> http://test/Jerry, knows, http://test/Frank
> http://test/Frank, type, Person
> http://test/Frank, name, Frank
> http://test/Frank, marriedWith, http://test/Mary
> http://test/Mary, type, Person
> http://test/Mary, name, Mary
>
> tree 3 (root is Mary) ------------------
> http://test/Mary, type, Person
> http://test/Mary, name, Mary
>
> tree 4 (root is Frank) ------------------
> http://test/Frank, type, Person
> http://test/Frank, name, Frank
> http://test/Frank, marriedWith, http://test/Mary
> http://test/Mary, type, Person
> http://test/Mary, name, Mary
>
> Thanks in advance,
> Flavio
>
> On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hey Santosh!
> >
> > RDF processing often involves either joins, or graph-query like
> operations
> > (transitive). Flink is fairly good at both types of operations.
> >
> > I would look into the graph examples and the graph API for a start:
> >
> >  - Graph examples:
> >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> >  - Graph API:
> >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> >
> > If you have a more specific question, I can give you better pointers ;-)
> >
> > Stephan
> >
> >
> > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]>
> > wrote:
> >
> > > Hello,
> > >
> > > how can flink be useful for processing the data to RDFs and build the
> > > ontology?
> > >
> > > Regards,
> > > Santosh
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> list
> > > archive at Nabble.com.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Hi to all,
I'm back to this task again :)

Summarizing again: I have some source dataset that has contains RDF "stars"
(SubjectURI, RdfType and a list of RDF triples belonging to this subject ->
the "a.k.a." star schema)
and I have to extract some sub-graphs for some RDF types of interest.
As described in the previous email I'd like to expand some root node (if
its type is of interest) and explode some of its path(s).
For example, if I'm interested in the expansion of rdf type Person (as in
the example), I could want to create a mini-graph with all of its triples
plus those obtained exploding the path(s)
knows.marriedWith and knows.knows.knows.
At the moment I do it with a punctual get from HBase but I didn't
get whether this could be done more efficiently with other strategies in
Flink.
@Vasiliki: you said that I could need "something like a BFS from each
vertex".  Do you have an example that could fit my use case? Is it possible
to filter out those vertices I'm interested in?

Thanks in advance,
Flavio


On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <[hidden email]>
wrote:

> Hi Flavio,
>
> if you want to use Gelly to model your data as a graph, you can load your
> Tuple3s as Edges.
> This will result in "http://test/John", "Person", "Frank", etc to be
> vertices and "type", "name", "knows" to be edge values.
> In the first case, you can use filterOnEdges() to get the subgraph with the
> relation edges.
>
> Once you have the graph, you could probably use a vertex-centric iteration
> to generate the trees.
> It seems to me that you need something like a BFS from each vertex. Keep in
> mind that this can be a very costly operation in terms of memory and
> communication for large graphs.
>
> Let me know if you have any questions!
>
> Cheers,
> V.
>
> On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]> wrote:
>
> > I have a nice case of RDF manipulation :)
> > Let's say I have the following RDF triples (Tuple3) in two files or
> tables:
> >
> > TABLE A:
> > http://test/John, type, Person
> > http://test/John, name, John
> > http://test/John, knows, http://test/Mary
> > http://test/John, knows, http://test/Jerry
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > TABLE B:
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> >
> > What is the best way to build up Person-rooted trees with all node's data
> > properties and some expanded path like 'Person.knows.marriedWith' ?
> > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
> > from a Key/value store or what?
> >
> > The expected 4 trees should be:
> >
> > tree 1 (root is John) ------------------
> > http://test/John, type, Person
> > http://test/John, name, John
> > http://test/John, knows, http://test/Mary
> > http://test/John, knows, http://test/Jerry
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> >
> > tree 2 (root is Jerry) ------------------
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > tree 3 (root is Mary) ------------------
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > tree 4 (root is Frank) ------------------
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > Thanks in advance,
> > Flavio
> >
> > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Hey Santosh!
> > >
> > > RDF processing often involves either joins, or graph-query like
> > operations
> > > (transitive). Flink is fairly good at both types of operations.
> > >
> > > I would look into the graph examples and the graph API for a start:
> > >
> > >  - Graph examples:
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > >  - Graph API:
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > >
> > > If you have a more specific question, I can give you better pointers
> ;-)
> > >
> > > Stephan
> > >
> > >
> > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > how can flink be useful for processing the data to RDFs and build the
> > > > ontology?
> > > >
> > > > Regards,
> > > > Santosh
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> > list
> > > > archive at Nabble.com.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Stephan Ewen
Hi Flavio!

I see initially two ways of doing this:

1) Do a series of joins. You start with your subject and join two or three
times using the "objects-from-triplets == subject" to make one hop. You can
filter the verbs from the triplets before if you are only interested in a
special relationship.

2) If you want to recursively explode the subgraph (something like all
reachable subjects) or do a rather long series of hops, then you should be
able to model this nicely as a delta iterations, or as a vertex-centric
graph computation. For that, you can use both "Gelly" (the graph library)
or the standalone Spargel operator (Giraph-like).

Does that help with your questions?

Greetings,
Stephan


On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <[hidden email]>
wrote:

> Hi to all,
> I'm back to this task again :)
>
> Summarizing again: I have some source dataset that has contains RDF "stars"
> (SubjectURI, RdfType and a list of RDF triples belonging to this subject ->
> the "a.k.a." star schema)
> and I have to extract some sub-graphs for some RDF types of interest.
> As described in the previous email I'd like to expand some root node (if
> its type is of interest) and explode some of its path(s).
> For example, if I'm interested in the expansion of rdf type Person (as in
> the example), I could want to create a mini-graph with all of its triples
> plus those obtained exploding the path(s)
> knows.marriedWith and knows.knows.knows.
> At the moment I do it with a punctual get from HBase but I didn't
> get whether this could be done more efficiently with other strategies in
> Flink.
> @Vasiliki: you said that I could need "something like a BFS from each
> vertex".  Do you have an example that could fit my use case? Is it possible
> to filter out those vertices I'm interested in?
>
> Thanks in advance,
> Flavio
>
>
> On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <
> [hidden email]>
> wrote:
>
> > Hi Flavio,
> >
> > if you want to use Gelly to model your data as a graph, you can load your
> > Tuple3s as Edges.
> > This will result in "http://test/John", "Person", "Frank", etc to be
> > vertices and "type", "name", "knows" to be edge values.
> > In the first case, you can use filterOnEdges() to get the subgraph with
> the
> > relation edges.
> >
> > Once you have the graph, you could probably use a vertex-centric
> iteration
> > to generate the trees.
> > It seems to me that you need something like a BFS from each vertex. Keep
> in
> > mind that this can be a very costly operation in terms of memory and
> > communication for large graphs.
> >
> > Let me know if you have any questions!
> >
> > Cheers,
> > V.
> >
> > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]>
> wrote:
> >
> > > I have a nice case of RDF manipulation :)
> > > Let's say I have the following RDF triples (Tuple3) in two files or
> > tables:
> > >
> > > TABLE A:
> > > http://test/John, type, Person
> > > http://test/John, name, John
> > > http://test/John, knows, http://test/Mary
> > > http://test/John, knows, http://test/Jerry
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > TABLE B:
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > >
> > > What is the best way to build up Person-rooted trees with all node's
> data
> > > properties and some expanded path like 'Person.knows.marriedWith' ?
> > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals
> get
> > > from a Key/value store or what?
> > >
> > > The expected 4 trees should be:
> > >
> > > tree 1 (root is John) ------------------
> > > http://test/John, type, Person
> > > http://test/John, name, John
> > > http://test/John, knows, http://test/Mary
> > > http://test/John, knows, http://test/Jerry
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > >
> > > tree 2 (root is Jerry) ------------------
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > tree 3 (root is Mary) ------------------
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > tree 4 (root is Frank) ------------------
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > Thanks in advance,
> > > Flavio
> > >
> > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]> wrote:
> > >
> > > > Hey Santosh!
> > > >
> > > > RDF processing often involves either joins, or graph-query like
> > > operations
> > > > (transitive). Flink is fairly good at both types of operations.
> > > >
> > > > I would look into the graph examples and the graph API for a start:
> > > >
> > > >  - Graph examples:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > > >  - Graph API:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > > >
> > > > If you have a more specific question, I can give you better pointers
> > ;-)
> > > >
> > > > Stephan
> > > >
> > > >
> > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > how can flink be useful for processing the data to RDFs and build
> the
> > > > > ontology?
> > > > >
> > > > > Regards,
> > > > > Santosh
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > > Sent from the Apache Flink (Incubator) Mailing List archive.
> mailing
> > > list
> > > > > archive at Nabble.com.
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Hi Stephan,
thanks for the response. Unfortunately I'm not familiar with the new Gelly
APIs and the old Spargel ones (I still don't understand the difference
actually).
Do you think it is possible to add such an example to the
documentation/examples?

Best,
Flavio



On Sat, Mar 21, 2015 at 7:48 PM, Stephan Ewen <[hidden email]> wrote:

> Hi Flavio!
>
> I see initially two ways of doing this:
>
> 1) Do a series of joins. You start with your subject and join two or three
> times using the "objects-from-triplets == subject" to make one hop. You can
> filter the verbs from the triplets before if you are only interested in a
> special relationship.
>
> 2) If you want to recursively explode the subgraph (something like all
> reachable subjects) or do a rather long series of hops, then you should be
> able to model this nicely as a delta iterations, or as a vertex-centric
> graph computation. For that, you can use both "Gelly" (the graph library)
> or the standalone Spargel operator (Giraph-like).
>
> Does that help with your questions?
>
> Greetings,
> Stephan
>
>
> On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <[hidden email]>
> wrote:
>
> > Hi to all,
> > I'm back to this task again :)
> >
> > Summarizing again: I have some source dataset that has contains RDF
> "stars"
> > (SubjectURI, RdfType and a list of RDF triples belonging to this subject
> ->
> > the "a.k.a." star schema)
> > and I have to extract some sub-graphs for some RDF types of interest.
> > As described in the previous email I'd like to expand some root node (if
> > its type is of interest) and explode some of its path(s).
> > For example, if I'm interested in the expansion of rdf type Person (as in
> > the example), I could want to create a mini-graph with all of its triples
> > plus those obtained exploding the path(s)
> > knows.marriedWith and knows.knows.knows.
> > At the moment I do it with a punctual get from HBase but I didn't
> > get whether this could be done more efficiently with other strategies in
> > Flink.
> > @Vasiliki: you said that I could need "something like a BFS from each
> > vertex".  Do you have an example that could fit my use case? Is it
> possible
> > to filter out those vertices I'm interested in?
> >
> > Thanks in advance,
> > Flavio
> >
> >
> > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <
> > [hidden email]>
> > wrote:
> >
> > > Hi Flavio,
> > >
> > > if you want to use Gelly to model your data as a graph, you can load
> your
> > > Tuple3s as Edges.
> > > This will result in "http://test/John", "Person", "Frank", etc to be
> > > vertices and "type", "name", "knows" to be edge values.
> > > In the first case, you can use filterOnEdges() to get the subgraph with
> > the
> > > relation edges.
> > >
> > > Once you have the graph, you could probably use a vertex-centric
> > iteration
> > > to generate the trees.
> > > It seems to me that you need something like a BFS from each vertex.
> Keep
> > in
> > > mind that this can be a very costly operation in terms of memory and
> > > communication for large graphs.
> > >
> > > Let me know if you have any questions!
> > >
> > > Cheers,
> > > V.
> > >
> > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]>
> > wrote:
> > >
> > > > I have a nice case of RDF manipulation :)
> > > > Let's say I have the following RDF triples (Tuple3) in two files or
> > > tables:
> > > >
> > > > TABLE A:
> > > > http://test/John, type, Person
> > > > http://test/John, name, John
> > > > http://test/John, knows, http://test/Mary
> > > > http://test/John, knows, http://test/Jerry
> > > > http://test/Jerry, type, Person
> > > > http://test/Jerry, name, Jerry
> > > > http://test/Jerry, knows, http://test/Frank
> > > > http://test/Mary, type, Person
> > > > http://test/Mary, name, Mary
> > > >
> > > > TABLE B:
> > > > http://test/Frank, type, Person
> > > > http://test/Frank, name, Frank
> > > > http://test/Frank, marriedWith, http://test/Mary
> > > >
> > > > What is the best way to build up Person-rooted trees with all node's
> > data
> > > > properties and some expanded path like 'Person.knows.marriedWith' ?
> > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals
> > get
> > > > from a Key/value store or what?
> > > >
> > > > The expected 4 trees should be:
> > > >
> > > > tree 1 (root is John) ------------------
> > > > http://test/John, type, Person
> > > > http://test/John, name, John
> > > > http://test/John, knows, http://test/Mary
> > > > http://test/John, knows, http://test/Jerry
> > > > http://test/Jerry, type, Person
> > > > http://test/Jerry, name, Jerry
> > > > http://test/Jerry, knows, http://test/Frank
> > > > http://test/Mary, type, Person
> > > > http://test/Mary, name, Mary
> > > > http://test/Frank, type, Person
> > > > http://test/Frank, name, Frank
> > > > http://test/Frank, marriedWith, http://test/Mary
> > > >
> > > > tree 2 (root is Jerry) ------------------
> > > > http://test/Jerry, type, Person
> > > > http://test/Jerry, name, Jerry
> > > > http://test/Jerry, knows, http://test/Frank
> > > > http://test/Frank, type, Person
> > > > http://test/Frank, name, Frank
> > > > http://test/Frank, marriedWith, http://test/Mary
> > > > http://test/Mary, type, Person
> > > > http://test/Mary, name, Mary
> > > >
> > > > tree 3 (root is Mary) ------------------
> > > > http://test/Mary, type, Person
> > > > http://test/Mary, name, Mary
> > > >
> > > > tree 4 (root is Frank) ------------------
> > > > http://test/Frank, type, Person
> > > > http://test/Frank, name, Frank
> > > > http://test/Frank, marriedWith, http://test/Mary
> > > > http://test/Mary, type, Person
> > > > http://test/Mary, name, Mary
> > > >
> > > > Thanks in advance,
> > > > Flavio
> > > >
> > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]>
> wrote:
> > > >
> > > > > Hey Santosh!
> > > > >
> > > > > RDF processing often involves either joins, or graph-query like
> > > > operations
> > > > > (transitive). Flink is fairly good at both types of operations.
> > > > >
> > > > > I would look into the graph examples and the graph API for a start:
> > > > >
> > > > >  - Graph examples:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > > > >  - Graph API:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > > > >
> > > > > If you have a more specific question, I can give you better
> pointers
> > > ;-)
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <
> [hidden email]
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > how can flink be useful for processing the data to RDFs and build
> > the
> > > > > > ontology?
> > > > > >
> > > > > > Regards,
> > > > > > Santosh
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > View this message in context:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > > > Sent from the Apache Flink (Incubator) Mailing List archive.
> > mailing
> > > > list
> > > > > > archive at Nabble.com.
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Stephan Ewen
Gelly has a section in the docs, it should explain the vertex-centric
iterations. Is that not extensive enough?
Am 22.03.2015 12:04 schrieb "Flavio Pompermaier" <[hidden email]>:

> Hi Stephan,
> thanks for the response. Unfortunately I'm not familiar with the new Gelly
> APIs and the old Spargel ones (I still don't understand the difference
> actually).
> Do you think it is possible to add such an example to the
> documentation/examples?
>
> Best,
> Flavio
>
>
>
> On Sat, Mar 21, 2015 at 7:48 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi Flavio!
> >
> > I see initially two ways of doing this:
> >
> > 1) Do a series of joins. You start with your subject and join two or
> three
> > times using the "objects-from-triplets == subject" to make one hop. You
> can
> > filter the verbs from the triplets before if you are only interested in a
> > special relationship.
> >
> > 2) If you want to recursively explode the subgraph (something like all
> > reachable subjects) or do a rather long series of hops, then you should
> be
> > able to model this nicely as a delta iterations, or as a vertex-centric
> > graph computation. For that, you can use both "Gelly" (the graph library)
> > or the standalone Spargel operator (Giraph-like).
> >
> > Does that help with your questions?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <
> [hidden email]>
> > wrote:
> >
> > > Hi to all,
> > > I'm back to this task again :)
> > >
> > > Summarizing again: I have some source dataset that has contains RDF
> > "stars"
> > > (SubjectURI, RdfType and a list of RDF triples belonging to this
> subject
> > ->
> > > the "a.k.a." star schema)
> > > and I have to extract some sub-graphs for some RDF types of interest.
> > > As described in the previous email I'd like to expand some root node
> (if
> > > its type is of interest) and explode some of its path(s).
> > > For example, if I'm interested in the expansion of rdf type Person (as
> in
> > > the example), I could want to create a mini-graph with all of its
> triples
> > > plus those obtained exploding the path(s)
> > > knows.marriedWith and knows.knows.knows.
> > > At the moment I do it with a punctual get from HBase but I didn't
> > > get whether this could be done more efficiently with other strategies
> in
> > > Flink.
> > > @Vasiliki: you said that I could need "something like a BFS from each
> > > vertex".  Do you have an example that could fit my use case? Is it
> > possible
> > > to filter out those vertices I'm interested in?
> > >
> > > Thanks in advance,
> > > Flavio
> > >
> > >
> > > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi Flavio,
> > > >
> > > > if you want to use Gelly to model your data as a graph, you can load
> > your
> > > > Tuple3s as Edges.
> > > > This will result in "http://test/John", "Person", "Frank", etc to be
> > > > vertices and "type", "name", "knows" to be edge values.
> > > > In the first case, you can use filterOnEdges() to get the subgraph
> with
> > > the
> > > > relation edges.
> > > >
> > > > Once you have the graph, you could probably use a vertex-centric
> > > iteration
> > > > to generate the trees.
> > > > It seems to me that you need something like a BFS from each vertex.
> > Keep
> > > in
> > > > mind that this can be a very costly operation in terms of memory and
> > > > communication for large graphs.
> > > >
> > > > Let me know if you have any questions!
> > > >
> > > > Cheers,
> > > > V.
> > > >
> > > > On 3 March 2015 at 09:13, Flavio Pompermaier <[hidden email]>
> > > wrote:
> > > >
> > > > > I have a nice case of RDF manipulation :)
> > > > > Let's say I have the following RDF triples (Tuple3) in two files or
> > > > tables:
> > > > >
> > > > > TABLE A:
> > > > > http://test/John, type, Person
> > > > > http://test/John, name, John
> > > > > http://test/John, knows, http://test/Mary
> > > > > http://test/John, knows, http://test/Jerry
> > > > > http://test/Jerry, type, Person
> > > > > http://test/Jerry, name, Jerry
> > > > > http://test/Jerry, knows, http://test/Frank
> > > > > http://test/Mary, type, Person
> > > > > http://test/Mary, name, Mary
> > > > >
> > > > > TABLE B:
> > > > > http://test/Frank, type, Person
> > > > > http://test/Frank, name, Frank
> > > > > http://test/Frank, marriedWith, http://test/Mary
> > > > >
> > > > > What is the best way to build up Person-rooted trees with all
> node's
> > > data
> > > > > properties and some expanded path like 'Person.knows.marriedWith' ?
> > > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple
> punctuals
> > > get
> > > > > from a Key/value store or what?
> > > > >
> > > > > The expected 4 trees should be:
> > > > >
> > > > > tree 1 (root is John) ------------------
> > > > > http://test/John, type, Person
> > > > > http://test/John, name, John
> > > > > http://test/John, knows, http://test/Mary
> > > > > http://test/John, knows, http://test/Jerry
> > > > > http://test/Jerry, type, Person
> > > > > http://test/Jerry, name, Jerry
> > > > > http://test/Jerry, knows, http://test/Frank
> > > > > http://test/Mary, type, Person
> > > > > http://test/Mary, name, Mary
> > > > > http://test/Frank, type, Person
> > > > > http://test/Frank, name, Frank
> > > > > http://test/Frank, marriedWith, http://test/Mary
> > > > >
> > > > > tree 2 (root is Jerry) ------------------
> > > > > http://test/Jerry, type, Person
> > > > > http://test/Jerry, name, Jerry
> > > > > http://test/Jerry, knows, http://test/Frank
> > > > > http://test/Frank, type, Person
> > > > > http://test/Frank, name, Frank
> > > > > http://test/Frank, marriedWith, http://test/Mary
> > > > > http://test/Mary, type, Person
> > > > > http://test/Mary, name, Mary
> > > > >
> > > > > tree 3 (root is Mary) ------------------
> > > > > http://test/Mary, type, Person
> > > > > http://test/Mary, name, Mary
> > > > >
> > > > > tree 4 (root is Frank) ------------------
> > > > > http://test/Frank, type, Person
> > > > > http://test/Frank, name, Frank
> > > > > http://test/Frank, marriedWith, http://test/Mary
> > > > > http://test/Mary, type, Person
> > > > > http://test/Mary, name, Mary
> > > > >
> > > > > Thanks in advance,
> > > > > Flavio
> > > > >
> > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hey Santosh!
> > > > > >
> > > > > > RDF processing often involves either joins, or graph-query like
> > > > > operations
> > > > > > (transitive). Flink is fairly good at both types of operations.
> > > > > >
> > > > > > I would look into the graph examples and the graph API for a
> start:
> > > > > >
> > > > > >  - Graph examples:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > > > > >  - Graph API:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > > > > >
> > > > > > If you have a more specific question, I can give you better
> > pointers
> > > > ;-)
> > > > > >
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > how can flink be useful for processing the data to RDFs and
> build
> > > the
> > > > > > > ontology?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Santosh
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > View this message in context:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > > > > Sent from the Apache Flink (Incubator) Mailing List archive.
> > > mailing
> > > > > list
> > > > > > > archive at Nabble.com.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Fabian Hueske-2
In reply to this post by Flavio Pompermaier
Hi Flavio,

also, Gelly is a superset of Spargel. It provides the same features and
much more.

Since RDF is graph-structured, Gelly might be a good fit for your use case.

Cheers, Fabian
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Is there anu example about rdf graph generation based on a skeleton
structure?
On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote:

> Hi Flavio,
>
> also, Gelly is a superset of Spargel. It provides the same features and
> much more.
>
> Since RDF is graph-structured, Gelly might be a good fit for your use case.
>
> Cheers, Fabian
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Andra Lungu
Hi Flavio,

We don't have a specific example for generating RDF graphs using Gelly, but
I will try to drop some lines of code here and hope you will find them
useful.

An RDF statement is formed of Subject - Predicate - Object triples. In Edge
notation, the Subject and the Object will be the source and target vertices
respectively, while the edge value will be the predicate.

This being said, say you have an input plain text file that represents the
edges.
A line would look like this : http://test/Frank, marriedWith,
http://test/Mary

This is internally coded in Flink like a Tuple3. So, to read this edge file
you should have something like this:

private static DataSet<Edge<String, String>>
getEdgesDataSet(ExecutionEnvironment env) {
   if (fileOutput) {
      return env.readCsvFile(edgesInputPath)
            .lineDelimiter("\n")

            // the subject, predicate, object

            .types(String.class, String.class, String.class)
            .map(new MapFunction<Tuple3<String, String, String>,
                                                      Edge<String, String>>() {

               @Override
               public Edge<String, String> map(Tuple3<String, String,
String> tuple3) throws Exception {
                  return new Edge(tuple3.f0, tuple3.f2, tuple3.f1);
               }
            });
   } else {
      return getDefaultEdges(env);
   }
}

After you have this, in your main method, you just write:
Graph<Long, String, String> rdfGraph = Graph.fromDataSet(edges, env);

I picked up the conversation later on, not sure if that's what you meant by
"graph generation"...

Cheers,
Andra

On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier <[hidden email]>
wrote:

> Is there anu example about rdf graph generation based on a skeleton
> structure?
> On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote:
>
> > Hi Flavio,
> >
> > also, Gelly is a superset of Spargel. It provides the same features and
> > much more.
> >
> > Since RDF is graph-structured, Gelly might be a good fit for your use
> case.
> >
> > Cheers, Fabian
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Thanks Andrea for the help!
For graph generation I mean that I'd like to materialize subgraphs of my
overall knowledge starting from some root nodes whose rdf type is of
interest (something similar to what JSON-LD does). Is that clear?
On Mar 22, 2015 1:09 PM, "Andra Lungu" <[hidden email]> wrote:

> Hi Flavio,
>
> We don't have a specific example for generating RDF graphs using Gelly, but
> I will try to drop some lines of code here and hope you will find them
> useful.
>
> An RDF statement is formed of Subject - Predicate - Object triples. In Edge
> notation, the Subject and the Object will be the source and target vertices
> respectively, while the edge value will be the predicate.
>
> This being said, say you have an input plain text file that represents the
> edges.
> A line would look like this : http://test/Frank, marriedWith,
> http://test/Mary
>
> This is internally coded in Flink like a Tuple3. So, to read this edge file
> you should have something like this:
>
> private static DataSet<Edge<String, String>>
> getEdgesDataSet(ExecutionEnvironment env) {
>    if (fileOutput) {
>       return env.readCsvFile(edgesInputPath)
>             .lineDelimiter("\n")
>
>             // the subject, predicate, object
>
>             .types(String.class, String.class, String.class)
>             .map(new MapFunction<Tuple3<String, String, String>,
>                                                       Edge<String,
> String>>() {
>
>                @Override
>                public Edge<String, String> map(Tuple3<String, String,
> String> tuple3) throws Exception {
>                   return new Edge(tuple3.f0, tuple3.f2, tuple3.f1);
>                }
>             });
>    } else {
>       return getDefaultEdges(env);
>    }
> }
>
> After you have this, in your main method, you just write:
> Graph<Long, String, String> rdfGraph = Graph.fromDataSet(edges, env);
>
> I picked up the conversation later on, not sure if that's what you meant by
> "graph generation"...
>
> Cheers,
> Andra
>
> On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier <[hidden email]
> >
> wrote:
>
> > Is there anu example about rdf graph generation based on a skeleton
> > structure?
> > On Mar 22, 2015 12:28 PM, "Fabian Hueske" <[hidden email]> wrote:
> >
> > > Hi Flavio,
> > >
> > > also, Gelly is a superset of Spargel. It provides the same features and
> > > much more.
> > >
> > > Since RDF is graph-structured, Gelly might be a good fit for your use
> > case.
> > >
> > > Cheers, Fabian
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Vasiliki Kalavri
Hi Flavio,

I'm not familiar with JSON-LD, but as far as I understand, you want to
generate some trees from selected root nodes.

Once you have created the Graph as Andra describes above, you can first
filter out the edges that are of no interest to you, using filterOnEdges.
There is a description of how edge filtering works in the Gelly docs [1].
Then, you could use a vertex-centric iteration and propagate a message from
the selected root node to the neighbors recursively, until you have the
tree.

In the vertex-centric model, you program from the perspective of a vertex
in the graph. You basically need to define what each vertex does within
each iteration (superstep). In Gelly this boils down to two things:
(a) what messages this vertex will send to its neighbors and
(b) how a vertex will update its value using the received messages.

This is also described in the Gelly docs [2].
Also, take a look at the Gelly library [3]. The library methods are
implemented using this model and should give you an idea.

In your case, you will probably need to simply propagate one message from
the root node and gather the newly discovered neighbors in each superstep.

I hope this helps! Let us know if you have further questions!

-Vasia.

[1]:
http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations

[2]:
http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations

[3]:
http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Thanks Vasiliki,
when I'll find the time I'll try to make a quick prototype using the
pointers you suggested!

Thanks for the support,
Flavio

On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri <
[hidden email]> wrote:

> Hi Flavio,
>
> I'm not familiar with JSON-LD, but as far as I understand, you want to
> generate some trees from selected root nodes.
>
> Once you have created the Graph as Andra describes above, you can first
> filter out the edges that are of no interest to you, using filterOnEdges.
> There is a description of how edge filtering works in the Gelly docs [1].
> Then, you could use a vertex-centric iteration and propagate a message from
> the selected root node to the neighbors recursively, until you have the
> tree.
>
> In the vertex-centric model, you program from the perspective of a vertex
> in the graph. You basically need to define what each vertex does within
> each iteration (superstep). In Gelly this boils down to two things:
> (a) what messages this vertex will send to its neighbors and
> (b) how a vertex will update its value using the received messages.
>
> This is also described in the Gelly docs [2].
> Also, take a look at the Gelly library [3]. The library methods are
> implemented using this model and should give you an idea.
>
> In your case, you will probably need to simply propagate one message from
> the root node and gather the newly discovered neighbors in each superstep.
>
> I hope this helps! Let us know if you have further questions!
>
> -Vasia.
>
> [1]:
>
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
>
> [2]:
>
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
>
> [3]:
>
> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Hi to all,
I made a simple RDF Gelly test and I shared it on my github repo at
https://github.com/fpompermaier/rdf-gelly-test.
I basically setup the Gelly stuff but I can't proceed and compute the
drafted TODOs.
Could someone help me and implementing them..?
I think this could become a nice example of how Gelly could help in
handling RDF graphs :)

Best,
Flavio

On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <[hidden email]>
wrote:

> Thanks Vasiliki,
> when I'll find the time I'll try to make a quick prototype using the
> pointers you suggested!
>
> Thanks for the support,
> Flavio
>
> On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri <
> [hidden email]> wrote:
>
>> Hi Flavio,
>>
>> I'm not familiar with JSON-LD, but as far as I understand, you want to
>> generate some trees from selected root nodes.
>>
>> Once you have created the Graph as Andra describes above, you can first
>> filter out the edges that are of no interest to you, using filterOnEdges.
>> There is a description of how edge filtering works in the Gelly docs [1].
>> Then, you could use a vertex-centric iteration and propagate a message
>> from
>> the selected root node to the neighbors recursively, until you have the
>> tree.
>>
>> In the vertex-centric model, you program from the perspective of a vertex
>> in the graph. You basically need to define what each vertex does within
>> each iteration (superstep). In Gelly this boils down to two things:
>> (a) what messages this vertex will send to its neighbors and
>> (b) how a vertex will update its value using the received messages.
>>
>> This is also described in the Gelly docs [2].
>> Also, take a look at the Gelly library [3]. The library methods are
>> implemented using this model and should give you an idea.
>>
>> In your case, you will probably need to simply propagate one message from
>> the root node and gather the newly discovered neighbors in each superstep.
>>
>> I hope this helps! Let us know if you have further questions!
>>
>> -Vasia.
>>
>> [1]:
>>
>> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
>>
>> [2]:
>>
>> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
>>
>> [3]:
>>
>> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Vasiliki Kalavri
Hi Flavio,

I'm not quite familiar with RDF or sparql, so not all of your code is clear
to me.

Your first TODO is "compute subgraph for Person". Is "Person" a vertex id
in your graph? A vertex value?
And by "subgraph of Person", do you mean all the vertices that can be
reached starting from this node and following the graph edges?

-Vasia.

On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]> wrote:

> Hi to all,
> I made a simple RDF Gelly test and I shared it on my github repo at
> https://github.com/fpompermaier/rdf-gelly-test.
> I basically setup the Gelly stuff but I can't proceed and compute the
> drafted TODOs.
> Could someone help me and implementing them..?
> I think this could become a nice example of how Gelly could help in
> handling RDF graphs :)
>
> Best,
> Flavio
>
> On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <[hidden email]
> >
> wrote:
>
> > Thanks Vasiliki,
> > when I'll find the time I'll try to make a quick prototype using the
> > pointers you suggested!
> >
> > Thanks for the support,
> > Flavio
> >
> > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri <
> > [hidden email]> wrote:
> >
> >> Hi Flavio,
> >>
> >> I'm not familiar with JSON-LD, but as far as I understand, you want to
> >> generate some trees from selected root nodes.
> >>
> >> Once you have created the Graph as Andra describes above, you can first
> >> filter out the edges that are of no interest to you, using
> filterOnEdges.
> >> There is a description of how edge filtering works in the Gelly docs
> [1].
> >> Then, you could use a vertex-centric iteration and propagate a message
> >> from
> >> the selected root node to the neighbors recursively, until you have the
> >> tree.
> >>
> >> In the vertex-centric model, you program from the perspective of a
> vertex
> >> in the graph. You basically need to define what each vertex does within
> >> each iteration (superstep). In Gelly this boils down to two things:
> >> (a) what messages this vertex will send to its neighbors and
> >> (b) how a vertex will update its value using the received messages.
> >>
> >> This is also described in the Gelly docs [2].
> >> Also, take a look at the Gelly library [3]. The library methods are
> >> implemented using this model and should give you an idea.
> >>
> >> In your case, you will probably need to simply propagate one message
> from
> >> the root node and gather the newly discovered neighbors in each
> superstep.
> >>
> >> I hope this helps! Let us know if you have further questions!
> >>
> >> -Vasia.
> >>
> >> [1]:
> >>
> >>
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
> >>
> >> [2]:
> >>
> >>
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
> >>
> >> [3]:
> >>
> >>
> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
> >>
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Flavio Pompermaier
Hi Vasia,
for compute subgraph for Person I mean exactly all the vertices that
can be reached
starting from this node and following the graph edges.
I drafted the graph as a set of vertices (where the id is the subject of
the set of triples and the value is all of its triples)
and a set of edges (properties connecting two vertices, this is only
possible if the object is an URI).

Thus, once computed the subgraph of a Person, if I merge the values of all
reachable vertices, I'll obtain all the triples of such a subgraph.

On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri <[hidden email]
> wrote:

> Hi Flavio,
>
> I'm not quite familiar with RDF or sparql, so not all of your code is clear
> to me.
>
> Your first TODO is "compute subgraph for Person". Is "Person" a vertex id
> in your graph? A vertex value?
> And by "subgraph of Person", do you mean all the vertices that can be
> reached starting from this node and following the graph edges?
>
> -Vasia.
>
> On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]>
> wrote:
>
> > Hi to all,
> > I made a simple RDF Gelly test and I shared it on my github repo at
> > https://github.com/fpompermaier/rdf-gelly-test.
> > I basically setup the Gelly stuff but I can't proceed and compute the
> > drafted TODOs.
> > Could someone help me and implementing them..?
> > I think this could become a nice example of how Gelly could help in
> > handling RDF graphs :)
> >
> > Best,
> > Flavio
> >
> > On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <
> [hidden email]
> > >
> > wrote:
> >
> > > Thanks Vasiliki,
> > > when I'll find the time I'll try to make a quick prototype using the
> > > pointers you suggested!
> > >
> > > Thanks for the support,
> > > Flavio
> > >
> > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri <
> > > [hidden email]> wrote:
> > >
> > >> Hi Flavio,
> > >>
> > >> I'm not familiar with JSON-LD, but as far as I understand, you want to
> > >> generate some trees from selected root nodes.
> > >>
> > >> Once you have created the Graph as Andra describes above, you can
> first
> > >> filter out the edges that are of no interest to you, using
> > filterOnEdges.
> > >> There is a description of how edge filtering works in the Gelly docs
> > [1].
> > >> Then, you could use a vertex-centric iteration and propagate a message
> > >> from
> > >> the selected root node to the neighbors recursively, until you have
> the
> > >> tree.
> > >>
> > >> In the vertex-centric model, you program from the perspective of a
> > vertex
> > >> in the graph. You basically need to define what each vertex does
> within
> > >> each iteration (superstep). In Gelly this boils down to two things:
> > >> (a) what messages this vertex will send to its neighbors and
> > >> (b) how a vertex will update its value using the received messages.
> > >>
> > >> This is also described in the Gelly docs [2].
> > >> Also, take a look at the Gelly library [3]. The library methods are
> > >> implemented using this model and should give you an idea.
> > >>
> > >> In your case, you will probably need to simply propagate one message
> > from
> > >> the root node and gather the newly discovered neighbors in each
> > superstep.
> > >>
> > >> I hope this helps! Let us know if you have further questions!
> > >>
> > >> -Vasia.
> > >>
> > >> [1]:
> > >>
> > >>
> >
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
> > >>
> > >> [2]:
> > >>
> > >>
> >
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
> > >>
> > >> [3]:
> > >>
> > >>
> >
> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
> > >>
> > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Queries regarding RDFs with Flink

Vasiliki Kalavri
Ok, so, exactly as I wrote a few e-mails back in this thread, you can do
this with a vertex-centric iteration :-)

All you need to do is call "myGraph.runVertexCentricIteration(new
MyUpdateFunction(), new MyMessagingFunction(), maxIterations)"
and define MyUpdateFunction and MyMessagingFunction.

The first function defines how a vertex updates its value based on the
received messages, while the second defines what messages a vertex sends in
each superstep.
Inside both functions, you have access to the vertex ID and value, so you
can check whether it's the vertex you're interested in.

In your case, in the first superstep, the Person vertex sends a message to
its neighbors​.
You can do this with something like the following inside the the
sendMessages() method:

for (Edge<K, V> edge : getOutgoingEdges()) {

  sendMessageTo(edge.getTarget(), msg);

}
The rest of the vertices don't need to do anything in the first superstep.
In the next supersteps, the vertices which have received a message,
propagate it to their neighbors in the same way.

One thing you need to be careful about is detecting cycles, so that the
iteration terminates. One way to do this is to mark the vertices you visit,
e.g. by setting a flag in the vertex value and not propagate messages from
a "visited" vertex.

If you are totally unfamiliar with the vertex-centric model, it might be a
good idea to first do some reading on this, in order to understand how it
works, for example take a look at the Pregel paper [1].

Let us know how it goes!

Cheers,
-Vasia.

[1]: http://kowshik.github.io/JPregel/pregel_paper.pdf


On 14 April 2015 at 18:12, Flavio Pompermaier <[hidden email]> wrote:

> Hi Vasia,
> for compute subgraph for Person I mean exactly all the vertices that
> can be reached
> starting from this node and following the graph edges.
> I drafted the graph as a set of vertices (where the id is the subject of
> the set of triples and the value is all of its triples)
> and a set of edges (properties connecting two vertices, this is only
> possible if the object is an URI).
>
> Thus, once computed the subgraph of a Person, if I merge the values of all
> reachable vertices, I'll obtain all the triples of such a subgraph.
>
> On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri <
> [hidden email]
> > wrote:
>
> > Hi Flavio,
> >
> > I'm not quite familiar with RDF or sparql, so not all of your code is
> clear
> > to me.
> >
> > Your first TODO is "compute subgraph for Person". Is "Person" a vertex id
> > in your graph? A vertex value?
> > And by "subgraph of Person", do you mean all the vertices that can be
> > reached starting from this node and following the graph edges?
> >
> > -Vasia.
> >
> > On 14 April 2015 at 10:37, Flavio Pompermaier <[hidden email]>
> > wrote:
> >
> > > Hi to all,
> > > I made a simple RDF Gelly test and I shared it on my github repo at
> > > https://github.com/fpompermaier/rdf-gelly-test.
> > > I basically setup the Gelly stuff but I can't proceed and compute the
> > > drafted TODOs.
> > > Could someone help me and implementing them..?
> > > I think this could become a nice example of how Gelly could help in
> > > handling RDF graphs :)
> > >
> > > Best,
> > > Flavio
> > >
> > > On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier <
> > [hidden email]
> > > >
> > > wrote:
> > >
> > > > Thanks Vasiliki,
> > > > when I'll find the time I'll try to make a quick prototype using the
> > > > pointers you suggested!
> > > >
> > > > Thanks for the support,
> > > > Flavio
> > > >
> > > > On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri <
> > > > [hidden email]> wrote:
> > > >
> > > >> Hi Flavio,
> > > >>
> > > >> I'm not familiar with JSON-LD, but as far as I understand, you want
> to
> > > >> generate some trees from selected root nodes.
> > > >>
> > > >> Once you have created the Graph as Andra describes above, you can
> > first
> > > >> filter out the edges that are of no interest to you, using
> > > filterOnEdges.
> > > >> There is a description of how edge filtering works in the Gelly docs
> > > [1].
> > > >> Then, you could use a vertex-centric iteration and propagate a
> message
> > > >> from
> > > >> the selected root node to the neighbors recursively, until you have
> > the
> > > >> tree.
> > > >>
> > > >> In the vertex-centric model, you program from the perspective of a
> > > vertex
> > > >> in the graph. You basically need to define what each vertex does
> > within
> > > >> each iteration (superstep). In Gelly this boils down to two things:
> > > >> (a) what messages this vertex will send to its neighbors and
> > > >> (b) how a vertex will update its value using the received messages.
> > > >>
> > > >> This is also described in the Gelly docs [2].
> > > >> Also, take a look at the Gelly library [3]. The library methods are
> > > >> implemented using this model and should give you an idea.
> > > >>
> > > >> In your case, you will probably need to simply propagate one message
> > > from
> > > >> the root node and gather the newly discovered neighbors in each
> > > superstep.
> > > >>
> > > >> I hope this helps! Let us know if you have further questions!
> > > >>
> > > >> -Vasia.
> > > >>
> > > >> [1]:
> > > >>
> > > >>
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
> > > >>
> > > >> [2]:
> > > >>
> > > >>
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
> > > >>
> > > >> [3]:
> > > >>
> > > >>
> > >
> >
> http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
> > > >>
> > > >
> > > >
> > > >
> > >
> >
>