[Proposal] Gelly Graph Generators

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Proposal] Gelly Graph Generators

Greg Hogan
I would like to propose that Flink include a selection of graph generators
in Gelly. Generated graphs will be useful for performing scalability,
stress, and regression testing as well as benchmarking and comparing
algorithms, both for Flink users and developers. Generated data is
infinitely scalable yet described by a few simple parameters and can often
substitute for user data or sharing large files when reporting issues.

Spark's GraphX includes a modest GraphGenerators class [1].

The initial implementation would focus on Erdos-Renyi, R-Mat [2], and
Kronecker [3] generators.

A key consideration is that the graphs should be seedable and generate the
same Graph regardless of parallelism.

Generated data is a complement to my proposed "Checksum method for DataSet
and Graph" [4].

[1]
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
[2] R-MAT: A Recursive Model for Graph Mining;
http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf
[3] Kronecker graphs: An Approach to Modeling Networks;
http://arxiv.org/pdf/0812.4905v2.pdf
[4] https://issues.apache.org/jira/browse/FLINK-2716

Greg Hogan
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Gelly Graph Generators

Vasiliki Kalavri
Hi Greg,

thank you for this proposal!
I think graph generators will be a very useful addition to Gelly.

I'm not quite familiar with the state-of-the-art algorithms for distributed
graph generation.
I suppose that we could easily provide an efficient random graph generator
and I've also seen some work on parallel/distributed algorithms for R-MAT
[1, 2].
Are you aware of similar work for Erdos-Reniy, Kronecker or other types of
graphs?
Another place we might want to look at is Giraph's Watts-Strogatz generator
[3].

Cheers,
Vasia.

[1]: https://github.com/farkhor/PaRMAT/
[2]: http://arxiv.org/pdf/1210.0187.pdf
[3]:
https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html


On 23 September 2015 at 19:49, Greg Hogan <[hidden email]> wrote:

> I would like to propose that Flink include a selection of graph generators
> in Gelly. Generated graphs will be useful for performing scalability,
> stress, and regression testing as well as benchmarking and comparing
> algorithms, both for Flink users and developers. Generated data is
> infinitely scalable yet described by a few simple parameters and can often
> substitute for user data or sharing large files when reporting issues.
>
> Spark's GraphX includes a modest GraphGenerators class [1].
>
> The initial implementation would focus on Erdos-Renyi, R-Mat [2], and
> Kronecker [3] generators.
>
> A key consideration is that the graphs should be seedable and generate the
> same Graph regardless of parallelism.
>
> Generated data is a complement to my proposed "Checksum method for DataSet
> and Graph" [4].
>
> [1]
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
> [2] R-MAT: A Recursive Model for Graph Mining;
> http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf
> [3] Kronecker graphs: An Approach to Modeling Networks;
> http://arxiv.org/pdf/0812.4905v2.pdf
> [4] https://issues.apache.org/jira/browse/FLINK-2716
>
> Greg Hogan
>
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Gelly Graph Generators

Robert Metzger
I would be happy to see some generators in Gelly for exactly the reasons
you've mentioned. Its always difficult for me to get some testing data when
running Flink on a new cluster ... so this would help me ;)

On Thu, Sep 24, 2015 at 11:03 AM, Vasiliki Kalavri <
[hidden email]> wrote:

> Hi Greg,
>
> thank you for this proposal!
> I think graph generators will be a very useful addition to Gelly.
>
> I'm not quite familiar with the state-of-the-art algorithms for distributed
> graph generation.
> I suppose that we could easily provide an efficient random graph generator
> and I've also seen some work on parallel/distributed algorithms for R-MAT
> [1, 2].
> Are you aware of similar work for Erdos-Reniy, Kronecker or other types of
> graphs?
> Another place we might want to look at is Giraph's Watts-Strogatz generator
> [3].
>
> Cheers,
> Vasia.
>
> [1]: https://github.com/farkhor/PaRMAT/
> [2]: http://arxiv.org/pdf/1210.0187.pdf
> [3]:
>
> https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html
>
>
> On 23 September 2015 at 19:49, Greg Hogan <[hidden email]> wrote:
>
> > I would like to propose that Flink include a selection of graph
> generators
> > in Gelly. Generated graphs will be useful for performing scalability,
> > stress, and regression testing as well as benchmarking and comparing
> > algorithms, both for Flink users and developers. Generated data is
> > infinitely scalable yet described by a few simple parameters and can
> often
> > substitute for user data or sharing large files when reporting issues.
> >
> > Spark's GraphX includes a modest GraphGenerators class [1].
> >
> > The initial implementation would focus on Erdos-Renyi, R-Mat [2], and
> > Kronecker [3] generators.
> >
> > A key consideration is that the graphs should be seedable and generate
> the
> > same Graph regardless of parallelism.
> >
> > Generated data is a complement to my proposed "Checksum method for
> DataSet
> > and Graph" [4].
> >
> > [1]
> >
> >
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
> > [2] R-MAT: A Recursive Model for Graph Mining;
> > http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf
> > [3] Kronecker graphs: An Approach to Modeling Networks;
> > http://arxiv.org/pdf/0812.4905v2.pdf
> > [4] https://issues.apache.org/jira/browse/FLINK-2716
> >
> > Greg Hogan
> >
>