(DEPRECATED) Apache Flink Mailing List archive.

Graph.fromDataSet function calls: flink-gelly

Classic

List

Threaded

3 messages Options

Sachin Goel

Graph.fromDataSet function calls: flink-gelly

Hi
I was going through the two files VertexCentricIteration and
GatherSumApplyIteration, and noticed that when the graph is constructed
from the edge and vertex data set, a new execution environment is passed.
As in,

Graph<K, VV, EV> graph =
Graph.fromDataSet(vertexDataSet, edgeDataSet,
ExecutionEnvironment.getExecutionEnvironment());

Graph<K, VV, EV> graph =
Graph.fromDataSet(initialVertices, edgesWithValue,
ExecutionEnvironment.getExecutionEnvironment());

Why is this necessary? Is there a specific reason we cannot use
vertexDataSet.getExecutionEnvironment?
I went through the code to figure out the reason but couldn't find
any. Changing
it to vertexDataSet.getExecutionEnvironment() and initialVertices
.getExecutionEnvironment(), all the tests still pass. I've never worked
through Gelly, so, I may have missed something.
The reason I ask is, I'm working on something which allows sharing of job
results across Execution Environments, and these two lines are the only
thing which cause trouble with that. :')

Thanks in advance
Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Vasiliki Kalavri

Re: Graph.fromDataSet function calls: flink-gelly

Hi Sachin,

I was actually under the impression that
ExecutionEnvironment.getExecutionEnvironment() returns the current
environment, if one has already been created.
I don't think that creating a second one is intentional there and if that's
the case, we should change it.

Cheers,
Vasia.

On 21 August 2015 at 19:40, Sachin Goel <[hidden email]> wrote:

> Hi
> I was going through the two files VertexCentricIteration and
> GatherSumApplyIteration, and noticed that when the graph is constructed
> from the edge and vertex data set, a new execution environment is passed.
> As in,
>
> Graph<K, VV, EV> graph =
> Graph.fromDataSet(vertexDataSet, edgeDataSet,
> ExecutionEnvironment.getExecutionEnvironment());
>
> Graph<K, VV, EV> graph =
> Graph.fromDataSet(initialVertices, edgesWithValue,
> ExecutionEnvironment.getExecutionEnvironment());
>
> Why is this necessary? Is there a specific reason we cannot use
> vertexDataSet.getExecutionEnvironment?
> I went through the code to figure out the reason but couldn't find
> any. Changing
> it to vertexDataSet.getExecutionEnvironment() and initialVertices
> .getExecutionEnvironment(), all the tests still pass. I've never worked
> through Gelly, so, I may have missed something.
> The reason I ask is, I'm working on something which allows sharing of job
> results across Execution Environments, and these two lines are the only
> thing which cause trouble with that. :')
>
> Thanks in advance
> Cheers!
> Sachin
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>

Sachin Goel

Re: Graph.fromDataSet function calls: flink-gelly

Hi Vasia

In that case, we might as well change it.
getExecutionEnvironment actually goes through the contextFactory, and it
then depends on this contextFactory how it generates new environments. For
example, all the Test Environments currently are shared, i.e., the factory
returns the same object on every call.
However, in general, this wouldn't be the case.

I'll push a patch along with the fix I'm working on.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sat, Aug 22, 2015 at 11:21 PM, Vasiliki Kalavri <
[hidden email]> wrote:

> Hi Sachin,
>
> I was actually under the impression that
> ExecutionEnvironment.getExecutionEnvironment() returns the current
> environment, if one has already been created.
> I don't think that creating a second one is intentional there and if that's
> the case, we should change it.
>
> Cheers,
> Vasia.
>
> On 21 August 2015 at 19:40, Sachin Goel <[hidden email]> wrote:
>
> > Hi
> > I was going through the two files VertexCentricIteration and
> > GatherSumApplyIteration, and noticed that when the graph is constructed
> > from the edge and vertex data set, a new execution environment is passed.
> > As in,
> >
> > Graph<K, VV, EV> graph =
> > Graph.fromDataSet(vertexDataSet, edgeDataSet,
> > ExecutionEnvironment.getExecutionEnvironment());
> >
> > Graph<K, VV, EV> graph =
> > Graph.fromDataSet(initialVertices, edgesWithValue,
> > ExecutionEnvironment.getExecutionEnvironment());
> >
> > Why is this necessary? Is there a specific reason we cannot use
> > vertexDataSet.getExecutionEnvironment?
> > I went through the code to figure out the reason but couldn't find
> > any. Changing
> > it to vertexDataSet.getExecutionEnvironment() and initialVertices
> > .getExecutionEnvironment(), all the tests still pass. I've never worked
> > through Gelly, so, I may have missed something.
> > The reason I ask is, I'm working on something which allows sharing of job
> > results across Execution Environments, and these two lines are the only
> > thing which cause trouble with that. :')
> >
> > Thanks in advance
> > Cheers!
> > Sachin
> >
> > -- Sachin Goel
> > Computer Science, IIT Delhi
> > m. +91-9871457685
> >
>