Scala API rewrite almost complete

classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
Thanks, I added it, along with an ITCase.

So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis

These are the examples people called dibs on:
 - TriangleEnumration and PageRank (Fabian)
 - BatchGradientDescent (Márton)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - TransitiveClosure
 - The relational Stuff
 - LinearRegression

Cheers,
Aljoscha

On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]> wrote:

> WebLog here:
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>
> Do you need any more done?
>
> On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> I added the ConnectedComponents Example from Vasia.
>>
>> Keep 'em coming, people. :D
>>
>> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]> wrote:
>> > Alright, will do.
>> > Thanks!
>> >
>> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>> >
>> >> Ok people, executive decision. :D
>> >>
>> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data
>> >> in multi-dimensional object arrays and then converting it to the
>> >> required Java or Scala objects.
>> >>
>> >> Also, I changed isEqualTo to equalTo to make it consistent with the Java
>> >> API.
>> >>
>> >> Regarding Join (and coGroup). There is no need for a keyword, you can
>> >> just write:
>> >>
>> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re)
>> }
>> >>
>> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]>
>> wrote:
>> >> > Aside from the DataSet issue, I also found an inconsistency with the
>> Java
>> >> > API. In Java join is done as:
>> >> >
>> >> > ds1.join(ds2).where(...).equalTo(...)
>> >> >
>> >> > where in the current Scala this is:
>> >> >
>> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >> >
>> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >> > Also, join (+cross and coGroup?) lacks the with() method because
>> "with"
>> >> is
>> >> > a keyword in Scala. Should be offer something similar for Scala or go
>> >> with
>> >> > map() on Tuple2(left, right)?
>> >> >
>> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
>> >> >
>> >> >> Instead of Strings, Object[][] would work as well. That is a generic
>> >> >> representation of a Tuple.
>> >> >>
>> >> >> Alternatively, they could be stored as Java or Scala Tuples, with a
>> >> generic
>> >> >> utility method to convert between the two.
>> >> >>
>> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]>
>> >> wrote:
>> >> >>
>> >> >> > Yeah, I ran into the same problem...
>> >> >> >
>> >> >> > +1 for using Strings and parsing them,  but using the CSVFormat
>> won't
>> >> >> work
>> >> >> > because this is based on a FileInputFormat.
>> >> >> > So we would need to parse the Strings manually...
>> >> >> >
>> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>> >> >> >
>> >> >> > > Hi,
>> >> >> > > on second thought. Maybe we should just change all the example
>> input
>> >> >> > > data to strings and use CSV input formats in all the examples.
>> What
>> >> do
>> >> >> > > you think?
>> >> >> > >
>> >> >> > > Cheers,
>> >> >> > > Aljoscha
>> >> >> > >
>> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>> >> [hidden email]>
>> >> >> > > wrote:
>> >> >> > > > Hi,
>> >> >> > > > yes it's unfortunate that the data types are incompatible. I'm
>> >> afraid
>> >> >> > > > you have to to what you proposed: move the data to a static
>> field
>> >> and
>> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's
>> >> not
>> >> >> > > > nice, but copying would duplicate the data and make it easier
>> for
>> >> it
>> >> >> > > > to go out of sync in the Java and Scala versions.
>> >> >> > > >
>> >> >> > > > What do the others think? This will probably occur in all the
>> >> >> examples.
>> >> >> > > >
>> >> >> > > > Cheers,
>> >> >> > > > Aljoscha
>> >> >> > > >
>> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>> >> >> > > > <[hidden email]> wrote:
>> >> >> > > >> Hey,
>> >> >> > > >>
>> >> >> > > >> I have ported the Connected Components example, but I am not
>> sure
>> >> >> how
>> >> >> > to
>> >> >> > > >> reuse the example input data from java-examples.
>> >> >> > > >> In the ConnectedComponentsData class, the vertices and edges
>> data
>> >> >> are
>> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>> parameter.
>> >> >> > > >>
>> >> >> > > >> One way is to provide public static fields (like in the
>> >> >> WordCountData
>> >> >> > > >> class), but this introduces a conversion
>> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and
>> >> from
>> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>> unnecessary
>> >> >> > > complexity
>> >> >> > > >> for an example (?).
>> >> >> > > >> Another way is, of course, to copy the example data in the
>> Scala
>> >> >> > > example.
>> >> >> > > >>
>> >> >> > > >> Am I missing something here?
>> >> >> > > >>
>> >> >> > > >> Thanks!
>> >> >> > > >>
>> >> >> > > >> Cheers,
>> >> >> > > >> V.
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>> [hidden email]
>> >> >
>> >> >> > > wrote:
>> >> >> > > >>
>> >> >> > > >>> Alright, I updated my repo:
>> >> >> > > >>>
>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >> > > >>>
>> >> >> > > >>> This now has a working WordCount example. It's pretty much a
>> >> copy
>> >> >> of
>> >> >> > > >>> the Java example with some fixups for the syntax and lambda
>> >> >> > functions.
>> >> >> > > >>> You'll also notice that I added the java-examples as a
>> >> dependency
>> >> >> for
>> >> >> > > >>> the scala-examples. I did this to reuse the example input
>> data.
>> >> >> > > >>>
>> >> >> > > >>> When you ported a program you can do a pull request against
>> my
>> >> repo
>> >> >> > > >>> and I will collect the examples.
>> >> >> > > >>>
>> >> >> > > >>> Happy coding. :D
>> >> >> > > >>>
>> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>> >> >> [hidden email]
>> >> >> > >
>> >> >> > > >>> wrote:
>> >> >> > > >>> > +1
>> >> >> > > >>> >
>> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >> >> > > >>> >
>> >> >> > > >>> >
>> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>> >> >> > > >>> [hidden email]>
>> >> >> > > >>> > wrote:
>> >> >> > > >>> >
>> >> >> > > >>> >> +1
>> >> >> > > >>> >>
>> >> >> > > >>> >> BatchGradientDescent for me :)
>> >> >> > > >>> >>
>> >> >> > > >>> >>
>> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>> >> >> > > [hidden email]>
>> >> >> > > >>> >> wrote:
>> >> >> > > >>> >>
>> >> >> > > >>> >> > +1
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > My experience with Scala consists of going through a
>> >> tutorial
>> >> >> so
>> >> >> > > this
>> >> >> > > >>> >> will
>> >> >> > > >>> >> > be a good stress test both for me and the new API :-)
>> >> >> > > >>> >> >
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
>> >> >> > > >>> >> > [hidden email]>
>> >> >> > > >>> >> > wrote:
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > > +1 for having other people implement the examples!
>> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > -V.
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>> >> >> [hidden email]>
>> >> >> > > >>> wrote:
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
>> >> examples:
>> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>> >> >> > > >>> >> > > > - parameters for external data
>> >> >> > > >>> >> > > > - follow a similar code structure
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
>> >> >> > > [hidden email]
>> >> >> > > >>> >:
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > > Will do, then people can reserve their favourite
>> >> >> examples
>> >> >> > > here.
>> >> >> > > >>> >> > > > >
>> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <
>> >> >> > > >>> [hidden email]>
>> >> >> > > >>> >> > > > wrote:
>> >> >> > > >>> >> > > > > > Hi,
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > I think having examples implemented by different
>> >> >> people
>> >> >> > > >>> proved to
>> >> >> > > >>> >> > be
>> >> >> > > >>> >> > > > > > valuable in the past.
>> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple first
>> >> one
>> >> >> > such
>> >> >> > > as
>> >> >> > > >>> >> > > WordCount.
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > Fabian
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <
>> >> >> > > >>> [hidden email]
>> >> >> > > >>> >> >:
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > >> Hi,
>> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here:
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >>
>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the
>> tests
>> >> and
>> >> >> > > port
>> >> >> > > >>> the
>> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let
>> other
>> >> >> > people
>> >> >> > > >>> port
>> >> >> > > >>> >> the
>> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
>> maybe
>> >> >> > notices
>> >> >> > > some
>> >> >> > > >>> >> > quirks
>> >> >> > > >>> >> > > > > >> in the API?
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > > >> Cheers,
>> >> >> > > >>> >> > > > > >> Aljoscha
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> >
>> >> >> > > >>> >>
>> >> >> > > >>>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Kostas Tzoumas-2
I'll take TransitiveClosure and PiEstimation (was not on your list).

If nobody volunteers for the relational stuff I can take those as well.

How about removing the "RelationalQuery" from both Scala and Java? It seems
to be a proper subset of TPC-H Q3. Does it add some teaching value on top
of TPC-H Q3?

Kostas

On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]>
wrote:

> Thanks, I added it, along with an ITCase.
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis
>
> These are the examples people called dibs on:
>  - TriangleEnumration and PageRank (Fabian)
>  - BatchGradientDescent (Márton)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - TransitiveClosure
>  - The relational Stuff
>  - LinearRegression
>
> Cheers,
> Aljoscha
>
> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> > WebLog here:
> >
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >
> > Do you need any more done?
> >
> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> I added the ConnectedComponents Example from Vasia.
> >>
> >> Keep 'em coming, people. :D
> >>
> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]>
> wrote:
> >> > Alright, will do.
> >> > Thanks!
> >> >
> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >> >
> >> >> Ok people, executive decision. :D
> >> >>
> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data
> >> >> in multi-dimensional object arrays and then converting it to the
> >> >> required Java or Scala objects.
> >> >>
> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
> Java
> >> >> API.
> >> >>
> >> >> Regarding Join (and coGroup). There is no need for a keyword, you can
> >> >> just write:
> >> >>
> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
> re)
> >> }
> >> >>
> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]>
> >> wrote:
> >> >> > Aside from the DataSet issue, I also found an inconsistency with
> the
> >> Java
> >> >> > API. In Java join is done as:
> >> >> >
> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >> >> >
> >> >> > where in the current Scala this is:
> >> >> >
> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >> >> >
> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
> >> "with"
> >> >> is
> >> >> > a keyword in Scala. Should be offer something similar for Scala or
> go
> >> >> with
> >> >> > map() on Tuple2(left, right)?
> >> >> >
> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
> >> >> >
> >> >> >> Instead of Strings, Object[][] would work as well. That is a
> generic
> >> >> >> representation of a Tuple.
> >> >> >>
> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, with
> a
> >> >> generic
> >> >> >> utility method to convert between the two.
> >> >> >>
> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <
> [hidden email]>
> >> >> wrote:
> >> >> >>
> >> >> >> > Yeah, I ran into the same problem...
> >> >> >> >
> >> >> >> > +1 for using Strings and parsing them,  but using the CSVFormat
> >> won't
> >> >> >> work
> >> >> >> > because this is based on a FileInputFormat.
> >> >> >> > So we would need to parse the Strings manually...
> >> >> >> >
> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <
> [hidden email]>:
> >> >> >> >
> >> >> >> > > Hi,
> >> >> >> > > on second thought. Maybe we should just change all the example
> >> input
> >> >> >> > > data to strings and use CSV input formats in all the examples.
> >> What
> >> >> do
> >> >> >> > > you think?
> >> >> >> > >
> >> >> >> > > Cheers,
> >> >> >> > > Aljoscha
> >> >> >> > >
> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
> >> >> [hidden email]>
> >> >> >> > > wrote:
> >> >> >> > > > Hi,
> >> >> >> > > > yes it's unfortunate that the data types are incompatible.
> I'm
> >> >> afraid
> >> >> >> > > > you have to to what you proposed: move the data to a static
> >> field
> >> >> and
> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala.
> It's
> >> >> not
> >> >> >> > > > nice, but copying would duplicate the data and make it
> easier
> >> for
> >> >> it
> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >> >> >> > > >
> >> >> >> > > > What do the others think? This will probably occur in all
> the
> >> >> >> examples.
> >> >> >> > > >
> >> >> >> > > > Cheers,
> >> >> >> > > > Aljoscha
> >> >> >> > > >
> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >> >> >> > > > <[hidden email]> wrote:
> >> >> >> > > >> Hey,
> >> >> >> > > >>
> >> >> >> > > >> I have ported the Connected Components example, but I am
> not
> >> sure
> >> >> >> how
> >> >> >> > to
> >> >> >> > > >> reuse the example input data from java-examples.
> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and
> edges
> >> data
> >> >> >> are
> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
> >> parameter.
> >> >> >> > > >>
> >> >> >> > > >> One way is to provide public static fields (like in the
> >> >> >> WordCountData
> >> >> >> > > >> class), but this introduces a conversion
> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple
> and
> >> >> from
> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
> >> unnecessary
> >> >> >> > > complexity
> >> >> >> > > >> for an example (?).
> >> >> >> > > >> Another way is, of course, to copy the example data in the
> >> Scala
> >> >> >> > > example.
> >> >> >> > > >>
> >> >> >> > > >> Am I missing something here?
> >> >> >> > > >>
> >> >> >> > > >> Thanks!
> >> >> >> > > >>
> >> >> >> > > >> Cheers,
> >> >> >> > > >> V.
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >> [hidden email]
> >> >> >
> >> >> >> > > wrote:
> >> >> >> > > >>
> >> >> >> > > >>> Alright, I updated my repo:
> >> >> >> > > >>>
> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >> >> > > >>>
> >> >> >> > > >>> This now has a working WordCount example. It's pretty
> much a
> >> >> copy
> >> >> >> of
> >> >> >> > > >>> the Java example with some fixups for the syntax and
> lambda
> >> >> >> > functions.
> >> >> >> > > >>> You'll also notice that I added the java-examples as a
> >> >> dependency
> >> >> >> for
> >> >> >> > > >>> the scala-examples. I did this to reuse the example input
> >> data.
> >> >> >> > > >>>
> >> >> >> > > >>> When you ported a program you can do a pull request
> against
> >> my
> >> >> repo
> >> >> >> > > >>> and I will collect the examples.
> >> >> >> > > >>>
> >> >> >> > > >>> Happy coding. :D
> >> >> >> > > >>>
> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
> >> >> >> [hidden email]
> >> >> >> > >
> >> >> >> > > >>> wrote:
> >> >> >> > > >>> > +1
> >> >> >> > > >>> >
> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >> >> >> > > >>> >
> >> >> >> > > >>> >
> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
> >> >> >> > > >>> [hidden email]>
> >> >> >> > > >>> > wrote:
> >> >> >> > > >>> >
> >> >> >> > > >>> >> +1
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >> >> >> > > >>> >>
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
> >> >> >> > > [hidden email]>
> >> >> >> > > >>> >> wrote:
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> > +1
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > My experience with Scala consists of going through a
> >> >> tutorial
> >> >> >> so
> >> >> >> > > this
> >> >> >> > > >>> >> will
> >> >> >> > > >>> >> > be a good stress test both for me and the new API :-)
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
> >> >> >> > > >>> >> > [hidden email]>
> >> >> >> > > >>> >> > wrote:
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > > +1 for having other people implement the examples!
> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > -V.
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
> >> >> >> [hidden email]>
> >> >> >> > > >>> wrote:
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
> >> >> examples:
> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
> >> >> >> > > >>> >> > > > - parameters for external data
> >> >> >> > > >>> >> > > > - follow a similar code structure
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
> >> >> >> > > [hidden email]
> >> >> >> > > >>> >:
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
> favourite
> >> >> >> examples
> >> >> >> > > here.
> >> >> >> > > >>> >> > > > >
> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <
> >> >> >> > > >>> [hidden email]>
> >> >> >> > > >>> >> > > > wrote:
> >> >> >> > > >>> >> > > > > > Hi,
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > I think having examples implemented by
> different
> >> >> >> people
> >> >> >> > > >>> proved to
> >> >> >> > > >>> >> > be
> >> >> >> > > >>> >> > > > > > valuable in the past.
> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple
> first
> >> >> one
> >> >> >> > such
> >> >> >> > > as
> >> >> >> > > >>> >> > > WordCount.
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > Fabian
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <
> >> >> >> > > >>> [hidden email]
> >> >> >> > > >>> >> >:
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > >> Hi,
> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API
> here:
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >>
> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the
> >> tests
> >> >> and
> >> >> >> > > port
> >> >> >> > > >>> the
> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let
> >> other
> >> >> >> > people
> >> >> >> > > >>> port
> >> >> >> > > >>> >> the
> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
> >> maybe
> >> >> >> > notices
> >> >> >> > > some
> >> >> >> > > >>> >> > quirks
> >> >> >> > > >>> >> > > > > >> in the API?
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > > >> Cheers,
> >> >> >> > > >>> >> > > > > >> Aljoscha
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >>
> >> >> >> > > >>>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
Thanks, I added it. I'll keep a running list of ported/unported
examples in my mails. I'll rename the java example package to examples
once the Scala API merge is done.

I think the termination criterion is fine as it is. Just because Scala
enables functional programming doesn't mean it's always the best
choice. :D

So far we have ported: WordCount, KMeans, ConnectedComponents,
WebLogAnalysis, TransitiveClosureNaive

These are the examples people called dibs on:
 - TriangleEnumration and PageRank (Fabian)
 - BatchGradientDescent (Márton)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - The relational Stuff
 - LinearRegression

Cheers,
Aljoscha

On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]> wrote:

> Transitive closure here, I also added a termination criterion in the Java
> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>
> Perhaps you can make the termination criterion in Scala more functional?
>
> I noticed that the examples package name is example.java but examples.scala
>
> Kostas
>
> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]> wrote:
>>
>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>>
>> If nobody volunteers for the relational stuff I can take those as well.
>>
>> How about removing the "RelationalQuery" from both Scala and Java? It
>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on
>> top of TPC-H Q3?
>>
>> Kostas
>>
>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]>
>> wrote:
>>>
>>> Thanks, I added it, along with an ITCase.
>>>
>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> WebLogAnalysis
>>>
>>> These are the examples people called dibs on:
>>>  - TriangleEnumration and PageRank (Fabian)
>>>  - BatchGradientDescent (Márton)
>>>  - ComputeEdgeDegrees (Hermann)
>>>
>>> Those are unclaimed (if I'm not mistaken):
>>>  - TransitiveClosure
>>>  - The relational Stuff
>>>  - LinearRegression
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
>>> wrote:
>>> > WebLog here:
>>> >
>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>> >
>>> > Do you need any more done?
>>> >
>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]>
>>> > wrote:
>>> >
>>> >> I added the ConnectedComponents Example from Vasia.
>>> >>
>>> >> Keep 'em coming, people. :D
>>> >>
>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]>
>>> >> wrote:
>>> >> > Alright, will do.
>>> >> > Thanks!
>>> >> >
>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>> >> >
>>> >> >> Ok people, executive decision. :D
>>> >> >>
>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the
>>> >> >> data
>>> >> >> in multi-dimensional object arrays and then converting it to the
>>> >> >> required Java or Scala objects.
>>> >> >>
>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
>>> >> >> Java
>>> >> >> API.
>>> >> >>
>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you
>>> >> >> can
>>> >> >> just write:
>>> >> >>
>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
>>> >> >> re)
>>> >> }
>>> >> >>
>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]>
>>> >> wrote:
>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with
>>> >> >> > the
>>> >> Java
>>> >> >> > API. In Java join is done as:
>>> >> >> >
>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>> >> >> >
>>> >> >> > where in the current Scala this is:
>>> >> >> >
>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>> >> >> >
>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
>>> >> "with"
>>> >> >> is
>>> >> >> > a keyword in Scala. Should be offer something similar for Scala
>>> >> >> > or go
>>> >> >> with
>>> >> >> > map() on Tuple2(left, right)?
>>> >> >> >
>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
>>> >> >> >
>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
>>> >> >> >> generic
>>> >> >> >> representation of a Tuple.
>>> >> >> >>
>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
>>> >> >> >> with a
>>> >> >> generic
>>> >> >> >> utility method to convert between the two.
>>> >> >> >>
>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>> >> >> >> <[hidden email]>
>>> >> >> wrote:
>>> >> >> >>
>>> >> >> >> > Yeah, I ran into the same problem...
>>> >> >> >> >
>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>>> >> >> >> > CSVFormat
>>> >> won't
>>> >> >> >> work
>>> >> >> >> > because this is based on a FileInputFormat.
>>> >> >> >> > So we would need to parse the Strings manually...
>>> >> >> >> >
>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>> >> >> >> > <[hidden email]>:
>>> >> >> >> >
>>> >> >> >> > > Hi,
>>> >> >> >> > > on second thought. Maybe we should just change all the
>>> >> >> >> > > example
>>> >> input
>>> >> >> >> > > data to strings and use CSV input formats in all the
>>> >> >> >> > > examples.
>>> >> What
>>> >> >> do
>>> >> >> >> > > you think?
>>> >> >> >> > >
>>> >> >> >> > > Cheers,
>>> >> >> >> > > Aljoscha
>>> >> >> >> > >
>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>>> >> >> [hidden email]>
>>> >> >> >> > > wrote:
>>> >> >> >> > > > Hi,
>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible.
>>> >> >> >> > > > I'm
>>> >> >> afraid
>>> >> >> >> > > > you have to to what you proposed: move the data to a
>>> >> >> >> > > > static
>>> >> field
>>> >> >> and
>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala.
>>> >> >> >> > > > It's
>>> >> >> not
>>> >> >> >> > > > nice, but copying would duplicate the data and make it
>>> >> >> >> > > > easier
>>> >> for
>>> >> >> it
>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>>> >> >> >> > > >
>>> >> >> >> > > > What do the others think? This will probably occur in all
>>> >> >> >> > > > the
>>> >> >> >> examples.
>>> >> >> >> > > >
>>> >> >> >> > > > Cheers,
>>> >> >> >> > > > Aljoscha
>>> >> >> >> > > >
>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>>> >> >> >> > > > <[hidden email]> wrote:
>>> >> >> >> > > >> Hey,
>>> >> >> >> > > >>
>>> >> >> >> > > >> I have ported the Connected Components example, but I am
>>> >> >> >> > > >> not
>>> >> sure
>>> >> >> >> how
>>> >> >> >> > to
>>> >> >> >> > > >> reuse the example input data from java-examples.
>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and
>>> >> >> >> > > >> edges
>>> >> data
>>> >> >> >> are
>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>>> >> parameter.
>>> >> >> >> > > >>
>>> >> >> >> > > >> One way is to provide public static fields (like in the
>>> >> >> >> WordCountData
>>> >> >> >> > > >> class), but this introduces a conversion
>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
>>> >> >> >> > > >> tuple and
>>> >> >> from
>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>>> >> unnecessary
>>> >> >> >> > > complexity
>>> >> >> >> > > >> for an example (?).
>>> >> >> >> > > >> Another way is, of course, to copy the example data in
>>> >> >> >> > > >> the
>>> >> Scala
>>> >> >> >> > > example.
>>> >> >> >> > > >>
>>> >> >> >> > > >> Am I missing something here?
>>> >> >> >> > > >>
>>> >> >> >> > > >> Thanks!
>>> >> >> >> > > >>
>>> >> >> >> > > >> Cheers,
>>> >> >> >> > > >> V.
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>>> >> [hidden email]
>>> >> >> >
>>> >> >> >> > > wrote:
>>> >> >> >> > > >>
>>> >> >> >> > > >>> Alright, I updated my repo:
>>> >> >> >> > > >>>
>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty
>>> >> >> >> > > >>> much a
>>> >> >> copy
>>> >> >> >> of
>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
>>> >> >> >> > > >>> lambda
>>> >> >> >> > functions.
>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a
>>> >> >> dependency
>>> >> >> >> for
>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
>>> >> >> >> > > >>> input
>>> >> data.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> When you ported a program you can do a pull request
>>> >> >> >> > > >>> against
>>> >> my
>>> >> >> repo
>>> >> >> >> > > >>> and I will collect the examples.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> Happy coding. :D
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>>> >> >> >> [hidden email]
>>> >> >> >> > >
>>> >> >> >> > > >>> wrote:
>>> >> >> >> > > >>> > +1
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>>> >> >> >> > > >>> [hidden email]>
>>> >> >> >> > > >>> > wrote:
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> >> +1
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>>> >> >> >> > > [hidden email]>
>>> >> >> >> > > >>> >> wrote:
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> > +1
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > My experience with Scala consists of going through
>>> >> >> >> > > >>> >> > a
>>> >> >> tutorial
>>> >> >> >> so
>>> >> >> >> > > this
>>> >> >> >> > > >>> >> will
>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API
>>> >> >> >> > > >>> >> > :-)
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
>>> >> >> >> > > >>> >> > [hidden email]>
>>> >> >> >> > > >>> >> > wrote:
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>>> >> >> >> > > >>> >> > > examples!
>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > -V.
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>>> >> >> >> [hidden email]>
>>> >> >> >> > > >>> wrote:
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
>>> >> >> examples:
>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>>> >> >> >> > > >>> >> > > > - parameters for external data
>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
>>> >> >> >> > > [hidden email]
>>> >> >> >> > > >>> >:
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>>> >> >> >> > > >>> >> > > > > favourite
>>> >> >> >> examples
>>> >> >> >> > > here.
>>> >> >> >> > > >>> >> > > > >
>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske
>>> >> >> >> > > >>> >> > > > > <
>>> >> >> >> > > >>> [hidden email]>
>>> >> >> >> > > >>> >> > > > wrote:
>>> >> >> >> > > >>> >> > > > > > Hi,
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
>>> >> >> >> > > >>> >> > > > > > different
>>> >> >> >> people
>>> >> >> >> > > >>> proved to
>>> >> >> >> > > >>> >> > be
>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple
>>> >> >> >> > > >>> >> > > > > > first
>>> >> >> one
>>> >> >> >> > such
>>> >> >> >> > > as
>>> >> >> >> > > >>> >> > > WordCount.
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > > Fabian
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek
>>> >> >> >> > > >>> >> > > > > > <
>>> >> >> >> > > >>> [hidden email]
>>> >> >> >> > > >>> >> >:
>>> >> >> >> > > >>> >> > > > > >
>>> >> >> >> > > >>> >> > > > > >> Hi,
>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API
>>> >> >> >> > > >>> >> > > > > >> here:
>>> >> >> >> > > >>> >> > > > > >>
>>> >> >> >> > > >>> >>
>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >> >> > > >>> >> > > > > >>
>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write
>>> >> >> >> > > >>> >> > > > > >> the
>>> >> tests
>>> >> >> and
>>> >> >> >> > > port
>>> >> >> >> > > >>> the
>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to
>>> >> >> >> > > >>> >> > > > > >> let
>>> >> other
>>> >> >> >> > people
>>> >> >> >> > > >>> port
>>> >> >> >> > > >>> >> the
>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
>>> >> maybe
>>> >> >> >> > notices
>>> >> >> >> > > some
>>> >> >> >> > > >>> >> > quirks
>>> >> >> >> > > >>> >> > > > > >> in the API?
>>> >> >> >> > > >>> >> > > > > >>
>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>> >> >> >> > > >>> >> > > > > >>
>>> >> >> >> > > >>> >> > > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>>
>>> >> >> >> > >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
I added the Triangle Enumeration Examples, thanks Fabian.

So far we have ported: WordCount, KMeans, ConnectedComponents,
WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt

These are the examples people called dibs on:
 - PageRank (Fabian)
 - BatchGradientDescent (Márton)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - The relational Stuff
 - LinearRegression

On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[hidden email]> wrote:

> Thanks, I added it. I'll keep a running list of ported/unported
> examples in my mails. I'll rename the java example package to examples
> once the Scala API merge is done.
>
> I think the termination criterion is fine as it is. Just because Scala
> enables functional programming doesn't mean it's always the best
> choice. :D
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis, TransitiveClosureNaive
>
> These are the examples people called dibs on:
>  - TriangleEnumration and PageRank (Fabian)
>  - BatchGradientDescent (Márton)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - The relational Stuff
>  - LinearRegression
>
> Cheers,
> Aljoscha
>
> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]> wrote:
>> Transitive closure here, I also added a termination criterion in the Java
>> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>
>> Perhaps you can make the termination criterion in Scala more functional?
>>
>> I noticed that the examples package name is example.java but examples.scala
>>
>> Kostas
>>
>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]> wrote:
>>>
>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>>>
>>> If nobody volunteers for the relational stuff I can take those as well.
>>>
>>> How about removing the "RelationalQuery" from both Scala and Java? It
>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on
>>> top of TPC-H Q3?
>>>
>>> Kostas
>>>
>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]>
>>> wrote:
>>>>
>>>> Thanks, I added it, along with an ITCase.
>>>>
>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> WebLogAnalysis
>>>>
>>>> These are the examples people called dibs on:
>>>>  - TriangleEnumration and PageRank (Fabian)
>>>>  - BatchGradientDescent (Márton)
>>>>  - ComputeEdgeDegrees (Hermann)
>>>>
>>>> Those are unclaimed (if I'm not mistaken):
>>>>  - TransitiveClosure
>>>>  - The relational Stuff
>>>>  - LinearRegression
>>>>
>>>> Cheers,
>>>> Aljoscha
>>>>
>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
>>>> wrote:
>>>> > WebLog here:
>>>> >
>>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>>> >
>>>> > Do you need any more done?
>>>> >
>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]>
>>>> > wrote:
>>>> >
>>>> >> I added the ConnectedComponents Example from Vasia.
>>>> >>
>>>> >> Keep 'em coming, people. :D
>>>> >>
>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]>
>>>> >> wrote:
>>>> >> > Alright, will do.
>>>> >> > Thanks!
>>>> >> >
>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>>> >> >
>>>> >> >> Ok people, executive decision. :D
>>>> >> >>
>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the
>>>> >> >> data
>>>> >> >> in multi-dimensional object arrays and then converting it to the
>>>> >> >> required Java or Scala objects.
>>>> >> >>
>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
>>>> >> >> Java
>>>> >> >> API.
>>>> >> >>
>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you
>>>> >> >> can
>>>> >> >> just write:
>>>> >> >>
>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
>>>> >> >> re)
>>>> >> }
>>>> >> >>
>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]>
>>>> >> wrote:
>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with
>>>> >> >> > the
>>>> >> Java
>>>> >> >> > API. In Java join is done as:
>>>> >> >> >
>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>>> >> >> >
>>>> >> >> > where in the current Scala this is:
>>>> >> >> >
>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>>> >> >> >
>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
>>>> >> "with"
>>>> >> >> is
>>>> >> >> > a keyword in Scala. Should be offer something similar for Scala
>>>> >> >> > or go
>>>> >> >> with
>>>> >> >> > map() on Tuple2(left, right)?
>>>> >> >> >
>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
>>>> >> >> >
>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
>>>> >> >> >> generic
>>>> >> >> >> representation of a Tuple.
>>>> >> >> >>
>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
>>>> >> >> >> with a
>>>> >> >> generic
>>>> >> >> >> utility method to convert between the two.
>>>> >> >> >>
>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>>> >> >> >> <[hidden email]>
>>>> >> >> wrote:
>>>> >> >> >>
>>>> >> >> >> > Yeah, I ran into the same problem...
>>>> >> >> >> >
>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>>>> >> >> >> > CSVFormat
>>>> >> won't
>>>> >> >> >> work
>>>> >> >> >> > because this is based on a FileInputFormat.
>>>> >> >> >> > So we would need to parse the Strings manually...
>>>> >> >> >> >
>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>>> >> >> >> > <[hidden email]>:
>>>> >> >> >> >
>>>> >> >> >> > > Hi,
>>>> >> >> >> > > on second thought. Maybe we should just change all the
>>>> >> >> >> > > example
>>>> >> input
>>>> >> >> >> > > data to strings and use CSV input formats in all the
>>>> >> >> >> > > examples.
>>>> >> What
>>>> >> >> do
>>>> >> >> >> > > you think?
>>>> >> >> >> > >
>>>> >> >> >> > > Cheers,
>>>> >> >> >> > > Aljoscha
>>>> >> >> >> > >
>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>>>> >> >> [hidden email]>
>>>> >> >> >> > > wrote:
>>>> >> >> >> > > > Hi,
>>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible.
>>>> >> >> >> > > > I'm
>>>> >> >> afraid
>>>> >> >> >> > > > you have to to what you proposed: move the data to a
>>>> >> >> >> > > > static
>>>> >> field
>>>> >> >> and
>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala.
>>>> >> >> >> > > > It's
>>>> >> >> not
>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
>>>> >> >> >> > > > easier
>>>> >> for
>>>> >> >> it
>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>>>> >> >> >> > > >
>>>> >> >> >> > > > What do the others think? This will probably occur in all
>>>> >> >> >> > > > the
>>>> >> >> >> examples.
>>>> >> >> >> > > >
>>>> >> >> >> > > > Cheers,
>>>> >> >> >> > > > Aljoscha
>>>> >> >> >> > > >
>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>>>> >> >> >> > > > <[hidden email]> wrote:
>>>> >> >> >> > > >> Hey,
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> I have ported the Connected Components example, but I am
>>>> >> >> >> > > >> not
>>>> >> sure
>>>> >> >> >> how
>>>> >> >> >> > to
>>>> >> >> >> > > >> reuse the example input data from java-examples.
>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and
>>>> >> >> >> > > >> edges
>>>> >> data
>>>> >> >> >> are
>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>>>> >> parameter.
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> One way is to provide public static fields (like in the
>>>> >> >> >> WordCountData
>>>> >> >> >> > > >> class), but this introduces a conversion
>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
>>>> >> >> >> > > >> tuple and
>>>> >> >> from
>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>>>> >> unnecessary
>>>> >> >> >> > > complexity
>>>> >> >> >> > > >> for an example (?).
>>>> >> >> >> > > >> Another way is, of course, to copy the example data in
>>>> >> >> >> > > >> the
>>>> >> Scala
>>>> >> >> >> > > example.
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> Am I missing something here?
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> Thanks!
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> Cheers,
>>>> >> >> >> > > >> V.
>>>> >> >> >> > > >>
>>>> >> >> >> > > >>
>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>>>> >> [hidden email]
>>>> >> >> >
>>>> >> >> >> > > wrote:
>>>> >> >> >> > > >>
>>>> >> >> >> > > >>> Alright, I updated my repo:
>>>> >> >> >> > > >>>
>>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >> >> > > >>>
>>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty
>>>> >> >> >> > > >>> much a
>>>> >> >> copy
>>>> >> >> >> of
>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
>>>> >> >> >> > > >>> lambda
>>>> >> >> >> > functions.
>>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a
>>>> >> >> dependency
>>>> >> >> >> for
>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
>>>> >> >> >> > > >>> input
>>>> >> data.
>>>> >> >> >> > > >>>
>>>> >> >> >> > > >>> When you ported a program you can do a pull request
>>>> >> >> >> > > >>> against
>>>> >> my
>>>> >> >> repo
>>>> >> >> >> > > >>> and I will collect the examples.
>>>> >> >> >> > > >>>
>>>> >> >> >> > > >>> Happy coding. :D
>>>> >> >> >> > > >>>
>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>>>> >> >> >> [hidden email]
>>>> >> >> >> > >
>>>> >> >> >> > > >>> wrote:
>>>> >> >> >> > > >>> > +1
>>>> >> >> >> > > >>> >
>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>>> >> >> >> > > >>> >
>>>> >> >> >> > > >>> >
>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>>>> >> >> >> > > >>> [hidden email]>
>>>> >> >> >> > > >>> > wrote:
>>>> >> >> >> > > >>> >
>>>> >> >> >> > > >>> >> +1
>>>> >> >> >> > > >>> >>
>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>>> >> >> >> > > >>> >>
>>>> >> >> >> > > >>> >>
>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>>>> >> >> >> > > [hidden email]>
>>>> >> >> >> > > >>> >> wrote:
>>>> >> >> >> > > >>> >>
>>>> >> >> >> > > >>> >> > +1
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >> > My experience with Scala consists of going through
>>>> >> >> >> > > >>> >> > a
>>>> >> >> tutorial
>>>> >> >> >> so
>>>> >> >> >> > > this
>>>> >> >> >> > > >>> >> will
>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API
>>>> >> >> >> > > >>> >> > :-)
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
>>>> >> >> >> > > >>> >> > [hidden email]>
>>>> >> >> >> > > >>> >> > wrote:
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>>>> >> >> >> > > >>> >> > > examples!
>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>>>> >> >> >> > > >>> >> > >
>>>> >> >> >> > > >>> >> > > -V.
>>>> >> >> >> > > >>> >> > >
>>>> >> >> >> > > >>> >> > >
>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>>>> >> >> >> [hidden email]>
>>>> >> >> >> > > >>> wrote:
>>>> >> >> >> > > >>> >> > >
>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
>>>> >> >> examples:
>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
>>>> >> >> >> > > [hidden email]
>>>> >> >> >> > > >>> >:
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>>>> >> >> >> > > >>> >> > > > > favourite
>>>> >> >> >> examples
>>>> >> >> >> > > here.
>>>> >> >> >> > > >>> >> > > > >
>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske
>>>> >> >> >> > > >>> >> > > > > <
>>>> >> >> >> > > >>> [hidden email]>
>>>> >> >> >> > > >>> >> > > > wrote:
>>>> >> >> >> > > >>> >> > > > > > Hi,
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
>>>> >> >> >> > > >>> >> > > > > > different
>>>> >> >> >> people
>>>> >> >> >> > > >>> proved to
>>>> >> >> >> > > >>> >> > be
>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple
>>>> >> >> >> > > >>> >> > > > > > first
>>>> >> >> one
>>>> >> >> >> > such
>>>> >> >> >> > > as
>>>> >> >> >> > > >>> >> > > WordCount.
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > > Fabian
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek
>>>> >> >> >> > > >>> >> > > > > > <
>>>> >> >> >> > > >>> [hidden email]
>>>> >> >> >> > > >>> >> >:
>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API
>>>> >> >> >> > > >>> >> > > > > >> here:
>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >> >> > > >>> >>
>>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write
>>>> >> >> >> > > >>> >> > > > > >> the
>>>> >> tests
>>>> >> >> and
>>>> >> >> >> > > port
>>>> >> >> >> > > >>> the
>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to
>>>> >> >> >> > > >>> >> > > > > >> let
>>>> >> other
>>>> >> >> >> > people
>>>> >> >> >> > > >>> port
>>>> >> >> >> > > >>> >> the
>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
>>>> >> maybe
>>>> >> >> >> > notices
>>>> >> >> >> > > some
>>>> >> >> >> > > >>> >> > quirks
>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >> >> > > >>> >> > > > >
>>>> >> >> >> > > >>> >> > > >
>>>> >> >> >> > > >>> >> > >
>>>> >> >> >> > > >>> >> >
>>>> >> >> >> > > >>> >>
>>>> >> >> >> > > >>>
>>>> >> >> >> > >
>>>> >> >> >> >
>>>> >> >> >>
>>>> >> >>
>>>> >>
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
By the way, what was called BatchGradientDescent in the Scala examples
should be replaced by a port of the LinearRegression Example from
Java. I had them as two separate examples earlier.

What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
RelationalQuery?

On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]> wrote:

> I added the Triangle Enumeration Examples, thanks Fabian.
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
>
> These are the examples people called dibs on:
>  - PageRank (Fabian)
>  - BatchGradientDescent (Márton)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - The relational Stuff
>  - LinearRegression
>
> On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[hidden email]> wrote:
>> Thanks, I added it. I'll keep a running list of ported/unported
>> examples in my mails. I'll rename the java example package to examples
>> once the Scala API merge is done.
>>
>> I think the termination criterion is fine as it is. Just because Scala
>> enables functional programming doesn't mean it's always the best
>> choice. :D
>>
>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> WebLogAnalysis, TransitiveClosureNaive
>>
>> These are the examples people called dibs on:
>>  - TriangleEnumration and PageRank (Fabian)
>>  - BatchGradientDescent (Márton)
>>  - ComputeEdgeDegrees (Hermann)
>>
>> Those are unclaimed (if I'm not mistaken):
>>  - The relational Stuff
>>  - LinearRegression
>>
>> Cheers,
>> Aljoscha
>>
>> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]> wrote:
>>> Transitive closure here, I also added a termination criterion in the Java
>>> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>>
>>> Perhaps you can make the termination criterion in Scala more functional?
>>>
>>> I noticed that the examples package name is example.java but examples.scala
>>>
>>> Kostas
>>>
>>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]> wrote:
>>>>
>>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>>>>
>>>> If nobody volunteers for the relational stuff I can take those as well.
>>>>
>>>> How about removing the "RelationalQuery" from both Scala and Java? It
>>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on
>>>> top of TPC-H Q3?
>>>>
>>>> Kostas
>>>>
>>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Thanks, I added it, along with an ITCase.
>>>>>
>>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>>> WebLogAnalysis
>>>>>
>>>>> These are the examples people called dibs on:
>>>>>  - TriangleEnumration and PageRank (Fabian)
>>>>>  - BatchGradientDescent (Márton)
>>>>>  - ComputeEdgeDegrees (Hermann)
>>>>>
>>>>> Those are unclaimed (if I'm not mistaken):
>>>>>  - TransitiveClosure
>>>>>  - The relational Stuff
>>>>>  - LinearRegression
>>>>>
>>>>> Cheers,
>>>>> Aljoscha
>>>>>
>>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
>>>>> wrote:
>>>>> > WebLog here:
>>>>> >
>>>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>>>> >
>>>>> > Do you need any more done?
>>>>> >
>>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]>
>>>>> > wrote:
>>>>> >
>>>>> >> I added the ConnectedComponents Example from Vasia.
>>>>> >>
>>>>> >> Keep 'em coming, people. :D
>>>>> >>
>>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]>
>>>>> >> wrote:
>>>>> >> > Alright, will do.
>>>>> >> > Thanks!
>>>>> >> >
>>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>>>> >> >
>>>>> >> >> Ok people, executive decision. :D
>>>>> >> >>
>>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the
>>>>> >> >> data
>>>>> >> >> in multi-dimensional object arrays and then converting it to the
>>>>> >> >> required Java or Scala objects.
>>>>> >> >>
>>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
>>>>> >> >> Java
>>>>> >> >> API.
>>>>> >> >>
>>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you
>>>>> >> >> can
>>>>> >> >> just write:
>>>>> >> >>
>>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
>>>>> >> >> re)
>>>>> >> }
>>>>> >> >>
>>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]>
>>>>> >> wrote:
>>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with
>>>>> >> >> > the
>>>>> >> Java
>>>>> >> >> > API. In Java join is done as:
>>>>> >> >> >
>>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>>>> >> >> >
>>>>> >> >> > where in the current Scala this is:
>>>>> >> >> >
>>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>>>> >> >> >
>>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
>>>>> >> "with"
>>>>> >> >> is
>>>>> >> >> > a keyword in Scala. Should be offer something similar for Scala
>>>>> >> >> > or go
>>>>> >> >> with
>>>>> >> >> > map() on Tuple2(left, right)?
>>>>> >> >> >
>>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
>>>>> >> >> >
>>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
>>>>> >> >> >> generic
>>>>> >> >> >> representation of a Tuple.
>>>>> >> >> >>
>>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
>>>>> >> >> >> with a
>>>>> >> >> generic
>>>>> >> >> >> utility method to convert between the two.
>>>>> >> >> >>
>>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>>>> >> >> >> <[hidden email]>
>>>>> >> >> wrote:
>>>>> >> >> >>
>>>>> >> >> >> > Yeah, I ran into the same problem...
>>>>> >> >> >> >
>>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>>>>> >> >> >> > CSVFormat
>>>>> >> won't
>>>>> >> >> >> work
>>>>> >> >> >> > because this is based on a FileInputFormat.
>>>>> >> >> >> > So we would need to parse the Strings manually...
>>>>> >> >> >> >
>>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>>>> >> >> >> > <[hidden email]>:
>>>>> >> >> >> >
>>>>> >> >> >> > > Hi,
>>>>> >> >> >> > > on second thought. Maybe we should just change all the
>>>>> >> >> >> > > example
>>>>> >> input
>>>>> >> >> >> > > data to strings and use CSV input formats in all the
>>>>> >> >> >> > > examples.
>>>>> >> What
>>>>> >> >> do
>>>>> >> >> >> > > you think?
>>>>> >> >> >> > >
>>>>> >> >> >> > > Cheers,
>>>>> >> >> >> > > Aljoscha
>>>>> >> >> >> > >
>>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>>>>> >> >> [hidden email]>
>>>>> >> >> >> > > wrote:
>>>>> >> >> >> > > > Hi,
>>>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible.
>>>>> >> >> >> > > > I'm
>>>>> >> >> afraid
>>>>> >> >> >> > > > you have to to what you proposed: move the data to a
>>>>> >> >> >> > > > static
>>>>> >> field
>>>>> >> >> and
>>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala.
>>>>> >> >> >> > > > It's
>>>>> >> >> not
>>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
>>>>> >> >> >> > > > easier
>>>>> >> for
>>>>> >> >> it
>>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > What do the others think? This will probably occur in all
>>>>> >> >> >> > > > the
>>>>> >> >> >> examples.
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > Cheers,
>>>>> >> >> >> > > > Aljoscha
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>>>>> >> >> >> > > > <[hidden email]> wrote:
>>>>> >> >> >> > > >> Hey,
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> I have ported the Connected Components example, but I am
>>>>> >> >> >> > > >> not
>>>>> >> sure
>>>>> >> >> >> how
>>>>> >> >> >> > to
>>>>> >> >> >> > > >> reuse the example input data from java-examples.
>>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and
>>>>> >> >> >> > > >> edges
>>>>> >> data
>>>>> >> >> >> are
>>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>>>>> >> parameter.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> One way is to provide public static fields (like in the
>>>>> >> >> >> WordCountData
>>>>> >> >> >> > > >> class), but this introduces a conversion
>>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
>>>>> >> >> >> > > >> tuple and
>>>>> >> >> from
>>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>>>>> >> unnecessary
>>>>> >> >> >> > > complexity
>>>>> >> >> >> > > >> for an example (?).
>>>>> >> >> >> > > >> Another way is, of course, to copy the example data in
>>>>> >> >> >> > > >> the
>>>>> >> Scala
>>>>> >> >> >> > > example.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Am I missing something here?
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Thanks!
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Cheers,
>>>>> >> >> >> > > >> V.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>>>>> >> [hidden email]
>>>>> >> >> >
>>>>> >> >> >> > > wrote:
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >>> Alright, I updated my repo:
>>>>> >> >> >> > > >>>
>>>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty
>>>>> >> >> >> > > >>> much a
>>>>> >> >> copy
>>>>> >> >> >> of
>>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
>>>>> >> >> >> > > >>> lambda
>>>>> >> >> >> > functions.
>>>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a
>>>>> >> >> dependency
>>>>> >> >> >> for
>>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
>>>>> >> >> >> > > >>> input
>>>>> >> data.
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> When you ported a program you can do a pull request
>>>>> >> >> >> > > >>> against
>>>>> >> my
>>>>> >> >> repo
>>>>> >> >> >> > > >>> and I will collect the examples.
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> Happy coding. :D
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>>>>> >> >> >> [hidden email]
>>>>> >> >> >> > >
>>>>> >> >> >> > > >>> wrote:
>>>>> >> >> >> > > >>> > +1
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>>>>> >> >> >> > > >>> [hidden email]>
>>>>> >> >> >> > > >>> > wrote:
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> >> +1
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>>>>> >> >> >> > > [hidden email]>
>>>>> >> >> >> > > >>> >> wrote:
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> > +1
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > My experience with Scala consists of going through
>>>>> >> >> >> > > >>> >> > a
>>>>> >> >> tutorial
>>>>> >> >> >> so
>>>>> >> >> >> > > this
>>>>> >> >> >> > > >>> >> will
>>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API
>>>>> >> >> >> > > >>> >> > :-)
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
>>>>> >> >> >> > > >>> >> > [hidden email]>
>>>>> >> >> >> > > >>> >> > wrote:
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>>>>> >> >> >> > > >>> >> > > examples!
>>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > -V.
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>>>>> >> >> >> [hidden email]>
>>>>> >> >> >> > > >>> wrote:
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
>>>>> >> >> examples:
>>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
>>>>> >> >> >> > > [hidden email]
>>>>> >> >> >> > > >>> >:
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>>>>> >> >> >> > > >>> >> > > > > favourite
>>>>> >> >> >> examples
>>>>> >> >> >> > > here.
>>>>> >> >> >> > > >>> >> > > > >
>>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske
>>>>> >> >> >> > > >>> >> > > > > <
>>>>> >> >> >> > > >>> [hidden email]>
>>>>> >> >> >> > > >>> >> > > > wrote:
>>>>> >> >> >> > > >>> >> > > > > > Hi,
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
>>>>> >> >> >> > > >>> >> > > > > > different
>>>>> >> >> >> people
>>>>> >> >> >> > > >>> proved to
>>>>> >> >> >> > > >>> >> > be
>>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple
>>>>> >> >> >> > > >>> >> > > > > > first
>>>>> >> >> one
>>>>> >> >> >> > such
>>>>> >> >> >> > > as
>>>>> >> >> >> > > >>> >> > > WordCount.
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > Fabian
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek
>>>>> >> >> >> > > >>> >> > > > > > <
>>>>> >> >> >> > > >>> [hidden email]
>>>>> >> >> >> > > >>> >> >:
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API
>>>>> >> >> >> > > >>> >> > > > > >> here:
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write
>>>>> >> >> >> > > >>> >> > > > > >> the
>>>>> >> tests
>>>>> >> >> and
>>>>> >> >> >> > > port
>>>>> >> >> >> > > >>> the
>>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to
>>>>> >> >> >> > > >>> >> > > > > >> let
>>>>> >> other
>>>>> >> >> >> > people
>>>>> >> >> >> > > >>> port
>>>>> >> >> >> > > >>> >> the
>>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
>>>>> >> maybe
>>>>> >> >> >> > notices
>>>>> >> >> >> > > some
>>>>> >> >> >> > > >>> >> > quirks
>>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > >
>>>>> >> >> >> >
>>>>> >> >> >>
>>>>> >> >>
>>>>> >>
>>>>
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Stephan Ewen
+1 for removing RelationQuery

On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
wrote:

> By the way, what was called BatchGradientDescent in the Scala examples
> should be replaced by a port of the LinearRegression Example from
> Java. I had them as two separate examples earlier.
>
> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
> RelationalQuery?
>
> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]>
> wrote:
> > I added the Triangle Enumeration Examples, thanks Fabian.
> >
> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
> >
> > These are the examples people called dibs on:
> >  - PageRank (Fabian)
> >  - BatchGradientDescent (Márton)
> >  - ComputeEdgeDegrees (Hermann)
> >
> > Those are unclaimed (if I'm not mistaken):
> >  - The relational Stuff
> >  - LinearRegression
> >
> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[hidden email]>
> wrote:
> >> Thanks, I added it. I'll keep a running list of ported/unported
> >> examples in my mails. I'll rename the java example package to examples
> >> once the Scala API merge is done.
> >>
> >> I think the termination criterion is fine as it is. Just because Scala
> >> enables functional programming doesn't mean it's always the best
> >> choice. :D
> >>
> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> WebLogAnalysis, TransitiveClosureNaive
> >>
> >> These are the examples people called dibs on:
> >>  - TriangleEnumration and PageRank (Fabian)
> >>  - BatchGradientDescent (Márton)
> >>  - ComputeEdgeDegrees (Hermann)
> >>
> >> Those are unclaimed (if I'm not mistaken):
> >>  - The relational Stuff
> >>  - LinearRegression
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> >>> Transitive closure here, I also added a termination criterion in the
> Java
> >>> version:
> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >>>
> >>> Perhaps you can make the termination criterion in Scala more
> functional?
> >>>
> >>> I noticed that the examples package name is example.java but
> examples.scala
> >>>
> >>> Kostas
> >>>
> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> >>>>
> >>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
> >>>>
> >>>> If nobody volunteers for the relational stuff I can take those as
> well.
> >>>>
> >>>> How about removing the "RelationalQuery" from both Scala and Java? It
> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
> value on
> >>>> top of TPC-H Q3?
> >>>>
> >>>> Kostas
> >>>>
> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]
> >
> >>>> wrote:
> >>>>>
> >>>>> Thanks, I added it, along with an ITCase.
> >>>>>
> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>>>> WebLogAnalysis
> >>>>>
> >>>>> These are the examples people called dibs on:
> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >>>>>  - BatchGradientDescent (Márton)
> >>>>>  - ComputeEdgeDegrees (Hermann)
> >>>>>
> >>>>> Those are unclaimed (if I'm not mistaken):
> >>>>>  - TransitiveClosure
> >>>>>  - The relational Stuff
> >>>>>  - LinearRegression
> >>>>>
> >>>>> Cheers,
> >>>>> Aljoscha
> >>>>>
> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
> >>>>> wrote:
> >>>>> > WebLog here:
> >>>>> >
> >>>>> >
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >>>>> >
> >>>>> > Do you need any more done?
> >>>>> >
> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> [hidden email]>
> >>>>> > wrote:
> >>>>> >
> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >>>>> >>
> >>>>> >> Keep 'em coming, people. :D
> >>>>> >>
> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]
> >
> >>>>> >> wrote:
> >>>>> >> > Alright, will do.
> >>>>> >> > Thanks!
> >>>>> >> >
> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> [hidden email]>:
> >>>>> >> >
> >>>>> >> >> Ok people, executive decision. :D
> >>>>> >> >>
> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
> the
> >>>>> >> >> data
> >>>>> >> >> in multi-dimensional object arrays and then converting it to
> the
> >>>>> >> >> required Java or Scala objects.
> >>>>> >> >>
> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
> with the
> >>>>> >> >> Java
> >>>>> >> >> API.
> >>>>> >> >>
> >>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword,
> you
> >>>>> >> >> can
> >>>>> >> >> just write:
> >>>>> >> >>
> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
> MyResult(le,
> >>>>> >> >> re)
> >>>>> >> }
> >>>>> >> >>
> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> [hidden email]>
> >>>>> >> wrote:
> >>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency
> with
> >>>>> >> >> > the
> >>>>> >> Java
> >>>>> >> >> > API. In Java join is done as:
> >>>>> >> >> >
> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >>>>> >> >> >
> >>>>> >> >> > where in the current Scala this is:
> >>>>> >> >> >
> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >>>>> >> >> >
> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
> because
> >>>>> >> "with"
> >>>>> >> >> is
> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
> Scala
> >>>>> >> >> > or go
> >>>>> >> >> with
> >>>>> >> >> > map() on Tuple2(left, right)?
> >>>>> >> >> >
> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
> >>>>> >> >> >
> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
> >>>>> >> >> >> generic
> >>>>> >> >> >> representation of a Tuple.
> >>>>> >> >> >>
> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
> >>>>> >> >> >> with a
> >>>>> >> >> generic
> >>>>> >> >> >> utility method to convert between the two.
> >>>>> >> >> >>
> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >>>>> >> >> >> <[hidden email]>
> >>>>> >> >> wrote:
> >>>>> >> >> >>
> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >>>>> >> >> >> >
> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
> >>>>> >> >> >> > CSVFormat
> >>>>> >> won't
> >>>>> >> >> >> work
> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >>>>> >> >> >> > So we would need to parse the Strings manually...
> >>>>> >> >> >> >
> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >>>>> >> >> >> > <[hidden email]>:
> >>>>> >> >> >> >
> >>>>> >> >> >> > > Hi,
> >>>>> >> >> >> > > on second thought. Maybe we should just change all the
> >>>>> >> >> >> > > example
> >>>>> >> input
> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
> >>>>> >> >> >> > > examples.
> >>>>> >> What
> >>>>> >> >> do
> >>>>> >> >> >> > > you think?
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > Cheers,
> >>>>> >> >> >> > > Aljoscha
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
> >>>>> >> >> [hidden email]>
> >>>>> >> >> >> > > wrote:
> >>>>> >> >> >> > > > Hi,
> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> incompatible.
> >>>>> >> >> >> > > > I'm
> >>>>> >> >> afraid
> >>>>> >> >> >> > > > you have to to what you proposed: move the data to a
> >>>>> >> >> >> > > > static
> >>>>> >> field
> >>>>> >> >> and
> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
> Scala.
> >>>>> >> >> >> > > > It's
> >>>>> >> >> not
> >>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
> >>>>> >> >> >> > > > easier
> >>>>> >> for
> >>>>> >> >> it
> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > What do the others think? This will probably occur in
> all
> >>>>> >> >> >> > > > the
> >>>>> >> >> >> examples.
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > Cheers,
> >>>>> >> >> >> > > > Aljoscha
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >>>>> >> >> >> > > >> Hey,
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> I have ported the Connected Components example, but
> I am
> >>>>> >> >> >> > > >> not
> >>>>> >> sure
> >>>>> >> >> >> how
> >>>>> >> >> >> > to
> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
> and
> >>>>> >> >> >> > > >> edges
> >>>>> >> data
> >>>>> >> >> >> are
> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
> >>>>> >> parameter.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> One way is to provide public static fields (like in
> the
> >>>>> >> >> >> WordCountData
> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
> >>>>> >> >> >> > > >> tuple and
> >>>>> >> >> from
> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
> >>>>> >> unnecessary
> >>>>> >> >> >> > > complexity
> >>>>> >> >> >> > > >> for an example (?).
> >>>>> >> >> >> > > >> Another way is, of course, to copy the example data
> in
> >>>>> >> >> >> > > >> the
> >>>>> >> Scala
> >>>>> >> >> >> > > example.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Am I missing something here?
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Thanks!
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Cheers,
> >>>>> >> >> >> > > >> V.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >>>>> >> [hidden email]
> >>>>> >> >> >
> >>>>> >> >> >> > > wrote:
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >>>>> >> >> >> > > >>>
> >>>>> >> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
> pretty
> >>>>> >> >> >> > > >>> much a
> >>>>> >> >> copy
> >>>>> >> >> >> of
> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
> >>>>> >> >> >> > > >>> lambda
> >>>>> >> >> >> > functions.
> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
> as a
> >>>>> >> >> dependency
> >>>>> >> >> >> for
> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
> >>>>> >> >> >> > > >>> input
> >>>>> >> data.
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> When you ported a program you can do a pull request
> >>>>> >> >> >> > > >>> against
> >>>>> >> my
> >>>>> >> >> repo
> >>>>> >> >> >> > > >>> and I will collect the examples.
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> Happy coding. :D
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
> >>>>> >> >> >> [hidden email]
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > >>> wrote:
> >>>>> >> >> >> > > >>> > +1
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
> >>>>> >> >> >> > > >>> [hidden email]>
> >>>>> >> >> >> > > >>> > wrote:
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> >> +1
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
> >>>>> >> >> >> > > [hidden email]>
> >>>>> >> >> >> > > >>> >> wrote:
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> > +1
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
> through
> >>>>> >> >> >> > > >>> >> > a
> >>>>> >> >> tutorial
> >>>>> >> >> >> so
> >>>>> >> >> >> > > this
> >>>>> >> >> >> > > >>> >> will
> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new
> API
> >>>>> >> >> >> > > >>> >> > :-)
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
> Kalavri <
> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >>>>> >> >> >> > > >>> >> > wrote:
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
> >>>>> >> >> >> > > >>> >> > > examples!
> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > -V.
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
> >>>>> >> >> >> [hidden email]>
> >>>>> >> >> >> > > >>> wrote:
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the
> Java
> >>>>> >> >> examples:
> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
> Krettek <
> >>>>> >> >> >> > > [hidden email]
> >>>>> >> >> >> > > >>> >:
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
> >>>>> >> >> >> > > >>> >> > > > > favourite
> >>>>> >> >> >> examples
> >>>>> >> >> >> > > here.
> >>>>> >> >> >> > > >>> >> > > > >
> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
> Hueske
> >>>>> >> >> >> > > >>> >> > > > > <
> >>>>> >> >> >> > > >>> [hidden email]>
> >>>>> >> >> >> > > >>> >> > > > wrote:
> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
> >>>>> >> >> >> > > >>> >> > > > > > different
> >>>>> >> >> >> people
> >>>>> >> >> >> > > >>> proved to
> >>>>> >> >> >> > > >>> >> > be
> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
> simple
> >>>>> >> >> >> > > >>> >> > > > > > first
> >>>>> >> >> one
> >>>>> >> >> >> > such
> >>>>> >> >> >> > > as
> >>>>> >> >> >> > > >>> >> > > WordCount.
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
> Krettek
> >>>>> >> >> >> > > >>> >> > > > > > <
> >>>>> >> >> >> > > >>> [hidden email]
> >>>>> >> >> >> > > >>> >> >:
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala
> API
> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
> write
> >>>>> >> >> >> > > >>> >> > > > > >> the
> >>>>> >> tests
> >>>>> >> >> and
> >>>>> >> >> >> > > port
> >>>>> >> >> >> > > >>> the
> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense
> to
> >>>>> >> >> >> > > >>> >> > > > > >> let
> >>>>> >> other
> >>>>> >> >> >> > people
> >>>>> >> >> >> > > >>> port
> >>>>> >> >> >> > > >>> >> the
> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
> it and
> >>>>> >> maybe
> >>>>> >> >> >> > notices
> >>>>> >> >> >> > > some
> >>>>> >> >> >> > > >>> >> > quirks
> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > >
> >>>>> >> >> >> >
> >>>>> >> >> >>
> >>>>> >> >>
> >>>>> >>
> >>>>
> >>>>
> >>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Fabian Hueske
In reply to this post by Aljoscha Krettek-2
+1 for removing RelationalQuery
IMO, the Scala examples should mirror the Java examples. So, we should
rather port Java examples to Scala instead of updating existing Scala
examples.

I am also done with the PageRank implementation. Final tests are currently
running and I'll open a PR soon.

I found some things that need to be solved or should at least be mentioned
in the documentation:
- It is crucial to import org.apache.flink.api.scala._. Eclipse shows many
errors regarding type extraction but does not help to solve the problem by
adding this import.
- The method ExecutionEnvironment.generateSequence appears to be missing in
Scala ExecutionEnvironment
- It is not possible to use arrays of Scala primitives such as Int, Long
which are mapped to Java Primitives int, long. Instead you need to force
Java types, e.g., Array[java.lang.Long].
- It is not possible to use Scala List. List is a Trait and considered to
be an Interface which is not supported by the TypeExtractor.

Cheers, Fabian

2014-09-11 15:04 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> By the way, what was called BatchGradientDescent in the Scala examples
> should be replaced by a port of the LinearRegression Example from
> Java. I had them as two separate examples earlier.
>
> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
> RelationalQuery?
>
> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]>
> wrote:
> > I added the Triangle Enumeration Examples, thanks Fabian.
> >
> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
> >
> > These are the examples people called dibs on:
> >  - PageRank (Fabian)
> >  - BatchGradientDescent (Márton)
> >  - ComputeEdgeDegrees (Hermann)
> >
> > Those are unclaimed (if I'm not mistaken):
> >  - The relational Stuff
> >  - LinearRegression
> >
> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[hidden email]>
> wrote:
> >> Thanks, I added it. I'll keep a running list of ported/unported
> >> examples in my mails. I'll rename the java example package to examples
> >> once the Scala API merge is done.
> >>
> >> I think the termination criterion is fine as it is. Just because Scala
> >> enables functional programming doesn't mean it's always the best
> >> choice. :D
> >>
> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> WebLogAnalysis, TransitiveClosureNaive
> >>
> >> These are the examples people called dibs on:
> >>  - TriangleEnumration and PageRank (Fabian)
> >>  - BatchGradientDescent (Márton)
> >>  - ComputeEdgeDegrees (Hermann)
> >>
> >> Those are unclaimed (if I'm not mistaken):
> >>  - The relational Stuff
> >>  - LinearRegression
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> >>> Transitive closure here, I also added a termination criterion in the
> Java
> >>> version:
> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >>>
> >>> Perhaps you can make the termination criterion in Scala more
> functional?
> >>>
> >>> I noticed that the examples package name is example.java but
> examples.scala
> >>>
> >>> Kostas
> >>>
> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> >>>>
> >>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
> >>>>
> >>>> If nobody volunteers for the relational stuff I can take those as
> well.
> >>>>
> >>>> How about removing the "RelationalQuery" from both Scala and Java? It
> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
> value on
> >>>> top of TPC-H Q3?
> >>>>
> >>>> Kostas
> >>>>
> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]
> >
> >>>> wrote:
> >>>>>
> >>>>> Thanks, I added it, along with an ITCase.
> >>>>>
> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>>>> WebLogAnalysis
> >>>>>
> >>>>> These are the examples people called dibs on:
> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >>>>>  - BatchGradientDescent (Márton)
> >>>>>  - ComputeEdgeDegrees (Hermann)
> >>>>>
> >>>>> Those are unclaimed (if I'm not mistaken):
> >>>>>  - TransitiveClosure
> >>>>>  - The relational Stuff
> >>>>>  - LinearRegression
> >>>>>
> >>>>> Cheers,
> >>>>> Aljoscha
> >>>>>
> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
> >>>>> wrote:
> >>>>> > WebLog here:
> >>>>> >
> >>>>> >
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >>>>> >
> >>>>> > Do you need any more done?
> >>>>> >
> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> [hidden email]>
> >>>>> > wrote:
> >>>>> >
> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >>>>> >>
> >>>>> >> Keep 'em coming, people. :D
> >>>>> >>
> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]
> >
> >>>>> >> wrote:
> >>>>> >> > Alright, will do.
> >>>>> >> > Thanks!
> >>>>> >> >
> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> [hidden email]>:
> >>>>> >> >
> >>>>> >> >> Ok people, executive decision. :D
> >>>>> >> >>
> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
> the
> >>>>> >> >> data
> >>>>> >> >> in multi-dimensional object arrays and then converting it to
> the
> >>>>> >> >> required Java or Scala objects.
> >>>>> >> >>
> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
> with the
> >>>>> >> >> Java
> >>>>> >> >> API.
> >>>>> >> >>
> >>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword,
> you
> >>>>> >> >> can
> >>>>> >> >> just write:
> >>>>> >> >>
> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
> MyResult(le,
> >>>>> >> >> re)
> >>>>> >> }
> >>>>> >> >>
> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> [hidden email]>
> >>>>> >> wrote:
> >>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency
> with
> >>>>> >> >> > the
> >>>>> >> Java
> >>>>> >> >> > API. In Java join is done as:
> >>>>> >> >> >
> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >>>>> >> >> >
> >>>>> >> >> > where in the current Scala this is:
> >>>>> >> >> >
> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >>>>> >> >> >
> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
> because
> >>>>> >> "with"
> >>>>> >> >> is
> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
> Scala
> >>>>> >> >> > or go
> >>>>> >> >> with
> >>>>> >> >> > map() on Tuple2(left, right)?
> >>>>> >> >> >
> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
> >>>>> >> >> >
> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
> >>>>> >> >> >> generic
> >>>>> >> >> >> representation of a Tuple.
> >>>>> >> >> >>
> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
> >>>>> >> >> >> with a
> >>>>> >> >> generic
> >>>>> >> >> >> utility method to convert between the two.
> >>>>> >> >> >>
> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >>>>> >> >> >> <[hidden email]>
> >>>>> >> >> wrote:
> >>>>> >> >> >>
> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >>>>> >> >> >> >
> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
> >>>>> >> >> >> > CSVFormat
> >>>>> >> won't
> >>>>> >> >> >> work
> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >>>>> >> >> >> > So we would need to parse the Strings manually...
> >>>>> >> >> >> >
> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >>>>> >> >> >> > <[hidden email]>:
> >>>>> >> >> >> >
> >>>>> >> >> >> > > Hi,
> >>>>> >> >> >> > > on second thought. Maybe we should just change all the
> >>>>> >> >> >> > > example
> >>>>> >> input
> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
> >>>>> >> >> >> > > examples.
> >>>>> >> What
> >>>>> >> >> do
> >>>>> >> >> >> > > you think?
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > Cheers,
> >>>>> >> >> >> > > Aljoscha
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
> >>>>> >> >> [hidden email]>
> >>>>> >> >> >> > > wrote:
> >>>>> >> >> >> > > > Hi,
> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> incompatible.
> >>>>> >> >> >> > > > I'm
> >>>>> >> >> afraid
> >>>>> >> >> >> > > > you have to to what you proposed: move the data to a
> >>>>> >> >> >> > > > static
> >>>>> >> field
> >>>>> >> >> and
> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
> Scala.
> >>>>> >> >> >> > > > It's
> >>>>> >> >> not
> >>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
> >>>>> >> >> >> > > > easier
> >>>>> >> for
> >>>>> >> >> it
> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > What do the others think? This will probably occur in
> all
> >>>>> >> >> >> > > > the
> >>>>> >> >> >> examples.
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > Cheers,
> >>>>> >> >> >> > > > Aljoscha
> >>>>> >> >> >> > > >
> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >>>>> >> >> >> > > >> Hey,
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> I have ported the Connected Components example, but
> I am
> >>>>> >> >> >> > > >> not
> >>>>> >> sure
> >>>>> >> >> >> how
> >>>>> >> >> >> > to
> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
> and
> >>>>> >> >> >> > > >> edges
> >>>>> >> data
> >>>>> >> >> >> are
> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
> >>>>> >> parameter.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> One way is to provide public static fields (like in
> the
> >>>>> >> >> >> WordCountData
> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
> >>>>> >> >> >> > > >> tuple and
> >>>>> >> >> from
> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
> >>>>> >> unnecessary
> >>>>> >> >> >> > > complexity
> >>>>> >> >> >> > > >> for an example (?).
> >>>>> >> >> >> > > >> Another way is, of course, to copy the example data
> in
> >>>>> >> >> >> > > >> the
> >>>>> >> Scala
> >>>>> >> >> >> > > example.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Am I missing something here?
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Thanks!
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> Cheers,
> >>>>> >> >> >> > > >> V.
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >>>>> >> [hidden email]
> >>>>> >> >> >
> >>>>> >> >> >> > > wrote:
> >>>>> >> >> >> > > >>
> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >>>>> >> >> >> > > >>>
> >>>>> >> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
> pretty
> >>>>> >> >> >> > > >>> much a
> >>>>> >> >> copy
> >>>>> >> >> >> of
> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
> >>>>> >> >> >> > > >>> lambda
> >>>>> >> >> >> > functions.
> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
> as a
> >>>>> >> >> dependency
> >>>>> >> >> >> for
> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
> >>>>> >> >> >> > > >>> input
> >>>>> >> data.
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> When you ported a program you can do a pull request
> >>>>> >> >> >> > > >>> against
> >>>>> >> my
> >>>>> >> >> repo
> >>>>> >> >> >> > > >>> and I will collect the examples.
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> Happy coding. :D
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
> >>>>> >> >> >> [hidden email]
> >>>>> >> >> >> > >
> >>>>> >> >> >> > > >>> wrote:
> >>>>> >> >> >> > > >>> > +1
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
> >>>>> >> >> >> > > >>> [hidden email]>
> >>>>> >> >> >> > > >>> > wrote:
> >>>>> >> >> >> > > >>> >
> >>>>> >> >> >> > > >>> >> +1
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
> >>>>> >> >> >> > > [hidden email]>
> >>>>> >> >> >> > > >>> >> wrote:
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>> >> > +1
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
> through
> >>>>> >> >> >> > > >>> >> > a
> >>>>> >> >> tutorial
> >>>>> >> >> >> so
> >>>>> >> >> >> > > this
> >>>>> >> >> >> > > >>> >> will
> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new
> API
> >>>>> >> >> >> > > >>> >> > :-)
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
> Kalavri <
> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >>>>> >> >> >> > > >>> >> > wrote:
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
> >>>>> >> >> >> > > >>> >> > > examples!
> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > -V.
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
> >>>>> >> >> >> [hidden email]>
> >>>>> >> >> >> > > >>> wrote:
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the
> Java
> >>>>> >> >> examples:
> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
> Krettek <
> >>>>> >> >> >> > > [hidden email]
> >>>>> >> >> >> > > >>> >:
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
> >>>>> >> >> >> > > >>> >> > > > > favourite
> >>>>> >> >> >> examples
> >>>>> >> >> >> > > here.
> >>>>> >> >> >> > > >>> >> > > > >
> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
> Hueske
> >>>>> >> >> >> > > >>> >> > > > > <
> >>>>> >> >> >> > > >>> [hidden email]>
> >>>>> >> >> >> > > >>> >> > > > wrote:
> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
> >>>>> >> >> >> > > >>> >> > > > > > different
> >>>>> >> >> >> people
> >>>>> >> >> >> > > >>> proved to
> >>>>> >> >> >> > > >>> >> > be
> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
> simple
> >>>>> >> >> >> > > >>> >> > > > > > first
> >>>>> >> >> one
> >>>>> >> >> >> > such
> >>>>> >> >> >> > > as
> >>>>> >> >> >> > > >>> >> > > WordCount.
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
> Krettek
> >>>>> >> >> >> > > >>> >> > > > > > <
> >>>>> >> >> >> > > >>> [hidden email]
> >>>>> >> >> >> > > >>> >> >:
> >>>>> >> >> >> > > >>> >> > > > > >
> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala
> API
> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
> write
> >>>>> >> >> >> > > >>> >> > > > > >> the
> >>>>> >> tests
> >>>>> >> >> and
> >>>>> >> >> >> > > port
> >>>>> >> >> >> > > >>> the
> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense
> to
> >>>>> >> >> >> > > >>> >> > > > > >> let
> >>>>> >> other
> >>>>> >> >> >> > people
> >>>>> >> >> >> > > >>> port
> >>>>> >> >> >> > > >>> >> the
> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
> it and
> >>>>> >> maybe
> >>>>> >> >> >> > notices
> >>>>> >> >> >> > > some
> >>>>> >> >> >> > > >>> >> > quirks
> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>>> >> >> >> > > >>> >> > > > >
> >>>>> >> >> >> > > >>> >> > > >
> >>>>> >> >> >> > > >>> >> > >
> >>>>> >> >> >> > > >>> >> >
> >>>>> >> >> >> > > >>> >>
> >>>>> >> >> >> > > >>>
> >>>>> >> >> >> > >
> >>>>> >> >> >> >
> >>>>> >> >> >>
> >>>>> >> >>
> >>>>> >>
> >>>>
> >>>>
> >>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
In reply to this post by Stephan Ewen
I added the PageRank example, thanks again fabian. :D

Regarding the other stuff:
 - There is a comment in DataSet.scala about including
org.apache.flink.api.scala._ because of the TypeInformation.
 - I added generateSequence to ExecutionEnvironment.
 - It is possible to use Scala Primitives in Array, I noticed it while
writing the tests, you probably had an older version of the code.
 - Yes, using List and other Interfaces is not possible, this is also
a restriction in the Java API.

What do you think about the interface of join and coGroup? Right now,
you can either use a lambda that returns an Option or the lambda with
the Collector. Originally I wanted to have also have a lambda that
returns a Collection, but due to type erasure this has the same type
as the lambda with the Option so I couldn't use it. There is an
implicit conversion from Option to a Collection, so I could change it
without breaking the examples we have now. What do you think?

So far we have ported: WordCount, KMeans, ConnectedComponents,
WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
PageRank

These are the examples people called dibs on:
 - BatchGradientDescent (Márton) (Should be a port of LinearRegression
Example from Java)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - The relational Stuff

On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]> wrote:

> +1 for removing RelationQuery
>
> On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> By the way, what was called BatchGradientDescent in the Scala examples
>> should be replaced by a port of the LinearRegression Example from
>> Java. I had them as two separate examples earlier.
>>
>> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
>> RelationalQuery?
>>
>> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]>
>> wrote:
>> > I added the Triangle Enumeration Examples, thanks Fabian.
>> >
>> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
>> >
>> > These are the examples people called dibs on:
>> >  - PageRank (Fabian)
>> >  - BatchGradientDescent (Márton)
>> >  - ComputeEdgeDegrees (Hermann)
>> >
>> > Those are unclaimed (if I'm not mistaken):
>> >  - The relational Stuff
>> >  - LinearRegression
>> >
>> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[hidden email]>
>> wrote:
>> >> Thanks, I added it. I'll keep a running list of ported/unported
>> >> examples in my mails. I'll rename the java example package to examples
>> >> once the Scala API merge is done.
>> >>
>> >> I think the termination criterion is fine as it is. Just because Scala
>> >> enables functional programming doesn't mean it's always the best
>> >> choice. :D
>> >>
>> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> WebLogAnalysis, TransitiveClosureNaive
>> >>
>> >> These are the examples people called dibs on:
>> >>  - TriangleEnumration and PageRank (Fabian)
>> >>  - BatchGradientDescent (Márton)
>> >>  - ComputeEdgeDegrees (Hermann)
>> >>
>> >> Those are unclaimed (if I'm not mistaken):
>> >>  - The relational Stuff
>> >>  - LinearRegression
>> >>
>> >> Cheers,
>> >> Aljoscha
>> >>
>> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]>
>> wrote:
>> >>> Transitive closure here, I also added a termination criterion in the
>> Java
>> >>> version:
>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>> >>>
>> >>> Perhaps you can make the termination criterion in Scala more
>> functional?
>> >>>
>> >>> I noticed that the examples package name is example.java but
>> examples.scala
>> >>>
>> >>> Kostas
>> >>>
>> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]>
>> wrote:
>> >>>>
>> >>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>> >>>>
>> >>>> If nobody volunteers for the relational stuff I can take those as
>> well.
>> >>>>
>> >>>> How about removing the "RelationalQuery" from both Scala and Java? It
>> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
>> value on
>> >>>> top of TPC-H Q3?
>> >>>>
>> >>>> Kostas
>> >>>>
>> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[hidden email]
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> Thanks, I added it, along with an ITCase.
>> >>>>>
>> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >>>>> WebLogAnalysis
>> >>>>>
>> >>>>> These are the examples people called dibs on:
>> >>>>>  - TriangleEnumration and PageRank (Fabian)
>> >>>>>  - BatchGradientDescent (Márton)
>> >>>>>  - ComputeEdgeDegrees (Hermann)
>> >>>>>
>> >>>>> Those are unclaimed (if I'm not mistaken):
>> >>>>>  - TransitiveClosure
>> >>>>>  - The relational Stuff
>> >>>>>  - LinearRegression
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Aljoscha
>> >>>>>
>> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[hidden email]>
>> >>>>> wrote:
>> >>>>> > WebLog here:
>> >>>>> >
>> >>>>> >
>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>> >>>>> >
>> >>>>> > Do you need any more done?
>> >>>>> >
>> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>> [hidden email]>
>> >>>>> > wrote:
>> >>>>> >
>> >>>>> >> I added the ConnectedComponents Example from Vasia.
>> >>>>> >>
>> >>>>> >> Keep 'em coming, people. :D
>> >>>>> >>
>> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]
>> >
>> >>>>> >> wrote:
>> >>>>> >> > Alright, will do.
>> >>>>> >> > Thanks!
>> >>>>> >> >
>> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>> [hidden email]>:
>> >>>>> >> >
>> >>>>> >> >> Ok people, executive decision. :D
>> >>>>> >> >>
>> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
>> the
>> >>>>> >> >> data
>> >>>>> >> >> in multi-dimensional object arrays and then converting it to
>> the
>> >>>>> >> >> required Java or Scala objects.
>> >>>>> >> >>
>> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
>> with the
>> >>>>> >> >> Java
>> >>>>> >> >> API.
>> >>>>> >> >>
>> >>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword,
>> you
>> >>>>> >> >> can
>> >>>>> >> >> just write:
>> >>>>> >> >>
>> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
>> MyResult(le,
>> >>>>> >> >> re)
>> >>>>> >> }
>> >>>>> >> >>
>> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>> [hidden email]>
>> >>>>> >> wrote:
>> >>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency
>> with
>> >>>>> >> >> > the
>> >>>>> >> Java
>> >>>>> >> >> > API. In Java join is done as:
>> >>>>> >> >> >
>> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>> >>>>> >> >> >
>> >>>>> >> >> > where in the current Scala this is:
>> >>>>> >> >> >
>> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >>>>> >> >> >
>> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
>> because
>> >>>>> >> "with"
>> >>>>> >> >> is
>> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
>> Scala
>> >>>>> >> >> > or go
>> >>>>> >> >> with
>> >>>>> >> >> > map() on Tuple2(left, right)?
>> >>>>> >> >> >
>> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>:
>> >>>>> >> >> >
>> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
>> >>>>> >> >> >> generic
>> >>>>> >> >> >> representation of a Tuple.
>> >>>>> >> >> >>
>> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
>> >>>>> >> >> >> with a
>> >>>>> >> >> generic
>> >>>>> >> >> >> utility method to convert between the two.
>> >>>>> >> >> >>
>> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>> >>>>> >> >> >> <[hidden email]>
>> >>>>> >> >> wrote:
>> >>>>> >> >> >>
>> >>>>> >> >> >> > Yeah, I ran into the same problem...
>> >>>>> >> >> >> >
>> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>> >>>>> >> >> >> > CSVFormat
>> >>>>> >> won't
>> >>>>> >> >> >> work
>> >>>>> >> >> >> > because this is based on a FileInputFormat.
>> >>>>> >> >> >> > So we would need to parse the Strings manually...
>> >>>>> >> >> >> >
>> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>> >>>>> >> >> >> > <[hidden email]>:
>> >>>>> >> >> >> >
>> >>>>> >> >> >> > > Hi,
>> >>>>> >> >> >> > > on second thought. Maybe we should just change all the
>> >>>>> >> >> >> > > example
>> >>>>> >> input
>> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
>> >>>>> >> >> >> > > examples.
>> >>>>> >> What
>> >>>>> >> >> do
>> >>>>> >> >> >> > > you think?
>> >>>>> >> >> >> > >
>> >>>>> >> >> >> > > Cheers,
>> >>>>> >> >> >> > > Aljoscha
>> >>>>> >> >> >> > >
>> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>> >>>>> >> >> [hidden email]>
>> >>>>> >> >> >> > > wrote:
>> >>>>> >> >> >> > > > Hi,
>> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>> incompatible.
>> >>>>> >> >> >> > > > I'm
>> >>>>> >> >> afraid
>> >>>>> >> >> >> > > > you have to to what you proposed: move the data to a
>> >>>>> >> >> >> > > > static
>> >>>>> >> field
>> >>>>> >> >> and
>> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
>> Scala.
>> >>>>> >> >> >> > > > It's
>> >>>>> >> >> not
>> >>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
>> >>>>> >> >> >> > > > easier
>> >>>>> >> for
>> >>>>> >> >> it
>> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>> >>>>> >> >> >> > > >
>> >>>>> >> >> >> > > > What do the others think? This will probably occur in
>> all
>> >>>>> >> >> >> > > > the
>> >>>>> >> >> >> examples.
>> >>>>> >> >> >> > > >
>> >>>>> >> >> >> > > > Cheers,
>> >>>>> >> >> >> > > > Aljoscha
>> >>>>> >> >> >> > > >
>> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>> >>>>> >> >> >> > > > <[hidden email]> wrote:
>> >>>>> >> >> >> > > >> Hey,
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> I have ported the Connected Components example, but
>> I am
>> >>>>> >> >> >> > > >> not
>> >>>>> >> sure
>> >>>>> >> >> >> how
>> >>>>> >> >> >> > to
>> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
>> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
>> and
>> >>>>> >> >> >> > > >> edges
>> >>>>> >> data
>> >>>>> >> >> >> are
>> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>> >>>>> >> parameter.
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> One way is to provide public static fields (like in
>> the
>> >>>>> >> >> >> WordCountData
>> >>>>> >> >> >> > > >> class), but this introduces a conversion
>> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
>> >>>>> >> >> >> > > >> tuple and
>> >>>>> >> >> from
>> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>> >>>>> >> unnecessary
>> >>>>> >> >> >> > > complexity
>> >>>>> >> >> >> > > >> for an example (?).
>> >>>>> >> >> >> > > >> Another way is, of course, to copy the example data
>> in
>> >>>>> >> >> >> > > >> the
>> >>>>> >> Scala
>> >>>>> >> >> >> > > example.
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> Am I missing something here?
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> Thanks!
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> Cheers,
>> >>>>> >> >> >> > > >> V.
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>> >>>>> >> [hidden email]
>> >>>>> >> >> >
>> >>>>> >> >> >> > > wrote:
>> >>>>> >> >> >> > > >>
>> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >>
>> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
>> pretty
>> >>>>> >> >> >> > > >>> much a
>> >>>>> >> >> copy
>> >>>>> >> >> >> of
>> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
>> >>>>> >> >> >> > > >>> lambda
>> >>>>> >> >> >> > functions.
>> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
>> as a
>> >>>>> >> >> dependency
>> >>>>> >> >> >> for
>> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
>> >>>>> >> >> >> > > >>> input
>> >>>>> >> data.
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >> >> > > >>> When you ported a program you can do a pull request
>> >>>>> >> >> >> > > >>> against
>> >>>>> >> my
>> >>>>> >> >> repo
>> >>>>> >> >> >> > > >>> and I will collect the examples.
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >> >> > > >>> Happy coding. :D
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>> >>>>> >> >> >> [hidden email]
>> >>>>> >> >> >> > >
>> >>>>> >> >> >> > > >>> wrote:
>> >>>>> >> >> >> > > >>> > +1
>> >>>>> >> >> >> > > >>> >
>> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >>>>> >> >> >> > > >>> >
>> >>>>> >> >> >> > > >>> >
>> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>> >>>>> >> >> >> > > >>> [hidden email]>
>> >>>>> >> >> >> > > >>> > wrote:
>> >>>>> >> >> >> > > >>> >
>> >>>>> >> >> >> > > >>> >> +1
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>> >>>>> >> >> >> > > [hidden email]>
>> >>>>> >> >> >> > > >>> >> wrote:
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >> > > >>> >> > +1
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
>> through
>> >>>>> >> >> >> > > >>> >> > a
>> >>>>> >> >> tutorial
>> >>>>> >> >> >> so
>> >>>>> >> >> >> > > this
>> >>>>> >> >> >> > > >>> >> will
>> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new
>> API
>> >>>>> >> >> >> > > >>> >> > :-)
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
>> Kalavri <
>> >>>>> >> >> >> > > >>> >> > [hidden email]>
>> >>>>> >> >> >> > > >>> >> > wrote:
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>> >>>>> >> >> >> > > >>> >> > > examples!
>> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>> >>>>> >> >> >> > > >>> >> > >
>> >>>>> >> >> >> > > >>> >> > > -V.
>> >>>>> >> >> >> > > >>> >> > >
>> >>>>> >> >> >> > > >>> >> > >
>> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>> >>>>> >> >> >> [hidden email]>
>> >>>>> >> >> >> > > >>> wrote:
>> >>>>> >> >> >> > > >>> >> > >
>> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the
>> Java
>> >>>>> >> >> examples:
>> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
>> Krettek <
>> >>>>> >> >> >> > > [hidden email]
>> >>>>> >> >> >> > > >>> >:
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>> >>>>> >> >> >> > > >>> >> > > > > favourite
>> >>>>> >> >> >> examples
>> >>>>> >> >> >> > > here.
>> >>>>> >> >> >> > > >>> >> > > > >
>> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
>> Hueske
>> >>>>> >> >> >> > > >>> >> > > > > <
>> >>>>> >> >> >> > > >>> [hidden email]>
>> >>>>> >> >> >> > > >>> >> > > > wrote:
>> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
>> >>>>> >> >> >> > > >>> >> > > > > > different
>> >>>>> >> >> >> people
>> >>>>> >> >> >> > > >>> proved to
>> >>>>> >> >> >> > > >>> >> > be
>> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
>> simple
>> >>>>> >> >> >> > > >>> >> > > > > > first
>> >>>>> >> >> one
>> >>>>> >> >> >> > such
>> >>>>> >> >> >> > > as
>> >>>>> >> >> >> > > >>> >> > > WordCount.
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
>> Krettek
>> >>>>> >> >> >> > > >>> >> > > > > > <
>> >>>>> >> >> >> > > >>> [hidden email]
>> >>>>> >> >> >> > > >>> >> >:
>> >>>>> >> >> >> > > >>> >> > > > > >
>> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala
>> API
>> >>>>> >> >> >> > > >>> >> > > > > >> here:
>> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >>
>> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
>> write
>> >>>>> >> >> >> > > >>> >> > > > > >> the
>> >>>>> >> tests
>> >>>>> >> >> and
>> >>>>> >> >> >> > > port
>> >>>>> >> >> >> > > >>> the
>> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense
>> to
>> >>>>> >> >> >> > > >>> >> > > > > >> let
>> >>>>> >> other
>> >>>>> >> >> >> > people
>> >>>>> >> >> >> > > >>> port
>> >>>>> >> >> >> > > >>> >> the
>> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
>> it and
>> >>>>> >> maybe
>> >>>>> >> >> >> > notices
>> >>>>> >> >> >> > > some
>> >>>>> >> >> >> > > >>> >> > quirks
>> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>>>> >> >> >> > > >>> >> > > > >
>> >>>>> >> >> >> > > >>> >> > > >
>> >>>>> >> >> >> > > >>> >> > >
>> >>>>> >> >> >> > > >>> >> >
>> >>>>> >> >> >> > > >>> >>
>> >>>>> >> >> >> > > >>>
>> >>>>> >> >> >> > >
>> >>>>> >> >> >> >
>> >>>>> >> >> >>
>> >>>>> >> >>
>> >>>>> >>
>> >>>>
>> >>>>
>> >>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Fabian Hueske
I just removed the old CountEdgeDegrees example.
That was a preprocessing step for the TriangleEnumeration, and is now part
of the new TriangleEnumerationOpt example.
So I guess, we don't need to port that one. As I said before, I'd prefer to
keep Java and Scala examples in sync.

Cheers, Fabian

2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> I added the PageRank example, thanks again fabian. :D
>
> Regarding the other stuff:
>  - There is a comment in DataSet.scala about including
> org.apache.flink.api.scala._ because of the TypeInformation.
>  - I added generateSequence to ExecutionEnvironment.
>  - It is possible to use Scala Primitives in Array, I noticed it while
> writing the tests, you probably had an older version of the code.
>  - Yes, using List and other Interfaces is not possible, this is also
> a restriction in the Java API.
>
> What do you think about the interface of join and coGroup? Right now,
> you can either use a lambda that returns an Option or the lambda with
> the Collector. Originally I wanted to have also have a lambda that
> returns a Collection, but due to type erasure this has the same type
> as the lambda with the Option so I couldn't use it. There is an
> implicit conversion from Option to a Collection, so I could change it
> without breaking the examples we have now. What do you think?
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
> PageRank
>
> These are the examples people called dibs on:
>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
> Example from Java)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - The relational Stuff
>
> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]> wrote:
> > +1 for removing RelationQuery
> >
> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> By the way, what was called BatchGradientDescent in the Scala examples
> >> should be replaced by a port of the LinearRegression Example from
> >> Java. I had them as two separate examples earlier.
> >>
> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
> >> RelationalQuery?
> >>
> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]
> >
> >> wrote:
> >> > I added the Triangle Enumeration Examples, thanks Fabian.
> >> >
> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
> >> >
> >> > These are the examples people called dibs on:
> >> >  - PageRank (Fabian)
> >> >  - BatchGradientDescent (Márton)
> >> >  - ComputeEdgeDegrees (Hermann)
> >> >
> >> > Those are unclaimed (if I'm not mistaken):
> >> >  - The relational Stuff
> >> >  - LinearRegression
> >> >
> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> [hidden email]>
> >> wrote:
> >> >> Thanks, I added it. I'll keep a running list of ported/unported
> >> >> examples in my mails. I'll rename the java example package to
> examples
> >> >> once the Scala API merge is done.
> >> >>
> >> >> I think the termination criterion is fine as it is. Just because
> Scala
> >> >> enables functional programming doesn't mean it's always the best
> >> >> choice. :D
> >> >>
> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >> WebLogAnalysis, TransitiveClosureNaive
> >> >>
> >> >> These are the examples people called dibs on:
> >> >>  - TriangleEnumration and PageRank (Fabian)
> >> >>  - BatchGradientDescent (Márton)
> >> >>  - ComputeEdgeDegrees (Hermann)
> >> >>
> >> >> Those are unclaimed (if I'm not mistaken):
> >> >>  - The relational Stuff
> >> >>  - LinearRegression
> >> >>
> >> >> Cheers,
> >> >> Aljoscha
> >> >>
> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]
> >
> >> wrote:
> >> >>> Transitive closure here, I also added a termination criterion in the
> >> Java
> >> >>> version:
> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >> >>>
> >> >>> Perhaps you can make the termination criterion in Scala more
> >> functional?
> >> >>>
> >> >>> I noticed that the examples package name is example.java but
> >> examples.scala
> >> >>>
> >> >>> Kostas
> >> >>>
> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]
> >
> >> wrote:
> >> >>>>
> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
> list).
> >> >>>>
> >> >>>> If nobody volunteers for the relational stuff I can take those as
> >> well.
> >> >>>>
> >> >>>> How about removing the "RelationalQuery" from both Scala and Java?
> It
> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
> >> value on
> >> >>>> top of TPC-H Q3?
> >> >>>>
> >> >>>> Kostas
> >> >>>>
> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> [hidden email]
> >> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Thanks, I added it, along with an ITCase.
> >> >>>>>
> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >>>>> WebLogAnalysis
> >> >>>>>
> >> >>>>> These are the examples people called dibs on:
> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >> >>>>>  - BatchGradientDescent (Márton)
> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> >> >>>>>
> >> >>>>> Those are unclaimed (if I'm not mistaken):
> >> >>>>>  - TransitiveClosure
> >> >>>>>  - The relational Stuff
> >> >>>>>  - LinearRegression
> >> >>>>>
> >> >>>>> Cheers,
> >> >>>>> Aljoscha
> >> >>>>>
> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> [hidden email]>
> >> >>>>> wrote:
> >> >>>>> > WebLog here:
> >> >>>>> >
> >> >>>>> >
> >>
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >> >>>>> >
> >> >>>>> > Do you need any more done?
> >> >>>>> >
> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> >> [hidden email]>
> >> >>>>> > wrote:
> >> >>>>> >
> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >> >>>>> >>
> >> >>>>> >> Keep 'em coming, people. :D
> >> >>>>> >>
> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> [hidden email]
> >> >
> >> >>>>> >> wrote:
> >> >>>>> >> > Alright, will do.
> >> >>>>> >> > Thanks!
> >> >>>>> >> >
> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> >> [hidden email]>:
> >> >>>>> >> >
> >> >>>>> >> >> Ok people, executive decision. :D
> >> >>>>> >> >>
> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
> >> the
> >> >>>>> >> >> data
> >> >>>>> >> >> in multi-dimensional object arrays and then converting it to
> >> the
> >> >>>>> >> >> required Java or Scala objects.
> >> >>>>> >> >>
> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
> >> with the
> >> >>>>> >> >> Java
> >> >>>>> >> >> API.
> >> >>>>> >> >>
> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
> keyword,
> >> you
> >> >>>>> >> >> can
> >> >>>>> >> >> just write:
> >> >>>>> >> >>
> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
> >> MyResult(le,
> >> >>>>> >> >> re)
> >> >>>>> >> }
> >> >>>>> >> >>
> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> >> [hidden email]>
> >> >>>>> >> wrote:
> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
> inconsistency
> >> with
> >> >>>>> >> >> > the
> >> >>>>> >> Java
> >> >>>>> >> >> > API. In Java join is done as:
> >> >>>>> >> >> >
> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >> >>>>> >> >> >
> >> >>>>> >> >> > where in the current Scala this is:
> >> >>>>> >> >> >
> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >> >>>>> >> >> >
> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
> >> because
> >> >>>>> >> "with"
> >> >>>>> >> >> is
> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
> >> Scala
> >> >>>>> >> >> > or go
> >> >>>>> >> >> with
> >> >>>>> >> >> > map() on Tuple2(left, right)?
> >> >>>>> >> >> >
> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]
> >:
> >> >>>>> >> >> >
> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That
> is a
> >> >>>>> >> >> >> generic
> >> >>>>> >> >> >> representation of a Tuple.
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
> Tuples,
> >> >>>>> >> >> >> with a
> >> >>>>> >> >> generic
> >> >>>>> >> >> >> utility method to convert between the two.
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >> >>>>> >> >> >> <[hidden email]>
> >> >>>>> >> >> wrote:
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
> >> >>>>> >> >> >> > CSVFormat
> >> >>>>> >> won't
> >> >>>>> >> >> >> work
> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >> >>>>> >> >> >> > <[hidden email]>:
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > > Hi,
> >> >>>>> >> >> >> > > on second thought. Maybe we should just change all
> the
> >> >>>>> >> >> >> > > example
> >> >>>>> >> input
> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
> >> >>>>> >> >> >> > > examples.
> >> >>>>> >> What
> >> >>>>> >> >> do
> >> >>>>> >> >> >> > > you think?
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > Cheers,
> >> >>>>> >> >> >> > > Aljoscha
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
> >> >>>>> >> >> [hidden email]>
> >> >>>>> >> >> >> > > wrote:
> >> >>>>> >> >> >> > > > Hi,
> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> >> incompatible.
> >> >>>>> >> >> >> > > > I'm
> >> >>>>> >> >> afraid
> >> >>>>> >> >> >> > > > you have to to what you proposed: move the data to
> a
> >> >>>>> >> >> >> > > > static
> >> >>>>> >> field
> >> >>>>> >> >> and
> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
> >> Scala.
> >> >>>>> >> >> >> > > > It's
> >> >>>>> >> >> not
> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
> make it
> >> >>>>> >> >> >> > > > easier
> >> >>>>> >> for
> >> >>>>> >> >> it
> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > What do the others think? This will probably occur
> in
> >> all
> >> >>>>> >> >> >> > > > the
> >> >>>>> >> >> >> examples.
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > Cheers,
> >> >>>>> >> >> >> > > > Aljoscha
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >> >>>>> >> >> >> > > >> Hey,
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> I have ported the Connected Components example,
> but
> >> I am
> >> >>>>> >> >> >> > > >> not
> >> >>>>> >> sure
> >> >>>>> >> >> >> how
> >> >>>>> >> >> >> > to
> >> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
> >> and
> >> >>>>> >> >> >> > > >> edges
> >> >>>>> >> data
> >> >>>>> >> >> >> are
> >> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
> as
> >> >>>>> >> parameter.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> One way is to provide public static fields (like
> in
> >> the
> >> >>>>> >> >> >> WordCountData
> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
> Scala
> >> >>>>> >> >> >> > > >> tuple and
> >> >>>>> >> >> from
> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is
> an
> >> >>>>> >> unnecessary
> >> >>>>> >> >> >> > > complexity
> >> >>>>> >> >> >> > > >> for an example (?).
> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
> data
> >> in
> >> >>>>> >> >> >> > > >> the
> >> >>>>> >> Scala
> >> >>>>> >> >> >> > > example.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Am I missing something here?
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Thanks!
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Cheers,
> >> >>>>> >> >> >> > > >> V.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >> >>>>> >> [hidden email]
> >> >>>>> >> >> >
> >> >>>>> >> >> >> > > wrote:
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >>
> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
> >> pretty
> >> >>>>> >> >> >> > > >>> much a
> >> >>>>> >> >> copy
> >> >>>>> >> >> >> of
> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax
> and
> >> >>>>> >> >> >> > > >>> lambda
> >> >>>>> >> >> >> > functions.
> >> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
> >> as a
> >> >>>>> >> >> dependency
> >> >>>>> >> >> >> for
> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
> example
> >> >>>>> >> >> >> > > >>> input
> >> >>>>> >> data.
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
> request
> >> >>>>> >> >> >> > > >>> against
> >> >>>>> >> my
> >> >>>>> >> >> repo
> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> Happy coding. :D
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
> >> >>>>> >> >> >> [hidden email]
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > >>> wrote:
> >> >>>>> >> >> >> > > >>> > +1
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
> Balassi <
> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>>>> >> >> >> > > >>> > wrote:
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> >> +1
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
> Tzoumas <
> >> >>>>> >> >> >> > > [hidden email]>
> >> >>>>> >> >> >> > > >>> >> wrote:
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> > +1
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
> >> through
> >> >>>>> >> >> >> > > >>> >> > a
> >> >>>>> >> >> tutorial
> >> >>>>> >> >> >> so
> >> >>>>> >> >> >> > > this
> >> >>>>> >> >> >> > > >>> >> will
> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the
> new
> >> API
> >> >>>>> >> >> >> > > >>> >> > :-)
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
> >> Kalavri <
> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >> >>>>> >> >> >> > > >>> >> > wrote:
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
> >> >>>>> >> >> >> > > >>> >> > > examples!
> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > -V.
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
> >> >>>>> >> >> >> [hidden email]>
> >> >>>>> >> >> >> > > >>> wrote:
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
> PageRank.
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to
> the
> >> Java
> >> >>>>> >> >> examples:
> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
> parameters
> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
> >> Krettek <
> >> >>>>> >> >> >> > > [hidden email]
> >> >>>>> >> >> >> > > >>> >:
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> >> >>>>> >> >> >> examples
> >> >>>>> >> >> >> > > here.
> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
> >> Hueske
> >> >>>>> >> >> >> > > >>> >> > > > > <
> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented
> by
> >> >>>>> >> >> >> > > >>> >> > > > > > different
> >> >>>>> >> >> >> people
> >> >>>>> >> >> >> > > >>> proved to
> >> >>>>> >> >> >> > > >>> >> > be
> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
> >> simple
> >> >>>>> >> >> >> > > >>> >> > > > > > first
> >> >>>>> >> >> one
> >> >>>>> >> >> >> > such
> >> >>>>> >> >> >> > > as
> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
> >> Krettek
> >> >>>>> >> >> >> > > >>> >> > > > > > <
> >> >>>>> >> >> >> > > >>> [hidden email]
> >> >>>>> >> >> >> > > >>> >> >:
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
> Scala
> >> API
> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >>
> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
> >> write
> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> >> >>>>> >> tests
> >> >>>>> >> >> and
> >> >>>>> >> >> >> > > port
> >> >>>>> >> >> >> > > >>> the
> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
> sense
> >> to
> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> >> >>>>> >> other
> >> >>>>> >> >> >> > people
> >> >>>>> >> >> >> > > >>> port
> >> >>>>> >> >> >> > > >>> >> the
> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
> >> it and
> >> >>>>> >> maybe
> >> >>>>> >> >> >> > notices
> >> >>>>> >> >> >> > > some
> >> >>>>> >> >> >> > > >>> >> > quirks
> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >>
> >> >>>>> >> >>
> >> >>>>> >>
> >> >>>>
> >> >>>>
> >> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Kostas Tzoumas-2
In reply to this post by Aljoscha Krettek-2
I will port PiEstimation now that generateSequence is in, as well as TPC-H
Q3

Kostas

On Thu, Sep 11, 2014 at 5:40 PM, Aljoscha Krettek <[hidden email]>
wrote:

> I added the PageRank example, thanks again fabian. :D
>
> Regarding the other stuff:
>  - There is a comment in DataSet.scala about including
> org.apache.flink.api.scala._ because of the TypeInformation.
>  - I added generateSequence to ExecutionEnvironment.
>  - It is possible to use Scala Primitives in Array, I noticed it while
> writing the tests, you probably had an older version of the code.
>  - Yes, using List and other Interfaces is not possible, this is also
> a restriction in the Java API.
>
> What do you think about the interface of join and coGroup? Right now,
> you can either use a lambda that returns an Option or the lambda with
> the Collector. Originally I wanted to have also have a lambda that
> returns a Collection, but due to type erasure this has the same type
> as the lambda with the Option so I couldn't use it. There is an
> implicit conversion from Option to a Collection, so I could change it
> without breaking the examples we have now. What do you think?
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
> PageRank
>
> These are the examples people called dibs on:
>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
> Example from Java)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - The relational Stuff
>
> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]> wrote:
> > +1 for removing RelationQuery
> >
> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> By the way, what was called BatchGradientDescent in the Scala examples
> >> should be replaced by a port of the LinearRegression Example from
> >> Java. I had them as two separate examples earlier.
> >>
> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
> >> RelationalQuery?
> >>
> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]
> >
> >> wrote:
> >> > I added the Triangle Enumeration Examples, thanks Fabian.
> >> >
> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
> >> >
> >> > These are the examples people called dibs on:
> >> >  - PageRank (Fabian)
> >> >  - BatchGradientDescent (Márton)
> >> >  - ComputeEdgeDegrees (Hermann)
> >> >
> >> > Those are unclaimed (if I'm not mistaken):
> >> >  - The relational Stuff
> >> >  - LinearRegression
> >> >
> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> [hidden email]>
> >> wrote:
> >> >> Thanks, I added it. I'll keep a running list of ported/unported
> >> >> examples in my mails. I'll rename the java example package to
> examples
> >> >> once the Scala API merge is done.
> >> >>
> >> >> I think the termination criterion is fine as it is. Just because
> Scala
> >> >> enables functional programming doesn't mean it's always the best
> >> >> choice. :D
> >> >>
> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >> WebLogAnalysis, TransitiveClosureNaive
> >> >>
> >> >> These are the examples people called dibs on:
> >> >>  - TriangleEnumration and PageRank (Fabian)
> >> >>  - BatchGradientDescent (Márton)
> >> >>  - ComputeEdgeDegrees (Hermann)
> >> >>
> >> >> Those are unclaimed (if I'm not mistaken):
> >> >>  - The relational Stuff
> >> >>  - LinearRegression
> >> >>
> >> >> Cheers,
> >> >> Aljoscha
> >> >>
> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]
> >
> >> wrote:
> >> >>> Transitive closure here, I also added a termination criterion in the
> >> Java
> >> >>> version:
> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >> >>>
> >> >>> Perhaps you can make the termination criterion in Scala more
> >> functional?
> >> >>>
> >> >>> I noticed that the examples package name is example.java but
> >> examples.scala
> >> >>>
> >> >>> Kostas
> >> >>>
> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]
> >
> >> wrote:
> >> >>>>
> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
> list).
> >> >>>>
> >> >>>> If nobody volunteers for the relational stuff I can take those as
> >> well.
> >> >>>>
> >> >>>> How about removing the "RelationalQuery" from both Scala and Java?
> It
> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
> >> value on
> >> >>>> top of TPC-H Q3?
> >> >>>>
> >> >>>> Kostas
> >> >>>>
> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> [hidden email]
> >> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Thanks, I added it, along with an ITCase.
> >> >>>>>
> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >>>>> WebLogAnalysis
> >> >>>>>
> >> >>>>> These are the examples people called dibs on:
> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >> >>>>>  - BatchGradientDescent (Márton)
> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> >> >>>>>
> >> >>>>> Those are unclaimed (if I'm not mistaken):
> >> >>>>>  - TransitiveClosure
> >> >>>>>  - The relational Stuff
> >> >>>>>  - LinearRegression
> >> >>>>>
> >> >>>>> Cheers,
> >> >>>>> Aljoscha
> >> >>>>>
> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> [hidden email]>
> >> >>>>> wrote:
> >> >>>>> > WebLog here:
> >> >>>>> >
> >> >>>>> >
> >>
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >> >>>>> >
> >> >>>>> > Do you need any more done?
> >> >>>>> >
> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> >> [hidden email]>
> >> >>>>> > wrote:
> >> >>>>> >
> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >> >>>>> >>
> >> >>>>> >> Keep 'em coming, people. :D
> >> >>>>> >>
> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> [hidden email]
> >> >
> >> >>>>> >> wrote:
> >> >>>>> >> > Alright, will do.
> >> >>>>> >> > Thanks!
> >> >>>>> >> >
> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> >> [hidden email]>:
> >> >>>>> >> >
> >> >>>>> >> >> Ok people, executive decision. :D
> >> >>>>> >> >>
> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
> >> the
> >> >>>>> >> >> data
> >> >>>>> >> >> in multi-dimensional object arrays and then converting it to
> >> the
> >> >>>>> >> >> required Java or Scala objects.
> >> >>>>> >> >>
> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
> >> with the
> >> >>>>> >> >> Java
> >> >>>>> >> >> API.
> >> >>>>> >> >>
> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
> keyword,
> >> you
> >> >>>>> >> >> can
> >> >>>>> >> >> just write:
> >> >>>>> >> >>
> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
> >> MyResult(le,
> >> >>>>> >> >> re)
> >> >>>>> >> }
> >> >>>>> >> >>
> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> >> [hidden email]>
> >> >>>>> >> wrote:
> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
> inconsistency
> >> with
> >> >>>>> >> >> > the
> >> >>>>> >> Java
> >> >>>>> >> >> > API. In Java join is done as:
> >> >>>>> >> >> >
> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >> >>>>> >> >> >
> >> >>>>> >> >> > where in the current Scala this is:
> >> >>>>> >> >> >
> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >> >>>>> >> >> >
> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
> >> because
> >> >>>>> >> "with"
> >> >>>>> >> >> is
> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
> >> Scala
> >> >>>>> >> >> > or go
> >> >>>>> >> >> with
> >> >>>>> >> >> > map() on Tuple2(left, right)?
> >> >>>>> >> >> >
> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]
> >:
> >> >>>>> >> >> >
> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That
> is a
> >> >>>>> >> >> >> generic
> >> >>>>> >> >> >> representation of a Tuple.
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
> Tuples,
> >> >>>>> >> >> >> with a
> >> >>>>> >> >> generic
> >> >>>>> >> >> >> utility method to convert between the two.
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >> >>>>> >> >> >> <[hidden email]>
> >> >>>>> >> >> wrote:
> >> >>>>> >> >> >>
> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
> >> >>>>> >> >> >> > CSVFormat
> >> >>>>> >> won't
> >> >>>>> >> >> >> work
> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >> >>>>> >> >> >> > <[hidden email]>:
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >> > > Hi,
> >> >>>>> >> >> >> > > on second thought. Maybe we should just change all
> the
> >> >>>>> >> >> >> > > example
> >> >>>>> >> input
> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
> >> >>>>> >> >> >> > > examples.
> >> >>>>> >> What
> >> >>>>> >> >> do
> >> >>>>> >> >> >> > > you think?
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > Cheers,
> >> >>>>> >> >> >> > > Aljoscha
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
> >> >>>>> >> >> [hidden email]>
> >> >>>>> >> >> >> > > wrote:
> >> >>>>> >> >> >> > > > Hi,
> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> >> incompatible.
> >> >>>>> >> >> >> > > > I'm
> >> >>>>> >> >> afraid
> >> >>>>> >> >> >> > > > you have to to what you proposed: move the data to
> a
> >> >>>>> >> >> >> > > > static
> >> >>>>> >> field
> >> >>>>> >> >> and
> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
> >> Scala.
> >> >>>>> >> >> >> > > > It's
> >> >>>>> >> >> not
> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
> make it
> >> >>>>> >> >> >> > > > easier
> >> >>>>> >> for
> >> >>>>> >> >> it
> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > What do the others think? This will probably occur
> in
> >> all
> >> >>>>> >> >> >> > > > the
> >> >>>>> >> >> >> examples.
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > Cheers,
> >> >>>>> >> >> >> > > > Aljoscha
> >> >>>>> >> >> >> > > >
> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >> >>>>> >> >> >> > > >> Hey,
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> I have ported the Connected Components example,
> but
> >> I am
> >> >>>>> >> >> >> > > >> not
> >> >>>>> >> sure
> >> >>>>> >> >> >> how
> >> >>>>> >> >> >> > to
> >> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
> >> and
> >> >>>>> >> >> >> > > >> edges
> >> >>>>> >> data
> >> >>>>> >> >> >> are
> >> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
> as
> >> >>>>> >> parameter.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> One way is to provide public static fields (like
> in
> >> the
> >> >>>>> >> >> >> WordCountData
> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
> Scala
> >> >>>>> >> >> >> > > >> tuple and
> >> >>>>> >> >> from
> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is
> an
> >> >>>>> >> unnecessary
> >> >>>>> >> >> >> > > complexity
> >> >>>>> >> >> >> > > >> for an example (?).
> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
> data
> >> in
> >> >>>>> >> >> >> > > >> the
> >> >>>>> >> Scala
> >> >>>>> >> >> >> > > example.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Am I missing something here?
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Thanks!
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> Cheers,
> >> >>>>> >> >> >> > > >> V.
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >> >>>>> >> [hidden email]
> >> >>>>> >> >> >
> >> >>>>> >> >> >> > > wrote:
> >> >>>>> >> >> >> > > >>
> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >>
> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
> >> pretty
> >> >>>>> >> >> >> > > >>> much a
> >> >>>>> >> >> copy
> >> >>>>> >> >> >> of
> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax
> and
> >> >>>>> >> >> >> > > >>> lambda
> >> >>>>> >> >> >> > functions.
> >> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
> >> as a
> >> >>>>> >> >> dependency
> >> >>>>> >> >> >> for
> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
> example
> >> >>>>> >> >> >> > > >>> input
> >> >>>>> >> data.
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
> request
> >> >>>>> >> >> >> > > >>> against
> >> >>>>> >> my
> >> >>>>> >> >> repo
> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> Happy coding. :D
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
> >> >>>>> >> >> >> [hidden email]
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> > > >>> wrote:
> >> >>>>> >> >> >> > > >>> > +1
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
> Balassi <
> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>>>> >> >> >> > > >>> > wrote:
> >> >>>>> >> >> >> > > >>> >
> >> >>>>> >> >> >> > > >>> >> +1
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
> Tzoumas <
> >> >>>>> >> >> >> > > [hidden email]>
> >> >>>>> >> >> >> > > >>> >> wrote:
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>> >> > +1
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
> >> through
> >> >>>>> >> >> >> > > >>> >> > a
> >> >>>>> >> >> tutorial
> >> >>>>> >> >> >> so
> >> >>>>> >> >> >> > > this
> >> >>>>> >> >> >> > > >>> >> will
> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the
> new
> >> API
> >> >>>>> >> >> >> > > >>> >> > :-)
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
> >> Kalavri <
> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >> >>>>> >> >> >> > > >>> >> > wrote:
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
> >> >>>>> >> >> >> > > >>> >> > > examples!
> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > -V.
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
> >> >>>>> >> >> >> [hidden email]>
> >> >>>>> >> >> >> > > >>> wrote:
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
> PageRank.
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to
> the
> >> Java
> >> >>>>> >> >> examples:
> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
> parameters
> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
> >> Krettek <
> >> >>>>> >> >> >> > > [hidden email]
> >> >>>>> >> >> >> > > >>> >:
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> >> >>>>> >> >> >> examples
> >> >>>>> >> >> >> > > here.
> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
> >> Hueske
> >> >>>>> >> >> >> > > >>> >> > > > > <
> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented
> by
> >> >>>>> >> >> >> > > >>> >> > > > > > different
> >> >>>>> >> >> >> people
> >> >>>>> >> >> >> > > >>> proved to
> >> >>>>> >> >> >> > > >>> >> > be
> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
> >> simple
> >> >>>>> >> >> >> > > >>> >> > > > > > first
> >> >>>>> >> >> one
> >> >>>>> >> >> >> > such
> >> >>>>> >> >> >> > > as
> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
> >> Krettek
> >> >>>>> >> >> >> > > >>> >> > > > > > <
> >> >>>>> >> >> >> > > >>> [hidden email]
> >> >>>>> >> >> >> > > >>> >> >:
> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
> Scala
> >> API
> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >>
> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
> >> write
> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> >> >>>>> >> tests
> >> >>>>> >> >> and
> >> >>>>> >> >> >> > > port
> >> >>>>> >> >> >> > > >>> the
> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
> sense
> >> to
> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> >> >>>>> >> other
> >> >>>>> >> >> >> > people
> >> >>>>> >> >> >> > > >>> port
> >> >>>>> >> >> >> > > >>> >> the
> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
> >> it and
> >> >>>>> >> maybe
> >> >>>>> >> >> >> > notices
> >> >>>>> >> >> >> > > some
> >> >>>>> >> >> >> > > >>> >> > quirks
> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>>>> >> >> >> > > >>> >> > >
> >> >>>>> >> >> >> > > >>> >> >
> >> >>>>> >> >> >> > > >>> >>
> >> >>>>> >> >> >> > > >>>
> >> >>>>> >> >> >> > >
> >> >>>>> >> >> >> >
> >> >>>>> >> >> >>
> >> >>>>> >> >>
> >> >>>>> >>
> >> >>>>
> >> >>>>
> >> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
In reply to this post by Fabian Hueske
Yes, I like that. For the ITCases I always just copied the Java ITCase.

The only examples that are missing now are LinearRegression and the
relational stuff.

On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]> wrote:

> I just removed the old CountEdgeDegrees example.
> That was a preprocessing step for the TriangleEnumeration, and is now part
> of the new TriangleEnumerationOpt example.
> So I guess, we don't need to port that one. As I said before, I'd prefer to
> keep Java and Scala examples in sync.
>
> Cheers, Fabian
>
> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>
>> I added the PageRank example, thanks again fabian. :D
>>
>> Regarding the other stuff:
>>  - There is a comment in DataSet.scala about including
>> org.apache.flink.api.scala._ because of the TypeInformation.
>>  - I added generateSequence to ExecutionEnvironment.
>>  - It is possible to use Scala Primitives in Array, I noticed it while
>> writing the tests, you probably had an older version of the code.
>>  - Yes, using List and other Interfaces is not possible, this is also
>> a restriction in the Java API.
>>
>> What do you think about the interface of join and coGroup? Right now,
>> you can either use a lambda that returns an Option or the lambda with
>> the Collector. Originally I wanted to have also have a lambda that
>> returns a Collection, but due to type erasure this has the same type
>> as the lambda with the Option so I couldn't use it. There is an
>> implicit conversion from Option to a Collection, so I could change it
>> without breaking the examples we have now. What do you think?
>>
>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
>> PageRank
>>
>> These are the examples people called dibs on:
>>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
>> Example from Java)
>>  - ComputeEdgeDegrees (Hermann)
>>
>> Those are unclaimed (if I'm not mistaken):
>>  - The relational Stuff
>>
>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]> wrote:
>> > +1 for removing RelationQuery
>> >
>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
>> > wrote:
>> >
>> >> By the way, what was called BatchGradientDescent in the Scala examples
>> >> should be replaced by a port of the LinearRegression Example from
>> >> Java. I had them as two separate examples earlier.
>> >>
>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
>> >> RelationalQuery?
>> >>
>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]
>> >
>> >> wrote:
>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>> >> >
>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
>> >> >
>> >> > These are the examples people called dibs on:
>> >> >  - PageRank (Fabian)
>> >> >  - BatchGradientDescent (Márton)
>> >> >  - ComputeEdgeDegrees (Hermann)
>> >> >
>> >> > Those are unclaimed (if I'm not mistaken):
>> >> >  - The relational Stuff
>> >> >  - LinearRegression
>> >> >
>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>> [hidden email]>
>> >> wrote:
>> >> >> Thanks, I added it. I'll keep a running list of ported/unported
>> >> >> examples in my mails. I'll rename the java example package to
>> examples
>> >> >> once the Scala API merge is done.
>> >> >>
>> >> >> I think the termination criterion is fine as it is. Just because
>> Scala
>> >> >> enables functional programming doesn't mean it's always the best
>> >> >> choice. :D
>> >> >>
>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> >> WebLogAnalysis, TransitiveClosureNaive
>> >> >>
>> >> >> These are the examples people called dibs on:
>> >> >>  - TriangleEnumration and PageRank (Fabian)
>> >> >>  - BatchGradientDescent (Márton)
>> >> >>  - ComputeEdgeDegrees (Hermann)
>> >> >>
>> >> >> Those are unclaimed (if I'm not mistaken):
>> >> >>  - The relational Stuff
>> >> >>  - LinearRegression
>> >> >>
>> >> >> Cheers,
>> >> >> Aljoscha
>> >> >>
>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]
>> >
>> >> wrote:
>> >> >>> Transitive closure here, I also added a termination criterion in the
>> >> Java
>> >> >>> version:
>> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>> >> >>>
>> >> >>> Perhaps you can make the termination criterion in Scala more
>> >> functional?
>> >> >>>
>> >> >>> I noticed that the examples package name is example.java but
>> >> examples.scala
>> >> >>>
>> >> >>> Kostas
>> >> >>>
>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]
>> >
>> >> wrote:
>> >> >>>>
>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
>> list).
>> >> >>>>
>> >> >>>> If nobody volunteers for the relational stuff I can take those as
>> >> well.
>> >> >>>>
>> >> >>>> How about removing the "RelationalQuery" from both Scala and Java?
>> It
>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
>> >> value on
>> >> >>>> top of TPC-H Q3?
>> >> >>>>
>> >> >>>> Kostas
>> >> >>>>
>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>> [hidden email]
>> >> >
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Thanks, I added it, along with an ITCase.
>> >> >>>>>
>> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> >>>>> WebLogAnalysis
>> >> >>>>>
>> >> >>>>> These are the examples people called dibs on:
>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>> >> >>>>>  - BatchGradientDescent (Márton)
>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>> >> >>>>>
>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>> >> >>>>>  - TransitiveClosure
>> >> >>>>>  - The relational Stuff
>> >> >>>>>  - LinearRegression
>> >> >>>>>
>> >> >>>>> Cheers,
>> >> >>>>> Aljoscha
>> >> >>>>>
>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>> [hidden email]>
>> >> >>>>> wrote:
>> >> >>>>> > WebLog here:
>> >> >>>>> >
>> >> >>>>> >
>> >>
>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>> >> >>>>> >
>> >> >>>>> > Do you need any more done?
>> >> >>>>> >
>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>> >> [hidden email]>
>> >> >>>>> > wrote:
>> >> >>>>> >
>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>> >> >>>>> >>
>> >> >>>>> >> Keep 'em coming, people. :D
>> >> >>>>> >>
>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>> [hidden email]
>> >> >
>> >> >>>>> >> wrote:
>> >> >>>>> >> > Alright, will do.
>> >> >>>>> >> > Thanks!
>> >> >>>>> >> >
>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>> >> [hidden email]>:
>> >> >>>>> >> >
>> >> >>>>> >> >> Ok people, executive decision. :D
>> >> >>>>> >> >>
>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
>> >> the
>> >> >>>>> >> >> data
>> >> >>>>> >> >> in multi-dimensional object arrays and then converting it to
>> >> the
>> >> >>>>> >> >> required Java or Scala objects.
>> >> >>>>> >> >>
>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
>> >> with the
>> >> >>>>> >> >> Java
>> >> >>>>> >> >> API.
>> >> >>>>> >> >>
>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>> keyword,
>> >> you
>> >> >>>>> >> >> can
>> >> >>>>> >> >> just write:
>> >> >>>>> >> >>
>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
>> >> MyResult(le,
>> >> >>>>> >> >> re)
>> >> >>>>> >> }
>> >> >>>>> >> >>
>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>> >> [hidden email]>
>> >> >>>>> >> wrote:
>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>> inconsistency
>> >> with
>> >> >>>>> >> >> > the
>> >> >>>>> >> Java
>> >> >>>>> >> >> > API. In Java join is done as:
>> >> >>>>> >> >> >
>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>> >> >>>>> >> >> >
>> >> >>>>> >> >> > where in the current Scala this is:
>> >> >>>>> >> >> >
>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >> >>>>> >> >> >
>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
>> >> because
>> >> >>>>> >> "with"
>> >> >>>>> >> >> is
>> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
>> >> Scala
>> >> >>>>> >> >> > or go
>> >> >>>>> >> >> with
>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>> >> >>>>> >> >> >
>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]
>> >:
>> >> >>>>> >> >> >
>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That
>> is a
>> >> >>>>> >> >> >> generic
>> >> >>>>> >> >> >> representation of a Tuple.
>> >> >>>>> >> >> >>
>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
>> Tuples,
>> >> >>>>> >> >> >> with a
>> >> >>>>> >> >> generic
>> >> >>>>> >> >> >> utility method to convert between the two.
>> >> >>>>> >> >> >>
>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>> >> >>>>> >> >> >> <[hidden email]>
>> >> >>>>> >> >> wrote:
>> >> >>>>> >> >> >>
>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>> >> >>>>> >> >> >> >
>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>> >> >>>>> >> >> >> > CSVFormat
>> >> >>>>> >> won't
>> >> >>>>> >> >> >> work
>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
>> >> >>>>> >> >> >> >
>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>> >> >>>>> >> >> >> > <[hidden email]>:
>> >> >>>>> >> >> >> >
>> >> >>>>> >> >> >> > > Hi,
>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change all
>> the
>> >> >>>>> >> >> >> > > example
>> >> >>>>> >> input
>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
>> >> >>>>> >> >> >> > > examples.
>> >> >>>>> >> What
>> >> >>>>> >> >> do
>> >> >>>>> >> >> >> > > you think?
>> >> >>>>> >> >> >> > >
>> >> >>>>> >> >> >> > > Cheers,
>> >> >>>>> >> >> >> > > Aljoscha
>> >> >>>>> >> >> >> > >
>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>> >> >>>>> >> >> [hidden email]>
>> >> >>>>> >> >> >> > > wrote:
>> >> >>>>> >> >> >> > > > Hi,
>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>> >> incompatible.
>> >> >>>>> >> >> >> > > > I'm
>> >> >>>>> >> >> afraid
>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the data to
>> a
>> >> >>>>> >> >> >> > > > static
>> >> >>>>> >> field
>> >> >>>>> >> >> and
>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
>> >> Scala.
>> >> >>>>> >> >> >> > > > It's
>> >> >>>>> >> >> not
>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
>> make it
>> >> >>>>> >> >> >> > > > easier
>> >> >>>>> >> for
>> >> >>>>> >> >> it
>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>> >> >>>>> >> >> >> > > >
>> >> >>>>> >> >> >> > > > What do the others think? This will probably occur
>> in
>> >> all
>> >> >>>>> >> >> >> > > > the
>> >> >>>>> >> >> >> examples.
>> >> >>>>> >> >> >> > > >
>> >> >>>>> >> >> >> > > > Cheers,
>> >> >>>>> >> >> >> > > > Aljoscha
>> >> >>>>> >> >> >> > > >
>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>> >> >>>>> >> >> >> > > >> Hey,
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> I have ported the Connected Components example,
>> but
>> >> I am
>> >> >>>>> >> >> >> > > >> not
>> >> >>>>> >> sure
>> >> >>>>> >> >> >> how
>> >> >>>>> >> >> >> > to
>> >> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
>> >> and
>> >> >>>>> >> >> >> > > >> edges
>> >> >>>>> >> data
>> >> >>>>> >> >> >> are
>> >> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
>> as
>> >> >>>>> >> parameter.
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> One way is to provide public static fields (like
>> in
>> >> the
>> >> >>>>> >> >> >> WordCountData
>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
>> Scala
>> >> >>>>> >> >> >> > > >> tuple and
>> >> >>>>> >> >> from
>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is
>> an
>> >> >>>>> >> unnecessary
>> >> >>>>> >> >> >> > > complexity
>> >> >>>>> >> >> >> > > >> for an example (?).
>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
>> data
>> >> in
>> >> >>>>> >> >> >> > > >> the
>> >> >>>>> >> Scala
>> >> >>>>> >> >> >> > > example.
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> Am I missing something here?
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> Thanks!
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> Cheers,
>> >> >>>>> >> >> >> > > >> V.
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>> >> >>>>> >> [hidden email]
>> >> >>>>> >> >> >
>> >> >>>>> >> >> >> > > wrote:
>> >> >>>>> >> >> >> > > >>
>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >>
>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
>> >> pretty
>> >> >>>>> >> >> >> > > >>> much a
>> >> >>>>> >> >> copy
>> >> >>>>> >> >> >> of
>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax
>> and
>> >> >>>>> >> >> >> > > >>> lambda
>> >> >>>>> >> >> >> > functions.
>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
>> >> as a
>> >> >>>>> >> >> dependency
>> >> >>>>> >> >> >> for
>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
>> example
>> >> >>>>> >> >> >> > > >>> input
>> >> >>>>> >> data.
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
>> request
>> >> >>>>> >> >> >> > > >>> against
>> >> >>>>> >> my
>> >> >>>>> >> >> repo
>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>> >> >>>>> >> >> >> [hidden email]
>> >> >>>>> >> >> >> > >
>> >> >>>>> >> >> >> > > >>> wrote:
>> >> >>>>> >> >> >> > > >>> > +1
>> >> >>>>> >> >> >> > > >>> >
>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >> >>>>> >> >> >> > > >>> >
>> >> >>>>> >> >> >> > > >>> >
>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>> Balassi <
>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >> >>>>> >> >> >> > > >>> > wrote:
>> >> >>>>> >> >> >> > > >>> >
>> >> >>>>> >> >> >> > > >>> >> +1
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>> Tzoumas <
>> >> >>>>> >> >> >> > > [hidden email]>
>> >> >>>>> >> >> >> > > >>> >> wrote:
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >> > > >>> >> > +1
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
>> >> through
>> >> >>>>> >> >> >> > > >>> >> > a
>> >> >>>>> >> >> tutorial
>> >> >>>>> >> >> >> so
>> >> >>>>> >> >> >> > > this
>> >> >>>>> >> >> >> > > >>> >> will
>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the
>> new
>> >> API
>> >> >>>>> >> >> >> > > >>> >> > :-)
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
>> >> Kalavri <
>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>> >> >>>>> >> >> >> > > >>> >> > wrote:
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>> >> >>>>> >> >> >> > > >>> >> > > examples!
>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>>>> >> >> >> > > >>> >> > > -V.
>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>> >> >>>>> >> >> >> [hidden email]>
>> >> >>>>> >> >> >> > > >>> wrote:
>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>> PageRank.
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to
>> the
>> >> Java
>> >> >>>>> >> >> examples:
>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>> parameters
>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
>> >> Krettek <
>> >> >>>>> >> >> >> > > [hidden email]
>> >> >>>>> >> >> >> > > >>> >:
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>> >> >>>>> >> >> >> examples
>> >> >>>>> >> >> >> > > here.
>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
>> >> Hueske
>> >> >>>>> >> >> >> > > >>> >> > > > > <
>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented
>> by
>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>> >> >>>>> >> >> >> people
>> >> >>>>> >> >> >> > > >>> proved to
>> >> >>>>> >> >> >> > > >>> >> > be
>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
>> >> simple
>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>> >> >>>>> >> >> one
>> >> >>>>> >> >> >> > such
>> >> >>>>> >> >> >> > > as
>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
>> >> Krettek
>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>> >> >>>>> >> >> >> > > >>> [hidden email]
>> >> >>>>> >> >> >> > > >>> >> >:
>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
>> Scala
>> >> API
>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >>
>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
>> >> write
>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>> >> >>>>> >> tests
>> >> >>>>> >> >> and
>> >> >>>>> >> >> >> > > port
>> >> >>>>> >> >> >> > > >>> the
>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
>> sense
>> >> to
>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>> >> >>>>> >> other
>> >> >>>>> >> >> >> > people
>> >> >>>>> >> >> >> > > >>> port
>> >> >>>>> >> >> >> > > >>> >> the
>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
>> >> it and
>> >> >>>>> >> maybe
>> >> >>>>> >> >> >> > notices
>> >> >>>>> >> >> >> > > some
>> >> >>>>> >> >> >> > > >>> >> > quirks
>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>>>> >> >> >> > > >>> >>
>> >> >>>>> >> >> >> > > >>>
>> >> >>>>> >> >> >> > >
>> >> >>>>> >> >> >> >
>> >> >>>>> >> >> >>
>> >> >>>>> >> >>
>> >> >>>>> >>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
What about the LineRank example? We had that in Scala but never had a
Java Example.

On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]> wrote:

> Yes, I like that. For the ITCases I always just copied the Java ITCase.
>
> The only examples that are missing now are LinearRegression and the
> relational stuff.
>
> On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]> wrote:
>> I just removed the old CountEdgeDegrees example.
>> That was a preprocessing step for the TriangleEnumeration, and is now part
>> of the new TriangleEnumerationOpt example.
>> So I guess, we don't need to port that one. As I said before, I'd prefer to
>> keep Java and Scala examples in sync.
>>
>> Cheers, Fabian
>>
>> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>
>>> I added the PageRank example, thanks again fabian. :D
>>>
>>> Regarding the other stuff:
>>>  - There is a comment in DataSet.scala about including
>>> org.apache.flink.api.scala._ because of the TypeInformation.
>>>  - I added generateSequence to ExecutionEnvironment.
>>>  - It is possible to use Scala Primitives in Array, I noticed it while
>>> writing the tests, you probably had an older version of the code.
>>>  - Yes, using List and other Interfaces is not possible, this is also
>>> a restriction in the Java API.
>>>
>>> What do you think about the interface of join and coGroup? Right now,
>>> you can either use a lambda that returns an Option or the lambda with
>>> the Collector. Originally I wanted to have also have a lambda that
>>> returns a Collection, but due to type erasure this has the same type
>>> as the lambda with the Option so I couldn't use it. There is an
>>> implicit conversion from Option to a Collection, so I could change it
>>> without breaking the examples we have now. What do you think?
>>>
>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
>>> PageRank
>>>
>>> These are the examples people called dibs on:
>>>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
>>> Example from Java)
>>>  - ComputeEdgeDegrees (Hermann)
>>>
>>> Those are unclaimed (if I'm not mistaken):
>>>  - The relational Stuff
>>>
>>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]> wrote:
>>> > +1 for removing RelationQuery
>>> >
>>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[hidden email]>
>>> > wrote:
>>> >
>>> >> By the way, what was called BatchGradientDescent in the Scala examples
>>> >> should be replaced by a port of the LinearRegression Example from
>>> >> Java. I had them as two separate examples earlier.
>>> >>
>>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
>>> >> RelationalQuery?
>>> >>
>>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]
>>> >
>>> >> wrote:
>>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>>> >> >
>>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
>>> >> >
>>> >> > These are the examples people called dibs on:
>>> >> >  - PageRank (Fabian)
>>> >> >  - BatchGradientDescent (Márton)
>>> >> >  - ComputeEdgeDegrees (Hermann)
>>> >> >
>>> >> > Those are unclaimed (if I'm not mistaken):
>>> >> >  - The relational Stuff
>>> >> >  - LinearRegression
>>> >> >
>>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>>> [hidden email]>
>>> >> wrote:
>>> >> >> Thanks, I added it. I'll keep a running list of ported/unported
>>> >> >> examples in my mails. I'll rename the java example package to
>>> examples
>>> >> >> once the Scala API merge is done.
>>> >> >>
>>> >> >> I think the termination criterion is fine as it is. Just because
>>> Scala
>>> >> >> enables functional programming doesn't mean it's always the best
>>> >> >> choice. :D
>>> >> >>
>>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> >> WebLogAnalysis, TransitiveClosureNaive
>>> >> >>
>>> >> >> These are the examples people called dibs on:
>>> >> >>  - TriangleEnumration and PageRank (Fabian)
>>> >> >>  - BatchGradientDescent (Márton)
>>> >> >>  - ComputeEdgeDegrees (Hermann)
>>> >> >>
>>> >> >> Those are unclaimed (if I'm not mistaken):
>>> >> >>  - The relational Stuff
>>> >> >>  - LinearRegression
>>> >> >>
>>> >> >> Cheers,
>>> >> >> Aljoscha
>>> >> >>
>>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[hidden email]
>>> >
>>> >> wrote:
>>> >> >>> Transitive closure here, I also added a termination criterion in the
>>> >> Java
>>> >> >>> version:
>>> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>> >> >>>
>>> >> >>> Perhaps you can make the termination criterion in Scala more
>>> >> functional?
>>> >> >>>
>>> >> >>> I noticed that the examples package name is example.java but
>>> >> examples.scala
>>> >> >>>
>>> >> >>> Kostas
>>> >> >>>
>>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[hidden email]
>>> >
>>> >> wrote:
>>> >> >>>>
>>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
>>> list).
>>> >> >>>>
>>> >> >>>> If nobody volunteers for the relational stuff I can take those as
>>> >> well.
>>> >> >>>>
>>> >> >>>> How about removing the "RelationalQuery" from both Scala and Java?
>>> It
>>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching
>>> >> value on
>>> >> >>>> top of TPC-H Q3?
>>> >> >>>>
>>> >> >>>> Kostas
>>> >> >>>>
>>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>>> [hidden email]
>>> >> >
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>> Thanks, I added it, along with an ITCase.
>>> >> >>>>>
>>> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> >>>>> WebLogAnalysis
>>> >> >>>>>
>>> >> >>>>> These are the examples people called dibs on:
>>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>>> >> >>>>>  - BatchGradientDescent (Márton)
>>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>>> >> >>>>>
>>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>>> >> >>>>>  - TransitiveClosure
>>> >> >>>>>  - The relational Stuff
>>> >> >>>>>  - LinearRegression
>>> >> >>>>>
>>> >> >>>>> Cheers,
>>> >> >>>>> Aljoscha
>>> >> >>>>>
>>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>>> [hidden email]>
>>> >> >>>>> wrote:
>>> >> >>>>> > WebLog here:
>>> >> >>>>> >
>>> >> >>>>> >
>>> >>
>>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>> >> >>>>> >
>>> >> >>>>> > Do you need any more done?
>>> >> >>>>> >
>>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>>> >> [hidden email]>
>>> >> >>>>> > wrote:
>>> >> >>>>> >
>>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>>> >> >>>>> >>
>>> >> >>>>> >> Keep 'em coming, people. :D
>>> >> >>>>> >>
>>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>>> [hidden email]
>>> >> >
>>> >> >>>>> >> wrote:
>>> >> >>>>> >> > Alright, will do.
>>> >> >>>>> >> > Thanks!
>>> >> >>>>> >> >
>>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>>> >> [hidden email]>:
>>> >> >>>>> >> >
>>> >> >>>>> >> >> Ok people, executive decision. :D
>>> >> >>>>> >> >>
>>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
>>> >> the
>>> >> >>>>> >> >> data
>>> >> >>>>> >> >> in multi-dimensional object arrays and then converting it to
>>> >> the
>>> >> >>>>> >> >> required Java or Scala objects.
>>> >> >>>>> >> >>
>>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
>>> >> with the
>>> >> >>>>> >> >> Java
>>> >> >>>>> >> >> API.
>>> >> >>>>> >> >>
>>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>>> keyword,
>>> >> you
>>> >> >>>>> >> >> can
>>> >> >>>>> >> >> just write:
>>> >> >>>>> >> >>
>>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
>>> >> MyResult(le,
>>> >> >>>>> >> >> re)
>>> >> >>>>> >> }
>>> >> >>>>> >> >>
>>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>>> >> [hidden email]>
>>> >> >>>>> >> wrote:
>>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>>> inconsistency
>>> >> with
>>> >> >>>>> >> >> > the
>>> >> >>>>> >> Java
>>> >> >>>>> >> >> > API. In Java join is done as:
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> > where in the current Scala this is:
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
>>> >> because
>>> >> >>>>> >> "with"
>>> >> >>>>> >> >> is
>>> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar for
>>> >> Scala
>>> >> >>>>> >> >> > or go
>>> >> >>>>> >> >> with
>>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]
>>> >:
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That
>>> is a
>>> >> >>>>> >> >> >> generic
>>> >> >>>>> >> >> >> representation of a Tuple.
>>> >> >>>>> >> >> >>
>>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
>>> Tuples,
>>> >> >>>>> >> >> >> with a
>>> >> >>>>> >> >> generic
>>> >> >>>>> >> >> >> utility method to convert between the two.
>>> >> >>>>> >> >> >>
>>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>> >> >>>>> >> >> >> <[hidden email]>
>>> >> >>>>> >> >> wrote:
>>> >> >>>>> >> >> >>
>>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>>> >> >>>>> >> >> >> >
>>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>>> >> >>>>> >> >> >> > CSVFormat
>>> >> >>>>> >> won't
>>> >> >>>>> >> >> >> work
>>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>>> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
>>> >> >>>>> >> >> >> >
>>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>> >> >>>>> >> >> >> > <[hidden email]>:
>>> >> >>>>> >> >> >> >
>>> >> >>>>> >> >> >> > > Hi,
>>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change all
>>> the
>>> >> >>>>> >> >> >> > > example
>>> >> >>>>> >> input
>>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all the
>>> >> >>>>> >> >> >> > > examples.
>>> >> >>>>> >> What
>>> >> >>>>> >> >> do
>>> >> >>>>> >> >> >> > > you think?
>>> >> >>>>> >> >> >> > >
>>> >> >>>>> >> >> >> > > Cheers,
>>> >> >>>>> >> >> >> > > Aljoscha
>>> >> >>>>> >> >> >> > >
>>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>>> >> >>>>> >> >> [hidden email]>
>>> >> >>>>> >> >> >> > > wrote:
>>> >> >>>>> >> >> >> > > > Hi,
>>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>>> >> incompatible.
>>> >> >>>>> >> >> >> > > > I'm
>>> >> >>>>> >> >> afraid
>>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the data to
>>> a
>>> >> >>>>> >> >> >> > > > static
>>> >> >>>>> >> field
>>> >> >>>>> >> >> and
>>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in
>>> >> Scala.
>>> >> >>>>> >> >> >> > > > It's
>>> >> >>>>> >> >> not
>>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
>>> make it
>>> >> >>>>> >> >> >> > > > easier
>>> >> >>>>> >> for
>>> >> >>>>> >> >> it
>>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>>> >> >>>>> >> >> >> > > >
>>> >> >>>>> >> >> >> > > > What do the others think? This will probably occur
>>> in
>>> >> all
>>> >> >>>>> >> >> >> > > > the
>>> >> >>>>> >> >> >> examples.
>>> >> >>>>> >> >> >> > > >
>>> >> >>>>> >> >> >> > > > Cheers,
>>> >> >>>>> >> >> >> > > > Aljoscha
>>> >> >>>>> >> >> >> > > >
>>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>>> >> >>>>> >> >> >> > > >> Hey,
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> I have ported the Connected Components example,
>>> but
>>> >> I am
>>> >> >>>>> >> >> >> > > >> not
>>> >> >>>>> >> sure
>>> >> >>>>> >> >> >> how
>>> >> >>>>> >> >> >> > to
>>> >> >>>>> >> >> >> > > >> reuse the example input data from java-examples.
>>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices
>>> >> and
>>> >> >>>>> >> >> >> > > >> edges
>>> >> >>>>> >> data
>>> >> >>>>> >> >> >> are
>>> >> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>> >> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
>>> as
>>> >> >>>>> >> parameter.
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> One way is to provide public static fields (like
>>> in
>>> >> the
>>> >> >>>>> >> >> >> WordCountData
>>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>>> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
>>> Scala
>>> >> >>>>> >> >> >> > > >> tuple and
>>> >> >>>>> >> >> from
>>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is
>>> an
>>> >> >>>>> >> unnecessary
>>> >> >>>>> >> >> >> > > complexity
>>> >> >>>>> >> >> >> > > >> for an example (?).
>>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
>>> data
>>> >> in
>>> >> >>>>> >> >> >> > > >> the
>>> >> >>>>> >> Scala
>>> >> >>>>> >> >> >> > > example.
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> Am I missing something here?
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> Thanks!
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> Cheers,
>>> >> >>>>> >> >> >> > > >> V.
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>>> >> >>>>> >> [hidden email]
>>> >> >>>>> >> >> >
>>> >> >>>>> >> >> >> > > wrote:
>>> >> >>>>> >> >> >> > > >>
>>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >>
>>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's
>>> >> pretty
>>> >> >>>>> >> >> >> > > >>> much a
>>> >> >>>>> >> >> copy
>>> >> >>>>> >> >> >> of
>>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax
>>> and
>>> >> >>>>> >> >> >> > > >>> lambda
>>> >> >>>>> >> >> >> > functions.
>>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples
>>> >> as a
>>> >> >>>>> >> >> dependency
>>> >> >>>>> >> >> >> for
>>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
>>> example
>>> >> >>>>> >> >> >> > > >>> input
>>> >> >>>>> >> data.
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
>>> request
>>> >> >>>>> >> >> >> > > >>> against
>>> >> >>>>> >> my
>>> >> >>>>> >> >> repo
>>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>>> >> >>>>> >> >> >> [hidden email]
>>> >> >>>>> >> >> >> > >
>>> >> >>>>> >> >> >> > > >>> wrote:
>>> >> >>>>> >> >> >> > > >>> > +1
>>> >> >>>>> >> >> >> > > >>> >
>>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>> >> >>>>> >> >> >> > > >>> >
>>> >> >>>>> >> >> >> > > >>> >
>>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>>> Balassi <
>>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>> >> >>>>> >> >> >> > > >>> > wrote:
>>> >> >>>>> >> >> >> > > >>> >
>>> >> >>>>> >> >> >> > > >>> >> +1
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>>> Tzoumas <
>>> >> >>>>> >> >> >> > > [hidden email]>
>>> >> >>>>> >> >> >> > > >>> >> wrote:
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >> > > >>> >> > +1
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going
>>> >> through
>>> >> >>>>> >> >> >> > > >>> >> > a
>>> >> >>>>> >> >> tutorial
>>> >> >>>>> >> >> >> so
>>> >> >>>>> >> >> >> > > this
>>> >> >>>>> >> >> >> > > >>> >> will
>>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the
>>> new
>>> >> API
>>> >> >>>>> >> >> >> > > >>> >> > :-)
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
>>> >> Kalavri <
>>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>>> >> >>>>> >> >> >> > > >>> >> > wrote:
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>>> >> >>>>> >> >> >> > > >>> >> > > examples!
>>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>>>> >> >> >> > > >>> >> > > -V.
>>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>>> >> >>>>> >> >> >> [hidden email]>
>>> >> >>>>> >> >> >> > > >>> wrote:
>>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>>> PageRank.
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to
>>> the
>>> >> Java
>>> >> >>>>> >> >> examples:
>>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>>> parameters
>>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
>>> >> Krettek <
>>> >> >>>>> >> >> >> > > [hidden email]
>>> >> >>>>> >> >> >> > > >>> >:
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>>> >> >>>>> >> >> >> examples
>>> >> >>>>> >> >> >> > > here.
>>> >> >>>>> >> >> >> > > >>> >> > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian
>>> >> Hueske
>>> >> >>>>> >> >> >> > > >>> >> > > > > <
>>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented
>>> by
>>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>>> >> >>>>> >> >> >> people
>>> >> >>>>> >> >> >> > > >>> proved to
>>> >> >>>>> >> >> >> > > >>> >> > be
>>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a
>>> >> simple
>>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>>> >> >>>>> >> >> one
>>> >> >>>>> >> >> >> > such
>>> >> >>>>> >> >> >> > > as
>>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha
>>> >> Krettek
>>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>>> >> >>>>> >> >> >> > > >>> [hidden email]
>>> >> >>>>> >> >> >> > > >>> >> >:
>>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
>>> Scala
>>> >> API
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >>
>>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to
>>> >> write
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>>> >> >>>>> >> tests
>>> >> >>>>> >> >> and
>>> >> >>>>> >> >> >> > > port
>>> >> >>>>> >> >> >> > > >>> the
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
>>> sense
>>> >> to
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>>> >> >>>>> >> other
>>> >> >>>>> >> >> >> > people
>>> >> >>>>> >> >> >> > > >>> port
>>> >> >>>>> >> >> >> > > >>> >> the
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses
>>> >> it and
>>> >> >>>>> >> maybe
>>> >> >>>>> >> >> >> > notices
>>> >> >>>>> >> >> >> > > some
>>> >> >>>>> >> >> >> > > >>> >> > quirks
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>>>> >> >> >> > > >>> >> > > > >
>>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>>>> >> >> >> > > >>>
>>> >> >>>>> >> >> >> > >
>>> >> >>>>> >> >> >> >
>>> >> >>>>> >> >> >>
>>> >> >>>>> >> >>
>>> >> >>>>> >>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Fabian Hueske
I haven't looked at the LineRank example in detail, but if you think that
it adds something new to the examples collection, we can certainly port it
also to Java.
I think the Option and Collector return types are sufficient right now but
if Collections are easy to add, go for it. ;-)

Great that the Scala primitives are working! Also thanks for adding
genSequence and adapting my examples.
Btw. does the codestyle not apply for Scala files or do we have a different
there?

2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> What about the LineRank example? We had that in Scala but never had a
> Java Example.
>
> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
> wrote:
> > Yes, I like that. For the ITCases I always just copied the Java ITCase.
> >
> > The only examples that are missing now are LinearRegression and the
> > relational stuff.
> >
> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
> wrote:
> >> I just removed the old CountEdgeDegrees example.
> >> That was a preprocessing step for the TriangleEnumeration, and is now
> part
> >> of the new TriangleEnumerationOpt example.
> >> So I guess, we don't need to port that one. As I said before, I'd
> prefer to
> >> keep Java and Scala examples in sync.
> >>
> >> Cheers, Fabian
> >>
> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >>
> >>> I added the PageRank example, thanks again fabian. :D
> >>>
> >>> Regarding the other stuff:
> >>>  - There is a comment in DataSet.scala about including
> >>> org.apache.flink.api.scala._ because of the TypeInformation.
> >>>  - I added generateSequence to ExecutionEnvironment.
> >>>  - It is possible to use Scala Primitives in Array, I noticed it while
> >>> writing the tests, you probably had an older version of the code.
> >>>  - Yes, using List and other Interfaces is not possible, this is also
> >>> a restriction in the Java API.
> >>>
> >>> What do you think about the interface of join and coGroup? Right now,
> >>> you can either use a lambda that returns an Option or the lambda with
> >>> the Collector. Originally I wanted to have also have a lambda that
> >>> returns a Collection, but due to type erasure this has the same type
> >>> as the lambda with the Option so I couldn't use it. There is an
> >>> implicit conversion from Option to a Collection, so I could change it
> >>> without breaking the examples we have now. What do you think?
> >>>
> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
> >>> PageRank
> >>>
> >>> These are the examples people called dibs on:
> >>>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
> >>> Example from Java)
> >>>  - ComputeEdgeDegrees (Hermann)
> >>>
> >>> Those are unclaimed (if I'm not mistaken):
> >>>  - The relational Stuff
> >>>
> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
> wrote:
> >>> > +1 for removing RelationQuery
> >>> >
> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
> [hidden email]>
> >>> > wrote:
> >>> >
> >>> >> By the way, what was called BatchGradientDescent in the Scala
> examples
> >>> >> should be replaced by a port of the LinearRegression Example from
> >>> >> Java. I had them as two separate examples earlier.
> >>> >>
> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
> >>> >> RelationalQuery?
> >>> >>
> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
> [hidden email]
> >>> >
> >>> >> wrote:
> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
> >>> >> >
> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
> TriangleEnumerationNaive/Opt
> >>> >> >
> >>> >> > These are the examples people called dibs on:
> >>> >> >  - PageRank (Fabian)
> >>> >> >  - BatchGradientDescent (Márton)
> >>> >> >  - ComputeEdgeDegrees (Hermann)
> >>> >> >
> >>> >> > Those are unclaimed (if I'm not mistaken):
> >>> >> >  - The relational Stuff
> >>> >> >  - LinearRegression
> >>> >> >
> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> >>> [hidden email]>
> >>> >> wrote:
> >>> >> >> Thanks, I added it. I'll keep a running list of ported/unported
> >>> >> >> examples in my mails. I'll rename the java example package to
> >>> examples
> >>> >> >> once the Scala API merge is done.
> >>> >> >>
> >>> >> >> I think the termination criterion is fine as it is. Just because
> >>> Scala
> >>> >> >> enables functional programming doesn't mean it's always the best
> >>> >> >> choice. :D
> >>> >> >>
> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
> >>> >> >>
> >>> >> >> These are the examples people called dibs on:
> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
> >>> >> >>  - BatchGradientDescent (Márton)
> >>> >> >>  - ComputeEdgeDegrees (Hermann)
> >>> >> >>
> >>> >> >> Those are unclaimed (if I'm not mistaken):
> >>> >> >>  - The relational Stuff
> >>> >> >>  - LinearRegression
> >>> >> >>
> >>> >> >> Cheers,
> >>> >> >> Aljoscha
> >>> >> >>
> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
> [hidden email]
> >>> >
> >>> >> wrote:
> >>> >> >>> Transitive closure here, I also added a termination criterion
> in the
> >>> >> Java
> >>> >> >>> version:
> >>> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >>> >> >>>
> >>> >> >>> Perhaps you can make the termination criterion in Scala more
> >>> >> functional?
> >>> >> >>>
> >>> >> >>> I noticed that the examples package name is example.java but
> >>> >> examples.scala
> >>> >> >>>
> >>> >> >>> Kostas
> >>> >> >>>
> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
> [hidden email]
> >>> >
> >>> >> wrote:
> >>> >> >>>>
> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
> >>> list).
> >>> >> >>>>
> >>> >> >>>> If nobody volunteers for the relational stuff I can take those
> as
> >>> >> well.
> >>> >> >>>>
> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
> Java?
> >>> It
> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
> teaching
> >>> >> value on
> >>> >> >>>> top of TPC-H Q3?
> >>> >> >>>>
> >>> >> >>>> Kostas
> >>> >> >>>>
> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> >>> [hidden email]
> >>> >> >
> >>> >> >>>> wrote:
> >>> >> >>>>>
> >>> >> >>>>> Thanks, I added it, along with an ITCase.
> >>> >> >>>>>
> >>> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>> >> >>>>> WebLogAnalysis
> >>> >> >>>>>
> >>> >> >>>>> These are the examples people called dibs on:
> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >>> >> >>>>>  - BatchGradientDescent (Márton)
> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> >>> >> >>>>>
> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
> >>> >> >>>>>  - TransitiveClosure
> >>> >> >>>>>  - The relational Stuff
> >>> >> >>>>>  - LinearRegression
> >>> >> >>>>>
> >>> >> >>>>> Cheers,
> >>> >> >>>>> Aljoscha
> >>> >> >>>>>
> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> >>> [hidden email]>
> >>> >> >>>>> wrote:
> >>> >> >>>>> > WebLog here:
> >>> >> >>>>> >
> >>> >> >>>>> >
> >>> >>
> >>>
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >>> >> >>>>> >
> >>> >> >>>>> > Do you need any more done?
> >>> >> >>>>> >
> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> >>> >> [hidden email]>
> >>> >> >>>>> > wrote:
> >>> >> >>>>> >
> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >>> >> >>>>> >>
> >>> >> >>>>> >> Keep 'em coming, people. :D
> >>> >> >>>>> >>
> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> >>> [hidden email]
> >>> >> >
> >>> >> >>>>> >> wrote:
> >>> >> >>>>> >> > Alright, will do.
> >>> >> >>>>> >> > Thanks!
> >>> >> >>>>> >> >
> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> >>> >> [hidden email]>:
> >>> >> >>>>> >> >
> >>> >> >>>>> >> >> Ok people, executive decision. :D
> >>> >> >>>>> >> >>
> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
> storing
> >>> >> the
> >>> >> >>>>> >> >> data
> >>> >> >>>>> >> >> in multi-dimensional object arrays and then converting
> it to
> >>> >> the
> >>> >> >>>>> >> >> required Java or Scala objects.
> >>> >> >>>>> >> >>
> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
> consistent
> >>> >> with the
> >>> >> >>>>> >> >> Java
> >>> >> >>>>> >> >> API.
> >>> >> >>>>> >> >>
> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
> >>> keyword,
> >>> >> you
> >>> >> >>>>> >> >> can
> >>> >> >>>>> >> >> just write:
> >>> >> >>>>> >> >>
> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
> >>> >> MyResult(le,
> >>> >> >>>>> >> >> re)
> >>> >> >>>>> >> }
> >>> >> >>>>> >> >>
> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> >>> >> [hidden email]>
> >>> >> >>>>> >> wrote:
> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
> >>> inconsistency
> >>> >> with
> >>> >> >>>>> >> >> > the
> >>> >> >>>>> >> Java
> >>> >> >>>>> >> >> > API. In Java join is done as:
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> > where in the current Scala this is:
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
> method
> >>> >> because
> >>> >> >>>>> >> "with"
> >>> >> >>>>> >> >> is
> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar
> for
> >>> >> Scala
> >>> >> >>>>> >> >> > or go
> >>> >> >>>>> >> >> with
> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
> [hidden email]
> >>> >:
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
> That
> >>> is a
> >>> >> >>>>> >> >> >> generic
> >>> >> >>>>> >> >> >> representation of a Tuple.
> >>> >> >>>>> >> >> >>
> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
> >>> Tuples,
> >>> >> >>>>> >> >> >> with a
> >>> >> >>>>> >> >> generic
> >>> >> >>>>> >> >> >> utility method to convert between the two.
> >>> >> >>>>> >> >> >>
> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >>> >> >>>>> >> >> >> <[hidden email]>
> >>> >> >>>>> >> >> wrote:
> >>> >> >>>>> >> >> >>
> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >>> >> >>>>> >> >> >> >
> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using
> the
> >>> >> >>>>> >> >> >> > CSVFormat
> >>> >> >>>>> >> won't
> >>> >> >>>>> >> >> >> work
> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >>> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
> >>> >> >>>>> >> >> >> >
> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >>> >> >>>>> >> >> >> > <[hidden email]>:
> >>> >> >>>>> >> >> >> >
> >>> >> >>>>> >> >> >> > > Hi,
> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
> all
> >>> the
> >>> >> >>>>> >> >> >> > > example
> >>> >> >>>>> >> input
> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all
> the
> >>> >> >>>>> >> >> >> > > examples.
> >>> >> >>>>> >> What
> >>> >> >>>>> >> >> do
> >>> >> >>>>> >> >> >> > > you think?
> >>> >> >>>>> >> >> >> > >
> >>> >> >>>>> >> >> >> > > Cheers,
> >>> >> >>>>> >> >> >> > > Aljoscha
> >>> >> >>>>> >> >> >> > >
> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek
> <
> >>> >> >>>>> >> >> [hidden email]>
> >>> >> >>>>> >> >> >> > > wrote:
> >>> >> >>>>> >> >> >> > > > Hi,
> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> >>> >> incompatible.
> >>> >> >>>>> >> >> >> > > > I'm
> >>> >> >>>>> >> >> afraid
> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
> data to
> >>> a
> >>> >> >>>>> >> >> >> > > > static
> >>> >> >>>>> >> field
> >>> >> >>>>> >> >> and
> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
> method in
> >>> >> Scala.
> >>> >> >>>>> >> >> >> > > > It's
> >>> >> >>>>> >> >> not
> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
> >>> make it
> >>> >> >>>>> >> >> >> > > > easier
> >>> >> >>>>> >> for
> >>> >> >>>>> >> >> it
> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
> versions.
> >>> >> >>>>> >> >> >> > > >
> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
> occur
> >>> in
> >>> >> all
> >>> >> >>>>> >> >> >> > > > the
> >>> >> >>>>> >> >> >> examples.
> >>> >> >>>>> >> >> >> > > >
> >>> >> >>>>> >> >> >> > > > Cheers,
> >>> >> >>>>> >> >> >> > > > Aljoscha
> >>> >> >>>>> >> >> >> > > >
> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
> Kalavri
> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >>> >> >>>>> >> >> >> > > >> Hey,
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
> example,
> >>> but
> >>> >> I am
> >>> >> >>>>> >> >> >> > > >> not
> >>> >> >>>>> >> sure
> >>> >> >>>>> >> >> >> how
> >>> >> >>>>> >> >> >> > to
> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
> java-examples.
> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
> vertices
> >>> >> and
> >>> >> >>>>> >> >> >> > > >> edges
> >>> >> >>>>> >> data
> >>> >> >>>>> >> >> >> are
> >>> >> >>>>> >> >> >> > > >> produced by the methods
> getDefaultVertexDataSet()
> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >>> >> >>>>> >> >> >> > > >> an
> org.apache.flink.api.java.ExecutionEnvironment
> >>> as
> >>> >> >>>>> >> parameter.
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
> (like
> >>> in
> >>> >> the
> >>> >> >>>>> >> >> >> WordCountData
> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >>> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
> >>> Scala
> >>> >> >>>>> >> >> >> > > >> tuple and
> >>> >> >>>>> >> >> from
> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this
> is
> >>> an
> >>> >> >>>>> >> unnecessary
> >>> >> >>>>> >> >> >> > > complexity
> >>> >> >>>>> >> >> >> > > >> for an example (?).
> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
> >>> data
> >>> >> in
> >>> >> >>>>> >> >> >> > > >> the
> >>> >> >>>>> >> Scala
> >>> >> >>>>> >> >> >> > > example.
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> Thanks!
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> Cheers,
> >>> >> >>>>> >> >> >> > > >> V.
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
> >>> >> >>>>> >> [hidden email]
> >>> >> >>>>> >> >> >
> >>> >> >>>>> >> >> >> > > wrote:
> >>> >> >>>>> >> >> >> > > >>
> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >>
> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
> It's
> >>> >> pretty
> >>> >> >>>>> >> >> >> > > >>> much a
> >>> >> >>>>> >> >> copy
> >>> >> >>>>> >> >> >> of
> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
> syntax
> >>> and
> >>> >> >>>>> >> >> >> > > >>> lambda
> >>> >> >>>>> >> >> >> > functions.
> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
> java-examples
> >>> >> as a
> >>> >> >>>>> >> >> dependency
> >>> >> >>>>> >> >> >> for
> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
> >>> example
> >>> >> >>>>> >> >> >> > > >>> input
> >>> >> >>>>> >> data.
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
> >>> request
> >>> >> >>>>> >> >> >> > > >>> against
> >>> >> >>>>> >> my
> >>> >> >>>>> >> >> repo
> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
> Gábor <
> >>> >> >>>>> >> >> >> [hidden email]
> >>> >> >>>>> >> >> >> > >
> >>> >> >>>>> >> >> >> > > >>> wrote:
> >>> >> >>>>> >> >> >> > > >>> > +1
> >>> >> >>>>> >> >> >> > > >>> >
> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >>> >> >>>>> >> >> >> > > >>> >
> >>> >> >>>>> >> >> >> > > >>> >
> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
> >>> Balassi <
> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >>> >> >>>>> >> >> >> > > >>> > wrote:
> >>> >> >>>>> >> >> >> > > >>> >
> >>> >> >>>>> >> >> >> > > >>> >> +1
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
> >>> Tzoumas <
> >>> >> >>>>> >> >> >> > > [hidden email]>
> >>> >> >>>>> >> >> >> > > >>> >> wrote:
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >> > > >>> >> > +1
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
> going
> >>> >> through
> >>> >> >>>>> >> >> >> > > >>> >> > a
> >>> >> >>>>> >> >> tutorial
> >>> >> >>>>> >> >> >> so
> >>> >> >>>>> >> >> >> > > this
> >>> >> >>>>> >> >> >> > > >>> >> will
> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
> the
> >>> new
> >>> >> API
> >>> >> >>>>> >> >> >> > > >>> >> > :-)
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
> >>> >> Kalavri <
> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement
> the
> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
> me :)
> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
> Hueske <
> >>> >> >>>>> >> >> >> [hidden email]>
> >>> >> >>>>> >> >> >> > > >>> wrote:
> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
> >>> PageRank.
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar
> to
> >>> the
> >>> >> Java
> >>> >> >>>>> >> >> examples:
> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
> >>> parameters
> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
> >>> >> Krettek <
> >>> >> >>>>> >> >> >> > > [hidden email]
> >>> >> >>>>> >> >> >> > > >>> >:
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve
> their
> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> >>> >> >>>>> >> >> >> examples
> >>> >> >>>>> >> >> >> > > here.
> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
> Fabian
> >>> >> Hueske
> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
> implemented
> >>> by
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
> >>> >> >>>>> >> >> >> people
> >>> >> >>>>> >> >> >> > > >>> proved to
> >>> >> >>>>> >> >> >> > > >>> >> > be
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
> examples.
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
> port a
> >>> >> simple
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
> >>> >> >>>>> >> >> one
> >>> >> >>>>> >> >> >> > such
> >>> >> >>>>> >> >> >> > > as
> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
> Aljoscha
> >>> >> Krettek
> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
> >>> >> >>>>> >> >> >> > > >>> [hidden email]
> >>> >> >>>>> >> >> >> > > >>> >> >:
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
> >>> Scala
> >>> >> API
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >>
> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have
> to
> >>> >> write
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> >>> >> >>>>> >> tests
> >>> >> >>>>> >> >> and
> >>> >> >>>>> >> >> >> > > port
> >>> >> >>>>> >> >> >> > > >>> the
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
> >>> sense
> >>> >> to
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> >>> >> >>>>> >> other
> >>> >> >>>>> >> >> >> > people
> >>> >> >>>>> >> >> >> > > >>> port
> >>> >> >>>>> >> >> >> > > >>> >> the
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else
> uses
> >>> >> it and
> >>> >> >>>>> >> maybe
> >>> >> >>>>> >> >> >> > notices
> >>> >> >>>>> >> >> >> > > some
> >>> >> >>>>> >> >> >> > > >>> >> > quirks
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>> >> >>>>> >> >> >> > > >>> >> >
> >>> >> >>>>> >> >> >> > > >>> >>
> >>> >> >>>>> >> >> >> > > >>>
> >>> >> >>>>> >> >> >> > >
> >>> >> >>>>> >> >> >> >
> >>> >> >>>>> >> >> >>
> >>> >> >>>>> >> >>
> >>> >> >>>>> >>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>
> >>> >>
> >>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
I didn't look at the example either.

Addings collections is easy, it's just that we can either have
Collections or the Option, not both.

For the coding style I followed this:
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
which itself is based on this: http://docs.scala-lang.org/style/. It
is different from the Java Code Guidelines we have in place, yes.

On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]> wrote:

> I haven't looked at the LineRank example in detail, but if you think that
> it adds something new to the examples collection, we can certainly port it
> also to Java.
> I think the Option and Collector return types are sufficient right now but
> if Collections are easy to add, go for it. ;-)
>
> Great that the Scala primitives are working! Also thanks for adding
> genSequence and adapting my examples.
> Btw. does the codestyle not apply for Scala files or do we have a different
> there?
>
> 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>
>> What about the LineRank example? We had that in Scala but never had a
>> Java Example.
>>
>> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
>> wrote:
>> > Yes, I like that. For the ITCases I always just copied the Java ITCase.
>> >
>> > The only examples that are missing now are LinearRegression and the
>> > relational stuff.
>> >
>> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
>> wrote:
>> >> I just removed the old CountEdgeDegrees example.
>> >> That was a preprocessing step for the TriangleEnumeration, and is now
>> part
>> >> of the new TriangleEnumerationOpt example.
>> >> So I guess, we don't need to port that one. As I said before, I'd
>> prefer to
>> >> keep Java and Scala examples in sync.
>> >>
>> >> Cheers, Fabian
>> >>
>> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>> >>
>> >>> I added the PageRank example, thanks again fabian. :D
>> >>>
>> >>> Regarding the other stuff:
>> >>>  - There is a comment in DataSet.scala about including
>> >>> org.apache.flink.api.scala._ because of the TypeInformation.
>> >>>  - I added generateSequence to ExecutionEnvironment.
>> >>>  - It is possible to use Scala Primitives in Array, I noticed it while
>> >>> writing the tests, you probably had an older version of the code.
>> >>>  - Yes, using List and other Interfaces is not possible, this is also
>> >>> a restriction in the Java API.
>> >>>
>> >>> What do you think about the interface of join and coGroup? Right now,
>> >>> you can either use a lambda that returns an Option or the lambda with
>> >>> the Collector. Originally I wanted to have also have a lambda that
>> >>> returns a Collection, but due to type erasure this has the same type
>> >>> as the lambda with the Option so I couldn't use it. There is an
>> >>> implicit conversion from Option to a Collection, so I could change it
>> >>> without breaking the examples we have now. What do you think?
>> >>>
>> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >>> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt,
>> >>> PageRank
>> >>>
>> >>> These are the examples people called dibs on:
>> >>>  - BatchGradientDescent (Márton) (Should be a port of LinearRegression
>> >>> Example from Java)
>> >>>  - ComputeEdgeDegrees (Hermann)
>> >>>
>> >>> Those are unclaimed (if I'm not mistaken):
>> >>>  - The relational Stuff
>> >>>
>> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
>> wrote:
>> >>> > +1 for removing RelationQuery
>> >>> >
>> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
>> [hidden email]>
>> >>> > wrote:
>> >>> >
>> >>> >> By the way, what was called BatchGradientDescent in the Scala
>> examples
>> >>> >> should be replaced by a port of the LinearRegression Example from
>> >>> >> Java. I had them as two separate examples earlier.
>> >>> >>
>> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
>> >>> >> RelationalQuery?
>> >>> >>
>> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
>> [hidden email]
>> >>> >
>> >>> >> wrote:
>> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>> >>> >> >
>> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
>> TriangleEnumerationNaive/Opt
>> >>> >> >
>> >>> >> > These are the examples people called dibs on:
>> >>> >> >  - PageRank (Fabian)
>> >>> >> >  - BatchGradientDescent (Márton)
>> >>> >> >  - ComputeEdgeDegrees (Hermann)
>> >>> >> >
>> >>> >> > Those are unclaimed (if I'm not mistaken):
>> >>> >> >  - The relational Stuff
>> >>> >> >  - LinearRegression
>> >>> >> >
>> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>> >>> [hidden email]>
>> >>> >> wrote:
>> >>> >> >> Thanks, I added it. I'll keep a running list of ported/unported
>> >>> >> >> examples in my mails. I'll rename the java example package to
>> >>> examples
>> >>> >> >> once the Scala API merge is done.
>> >>> >> >>
>> >>> >> >> I think the termination criterion is fine as it is. Just because
>> >>> Scala
>> >>> >> >> enables functional programming doesn't mean it's always the best
>> >>> >> >> choice. :D
>> >>> >> >>
>> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
>> >>> >> >>
>> >>> >> >> These are the examples people called dibs on:
>> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
>> >>> >> >>  - BatchGradientDescent (Márton)
>> >>> >> >>  - ComputeEdgeDegrees (Hermann)
>> >>> >> >>
>> >>> >> >> Those are unclaimed (if I'm not mistaken):
>> >>> >> >>  - The relational Stuff
>> >>> >> >>  - LinearRegression
>> >>> >> >>
>> >>> >> >> Cheers,
>> >>> >> >> Aljoscha
>> >>> >> >>
>> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
>> [hidden email]
>> >>> >
>> >>> >> wrote:
>> >>> >> >>> Transitive closure here, I also added a termination criterion
>> in the
>> >>> >> Java
>> >>> >> >>> version:
>> >>> >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>> >>> >> >>>
>> >>> >> >>> Perhaps you can make the termination criterion in Scala more
>> >>> >> functional?
>> >>> >> >>>
>> >>> >> >>> I noticed that the examples package name is example.java but
>> >>> >> examples.scala
>> >>> >> >>>
>> >>> >> >>> Kostas
>> >>> >> >>>
>> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
>> [hidden email]
>> >>> >
>> >>> >> wrote:
>> >>> >> >>>>
>> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your
>> >>> list).
>> >>> >> >>>>
>> >>> >> >>>> If nobody volunteers for the relational stuff I can take those
>> as
>> >>> >> well.
>> >>> >> >>>>
>> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
>> Java?
>> >>> It
>> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
>> teaching
>> >>> >> value on
>> >>> >> >>>> top of TPC-H Q3?
>> >>> >> >>>>
>> >>> >> >>>> Kostas
>> >>> >> >>>>
>> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>> >>> [hidden email]
>> >>> >> >
>> >>> >> >>>> wrote:
>> >>> >> >>>>>
>> >>> >> >>>>> Thanks, I added it, along with an ITCase.
>> >>> >> >>>>>
>> >>> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >>> >> >>>>> WebLogAnalysis
>> >>> >> >>>>>
>> >>> >> >>>>> These are the examples people called dibs on:
>> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>> >>> >> >>>>>  - BatchGradientDescent (Márton)
>> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>> >>> >> >>>>>
>> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>> >>> >> >>>>>  - TransitiveClosure
>> >>> >> >>>>>  - The relational Stuff
>> >>> >> >>>>>  - LinearRegression
>> >>> >> >>>>>
>> >>> >> >>>>> Cheers,
>> >>> >> >>>>> Aljoscha
>> >>> >> >>>>>
>> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>> >>> [hidden email]>
>> >>> >> >>>>> wrote:
>> >>> >> >>>>> > WebLog here:
>> >>> >> >>>>> >
>> >>> >> >>>>> >
>> >>> >>
>> >>>
>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>> >>> >> >>>>> >
>> >>> >> >>>>> > Do you need any more done?
>> >>> >> >>>>> >
>> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>> >>> >> [hidden email]>
>> >>> >> >>>>> > wrote:
>> >>> >> >>>>> >
>> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>> >>> >> >>>>> >>
>> >>> >> >>>>> >> Keep 'em coming, people. :D
>> >>> >> >>>>> >>
>> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>> >>> [hidden email]
>> >>> >> >
>> >>> >> >>>>> >> wrote:
>> >>> >> >>>>> >> > Alright, will do.
>> >>> >> >>>>> >> > Thanks!
>> >>> >> >>>>> >> >
>> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>> >>> >> [hidden email]>:
>> >>> >> >>>>> >> >
>> >>> >> >>>>> >> >> Ok people, executive decision. :D
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
>> storing
>> >>> >> the
>> >>> >> >>>>> >> >> data
>> >>> >> >>>>> >> >> in multi-dimensional object arrays and then converting
>> it to
>> >>> >> the
>> >>> >> >>>>> >> >> required Java or Scala objects.
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
>> consistent
>> >>> >> with the
>> >>> >> >>>>> >> >> Java
>> >>> >> >>>>> >> >> API.
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>> >>> keyword,
>> >>> >> you
>> >>> >> >>>>> >> >> can
>> >>> >> >>>>> >> >> just write:
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new
>> >>> >> MyResult(le,
>> >>> >> >>>>> >> >> re)
>> >>> >> >>>>> >> }
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>> >>> >> [hidden email]>
>> >>> >> >>>>> >> wrote:
>> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>> >>> inconsistency
>> >>> >> with
>> >>> >> >>>>> >> >> > the
>> >>> >> >>>>> >> Java
>> >>> >> >>>>> >> >> > API. In Java join is done as:
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> > where in the current Scala this is:
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
>> method
>> >>> >> because
>> >>> >> >>>>> >> "with"
>> >>> >> >>>>> >> >> is
>> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something similar
>> for
>> >>> >> Scala
>> >>> >> >>>>> >> >> > or go
>> >>> >> >>>>> >> >> with
>> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
>> [hidden email]
>> >>> >:
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
>> That
>> >>> is a
>> >>> >> >>>>> >> >> >> generic
>> >>> >> >>>>> >> >> >> representation of a Tuple.
>> >>> >> >>>>> >> >> >>
>> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala
>> >>> Tuples,
>> >>> >> >>>>> >> >> >> with a
>> >>> >> >>>>> >> >> generic
>> >>> >> >>>>> >> >> >> utility method to convert between the two.
>> >>> >> >>>>> >> >> >>
>> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>> >>> >> >>>>> >> >> >> <[hidden email]>
>> >>> >> >>>>> >> >> wrote:
>> >>> >> >>>>> >> >> >>
>> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>> >>> >> >>>>> >> >> >> >
>> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but using
>> the
>> >>> >> >>>>> >> >> >> > CSVFormat
>> >>> >> >>>>> >> won't
>> >>> >> >>>>> >> >> >> work
>> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>> >>> >> >>>>> >> >> >> > So we would need to parse the Strings manually...
>> >>> >> >>>>> >> >> >> >
>> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>> >>> >> >>>>> >> >> >> > <[hidden email]>:
>> >>> >> >>>>> >> >> >> >
>> >>> >> >>>>> >> >> >> > > Hi,
>> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
>> all
>> >>> the
>> >>> >> >>>>> >> >> >> > > example
>> >>> >> >>>>> >> input
>> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in all
>> the
>> >>> >> >>>>> >> >> >> > > examples.
>> >>> >> >>>>> >> What
>> >>> >> >>>>> >> >> do
>> >>> >> >>>>> >> >> >> > > you think?
>> >>> >> >>>>> >> >> >> > >
>> >>> >> >>>>> >> >> >> > > Cheers,
>> >>> >> >>>>> >> >> >> > > Aljoscha
>> >>> >> >>>>> >> >> >> > >
>> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek
>> <
>> >>> >> >>>>> >> >> [hidden email]>
>> >>> >> >>>>> >> >> >> > > wrote:
>> >>> >> >>>>> >> >> >> > > > Hi,
>> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>> >>> >> incompatible.
>> >>> >> >>>>> >> >> >> > > > I'm
>> >>> >> >>>>> >> >> afraid
>> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
>> data to
>> >>> a
>> >>> >> >>>>> >> >> >> > > > static
>> >>> >> >>>>> >> field
>> >>> >> >>>>> >> >> and
>> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
>> method in
>> >>> >> Scala.
>> >>> >> >>>>> >> >> >> > > > It's
>> >>> >> >>>>> >> >> not
>> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and
>> >>> make it
>> >>> >> >>>>> >> >> >> > > > easier
>> >>> >> >>>>> >> for
>> >>> >> >>>>> >> >> it
>> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
>> versions.
>> >>> >> >>>>> >> >> >> > > >
>> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
>> occur
>> >>> in
>> >>> >> all
>> >>> >> >>>>> >> >> >> > > > the
>> >>> >> >>>>> >> >> >> examples.
>> >>> >> >>>>> >> >> >> > > >
>> >>> >> >>>>> >> >> >> > > > Cheers,
>> >>> >> >>>>> >> >> >> > > > Aljoscha
>> >>> >> >>>>> >> >> >> > > >
>> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
>> Kalavri
>> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>> >>> >> >>>>> >> >> >> > > >> Hey,
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
>> example,
>> >>> but
>> >>> >> I am
>> >>> >> >>>>> >> >> >> > > >> not
>> >>> >> >>>>> >> sure
>> >>> >> >>>>> >> >> >> how
>> >>> >> >>>>> >> >> >> > to
>> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
>> java-examples.
>> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
>> vertices
>> >>> >> and
>> >>> >> >>>>> >> >> >> > > >> edges
>> >>> >> >>>>> >> data
>> >>> >> >>>>> >> >> >> are
>> >>> >> >>>>> >> >> >> > > >> produced by the methods
>> getDefaultVertexDataSet()
>> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >>> >> >>>>> >> >> >> > > >> an
>> org.apache.flink.api.java.ExecutionEnvironment
>> >>> as
>> >>> >> >>>>> >> parameter.
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
>> (like
>> >>> in
>> >>> >> the
>> >>> >> >>>>> >> >> >> WordCountData
>> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>> >>> >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to
>> >>> Scala
>> >>> >> >>>>> >> >> >> > > >> tuple and
>> >>> >> >>>>> >> >> from
>> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this
>> is
>> >>> an
>> >>> >> >>>>> >> unnecessary
>> >>> >> >>>>> >> >> >> > > complexity
>> >>> >> >>>>> >> >> >> > > >> for an example (?).
>> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example
>> >>> data
>> >>> >> in
>> >>> >> >>>>> >> >> >> > > >> the
>> >>> >> >>>>> >> Scala
>> >>> >> >>>>> >> >> >> > > example.
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> Thanks!
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> Cheers,
>> >>> >> >>>>> >> >> >> > > >> V.
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>> >>> >> >>>>> >> [hidden email]
>> >>> >> >>>>> >> >> >
>> >>> >> >>>>> >> >> >> > > wrote:
>> >>> >> >>>>> >> >> >> > > >>
>> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >>
>> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
>> It's
>> >>> >> pretty
>> >>> >> >>>>> >> >> >> > > >>> much a
>> >>> >> >>>>> >> >> copy
>> >>> >> >>>>> >> >> >> of
>> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
>> syntax
>> >>> and
>> >>> >> >>>>> >> >> >> > > >>> lambda
>> >>> >> >>>>> >> >> >> > functions.
>> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
>> java-examples
>> >>> >> as a
>> >>> >> >>>>> >> >> dependency
>> >>> >> >>>>> >> >> >> for
>> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the
>> >>> example
>> >>> >> >>>>> >> >> >> > > >>> input
>> >>> >> >>>>> >> data.
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull
>> >>> request
>> >>> >> >>>>> >> >> >> > > >>> against
>> >>> >> >>>>> >> my
>> >>> >> >>>>> >> >> repo
>> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
>> Gábor <
>> >>> >> >>>>> >> >> >> [hidden email]
>> >>> >> >>>>> >> >> >> > >
>> >>> >> >>>>> >> >> >> > > >>> wrote:
>> >>> >> >>>>> >> >> >> > > >>> > +1
>> >>> >> >>>>> >> >> >> > > >>> >
>> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >>> >> >>>>> >> >> >> > > >>> >
>> >>> >> >>>>> >> >> >> > > >>> >
>> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>> >>> Balassi <
>> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >>> >> >>>>> >> >> >> > > >>> > wrote:
>> >>> >> >>>>> >> >> >> > > >>> >
>> >>> >> >>>>> >> >> >> > > >>> >> +1
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>> >>> Tzoumas <
>> >>> >> >>>>> >> >> >> > > [hidden email]>
>> >>> >> >>>>> >> >> >> > > >>> >> wrote:
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >> > > >>> >> > +1
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
>> going
>> >>> >> through
>> >>> >> >>>>> >> >> >> > > >>> >> > a
>> >>> >> >>>>> >> >> tutorial
>> >>> >> >>>>> >> >> >> so
>> >>> >> >>>>> >> >> >> > > this
>> >>> >> >>>>> >> >> >> > > >>> >> will
>> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
>> the
>> >>> new
>> >>> >> API
>> >>> >> >>>>> >> >> >> > > >>> >> > :-)
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki
>> >>> >> Kalavri <
>> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement
>> the
>> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
>> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
>> me :)
>> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
>> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
>> Hueske <
>> >>> >> >>>>> >> >> >> [hidden email]>
>> >>> >> >>>>> >> >> >> > > >>> wrote:
>> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>> >>> PageRank.
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar
>> to
>> >>> the
>> >>> >> Java
>> >>> >> >>>>> >> >> examples:
>> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>> >>> parameters
>> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
>> >>> >> Krettek <
>> >>> >> >>>>> >> >> >> > > [hidden email]
>> >>> >> >>>>> >> >> >> > > >>> >:
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve
>> their
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>> >>> >> >>>>> >> >> >> examples
>> >>> >> >>>>> >> >> >> > > here.
>> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
>> Fabian
>> >>> >> Hueske
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
>> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
>> implemented
>> >>> by
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>> >>> >> >>>>> >> >> >> people
>> >>> >> >>>>> >> >> >> > > >>> proved to
>> >>> >> >>>>> >> >> >> > > >>> >> > be
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
>> examples.
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
>> port a
>> >>> >> simple
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>> >>> >> >>>>> >> >> one
>> >>> >> >>>>> >> >> >> > such
>> >>> >> >>>>> >> >> >> > > as
>> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
>> Aljoscha
>> >>> >> Krettek
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>> >>> >> >>>>> >> >> >> > > >>> [hidden email]
>> >>> >> >>>>> >> >> >> > > >>> >> >:
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the
>> >>> Scala
>> >>> >> API
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >>
>> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have
>> to
>> >>> >> write
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>> >>> >> >>>>> >> tests
>> >>> >> >>>>> >> >> and
>> >>> >> >>>>> >> >> >> > > port
>> >>> >> >>>>> >> >> >> > > >>> the
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes
>> >>> sense
>> >>> >> to
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>> >>> >> >>>>> >> other
>> >>> >> >>>>> >> >> >> > people
>> >>> >> >>>>> >> >> >> > > >>> port
>> >>> >> >>>>> >> >> >> > > >>> >> the
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else
>> uses
>> >>> >> it and
>> >>> >> >>>>> >> maybe
>> >>> >> >>>>> >> >> >> > notices
>> >>> >> >>>>> >> >> >> > > some
>> >>> >> >>>>> >> >> >> > > >>> >> > quirks
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >>> >> >>>>> >> >> >> > > >>> >> >
>> >>> >> >>>>> >> >> >> > > >>> >>
>> >>> >> >>>>> >> >> >> > > >>>
>> >>> >> >>>>> >> >> >> > >
>> >>> >> >>>>> >> >> >> >
>> >>> >> >>>>> >> >> >>
>> >>> >> >>>>> >> >>
>> >>> >> >>>>> >>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>
>> >>> >>
>> >>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Fabian Hueske
Hmmm, tricky question...
How about the Option for Join as this is a tuple-wise operation and the
Collection for Cogroup which is group-wise?
Could we in that case use list comprehensions in Cogroup functions?

Or is that too much mixing?

2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> I didn't look at the example either.
>
> Addings collections is easy, it's just that we can either have
> Collections or the Option, not both.
>
> For the coding style I followed this:
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
> which itself is based on this: http://docs.scala-lang.org/style/. It
> is different from the Java Code Guidelines we have in place, yes.
>
> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]>
> wrote:
> > I haven't looked at the LineRank example in detail, but if you think that
> > it adds something new to the examples collection, we can certainly port
> it
> > also to Java.
> > I think the Option and Collector return types are sufficient right now
> but
> > if Collections are easy to add, go for it. ;-)
> >
> > Great that the Scala primitives are working! Also thanks for adding
> > genSequence and adapting my examples.
> > Btw. does the codestyle not apply for Scala files or do we have a
> different
> > there?
> >
> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >
> >> What about the LineRank example? We had that in Scala but never had a
> >> Java Example.
> >>
> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
> >> wrote:
> >> > Yes, I like that. For the ITCases I always just copied the Java
> ITCase.
> >> >
> >> > The only examples that are missing now are LinearRegression and the
> >> > relational stuff.
> >> >
> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
> >> wrote:
> >> >> I just removed the old CountEdgeDegrees example.
> >> >> That was a preprocessing step for the TriangleEnumeration, and is now
> >> part
> >> >> of the new TriangleEnumerationOpt example.
> >> >> So I guess, we don't need to port that one. As I said before, I'd
> >> prefer to
> >> >> keep Java and Scala examples in sync.
> >> >>
> >> >> Cheers, Fabian
> >> >>
> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >> >>
> >> >>> I added the PageRank example, thanks again fabian. :D
> >> >>>
> >> >>> Regarding the other stuff:
> >> >>>  - There is a comment in DataSet.scala about including
> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
> >> >>>  - I added generateSequence to ExecutionEnvironment.
> >> >>>  - It is possible to use Scala Primitives in Array, I noticed it
> while
> >> >>> writing the tests, you probably had an older version of the code.
> >> >>>  - Yes, using List and other Interfaces is not possible, this is
> also
> >> >>> a restriction in the Java API.
> >> >>>
> >> >>> What do you think about the interface of join and coGroup? Right
> now,
> >> >>> you can either use a lambda that returns an Option or the lambda
> with
> >> >>> the Collector. Originally I wanted to have also have a lambda that
> >> >>> returns a Collection, but due to type erasure this has the same type
> >> >>> as the lambda with the Option so I couldn't use it. There is an
> >> >>> implicit conversion from Option to a Collection, so I could change
> it
> >> >>> without breaking the examples we have now. What do you think?
> >> >>>
> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >>> WebLogAnalysis, TransitiveClosureNaive,
> TriangleEnumerationNaive/Opt,
> >> >>> PageRank
> >> >>>
> >> >>> These are the examples people called dibs on:
> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
> LinearRegression
> >> >>> Example from Java)
> >> >>>  - ComputeEdgeDegrees (Hermann)
> >> >>>
> >> >>> Those are unclaimed (if I'm not mistaken):
> >> >>>  - The relational Stuff
> >> >>>
> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
> >> wrote:
> >> >>> > +1 for removing RelationQuery
> >> >>> >
> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
> >> [hidden email]>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> By the way, what was called BatchGradientDescent in the Scala
> >> examples
> >> >>> >> should be replaced by a port of the LinearRegression Example from
> >> >>> >> Java. I had them as two separate examples earlier.
> >> >>> >>
> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
> removing
> >> >>> >> RelationalQuery?
> >> >>> >>
> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
> >> [hidden email]
> >> >>> >
> >> >>> >> wrote:
> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
> >> >>> >> >
> >> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
> >> TriangleEnumerationNaive/Opt
> >> >>> >> >
> >> >>> >> > These are the examples people called dibs on:
> >> >>> >> >  - PageRank (Fabian)
> >> >>> >> >  - BatchGradientDescent (Márton)
> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
> >> >>> >> >
> >> >>> >> > Those are unclaimed (if I'm not mistaken):
> >> >>> >> >  - The relational Stuff
> >> >>> >> >  - LinearRegression
> >> >>> >> >
> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> >> >>> [hidden email]>
> >> >>> >> wrote:
> >> >>> >> >> Thanks, I added it. I'll keep a running list of
> ported/unported
> >> >>> >> >> examples in my mails. I'll rename the java example package to
> >> >>> examples
> >> >>> >> >> once the Scala API merge is done.
> >> >>> >> >>
> >> >>> >> >> I think the termination criterion is fine as it is. Just
> because
> >> >>> Scala
> >> >>> >> >> enables functional programming doesn't mean it's always the
> best
> >> >>> >> >> choice. :D
> >> >>> >> >>
> >> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
> >> >>> >> >>
> >> >>> >> >> These are the examples people called dibs on:
> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
> >> >>> >> >>  - BatchGradientDescent (Márton)
> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
> >> >>> >> >>
> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
> >> >>> >> >>  - The relational Stuff
> >> >>> >> >>  - LinearRegression
> >> >>> >> >>
> >> >>> >> >> Cheers,
> >> >>> >> >> Aljoscha
> >> >>> >> >>
> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
> >> [hidden email]
> >> >>> >
> >> >>> >> wrote:
> >> >>> >> >>> Transitive closure here, I also added a termination criterion
> >> in the
> >> >>> >> Java
> >> >>> >> >>> version:
> >> >>> >>
> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >> >>> >> >>>
> >> >>> >> >>> Perhaps you can make the termination criterion in Scala more
> >> >>> >> functional?
> >> >>> >> >>>
> >> >>> >> >>> I noticed that the examples package name is example.java but
> >> >>> >> examples.scala
> >> >>> >> >>>
> >> >>> >> >>> Kostas
> >> >>> >> >>>
> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
> >> [hidden email]
> >> >>> >
> >> >>> >> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on
> your
> >> >>> list).
> >> >>> >> >>>>
> >> >>> >> >>>> If nobody volunteers for the relational stuff I can take
> those
> >> as
> >> >>> >> well.
> >> >>> >> >>>>
> >> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
> >> Java?
> >> >>> It
> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
> >> teaching
> >> >>> >> value on
> >> >>> >> >>>> top of TPC-H Q3?
> >> >>> >> >>>>
> >> >>> >> >>>> Kostas
> >> >>> >> >>>>
> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> >> >>> [hidden email]
> >> >>> >> >
> >> >>> >> >>>> wrote:
> >> >>> >> >>>>>
> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
> >> >>> >> >>>>>
> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
> ConnectedComponents,
> >> >>> >> >>>>> WebLogAnalysis
> >> >>> >> >>>>>
> >> >>> >> >>>>> These are the examples people called dibs on:
> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> >> >>> >> >>>>>
> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
> >> >>> >> >>>>>  - TransitiveClosure
> >> >>> >> >>>>>  - The relational Stuff
> >> >>> >> >>>>>  - LinearRegression
> >> >>> >> >>>>>
> >> >>> >> >>>>> Cheers,
> >> >>> >> >>>>> Aljoscha
> >> >>> >> >>>>>
> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> >> >>> [hidden email]>
> >> >>> >> >>>>> wrote:
> >> >>> >> >>>>> > WebLog here:
> >> >>> >> >>>>> >
> >> >>> >> >>>>> >
> >> >>> >>
> >> >>>
> >>
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >> >>> >> >>>>> >
> >> >>> >> >>>>> > Do you need any more done?
> >> >>> >> >>>>> >
> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> >> >>> >> [hidden email]>
> >> >>> >> >>>>> > wrote:
> >> >>> >> >>>>> >
> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >> >>> >> >>>>> >>
> >> >>> >> >>>>> >> Keep 'em coming, people. :D
> >> >>> >> >>>>> >>
> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> >> >>> [hidden email]
> >> >>> >> >
> >> >>> >> >>>>> >> wrote:
> >> >>> >> >>>>> >> > Alright, will do.
> >> >>> >> >>>>> >> > Thanks!
> >> >>> >> >>>>> >> >
> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> >> >>> >> [hidden email]>:
> >> >>> >> >>>>> >> >
> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
> >> storing
> >> >>> >> the
> >> >>> >> >>>>> >> >> data
> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
> converting
> >> it to
> >> >>> >> the
> >> >>> >> >>>>> >> >> required Java or Scala objects.
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
> >> consistent
> >> >>> >> with the
> >> >>> >> >>>>> >> >> Java
> >> >>> >> >>>>> >> >> API.
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
> >> >>> keyword,
> >> >>> >> you
> >> >>> >> >>>>> >> >> can
> >> >>> >> >>>>> >> >> just write:
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) =>
> new
> >> >>> >> MyResult(le,
> >> >>> >> >>>>> >> >> re)
> >> >>> >> >>>>> >> }
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> >> >>> >> [hidden email]>
> >> >>> >> >>>>> >> wrote:
> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
> >> >>> inconsistency
> >> >>> >> with
> >> >>> >> >>>>> >> >> > the
> >> >>> >> >>>>> >> Java
> >> >>> >> >>>>> >> >> > API. In Java join is done as:
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> > where in the current Scala this is:
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
> >> method
> >> >>> >> because
> >> >>> >> >>>>> >> "with"
> >> >>> >> >>>>> >> >> is
> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
> similar
> >> for
> >> >>> >> Scala
> >> >>> >> >>>>> >> >> > or go
> >> >>> >> >>>>> >> >> with
> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
> >> [hidden email]
> >> >>> >:
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
> >> That
> >> >>> is a
> >> >>> >> >>>>> >> >> >> generic
> >> >>> >> >>>>> >> >> >> representation of a Tuple.
> >> >>> >> >>>>> >> >> >>
> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or
> Scala
> >> >>> Tuples,
> >> >>> >> >>>>> >> >> >> with a
> >> >>> >> >>>>> >> >> generic
> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
> >> >>> >> >>>>> >> >> >>
> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
> >> >>> >> >>>>> >> >> >> <[hidden email]>
> >> >>> >> >>>>> >> >> wrote:
> >> >>> >> >>>>> >> >> >>
> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >> >>> >> >>>>> >> >> >> >
> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
> using
> >> the
> >> >>> >> >>>>> >> >> >> > CSVFormat
> >> >>> >> >>>>> >> won't
> >> >>> >> >>>>> >> >> >> work
> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
> manually...
> >> >>> >> >>>>> >> >> >> >
> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
> >> >>> >> >>>>> >> >> >> >
> >> >>> >> >>>>> >> >> >> > > Hi,
> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
> >> all
> >> >>> the
> >> >>> >> >>>>> >> >> >> > > example
> >> >>> >> >>>>> >> input
> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in
> all
> >> the
> >> >>> >> >>>>> >> >> >> > > examples.
> >> >>> >> >>>>> >> What
> >> >>> >> >>>>> >> >> do
> >> >>> >> >>>>> >> >> >> > > you think?
> >> >>> >> >>>>> >> >> >> > >
> >> >>> >> >>>>> >> >> >> > > Cheers,
> >> >>> >> >>>>> >> >> >> > > Aljoscha
> >> >>> >> >>>>> >> >> >> > >
> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
> Krettek
> >> <
> >> >>> >> >>>>> >> >> [hidden email]>
> >> >>> >> >>>>> >> >> >> > > wrote:
> >> >>> >> >>>>> >> >> >> > > > Hi,
> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
> >> >>> >> incompatible.
> >> >>> >> >>>>> >> >> >> > > > I'm
> >> >>> >> >>>>> >> >> afraid
> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
> >> data to
> >> >>> a
> >> >>> >> >>>>> >> >> >> > > > static
> >> >>> >> >>>>> >> field
> >> >>> >> >>>>> >> >> and
> >> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
> >> method in
> >> >>> >> Scala.
> >> >>> >> >>>>> >> >> >> > > > It's
> >> >>> >> >>>>> >> >> not
> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data
> and
> >> >>> make it
> >> >>> >> >>>>> >> >> >> > > > easier
> >> >>> >> >>>>> >> for
> >> >>> >> >>>>> >> >> it
> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
> >> versions.
> >> >>> >> >>>>> >> >> >> > > >
> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
> >> occur
> >> >>> in
> >> >>> >> all
> >> >>> >> >>>>> >> >> >> > > > the
> >> >>> >> >>>>> >> >> >> examples.
> >> >>> >> >>>>> >> >> >> > > >
> >> >>> >> >>>>> >> >> >> > > > Cheers,
> >> >>> >> >>>>> >> >> >> > > > Aljoscha
> >> >>> >> >>>>> >> >> >> > > >
> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
> >> Kalavri
> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >> >>> >> >>>>> >> >> >> > > >> Hey,
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
> >> example,
> >> >>> but
> >> >>> >> I am
> >> >>> >> >>>>> >> >> >> > > >> not
> >> >>> >> >>>>> >> sure
> >> >>> >> >>>>> >> >> >> how
> >> >>> >> >>>>> >> >> >> > to
> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
> >> java-examples.
> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
> >> vertices
> >> >>> >> and
> >> >>> >> >>>>> >> >> >> > > >> edges
> >> >>> >> >>>>> >> data
> >> >>> >> >>>>> >> >> >> are
> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
> >> getDefaultVertexDataSet()
> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >> >>> >> >>>>> >> >> >> > > >> an
> >> org.apache.flink.api.java.ExecutionEnvironment
> >> >>> as
> >> >>> >> >>>>> >> parameter.
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
> >> (like
> >> >>> in
> >> >>> >> the
> >> >>> >> >>>>> >> >> >> WordCountData
> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
> >> >>> >> >>>>> >> >> >> > > >> from
> org.apache.flink.api.java.tuple.Tuple2 to
> >> >>> Scala
> >> >>> >> >>>>> >> >> >> > > >> tuple and
> >> >>> >> >>>>> >> >> from
> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess
> this
> >> is
> >> >>> an
> >> >>> >> >>>>> >> unnecessary
> >> >>> >> >>>>> >> >> >> > > complexity
> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
> example
> >> >>> data
> >> >>> >> in
> >> >>> >> >>>>> >> >> >> > > >> the
> >> >>> >> >>>>> >> Scala
> >> >>> >> >>>>> >> >> >> > > example.
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> Thanks!
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> Cheers,
> >> >>> >> >>>>> >> >> >> > > >> V.
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
> Krettek <
> >> >>> >> >>>>> >> [hidden email]
> >> >>> >> >>>>> >> >> >
> >> >>> >> >>>>> >> >> >> > > wrote:
> >> >>> >> >>>>> >> >> >> > > >>
> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >>
> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
> >> It's
> >> >>> >> pretty
> >> >>> >> >>>>> >> >> >> > > >>> much a
> >> >>> >> >>>>> >> >> copy
> >> >>> >> >>>>> >> >> >> of
> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
> >> syntax
> >> >>> and
> >> >>> >> >>>>> >> >> >> > > >>> lambda
> >> >>> >> >>>>> >> >> >> > functions.
> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
> >> java-examples
> >> >>> >> as a
> >> >>> >> >>>>> >> >> dependency
> >> >>> >> >>>>> >> >> >> for
> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse
> the
> >> >>> example
> >> >>> >> >>>>> >> >> >> > > >>> input
> >> >>> >> >>>>> >> data.
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a
> pull
> >> >>> request
> >> >>> >> >>>>> >> >> >> > > >>> against
> >> >>> >> >>>>> >> my
> >> >>> >> >>>>> >> >> repo
> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
> >> Gábor <
> >> >>> >> >>>>> >> >> >> [hidden email]
> >> >>> >> >>>>> >> >> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> >> >>> >> >>>>> >> >> >> > > >>> > +1
> >> >>> >> >>>>> >> >> >> > > >>> >
> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >> >>> >> >>>>> >> >> >> > > >>> >
> >> >>> >> >>>>> >> >> >> > > >>> >
> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
> >> >>> Balassi <
> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
> >> >>> >> >>>>> >> >> >> > > >>> >
> >> >>> >> >>>>> >> >> >> > > >>> >> +1
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
> >> >>> Tzoumas <
> >> >>> >> >>>>> >> >> >> > > [hidden email]>
> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
> >> going
> >> >>> >> through
> >> >>> >> >>>>> >> >> >> > > >>> >> > a
> >> >>> >> >>>>> >> >> tutorial
> >> >>> >> >>>>> >> >> >> so
> >> >>> >> >>>>> >> >> >> > > this
> >> >>> >> >>>>> >> >> >> > > >>> >> will
> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
> >> the
> >> >>> new
> >> >>> >> API
> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
> Vasiliki
> >> >>> >> Kalavri <
> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
> implement
> >> the
> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
> >> me :)
> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
> >> Hueske <
> >> >>> >> >>>>> >> >> >> [hidden email]>
> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
> >> >>> PageRank.
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
> similar
> >> to
> >> >>> the
> >> >>> >> Java
> >> >>> >> >>>>> >> >> examples:
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
> >> >>> parameters
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
> Aljoscha
> >> >>> >> Krettek <
> >> >>> >> >>>>> >> >> >> > > [hidden email]
> >> >>> >> >>>>> >> >> >> > > >>> >:
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
> reserve
> >> their
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> >> >>> >> >>>>> >> >> >> examples
> >> >>> >> >>>>> >> >> >> > > here.
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
> >> Fabian
> >> >>> >> Hueske
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
> >> implemented
> >> >>> by
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
> >> >>> >> >>>>> >> >> >> people
> >> >>> >> >>>>> >> >> >> > > >>> proved to
> >> >>> >> >>>>> >> >> >> > > >>> >> > be
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
> >> examples.
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
> >> port a
> >> >>> >> simple
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
> >> >>> >> >>>>> >> >> one
> >> >>> >> >>>>> >> >> >> > such
> >> >>> >> >>>>> >> >> >> > > as
> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
> >> Aljoscha
> >> >>> >> Krettek
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
> >> >>> >> >>>>> >> >> >> > > >>> >> >:
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of
> the
> >> >>> Scala
> >> >>> >> API
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >>
> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only
> have
> >> to
> >> >>> >> write
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> >> >>> >> >>>>> >> tests
> >> >>> >> >>>>> >> >> and
> >> >>> >> >>>>> >> >> >> > > port
> >> >>> >> >>>>> >> >> >> > > >>> the
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it
> makes
> >> >>> sense
> >> >>> >> to
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> >> >>> >> >>>>> >> other
> >> >>> >> >>>>> >> >> >> > people
> >> >>> >> >>>>> >> >> >> > > >>> port
> >> >>> >> >>>>> >> >> >> > > >>> >> the
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone
> else
> >> uses
> >> >>> >> it and
> >> >>> >> >>>>> >> maybe
> >> >>> >> >>>>> >> >> >> > notices
> >> >>> >> >>>>> >> >> >> > > some
> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >> >>> >> >>>>> >> >> >> > > >>> >>
> >> >>> >> >>>>> >> >> >> > > >>>
> >> >>> >> >>>>> >> >> >> > >
> >> >>> >> >>>>> >> >> >> >
> >> >>> >> >>>>> >> >> >>
> >> >>> >> >>>>> >> >>
> >> >>> >> >>>>> >>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >>
> >> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
Yes, that would allow list comprehensions. It would be possible to
have the Collection signature for join (and coGroup), i.e.:

apply[R]((T, O) => TraversableOnce[O]): DataSet[O]

(T and O are the left and right input type, R is result type)

Then you can return collections and still return an option, as in:

a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else None }

Because there is an implicit conversion from Options to a Collection.
This will always wrap the return value in a List with only one value.
I'm not sure we want the overhead here. I'm also not sure whether we
want the overhead of always having to use an Option even though the
join always returns a value.

What do you think?

On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <[hidden email]> wrote:

> Hmmm, tricky question...
> How about the Option for Join as this is a tuple-wise operation and the
> Collection for Cogroup which is group-wise?
> Could we in that case use list comprehensions in Cogroup functions?
>
> Or is that too much mixing?
>
> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>
>> I didn't look at the example either.
>>
>> Addings collections is easy, it's just that we can either have
>> Collections or the Option, not both.
>>
>> For the coding style I followed this:
>> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
>> which itself is based on this: http://docs.scala-lang.org/style/. It
>> is different from the Java Code Guidelines we have in place, yes.
>>
>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]>
>> wrote:
>> > I haven't looked at the LineRank example in detail, but if you think that
>> > it adds something new to the examples collection, we can certainly port
>> it
>> > also to Java.
>> > I think the Option and Collector return types are sufficient right now
>> but
>> > if Collections are easy to add, go for it. ;-)
>> >
>> > Great that the Scala primitives are working! Also thanks for adding
>> > genSequence and adapting my examples.
>> > Btw. does the codestyle not apply for Scala files or do we have a
>> different
>> > there?
>> >
>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>> >
>> >> What about the LineRank example? We had that in Scala but never had a
>> >> Java Example.
>> >>
>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
>> >> wrote:
>> >> > Yes, I like that. For the ITCases I always just copied the Java
>> ITCase.
>> >> >
>> >> > The only examples that are missing now are LinearRegression and the
>> >> > relational stuff.
>> >> >
>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
>> >> wrote:
>> >> >> I just removed the old CountEdgeDegrees example.
>> >> >> That was a preprocessing step for the TriangleEnumeration, and is now
>> >> part
>> >> >> of the new TriangleEnumerationOpt example.
>> >> >> So I guess, we don't need to port that one. As I said before, I'd
>> >> prefer to
>> >> >> keep Java and Scala examples in sync.
>> >> >>
>> >> >> Cheers, Fabian
>> >> >>
>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>> >> >>
>> >> >>> I added the PageRank example, thanks again fabian. :D
>> >> >>>
>> >> >>> Regarding the other stuff:
>> >> >>>  - There is a comment in DataSet.scala about including
>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
>> >> >>>  - I added generateSequence to ExecutionEnvironment.
>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed it
>> while
>> >> >>> writing the tests, you probably had an older version of the code.
>> >> >>>  - Yes, using List and other Interfaces is not possible, this is
>> also
>> >> >>> a restriction in the Java API.
>> >> >>>
>> >> >>> What do you think about the interface of join and coGroup? Right
>> now,
>> >> >>> you can either use a lambda that returns an Option or the lambda
>> with
>> >> >>> the Collector. Originally I wanted to have also have a lambda that
>> >> >>> returns a Collection, but due to type erasure this has the same type
>> >> >>> as the lambda with the Option so I couldn't use it. There is an
>> >> >>> implicit conversion from Option to a Collection, so I could change
>> it
>> >> >>> without breaking the examples we have now. What do you think?
>> >> >>>
>> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
>> TriangleEnumerationNaive/Opt,
>> >> >>> PageRank
>> >> >>>
>> >> >>> These are the examples people called dibs on:
>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
>> LinearRegression
>> >> >>> Example from Java)
>> >> >>>  - ComputeEdgeDegrees (Hermann)
>> >> >>>
>> >> >>> Those are unclaimed (if I'm not mistaken):
>> >> >>>  - The relational Stuff
>> >> >>>
>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
>> >> wrote:
>> >> >>> > +1 for removing RelationQuery
>> >> >>> >
>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
>> >> [hidden email]>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> By the way, what was called BatchGradientDescent in the Scala
>> >> examples
>> >> >>> >> should be replaced by a port of the LinearRegression Example from
>> >> >>> >> Java. I had them as two separate examples earlier.
>> >> >>> >>
>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
>> removing
>> >> >>> >> RelationalQuery?
>> >> >>> >>
>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
>> >> [hidden email]
>> >> >>> >
>> >> >>> >> wrote:
>> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>> >> >>> >> >
>> >> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
>> >> TriangleEnumerationNaive/Opt
>> >> >>> >> >
>> >> >>> >> > These are the examples people called dibs on:
>> >> >>> >> >  - PageRank (Fabian)
>> >> >>> >> >  - BatchGradientDescent (Márton)
>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
>> >> >>> >> >
>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
>> >> >>> >> >  - The relational Stuff
>> >> >>> >> >  - LinearRegression
>> >> >>> >> >
>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>> >> >>> [hidden email]>
>> >> >>> >> wrote:
>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
>> ported/unported
>> >> >>> >> >> examples in my mails. I'll rename the java example package to
>> >> >>> examples
>> >> >>> >> >> once the Scala API merge is done.
>> >> >>> >> >>
>> >> >>> >> >> I think the termination criterion is fine as it is. Just
>> because
>> >> >>> Scala
>> >> >>> >> >> enables functional programming doesn't mean it's always the
>> best
>> >> >>> >> >> choice. :D
>> >> >>> >> >>
>> >> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
>> >> >>> >> >>
>> >> >>> >> >> These are the examples people called dibs on:
>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
>> >> >>> >> >>  - BatchGradientDescent (Márton)
>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
>> >> >>> >> >>
>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
>> >> >>> >> >>  - The relational Stuff
>> >> >>> >> >>  - LinearRegression
>> >> >>> >> >>
>> >> >>> >> >> Cheers,
>> >> >>> >> >> Aljoscha
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
>> >> [hidden email]
>> >> >>> >
>> >> >>> >> wrote:
>> >> >>> >> >>> Transitive closure here, I also added a termination criterion
>> >> in the
>> >> >>> >> Java
>> >> >>> >> >>> version:
>> >> >>> >>
>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>> >> >>> >> >>>
>> >> >>> >> >>> Perhaps you can make the termination criterion in Scala more
>> >> >>> >> functional?
>> >> >>> >> >>>
>> >> >>> >> >>> I noticed that the examples package name is example.java but
>> >> >>> >> examples.scala
>> >> >>> >> >>>
>> >> >>> >> >>> Kostas
>> >> >>> >> >>>
>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
>> >> [hidden email]
>> >> >>> >
>> >> >>> >> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on
>> your
>> >> >>> list).
>> >> >>> >> >>>>
>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can take
>> those
>> >> as
>> >> >>> >> well.
>> >> >>> >> >>>>
>> >> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
>> >> Java?
>> >> >>> It
>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
>> >> teaching
>> >> >>> >> value on
>> >> >>> >> >>>> top of TPC-H Q3?
>> >> >>> >> >>>>
>> >> >>> >> >>>> Kostas
>> >> >>> >> >>>>
>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>> >> >>> [hidden email]
>> >> >>> >> >
>> >> >>> >> >>>> wrote:
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
>> ConnectedComponents,
>> >> >>> >> >>>>> WebLogAnalysis
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> These are the examples people called dibs on:
>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>> >> >>> >> >>>>>  - TransitiveClosure
>> >> >>> >> >>>>>  - The relational Stuff
>> >> >>> >> >>>>>  - LinearRegression
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> Cheers,
>> >> >>> >> >>>>> Aljoscha
>> >> >>> >> >>>>>
>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>> >> >>> [hidden email]>
>> >> >>> >> >>>>> wrote:
>> >> >>> >> >>>>> > WebLog here:
>> >> >>> >> >>>>> >
>> >> >>> >> >>>>> >
>> >> >>> >>
>> >> >>>
>> >>
>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>> >> >>> >> >>>>> >
>> >> >>> >> >>>>> > Do you need any more done?
>> >> >>> >> >>>>> >
>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>> >> >>> >> [hidden email]>
>> >> >>> >> >>>>> > wrote:
>> >> >>> >> >>>>> >
>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>> >> >>> >> >>>>> >>
>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
>> >> >>> >> >>>>> >>
>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>> >> >>> [hidden email]
>> >> >>> >> >
>> >> >>> >> >>>>> >> wrote:
>> >> >>> >> >>>>> >> > Alright, will do.
>> >> >>> >> >>>>> >> > Thanks!
>> >> >>> >> >>>>> >> >
>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>> >> >>> >> [hidden email]>:
>> >> >>> >> >>>>> >> >
>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
>> >> storing
>> >> >>> >> the
>> >> >>> >> >>>>> >> >> data
>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
>> converting
>> >> it to
>> >> >>> >> the
>> >> >>> >> >>>>> >> >> required Java or Scala objects.
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
>> >> consistent
>> >> >>> >> with the
>> >> >>> >> >>>>> >> >> Java
>> >> >>> >> >>>>> >> >> API.
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>> >> >>> keyword,
>> >> >>> >> you
>> >> >>> >> >>>>> >> >> can
>> >> >>> >> >>>>> >> >> just write:
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) =>
>> new
>> >> >>> >> MyResult(le,
>> >> >>> >> >>>>> >> >> re)
>> >> >>> >> >>>>> >> }
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>> >> >>> >> [hidden email]>
>> >> >>> >> >>>>> >> wrote:
>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>> >> >>> inconsistency
>> >> >>> >> with
>> >> >>> >> >>>>> >> >> > the
>> >> >>> >> >>>>> >> Java
>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
>> >> method
>> >> >>> >> because
>> >> >>> >> >>>>> >> "with"
>> >> >>> >> >>>>> >> >> is
>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
>> similar
>> >> for
>> >> >>> >> Scala
>> >> >>> >> >>>>> >> >> > or go
>> >> >>> >> >>>>> >> >> with
>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
>> >> [hidden email]
>> >> >>> >:
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
>> >> That
>> >> >>> is a
>> >> >>> >> >>>>> >> >> >> generic
>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
>> >> >>> >> >>>>> >> >> >>
>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or
>> Scala
>> >> >>> Tuples,
>> >> >>> >> >>>>> >> >> >> with a
>> >> >>> >> >>>>> >> >> generic
>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
>> >> >>> >> >>>>> >> >> >>
>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>> >> >>> >> >>>>> >> >> >> <[hidden email]>
>> >> >>> >> >>>>> >> >> wrote:
>> >> >>> >> >>>>> >> >> >>
>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>> >> >>> >> >>>>> >> >> >> >
>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
>> using
>> >> the
>> >> >>> >> >>>>> >> >> >> > CSVFormat
>> >> >>> >> >>>>> >> won't
>> >> >>> >> >>>>> >> >> >> work
>> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
>> manually...
>> >> >>> >> >>>>> >> >> >> >
>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
>> >> >>> >> >>>>> >> >> >> >
>> >> >>> >> >>>>> >> >> >> > > Hi,
>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
>> >> all
>> >> >>> the
>> >> >>> >> >>>>> >> >> >> > > example
>> >> >>> >> >>>>> >> input
>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in
>> all
>> >> the
>> >> >>> >> >>>>> >> >> >> > > examples.
>> >> >>> >> >>>>> >> What
>> >> >>> >> >>>>> >> >> do
>> >> >>> >> >>>>> >> >> >> > > you think?
>> >> >>> >> >>>>> >> >> >> > >
>> >> >>> >> >>>>> >> >> >> > > Cheers,
>> >> >>> >> >>>>> >> >> >> > > Aljoscha
>> >> >>> >> >>>>> >> >> >> > >
>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
>> Krettek
>> >> <
>> >> >>> >> >>>>> >> >> [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > wrote:
>> >> >>> >> >>>>> >> >> >> > > > Hi,
>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>> >> >>> >> incompatible.
>> >> >>> >> >>>>> >> >> >> > > > I'm
>> >> >>> >> >>>>> >> >> afraid
>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
>> >> data to
>> >> >>> a
>> >> >>> >> >>>>> >> >> >> > > > static
>> >> >>> >> >>>>> >> field
>> >> >>> >> >>>>> >> >> and
>> >> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
>> >> method in
>> >> >>> >> Scala.
>> >> >>> >> >>>>> >> >> >> > > > It's
>> >> >>> >> >>>>> >> >> not
>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data
>> and
>> >> >>> make it
>> >> >>> >> >>>>> >> >> >> > > > easier
>> >> >>> >> >>>>> >> for
>> >> >>> >> >>>>> >> >> it
>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
>> >> versions.
>> >> >>> >> >>>>> >> >> >> > > >
>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
>> >> occur
>> >> >>> in
>> >> >>> >> all
>> >> >>> >> >>>>> >> >> >> > > > the
>> >> >>> >> >>>>> >> >> >> examples.
>> >> >>> >> >>>>> >> >> >> > > >
>> >> >>> >> >>>>> >> >> >> > > > Cheers,
>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
>> >> >>> >> >>>>> >> >> >> > > >
>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
>> >> Kalavri
>> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>> >> >>> >> >>>>> >> >> >> > > >> Hey,
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
>> >> example,
>> >> >>> but
>> >> >>> >> I am
>> >> >>> >> >>>>> >> >> >> > > >> not
>> >> >>> >> >>>>> >> sure
>> >> >>> >> >>>>> >> >> >> how
>> >> >>> >> >>>>> >> >> >> > to
>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
>> >> java-examples.
>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
>> >> vertices
>> >> >>> >> and
>> >> >>> >> >>>>> >> >> >> > > >> edges
>> >> >>> >> >>>>> >> data
>> >> >>> >> >>>>> >> >> >> are
>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
>> >> getDefaultVertexDataSet()
>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >> >>> >> >>>>> >> >> >> > > >> an
>> >> org.apache.flink.api.java.ExecutionEnvironment
>> >> >>> as
>> >> >>> >> >>>>> >> parameter.
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
>> >> (like
>> >> >>> in
>> >> >>> >> the
>> >> >>> >> >>>>> >> >> >> WordCountData
>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>> >> >>> >> >>>>> >> >> >> > > >> from
>> org.apache.flink.api.java.tuple.Tuple2 to
>> >> >>> Scala
>> >> >>> >> >>>>> >> >> >> > > >> tuple and
>> >> >>> >> >>>>> >> >> from
>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess
>> this
>> >> is
>> >> >>> an
>> >> >>> >> >>>>> >> unnecessary
>> >> >>> >> >>>>> >> >> >> > > complexity
>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
>> example
>> >> >>> data
>> >> >>> >> in
>> >> >>> >> >>>>> >> >> >> > > >> the
>> >> >>> >> >>>>> >> Scala
>> >> >>> >> >>>>> >> >> >> > > example.
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
>> >> >>> >> >>>>> >> >> >> > > >> V.
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
>> Krettek <
>> >> >>> >> >>>>> >> [hidden email]
>> >> >>> >> >>>>> >> >> >
>> >> >>> >> >>>>> >> >> >> > > wrote:
>> >> >>> >> >>>>> >> >> >> > > >>
>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >>
>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
>> >> It's
>> >> >>> >> pretty
>> >> >>> >> >>>>> >> >> >> > > >>> much a
>> >> >>> >> >>>>> >> >> copy
>> >> >>> >> >>>>> >> >> >> of
>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
>> >> syntax
>> >> >>> and
>> >> >>> >> >>>>> >> >> >> > > >>> lambda
>> >> >>> >> >>>>> >> >> >> > functions.
>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
>> >> java-examples
>> >> >>> >> as a
>> >> >>> >> >>>>> >> >> dependency
>> >> >>> >> >>>>> >> >> >> for
>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse
>> the
>> >> >>> example
>> >> >>> >> >>>>> >> >> >> > > >>> input
>> >> >>> >> >>>>> >> data.
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a
>> pull
>> >> >>> request
>> >> >>> >> >>>>> >> >> >> > > >>> against
>> >> >>> >> >>>>> >> my
>> >> >>> >> >>>>> >> >> repo
>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
>> >> Gábor <
>> >> >>> >> >>>>> >> >> >> [hidden email]
>> >> >>> >> >>>>> >> >> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> > +1
>> >> >>> >> >>>>> >> >> >> > > >>> >
>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >> >>> >> >>>>> >> >> >> > > >>> >
>> >> >>> >> >>>>> >> >> >> > > >>> >
>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>> >> >>> Balassi <
>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>> >> >>> Tzoumas <
>> >> >>> >> >>>>> >> >> >> > > [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
>> >> going
>> >> >>> >> through
>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
>> >> >>> >> >>>>> >> >> tutorial
>> >> >>> >> >>>>> >> >> >> so
>> >> >>> >> >>>>> >> >> >> > > this
>> >> >>> >> >>>>> >> >> >> > > >>> >> will
>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
>> >> the
>> >> >>> new
>> >> >>> >> API
>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
>> Vasiliki
>> >> >>> >> Kalavri <
>> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
>> implement
>> >> the
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
>> >> me :)
>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
>> >> Hueske <
>> >> >>> >> >>>>> >> >> >> [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>> >> >>> PageRank.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
>> similar
>> >> to
>> >> >>> the
>> >> >>> >> Java
>> >> >>> >> >>>>> >> >> examples:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>> >> >>> parameters
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
>> Aljoscha
>> >> >>> >> Krettek <
>> >> >>> >> >>>>> >> >> >> > > [hidden email]
>> >> >>> >> >>>>> >> >> >> > > >>> >:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
>> reserve
>> >> their
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>> >> >>> >> >>>>> >> >> >> examples
>> >> >>> >> >>>>> >> >> >> > > here.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
>> >> Fabian
>> >> >>> >> Hueske
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
>> >> implemented
>> >> >>> by
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>> >> >>> >> >>>>> >> >> >> people
>> >> >>> >> >>>>> >> >> >> > > >>> proved to
>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
>> >> examples.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
>> >> port a
>> >> >>> >> simple
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>> >> >>> >> >>>>> >> >> one
>> >> >>> >> >>>>> >> >> >> > such
>> >> >>> >> >>>>> >> >> >> > > as
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
>> >> Aljoscha
>> >> >>> >> Krettek
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of
>> the
>> >> >>> Scala
>> >> >>> >> API
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >>
>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only
>> have
>> >> to
>> >> >>> >> write
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>> >> >>> >> >>>>> >> tests
>> >> >>> >> >>>>> >> >> and
>> >> >>> >> >>>>> >> >> >> > > port
>> >> >>> >> >>>>> >> >> >> > > >>> the
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it
>> makes
>> >> >>> sense
>> >> >>> >> to
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>> >> >>> >> >>>>> >> other
>> >> >>> >> >>>>> >> >> >> > people
>> >> >>> >> >>>>> >> >> >> > > >>> port
>> >> >>> >> >>>>> >> >> >> > > >>> >> the
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone
>> else
>> >> uses
>> >> >>> >> it and
>> >> >>> >> >>>>> >> maybe
>> >> >>> >> >>>>> >> >> >> > notices
>> >> >>> >> >>>>> >> >> >> > > some
>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>> >> >>> >> >>>>> >> >> >> > > >>> >>
>> >> >>> >> >>>>> >> >> >> > > >>>
>> >> >>> >> >>>>> >> >> >> > >
>> >> >>> >> >>>>> >> >> >> >
>> >> >>> >> >>>>> >> >> >>
>> >> >>> >> >>>>> >> >>
>> >> >>> >> >>>>> >>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>
>> >> >>> >>
>> >> >>>
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
I added support for specifying keys by name for CaseClasses. Check out
the PageRank and TriangleEnumeration examples to see it in action.

@Kostas: I think you could use them for the TPC-H examples.

On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek <[hidden email]> wrote:

> Yes, that would allow list comprehensions. It would be possible to
> have the Collection signature for join (and coGroup), i.e.:
>
> apply[R]((T, O) => TraversableOnce[O]): DataSet[O]
>
> (T and O are the left and right input type, R is result type)
>
> Then you can return collections and still return an option, as in:
>
> a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else None }
>
> Because there is an implicit conversion from Options to a Collection.
> This will always wrap the return value in a List with only one value.
> I'm not sure we want the overhead here. I'm also not sure whether we
> want the overhead of always having to use an Option even though the
> join always returns a value.
>
> What do you think?
>
> On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <[hidden email]> wrote:
>> Hmmm, tricky question...
>> How about the Option for Join as this is a tuple-wise operation and the
>> Collection for Cogroup which is group-wise?
>> Could we in that case use list comprehensions in Cogroup functions?
>>
>> Or is that too much mixing?
>>
>> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>
>>> I didn't look at the example either.
>>>
>>> Addings collections is easy, it's just that we can either have
>>> Collections or the Option, not both.
>>>
>>> For the coding style I followed this:
>>> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
>>> which itself is based on this: http://docs.scala-lang.org/style/. It
>>> is different from the Java Code Guidelines we have in place, yes.
>>>
>>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]>
>>> wrote:
>>> > I haven't looked at the LineRank example in detail, but if you think that
>>> > it adds something new to the examples collection, we can certainly port
>>> it
>>> > also to Java.
>>> > I think the Option and Collector return types are sufficient right now
>>> but
>>> > if Collections are easy to add, go for it. ;-)
>>> >
>>> > Great that the Scala primitives are working! Also thanks for adding
>>> > genSequence and adapting my examples.
>>> > Btw. does the codestyle not apply for Scala files or do we have a
>>> different
>>> > there?
>>> >
>>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>> >
>>> >> What about the LineRank example? We had that in Scala but never had a
>>> >> Java Example.
>>> >>
>>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
>>> >> wrote:
>>> >> > Yes, I like that. For the ITCases I always just copied the Java
>>> ITCase.
>>> >> >
>>> >> > The only examples that are missing now are LinearRegression and the
>>> >> > relational stuff.
>>> >> >
>>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
>>> >> wrote:
>>> >> >> I just removed the old CountEdgeDegrees example.
>>> >> >> That was a preprocessing step for the TriangleEnumeration, and is now
>>> >> part
>>> >> >> of the new TriangleEnumerationOpt example.
>>> >> >> So I guess, we don't need to port that one. As I said before, I'd
>>> >> prefer to
>>> >> >> keep Java and Scala examples in sync.
>>> >> >>
>>> >> >> Cheers, Fabian
>>> >> >>
>>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>> >> >>
>>> >> >>> I added the PageRank example, thanks again fabian. :D
>>> >> >>>
>>> >> >>> Regarding the other stuff:
>>> >> >>>  - There is a comment in DataSet.scala about including
>>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
>>> >> >>>  - I added generateSequence to ExecutionEnvironment.
>>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed it
>>> while
>>> >> >>> writing the tests, you probably had an older version of the code.
>>> >> >>>  - Yes, using List and other Interfaces is not possible, this is
>>> also
>>> >> >>> a restriction in the Java API.
>>> >> >>>
>>> >> >>> What do you think about the interface of join and coGroup? Right
>>> now,
>>> >> >>> you can either use a lambda that returns an Option or the lambda
>>> with
>>> >> >>> the Collector. Originally I wanted to have also have a lambda that
>>> >> >>> returns a Collection, but due to type erasure this has the same type
>>> >> >>> as the lambda with the Option so I couldn't use it. There is an
>>> >> >>> implicit conversion from Option to a Collection, so I could change
>>> it
>>> >> >>> without breaking the examples we have now. What do you think?
>>> >> >>>
>>> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
>>> TriangleEnumerationNaive/Opt,
>>> >> >>> PageRank
>>> >> >>>
>>> >> >>> These are the examples people called dibs on:
>>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
>>> LinearRegression
>>> >> >>> Example from Java)
>>> >> >>>  - ComputeEdgeDegrees (Hermann)
>>> >> >>>
>>> >> >>> Those are unclaimed (if I'm not mistaken):
>>> >> >>>  - The relational Stuff
>>> >> >>>
>>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
>>> >> wrote:
>>> >> >>> > +1 for removing RelationQuery
>>> >> >>> >
>>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
>>> >> [hidden email]>
>>> >> >>> > wrote:
>>> >> >>> >
>>> >> >>> >> By the way, what was called BatchGradientDescent in the Scala
>>> >> examples
>>> >> >>> >> should be replaced by a port of the LinearRegression Example from
>>> >> >>> >> Java. I had them as two separate examples earlier.
>>> >> >>> >>
>>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
>>> removing
>>> >> >>> >> RelationalQuery?
>>> >> >>> >>
>>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
>>> >> [hidden email]
>>> >> >>> >
>>> >> >>> >> wrote:
>>> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>>> >> >>> >> >
>>> >> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
>>> >> TriangleEnumerationNaive/Opt
>>> >> >>> >> >
>>> >> >>> >> > These are the examples people called dibs on:
>>> >> >>> >> >  - PageRank (Fabian)
>>> >> >>> >> >  - BatchGradientDescent (Márton)
>>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
>>> >> >>> >> >
>>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
>>> >> >>> >> >  - The relational Stuff
>>> >> >>> >> >  - LinearRegression
>>> >> >>> >> >
>>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>>> >> >>> [hidden email]>
>>> >> >>> >> wrote:
>>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
>>> ported/unported
>>> >> >>> >> >> examples in my mails. I'll rename the java example package to
>>> >> >>> examples
>>> >> >>> >> >> once the Scala API merge is done.
>>> >> >>> >> >>
>>> >> >>> >> >> I think the termination criterion is fine as it is. Just
>>> because
>>> >> >>> Scala
>>> >> >>> >> >> enables functional programming doesn't mean it's always the
>>> best
>>> >> >>> >> >> choice. :D
>>> >> >>> >> >>
>>> >> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
>>> >> >>> >> >>
>>> >> >>> >> >> These are the examples people called dibs on:
>>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
>>> >> >>> >> >>  - BatchGradientDescent (Márton)
>>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
>>> >> >>> >> >>
>>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
>>> >> >>> >> >>  - The relational Stuff
>>> >> >>> >> >>  - LinearRegression
>>> >> >>> >> >>
>>> >> >>> >> >> Cheers,
>>> >> >>> >> >> Aljoscha
>>> >> >>> >> >>
>>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
>>> >> [hidden email]
>>> >> >>> >
>>> >> >>> >> wrote:
>>> >> >>> >> >>> Transitive closure here, I also added a termination criterion
>>> >> in the
>>> >> >>> >> Java
>>> >> >>> >> >>> version:
>>> >> >>> >>
>>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>> >> >>> >> >>>
>>> >> >>> >> >>> Perhaps you can make the termination criterion in Scala more
>>> >> >>> >> functional?
>>> >> >>> >> >>>
>>> >> >>> >> >>> I noticed that the examples package name is example.java but
>>> >> >>> >> examples.scala
>>> >> >>> >> >>>
>>> >> >>> >> >>> Kostas
>>> >> >>> >> >>>
>>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
>>> >> [hidden email]
>>> >> >>> >
>>> >> >>> >> wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on
>>> your
>>> >> >>> list).
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can take
>>> those
>>> >> as
>>> >> >>> >> well.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
>>> >> Java?
>>> >> >>> It
>>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
>>> >> teaching
>>> >> >>> >> value on
>>> >> >>> >> >>>> top of TPC-H Q3?
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Kostas
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>>> >> >>> [hidden email]
>>> >> >>> >> >
>>> >> >>> >> >>>> wrote:
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
>>> ConnectedComponents,
>>> >> >>> >> >>>>> WebLogAnalysis
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> These are the examples people called dibs on:
>>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
>>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>>> >> >>> >> >>>>>  - TransitiveClosure
>>> >> >>> >> >>>>>  - The relational Stuff
>>> >> >>> >> >>>>>  - LinearRegression
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> Cheers,
>>> >> >>> >> >>>>> Aljoscha
>>> >> >>> >> >>>>>
>>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>>> >> >>> [hidden email]>
>>> >> >>> >> >>>>> wrote:
>>> >> >>> >> >>>>> > WebLog here:
>>> >> >>> >> >>>>> >
>>> >> >>> >> >>>>> >
>>> >> >>> >>
>>> >> >>>
>>> >>
>>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>> >> >>> >> >>>>> >
>>> >> >>> >> >>>>> > Do you need any more done?
>>> >> >>> >> >>>>> >
>>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>>> >> >>> >> [hidden email]>
>>> >> >>> >> >>>>> > wrote:
>>> >> >>> >> >>>>> >
>>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>>> >> >>> >> >>>>> >>
>>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
>>> >> >>> >> >>>>> >>
>>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>>> >> >>> [hidden email]
>>> >> >>> >> >
>>> >> >>> >> >>>>> >> wrote:
>>> >> >>> >> >>>>> >> > Alright, will do.
>>> >> >>> >> >>>>> >> > Thanks!
>>> >> >>> >> >>>>> >> >
>>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>>> >> >>> >> [hidden email]>:
>>> >> >>> >> >>>>> >> >
>>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
>>> >> storing
>>> >> >>> >> the
>>> >> >>> >> >>>>> >> >> data
>>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
>>> converting
>>> >> it to
>>> >> >>> >> the
>>> >> >>> >> >>>>> >> >> required Java or Scala objects.
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
>>> >> consistent
>>> >> >>> >> with the
>>> >> >>> >> >>>>> >> >> Java
>>> >> >>> >> >>>>> >> >> API.
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>>> >> >>> keyword,
>>> >> >>> >> you
>>> >> >>> >> >>>>> >> >> can
>>> >> >>> >> >>>>> >> >> just write:
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) =>
>>> new
>>> >> >>> >> MyResult(le,
>>> >> >>> >> >>>>> >> >> re)
>>> >> >>> >> >>>>> >> }
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>>> >> >>> >> [hidden email]>
>>> >> >>> >> >>>>> >> wrote:
>>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>>> >> >>> inconsistency
>>> >> >>> >> with
>>> >> >>> >> >>>>> >> >> > the
>>> >> >>> >> >>>>> >> Java
>>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
>>> >> method
>>> >> >>> >> because
>>> >> >>> >> >>>>> >> "with"
>>> >> >>> >> >>>>> >> >> is
>>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
>>> similar
>>> >> for
>>> >> >>> >> Scala
>>> >> >>> >> >>>>> >> >> > or go
>>> >> >>> >> >>>>> >> >> with
>>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
>>> >> [hidden email]
>>> >> >>> >:
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
>>> >> That
>>> >> >>> is a
>>> >> >>> >> >>>>> >> >> >> generic
>>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
>>> >> >>> >> >>>>> >> >> >>
>>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or
>>> Scala
>>> >> >>> Tuples,
>>> >> >>> >> >>>>> >> >> >> with a
>>> >> >>> >> >>>>> >> >> generic
>>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
>>> >> >>> >> >>>>> >> >> >>
>>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>> >> >>> >> >>>>> >> >> >> <[hidden email]>
>>> >> >>> >> >>>>> >> >> wrote:
>>> >> >>> >> >>>>> >> >> >>
>>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>>> >> >>> >> >>>>> >> >> >> >
>>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
>>> using
>>> >> the
>>> >> >>> >> >>>>> >> >> >> > CSVFormat
>>> >> >>> >> >>>>> >> won't
>>> >> >>> >> >>>>> >> >> >> work
>>> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
>>> manually...
>>> >> >>> >> >>>>> >> >> >> >
>>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
>>> >> >>> >> >>>>> >> >> >> >
>>> >> >>> >> >>>>> >> >> >> > > Hi,
>>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
>>> >> all
>>> >> >>> the
>>> >> >>> >> >>>>> >> >> >> > > example
>>> >> >>> >> >>>>> >> input
>>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in
>>> all
>>> >> the
>>> >> >>> >> >>>>> >> >> >> > > examples.
>>> >> >>> >> >>>>> >> What
>>> >> >>> >> >>>>> >> >> do
>>> >> >>> >> >>>>> >> >> >> > > you think?
>>> >> >>> >> >>>>> >> >> >> > >
>>> >> >>> >> >>>>> >> >> >> > > Cheers,
>>> >> >>> >> >>>>> >> >> >> > > Aljoscha
>>> >> >>> >> >>>>> >> >> >> > >
>>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
>>> Krettek
>>> >> <
>>> >> >>> >> >>>>> >> >> [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>> >> >>> >> >>>>> >> >> >> > > > Hi,
>>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>>> >> >>> >> incompatible.
>>> >> >>> >> >>>>> >> >> >> > > > I'm
>>> >> >>> >> >>>>> >> >> afraid
>>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
>>> >> data to
>>> >> >>> a
>>> >> >>> >> >>>>> >> >> >> > > > static
>>> >> >>> >> >>>>> >> field
>>> >> >>> >> >>>>> >> >> and
>>> >> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
>>> >> method in
>>> >> >>> >> Scala.
>>> >> >>> >> >>>>> >> >> >> > > > It's
>>> >> >>> >> >>>>> >> >> not
>>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data
>>> and
>>> >> >>> make it
>>> >> >>> >> >>>>> >> >> >> > > > easier
>>> >> >>> >> >>>>> >> for
>>> >> >>> >> >>>>> >> >> it
>>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
>>> >> versions.
>>> >> >>> >> >>>>> >> >> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
>>> >> occur
>>> >> >>> in
>>> >> >>> >> all
>>> >> >>> >> >>>>> >> >> >> > > > the
>>> >> >>> >> >>>>> >> >> >> examples.
>>> >> >>> >> >>>>> >> >> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > > Cheers,
>>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
>>> >> >>> >> >>>>> >> >> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
>>> >> Kalavri
>>> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>>> >> >>> >> >>>>> >> >> >> > > >> Hey,
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
>>> >> example,
>>> >> >>> but
>>> >> >>> >> I am
>>> >> >>> >> >>>>> >> >> >> > > >> not
>>> >> >>> >> >>>>> >> sure
>>> >> >>> >> >>>>> >> >> >> how
>>> >> >>> >> >>>>> >> >> >> > to
>>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
>>> >> java-examples.
>>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
>>> >> vertices
>>> >> >>> >> and
>>> >> >>> >> >>>>> >> >> >> > > >> edges
>>> >> >>> >> >>>>> >> data
>>> >> >>> >> >>>>> >> >> >> are
>>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
>>> >> getDefaultVertexDataSet()
>>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>> >> >>> >> >>>>> >> >> >> > > >> an
>>> >> org.apache.flink.api.java.ExecutionEnvironment
>>> >> >>> as
>>> >> >>> >> >>>>> >> parameter.
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
>>> >> (like
>>> >> >>> in
>>> >> >>> >> the
>>> >> >>> >> >>>>> >> >> >> WordCountData
>>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>>> >> >>> >> >>>>> >> >> >> > > >> from
>>> org.apache.flink.api.java.tuple.Tuple2 to
>>> >> >>> Scala
>>> >> >>> >> >>>>> >> >> >> > > >> tuple and
>>> >> >>> >> >>>>> >> >> from
>>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess
>>> this
>>> >> is
>>> >> >>> an
>>> >> >>> >> >>>>> >> unnecessary
>>> >> >>> >> >>>>> >> >> >> > > complexity
>>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
>>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
>>> example
>>> >> >>> data
>>> >> >>> >> in
>>> >> >>> >> >>>>> >> >> >> > > >> the
>>> >> >>> >> >>>>> >> Scala
>>> >> >>> >> >>>>> >> >> >> > > example.
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
>>> >> >>> >> >>>>> >> >> >> > > >> V.
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
>>> Krettek <
>>> >> >>> >> >>>>> >> [hidden email]
>>> >> >>> >> >>>>> >> >> >
>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
>>> >> It's
>>> >> >>> >> pretty
>>> >> >>> >> >>>>> >> >> >> > > >>> much a
>>> >> >>> >> >>>>> >> >> copy
>>> >> >>> >> >>>>> >> >> >> of
>>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
>>> >> syntax
>>> >> >>> and
>>> >> >>> >> >>>>> >> >> >> > > >>> lambda
>>> >> >>> >> >>>>> >> >> >> > functions.
>>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
>>> >> java-examples
>>> >> >>> >> as a
>>> >> >>> >> >>>>> >> >> dependency
>>> >> >>> >> >>>>> >> >> >> for
>>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse
>>> the
>>> >> >>> example
>>> >> >>> >> >>>>> >> >> >> > > >>> input
>>> >> >>> >> >>>>> >> data.
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a
>>> pull
>>> >> >>> request
>>> >> >>> >> >>>>> >> >> >> > > >>> against
>>> >> >>> >> >>>>> >> my
>>> >> >>> >> >>>>> >> >> repo
>>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
>>> >> Gábor <
>>> >> >>> >> >>>>> >> >> >> [hidden email]
>>> >> >>> >> >>>>> >> >> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> > +1
>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>>> >> >>> Balassi <
>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>>> >> >>> Tzoumas <
>>> >> >>> >> >>>>> >> >> >> > > [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
>>> >> going
>>> >> >>> >> through
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
>>> >> >>> >> >>>>> >> >> tutorial
>>> >> >>> >> >>>>> >> >> >> so
>>> >> >>> >> >>>>> >> >> >> > > this
>>> >> >>> >> >>>>> >> >> >> > > >>> >> will
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
>>> >> the
>>> >> >>> new
>>> >> >>> >> API
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
>>> Vasiliki
>>> >> >>> >> Kalavri <
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
>>> implement
>>> >> the
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
>>> >> me :)
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
>>> >> Hueske <
>>> >> >>> >> >>>>> >> >> >> [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>>> >> >>> PageRank.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
>>> similar
>>> >> to
>>> >> >>> the
>>> >> >>> >> Java
>>> >> >>> >> >>>>> >> >> examples:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>>> >> >>> parameters
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
>>> Aljoscha
>>> >> >>> >> Krettek <
>>> >> >>> >> >>>>> >> >> >> > > [hidden email]
>>> >> >>> >> >>>>> >> >> >> > > >>> >:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
>>> reserve
>>> >> their
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>>> >> >>> >> >>>>> >> >> >> examples
>>> >> >>> >> >>>>> >> >> >> > > here.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
>>> >> Fabian
>>> >> >>> >> Hueske
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
>>> >> implemented
>>> >> >>> by
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>>> >> >>> >> >>>>> >> >> >> people
>>> >> >>> >> >>>>> >> >> >> > > >>> proved to
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
>>> >> examples.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
>>> >> port a
>>> >> >>> >> simple
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>>> >> >>> >> >>>>> >> >> one
>>> >> >>> >> >>>>> >> >> >> > such
>>> >> >>> >> >>>>> >> >> >> > > as
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
>>> >> Aljoscha
>>> >> >>> >> Krettek
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of
>>> the
>>> >> >>> Scala
>>> >> >>> >> API
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >>
>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only
>>> have
>>> >> to
>>> >> >>> >> write
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>>> >> >>> >> >>>>> >> tests
>>> >> >>> >> >>>>> >> >> and
>>> >> >>> >> >>>>> >> >> >> > > port
>>> >> >>> >> >>>>> >> >> >> > > >>> the
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it
>>> makes
>>> >> >>> sense
>>> >> >>> >> to
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>>> >> >>> >> >>>>> >> other
>>> >> >>> >> >>>>> >> >> >> > people
>>> >> >>> >> >>>>> >> >> >> > > >>> port
>>> >> >>> >> >>>>> >> >> >> > > >>> >> the
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone
>>> else
>>> >> uses
>>> >> >>> >> it and
>>> >> >>> >> >>>>> >> maybe
>>> >> >>> >> >>>>> >> >> >> > notices
>>> >> >>> >> >>>>> >> >> >> > > some
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>> >> >>> >> >>>>> >> >> >> > > >>>
>>> >> >>> >> >>>>> >> >> >> > >
>>> >> >>> >> >>>>> >> >> >> >
>>> >> >>> >> >>>>> >> >> >>
>>> >> >>> >> >>>>> >> >>
>>> >> >>> >> >>>>> >>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Aljoscha Krettek-2
Also, you can use CaseClasses directly as the type for CSV input. So
instead of reading it as tuples and then having a mapper that maps to
your case classes you can use:

env.readCsv[Edge](...)

On Fri, Sep 12, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]> wrote:

> I added support for specifying keys by name for CaseClasses. Check out
> the PageRank and TriangleEnumeration examples to see it in action.
>
> @Kostas: I think you could use them for the TPC-H examples.
>
> On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek <[hidden email]> wrote:
>> Yes, that would allow list comprehensions. It would be possible to
>> have the Collection signature for join (and coGroup), i.e.:
>>
>> apply[R]((T, O) => TraversableOnce[O]): DataSet[O]
>>
>> (T and O are the left and right input type, R is result type)
>>
>> Then you can return collections and still return an option, as in:
>>
>> a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else None }
>>
>> Because there is an implicit conversion from Options to a Collection.
>> This will always wrap the return value in a List with only one value.
>> I'm not sure we want the overhead here. I'm also not sure whether we
>> want the overhead of always having to use an Option even though the
>> join always returns a value.
>>
>> What do you think?
>>
>> On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <[hidden email]> wrote:
>>> Hmmm, tricky question...
>>> How about the Option for Join as this is a tuple-wise operation and the
>>> Collection for Cogroup which is group-wise?
>>> Could we in that case use list comprehensions in Cogroup functions?
>>>
>>> Or is that too much mixing?
>>>
>>> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>>
>>>> I didn't look at the example either.
>>>>
>>>> Addings collections is easy, it's just that we can either have
>>>> Collections or the Option, not both.
>>>>
>>>> For the coding style I followed this:
>>>> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
>>>> which itself is based on this: http://docs.scala-lang.org/style/. It
>>>> is different from the Java Code Guidelines we have in place, yes.
>>>>
>>>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]>
>>>> wrote:
>>>> > I haven't looked at the LineRank example in detail, but if you think that
>>>> > it adds something new to the examples collection, we can certainly port
>>>> it
>>>> > also to Java.
>>>> > I think the Option and Collector return types are sufficient right now
>>>> but
>>>> > if Collections are easy to add, go for it. ;-)
>>>> >
>>>> > Great that the Scala primitives are working! Also thanks for adding
>>>> > genSequence and adapting my examples.
>>>> > Btw. does the codestyle not apply for Scala files or do we have a
>>>> different
>>>> > there?
>>>> >
>>>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>>> >
>>>> >> What about the LineRank example? We had that in Scala but never had a
>>>> >> Java Example.
>>>> >>
>>>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <[hidden email]>
>>>> >> wrote:
>>>> >> > Yes, I like that. For the ITCases I always just copied the Java
>>>> ITCase.
>>>> >> >
>>>> >> > The only examples that are missing now are LinearRegression and the
>>>> >> > relational stuff.
>>>> >> >
>>>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <[hidden email]>
>>>> >> wrote:
>>>> >> >> I just removed the old CountEdgeDegrees example.
>>>> >> >> That was a preprocessing step for the TriangleEnumeration, and is now
>>>> >> part
>>>> >> >> of the new TriangleEnumerationOpt example.
>>>> >> >> So I guess, we don't need to port that one. As I said before, I'd
>>>> >> prefer to
>>>> >> >> keep Java and Scala examples in sync.
>>>> >> >>
>>>> >> >> Cheers, Fabian
>>>> >> >>
>>>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>>>> >> >>
>>>> >> >>> I added the PageRank example, thanks again fabian. :D
>>>> >> >>>
>>>> >> >>> Regarding the other stuff:
>>>> >> >>>  - There is a comment in DataSet.scala about including
>>>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
>>>> >> >>>  - I added generateSequence to ExecutionEnvironment.
>>>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed it
>>>> while
>>>> >> >>> writing the tests, you probably had an older version of the code.
>>>> >> >>>  - Yes, using List and other Interfaces is not possible, this is
>>>> also
>>>> >> >>> a restriction in the Java API.
>>>> >> >>>
>>>> >> >>> What do you think about the interface of join and coGroup? Right
>>>> now,
>>>> >> >>> you can either use a lambda that returns an Option or the lambda
>>>> with
>>>> >> >>> the Collector. Originally I wanted to have also have a lambda that
>>>> >> >>> returns a Collection, but due to type erasure this has the same type
>>>> >> >>> as the lambda with the Option so I couldn't use it. There is an
>>>> >> >>> implicit conversion from Option to a Collection, so I could change
>>>> it
>>>> >> >>> without breaking the examples we have now. What do you think?
>>>> >> >>>
>>>> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
>>>> TriangleEnumerationNaive/Opt,
>>>> >> >>> PageRank
>>>> >> >>>
>>>> >> >>> These are the examples people called dibs on:
>>>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
>>>> LinearRegression
>>>> >> >>> Example from Java)
>>>> >> >>>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>>
>>>> >> >>> Those are unclaimed (if I'm not mistaken):
>>>> >> >>>  - The relational Stuff
>>>> >> >>>
>>>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[hidden email]>
>>>> >> wrote:
>>>> >> >>> > +1 for removing RelationQuery
>>>> >> >>> >
>>>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
>>>> >> [hidden email]>
>>>> >> >>> > wrote:
>>>> >> >>> >
>>>> >> >>> >> By the way, what was called BatchGradientDescent in the Scala
>>>> >> examples
>>>> >> >>> >> should be replaced by a port of the LinearRegression Example from
>>>> >> >>> >> Java. I had them as two separate examples earlier.
>>>> >> >>> >>
>>>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
>>>> removing
>>>> >> >>> >> RelationalQuery?
>>>> >> >>> >>
>>>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
>>>> >> [hidden email]
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>>>> >> >>> >> >
>>>> >> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
>>>> >> TriangleEnumerationNaive/Opt
>>>> >> >>> >> >
>>>> >> >>> >> > These are the examples people called dibs on:
>>>> >> >>> >> >  - PageRank (Fabian)
>>>> >> >>> >> >  - BatchGradientDescent (Márton)
>>>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >
>>>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >  - The relational Stuff
>>>> >> >>> >> >  - LinearRegression
>>>> >> >>> >> >
>>>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>>>> >> >>> [hidden email]>
>>>> >> >>> >> wrote:
>>>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
>>>> ported/unported
>>>> >> >>> >> >> examples in my mails. I'll rename the java example package to
>>>> >> >>> examples
>>>> >> >>> >> >> once the Scala API merge is done.
>>>> >> >>> >> >>
>>>> >> >>> >> >> I think the termination criterion is fine as it is. Just
>>>> because
>>>> >> >>> Scala
>>>> >> >>> >> >> enables functional programming doesn't mean it's always the
>>>> best
>>>> >> >>> >> >> choice. :D
>>>> >> >>> >> >>
>>>> >> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
>>>> >> >>> >> >>
>>>> >> >>> >> >> These are the examples people called dibs on:
>>>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
>>>> >> >>> >> >>  - BatchGradientDescent (Márton)
>>>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >>
>>>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >>  - The relational Stuff
>>>> >> >>> >> >>  - LinearRegression
>>>> >> >>> >> >>
>>>> >> >>> >> >> Cheers,
>>>> >> >>> >> >> Aljoscha
>>>> >> >>> >> >>
>>>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
>>>> >> [hidden email]
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> >>> Transitive closure here, I also added a termination criterion
>>>> >> in the
>>>> >> >>> >> Java
>>>> >> >>> >> >>> version:
>>>> >> >>> >>
>>>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> Perhaps you can make the termination criterion in Scala more
>>>> >> >>> >> functional?
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> I noticed that the examples package name is example.java but
>>>> >> >>> >> examples.scala
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> Kostas
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
>>>> >> [hidden email]
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on
>>>> your
>>>> >> >>> list).
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can take
>>>> those
>>>> >> as
>>>> >> >>> >> well.
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
>>>> >> Java?
>>>> >> >>> It
>>>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
>>>> >> teaching
>>>> >> >>> >> value on
>>>> >> >>> >> >>>> top of TPC-H Q3?
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> Kostas
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>>>> >> >>> [hidden email]
>>>> >> >>> >> >
>>>> >> >>> >> >>>> wrote:
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
>>>> ConnectedComponents,
>>>> >> >>> >> >>>>> WebLogAnalysis
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> These are the examples people called dibs on:
>>>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>>>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
>>>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >>>>>  - TransitiveClosure
>>>> >> >>> >> >>>>>  - The relational Stuff
>>>> >> >>> >> >>>>>  - LinearRegression
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Cheers,
>>>> >> >>> >> >>>>> Aljoscha
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>>>> >> >>> [hidden email]>
>>>> >> >>> >> >>>>> wrote:
>>>> >> >>> >> >>>>> > WebLog here:
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> >
>>>> >> >>> >>
>>>> >> >>>
>>>> >>
>>>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> > Do you need any more done?
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>>>> >> >>> >> [hidden email]>
>>>> >> >>> >> >>>>> > wrote:
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>>>> >> >>> [hidden email]
>>>> >> >>> >> >
>>>> >> >>> >> >>>>> >> wrote:
>>>> >> >>> >> >>>>> >> > Alright, will do.
>>>> >> >>> >> >>>>> >> > Thanks!
>>>> >> >>> >> >>>>> >> >
>>>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>>>> >> >>> >> [hidden email]>:
>>>> >> >>> >> >>>>> >> >
>>>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
>>>> >> storing
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> data
>>>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
>>>> converting
>>>> >> it to
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> required Java or Scala objects.
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
>>>> >> consistent
>>>> >> >>> >> with the
>>>> >> >>> >> >>>>> >> >> Java
>>>> >> >>> >> >>>>> >> >> API.
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>>>> >> >>> keyword,
>>>> >> >>> >> you
>>>> >> >>> >> >>>>> >> >> can
>>>> >> >>> >> >>>>> >> >> just write:
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) =>
>>>> new
>>>> >> >>> >> MyResult(le,
>>>> >> >>> >> >>>>> >> >> re)
>>>> >> >>> >> >>>>> >> }
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>>>> >> >>> >> [hidden email]>
>>>> >> >>> >> >>>>> >> wrote:
>>>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>>>> >> >>> inconsistency
>>>> >> >>> >> with
>>>> >> >>> >> >>>>> >> >> > the
>>>> >> >>> >> >>>>> >> Java
>>>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
>>>> >> method
>>>> >> >>> >> because
>>>> >> >>> >> >>>>> >> "with"
>>>> >> >>> >> >>>>> >> >> is
>>>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
>>>> similar
>>>> >> for
>>>> >> >>> >> Scala
>>>> >> >>> >> >>>>> >> >> > or go
>>>> >> >>> >> >>>>> >> >> with
>>>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
>>>> >> [hidden email]
>>>> >> >>> >:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
>>>> >> That
>>>> >> >>> is a
>>>> >> >>> >> >>>>> >> >> >> generic
>>>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or
>>>> Scala
>>>> >> >>> Tuples,
>>>> >> >>> >> >>>>> >> >> >> with a
>>>> >> >>> >> >>>>> >> >> generic
>>>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>>> >> >>> >> >>>>> >> >> >> <[hidden email]>
>>>> >> >>> >> >>>>> >> >> wrote:
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
>>>> using
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > CSVFormat
>>>> >> >>> >> >>>>> >> won't
>>>> >> >>> >> >>>>> >> >> >> work
>>>> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>>>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
>>>> manually...
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>>> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
>>>> >> all
>>>> >> >>> the
>>>> >> >>> >> >>>>> >> >> >> > > example
>>>> >> >>> >> >>>>> >> input
>>>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in
>>>> all
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > examples.
>>>> >> >>> >> >>>>> >> What
>>>> >> >>> >> >>>>> >> >> do
>>>> >> >>> >> >>>>> >> >> >> > > you think?
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
>>>> Krettek
>>>> >> <
>>>> >> >>> >> >>>>> >> >> [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>>>> >> >>> >> incompatible.
>>>> >> >>> >> >>>>> >> >> >> > > > I'm
>>>> >> >>> >> >>>>> >> >> afraid
>>>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
>>>> >> data to
>>>> >> >>> a
>>>> >> >>> >> >>>>> >> >> >> > > > static
>>>> >> >>> >> >>>>> >> field
>>>> >> >>> >> >>>>> >> >> and
>>>> >> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
>>>> >> method in
>>>> >> >>> >> Scala.
>>>> >> >>> >> >>>>> >> >> >> > > > It's
>>>> >> >>> >> >>>>> >> >> not
>>>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data
>>>> and
>>>> >> >>> make it
>>>> >> >>> >> >>>>> >> >> >> > > > easier
>>>> >> >>> >> >>>>> >> for
>>>> >> >>> >> >>>>> >> >> it
>>>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
>>>> >> versions.
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
>>>> >> occur
>>>> >> >>> in
>>>> >> >>> >> all
>>>> >> >>> >> >>>>> >> >> >> > > > the
>>>> >> >>> >> >>>>> >> >> >> examples.
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
>>>> >> Kalavri
>>>> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >> Hey,
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
>>>> >> example,
>>>> >> >>> but
>>>> >> >>> >> I am
>>>> >> >>> >> >>>>> >> >> >> > > >> not
>>>> >> >>> >> >>>>> >> sure
>>>> >> >>> >> >>>>> >> >> >> how
>>>> >> >>> >> >>>>> >> >> >> > to
>>>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
>>>> >> java-examples.
>>>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
>>>> >> vertices
>>>> >> >>> >> and
>>>> >> >>> >> >>>>> >> >> >> > > >> edges
>>>> >> >>> >> >>>>> >> data
>>>> >> >>> >> >>>>> >> >> >> are
>>>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
>>>> >> getDefaultVertexDataSet()
>>>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>>> >> >>> >> >>>>> >> >> >> > > >> an
>>>> >> org.apache.flink.api.java.ExecutionEnvironment
>>>> >> >>> as
>>>> >> >>> >> >>>>> >> parameter.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
>>>> >> (like
>>>> >> >>> in
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> >> WordCountData
>>>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>>>> >> >>> >> >>>>> >> >> >> > > >> from
>>>> org.apache.flink.api.java.tuple.Tuple2 to
>>>> >> >>> Scala
>>>> >> >>> >> >>>>> >> >> >> > > >> tuple and
>>>> >> >>> >> >>>>> >> >> from
>>>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess
>>>> this
>>>> >> is
>>>> >> >>> an
>>>> >> >>> >> >>>>> >> unnecessary
>>>> >> >>> >> >>>>> >> >> >> > > complexity
>>>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
>>>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
>>>> example
>>>> >> >>> data
>>>> >> >>> >> in
>>>> >> >>> >> >>>>> >> >> >> > > >> the
>>>> >> >>> >> >>>>> >> Scala
>>>> >> >>> >> >>>>> >> >> >> > > example.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > >> V.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
>>>> Krettek <
>>>> >> >>> >> >>>>> >> [hidden email]
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
>>>> >> It's
>>>> >> >>> >> pretty
>>>> >> >>> >> >>>>> >> >> >> > > >>> much a
>>>> >> >>> >> >>>>> >> >> copy
>>>> >> >>> >> >>>>> >> >> >> of
>>>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
>>>> >> syntax
>>>> >> >>> and
>>>> >> >>> >> >>>>> >> >> >> > > >>> lambda
>>>> >> >>> >> >>>>> >> >> >> > functions.
>>>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
>>>> >> java-examples
>>>> >> >>> >> as a
>>>> >> >>> >> >>>>> >> >> dependency
>>>> >> >>> >> >>>>> >> >> >> for
>>>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse
>>>> the
>>>> >> >>> example
>>>> >> >>> >> >>>>> >> >> >> > > >>> input
>>>> >> >>> >> >>>>> >> data.
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a
>>>> pull
>>>> >> >>> request
>>>> >> >>> >> >>>>> >> >> >> > > >>> against
>>>> >> >>> >> >>>>> >> my
>>>> >> >>> >> >>>>> >> >> repo
>>>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
>>>> >> Gábor <
>>>> >> >>> >> >>>>> >> >> >> [hidden email]
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> > +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>>>> >> >>> Balassi <
>>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>>>> >> >>> Tzoumas <
>>>> >> >>> >> >>>>> >> >> >> > > [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
>>>> >> going
>>>> >> >>> >> through
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
>>>> >> >>> >> >>>>> >> >> tutorial
>>>> >> >>> >> >>>>> >> >> >> so
>>>> >> >>> >> >>>>> >> >> >> > > this
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> will
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
>>>> >> the
>>>> >> >>> new
>>>> >> >>> >> API
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
>>>> Vasiliki
>>>> >> >>> >> Kalavri <
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
>>>> implement
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
>>>> >> me :)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
>>>> >> Hueske <
>>>> >> >>> >> >>>>> >> >> >> [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>>>> >> >>> PageRank.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
>>>> similar
>>>> >> to
>>>> >> >>> the
>>>> >> >>> >> Java
>>>> >> >>> >> >>>>> >> >> examples:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>>>> >> >>> parameters
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
>>>> Aljoscha
>>>> >> >>> >> Krettek <
>>>> >> >>> >> >>>>> >> >> >> > > [hidden email]
>>>> >> >>> >> >>>>> >> >> >> > > >>> >:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
>>>> reserve
>>>> >> their
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>>>> >> >>> >> >>>>> >> >> >> examples
>>>> >> >>> >> >>>>> >> >> >> > > here.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
>>>> >> Fabian
>>>> >> >>> >> Hueske
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
>>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
>>>> >> implemented
>>>> >> >>> by
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>>>> >> >>> >> >>>>> >> >> >> people
>>>> >> >>> >> >>>>> >> >> >> > > >>> proved to
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
>>>> >> examples.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
>>>> >> port a
>>>> >> >>> >> simple
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>>>> >> >>> >> >>>>> >> >> one
>>>> >> >>> >> >>>>> >> >> >> > such
>>>> >> >>> >> >>>>> >> >> >> > > as
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
>>>> >> Aljoscha
>>>> >> >>> >> Krettek
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of
>>>> the
>>>> >> >>> Scala
>>>> >> >>> >> API
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only
>>>> have
>>>> >> to
>>>> >> >>> >> write
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>>>> >> >>> >> >>>>> >> tests
>>>> >> >>> >> >>>>> >> >> and
>>>> >> >>> >> >>>>> >> >> >> > > port
>>>> >> >>> >> >>>>> >> >> >> > > >>> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it
>>>> makes
>>>> >> >>> sense
>>>> >> >>> >> to
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>>>> >> >>> >> >>>>> >> other
>>>> >> >>> >> >>>>> >> >> >> > people
>>>> >> >>> >> >>>>> >> >> >> > > >>> port
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone
>>>> else
>>>> >> uses
>>>> >> >>> >> it and
>>>> >> >>> >> >>>>> >> maybe
>>>> >> >>> >> >>>>> >> >> >> > notices
>>>> >> >>> >> >>>>> >> >> >> > > some
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>
>>>> >> >>> >>
>>>> >> >>>
>>>> >>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Fabian Hueske
Sweet! I'm lovin' this :-)

2014-09-12 11:46 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> Also, you can use CaseClasses directly as the type for CSV input. So
> instead of reading it as tuples and then having a mapper that maps to
> your case classes you can use:
>
> env.readCsv[Edge](...)
>
> On Fri, Sep 12, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]>
> wrote:
> > I added support for specifying keys by name for CaseClasses. Check out
> > the PageRank and TriangleEnumeration examples to see it in action.
> >
> > @Kostas: I think you could use them for the TPC-H examples.
> >
> > On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek <[hidden email]>
> wrote:
> >> Yes, that would allow list comprehensions. It would be possible to
> >> have the Collection signature for join (and coGroup), i.e.:
> >>
> >> apply[R]((T, O) => TraversableOnce[O]): DataSet[O]
> >>
> >> (T and O are the left and right input type, R is result type)
> >>
> >> Then you can return collections and still return an option, as in:
> >>
> >> a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else
> None }
> >>
> >> Because there is an implicit conversion from Options to a Collection.
> >> This will always wrap the return value in a List with only one value.
> >> I'm not sure we want the overhead here. I'm also not sure whether we
> >> want the overhead of always having to use an Option even though the
> >> join always returns a value.
> >>
> >> What do you think?
> >>
> >> On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <[hidden email]>
> wrote:
> >>> Hmmm, tricky question...
> >>> How about the Option for Join as this is a tuple-wise operation and the
> >>> Collection for Cogroup which is group-wise?
> >>> Could we in that case use list comprehensions in Cogroup functions?
> >>>
> >>> Or is that too much mixing?
> >>>
> >>> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >>>
> >>>> I didn't look at the example either.
> >>>>
> >>>> Addings collections is easy, it's just that we can either have
> >>>> Collections or the Option, not both.
> >>>>
> >>>> For the coding style I followed this:
> >>>>
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
> >>>> which itself is based on this: http://docs.scala-lang.org/style/. It
> >>>> is different from the Java Code Guidelines we have in place, yes.
> >>>>
> >>>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]>
> >>>> wrote:
> >>>> > I haven't looked at the LineRank example in detail, but if you
> think that
> >>>> > it adds something new to the examples collection, we can certainly
> port
> >>>> it
> >>>> > also to Java.
> >>>> > I think the Option and Collector return types are sufficient right
> now
> >>>> but
> >>>> > if Collections are easy to add, go for it. ;-)
> >>>> >
> >>>> > Great that the Scala primitives are working! Also thanks for adding
> >>>> > genSequence and adapting my examples.
> >>>> > Btw. does the codestyle not apply for Scala files or do we have a
> >>>> different
> >>>> > there?
> >>>> >
> >>>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> >>>> >
> >>>> >> What about the LineRank example? We had that in Scala but never
> had a
> >>>> >> Java Example.
> >>>> >>
> >>>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <
> [hidden email]>
> >>>> >> wrote:
> >>>> >> > Yes, I like that. For the ITCases I always just copied the Java
> >>>> ITCase.
> >>>> >> >
> >>>> >> > The only examples that are missing now are LinearRegression and
> the
> >>>> >> > relational stuff.
> >>>> >> >
> >>>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <
> [hidden email]>
> >>>> >> wrote:
> >>>> >> >> I just removed the old CountEdgeDegrees example.
> >>>> >> >> That was a preprocessing step for the TriangleEnumeration, and
> is now
> >>>> >> part
> >>>> >> >> of the new TriangleEnumerationOpt example.
> >>>> >> >> So I guess, we don't need to port that one. As I said before,
> I'd
> >>>> >> prefer to
> >>>> >> >> keep Java and Scala examples in sync.
> >>>> >> >>
> >>>> >> >> Cheers, Fabian
> >>>> >> >>
> >>>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <
> [hidden email]>:
> >>>> >> >>
> >>>> >> >>> I added the PageRank example, thanks again fabian. :D
> >>>> >> >>>
> >>>> >> >>> Regarding the other stuff:
> >>>> >> >>>  - There is a comment in DataSet.scala about including
> >>>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
> >>>> >> >>>  - I added generateSequence to ExecutionEnvironment.
> >>>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed
> it
> >>>> while
> >>>> >> >>> writing the tests, you probably had an older version of the
> code.
> >>>> >> >>>  - Yes, using List and other Interfaces is not possible, this
> is
> >>>> also
> >>>> >> >>> a restriction in the Java API.
> >>>> >> >>>
> >>>> >> >>> What do you think about the interface of join and coGroup?
> Right
> >>>> now,
> >>>> >> >>> you can either use a lambda that returns an Option or the
> lambda
> >>>> with
> >>>> >> >>> the Collector. Originally I wanted to have also have a lambda
> that
> >>>> >> >>> returns a Collection, but due to type erasure this has the
> same type
> >>>> >> >>> as the lambda with the Option so I couldn't use it. There is an
> >>>> >> >>> implicit conversion from Option to a Collection, so I could
> change
> >>>> it
> >>>> >> >>> without breaking the examples we have now. What do you think?
> >>>> >> >>>
> >>>> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
> >>>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
> >>>> TriangleEnumerationNaive/Opt,
> >>>> >> >>> PageRank
> >>>> >> >>>
> >>>> >> >>> These are the examples people called dibs on:
> >>>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
> >>>> LinearRegression
> >>>> >> >>> Example from Java)
> >>>> >> >>>  - ComputeEdgeDegrees (Hermann)
> >>>> >> >>>
> >>>> >> >>> Those are unclaimed (if I'm not mistaken):
> >>>> >> >>>  - The relational Stuff
> >>>> >> >>>
> >>>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <
> [hidden email]>
> >>>> >> wrote:
> >>>> >> >>> > +1 for removing RelationQuery
> >>>> >> >>> >
> >>>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
> >>>> >> [hidden email]>
> >>>> >> >>> > wrote:
> >>>> >> >>> >
> >>>> >> >>> >> By the way, what was called BatchGradientDescent in the
> Scala
> >>>> >> examples
> >>>> >> >>> >> should be replaced by a port of the LinearRegression
> Example from
> >>>> >> >>> >> Java. I had them as two separate examples earlier.
> >>>> >> >>> >>
> >>>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
> >>>> removing
> >>>> >> >>> >> RelationalQuery?
> >>>> >> >>> >>
> >>>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
> >>>> >> [hidden email]
> >>>> >> >>> >
> >>>> >> >>> >> wrote:
> >>>> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
> >>>> >> >>> >> >
> >>>> >> >>> >> > So far we have ported: WordCount, KMeans,
> ConnectedComponents,
> >>>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
> >>>> >> TriangleEnumerationNaive/Opt
> >>>> >> >>> >> >
> >>>> >> >>> >> > These are the examples people called dibs on:
> >>>> >> >>> >> >  - PageRank (Fabian)
> >>>> >> >>> >> >  - BatchGradientDescent (Márton)
> >>>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
> >>>> >> >>> >> >
> >>>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
> >>>> >> >>> >> >  - The relational Stuff
> >>>> >> >>> >> >  - LinearRegression
> >>>> >> >>> >> >
> >>>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> >>>> >> >>> [hidden email]>
> >>>> >> >>> >> wrote:
> >>>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
> >>>> ported/unported
> >>>> >> >>> >> >> examples in my mails. I'll rename the java example
> package to
> >>>> >> >>> examples
> >>>> >> >>> >> >> once the Scala API merge is done.
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> I think the termination criterion is fine as it is. Just
> >>>> because
> >>>> >> >>> Scala
> >>>> >> >>> >> >> enables functional programming doesn't mean it's always
> the
> >>>> best
> >>>> >> >>> >> >> choice. :D
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> So far we have ported: WordCount, KMeans,
> ConnectedComponents,
> >>>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> These are the examples people called dibs on:
> >>>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
> >>>> >> >>> >> >>  - BatchGradientDescent (Márton)
> >>>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
> >>>> >> >>> >> >>  - The relational Stuff
> >>>> >> >>> >> >>  - LinearRegression
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> Cheers,
> >>>> >> >>> >> >> Aljoscha
> >>>> >> >>> >> >>
> >>>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
> >>>> >> [hidden email]
> >>>> >> >>> >
> >>>> >> >>> >> wrote:
> >>>> >> >>> >> >>> Transitive closure here, I also added a termination
> criterion
> >>>> >> in the
> >>>> >> >>> >> Java
> >>>> >> >>> >> >>> version:
> >>>> >> >>> >>
> >>>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> >>>> >> >>> >> >>>
> >>>> >> >>> >> >>> Perhaps you can make the termination criterion in Scala
> more
> >>>> >> >>> >> functional?
> >>>> >> >>> >> >>>
> >>>> >> >>> >> >>> I noticed that the examples package name is
> example.java but
> >>>> >> >>> >> examples.scala
> >>>> >> >>> >> >>>
> >>>> >> >>> >> >>> Kostas
> >>>> >> >>> >> >>>
> >>>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
> >>>> >> [hidden email]
> >>>> >> >>> >
> >>>> >> >>> >> wrote:
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not
> on
> >>>> your
> >>>> >> >>> list).
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can
> take
> >>>> those
> >>>> >> as
> >>>> >> >>> >> well.
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>> How about removing the "RelationalQuery" from both
> Scala and
> >>>> >> Java?
> >>>> >> >>> It
> >>>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add
> some
> >>>> >> teaching
> >>>> >> >>> >> value on
> >>>> >> >>> >> >>>> top of TPC-H Q3?
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>> Kostas
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> >>>> >> >>> [hidden email]
> >>>> >> >>> >> >
> >>>> >> >>> >> >>>> wrote:
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
> >>>> ConnectedComponents,
> >>>> >> >>> >> >>>>> WebLogAnalysis
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> These are the examples people called dibs on:
> >>>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> >>>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
> >>>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
> >>>> >> >>> >> >>>>>  - TransitiveClosure
> >>>> >> >>> >> >>>>>  - The relational Stuff
> >>>> >> >>> >> >>>>>  - LinearRegression
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> Cheers,
> >>>> >> >>> >> >>>>> Aljoscha
> >>>> >> >>> >> >>>>>
> >>>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> >>>> >> >>> [hidden email]>
> >>>> >> >>> >> >>>>> wrote:
> >>>> >> >>> >> >>>>> > WebLog here:
> >>>> >> >>> >> >>>>> >
> >>>> >> >>> >> >>>>> >
> >>>> >> >>> >>
> >>>> >> >>>
> >>>> >>
> >>>>
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >>>> >> >>> >> >>>>> >
> >>>> >> >>> >> >>>>> > Do you need any more done?
> >>>> >> >>> >> >>>>> >
> >>>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> >>>> >> >>> >> [hidden email]>
> >>>> >> >>> >> >>>>> > wrote:
> >>>> >> >>> >> >>>>> >
> >>>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
> >>>> >> >>> >> >>>>> >>
> >>>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
> >>>> >> >>> >> >>>>> >>
> >>>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> >>>> >> >>> [hidden email]
> >>>> >> >>> >> >
> >>>> >> >>> >> >>>>> >> wrote:
> >>>> >> >>> >> >>>>> >> > Alright, will do.
> >>>> >> >>> >> >>>>> >> > Thanks!
> >>>> >> >>> >> >>>>> >> >
> >>>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> >>>> >> >>> >> [hidden email]>:
> >>>> >> >>> >> >>>>> >> >
> >>>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and
> KMeans.scala. I'm
> >>>> >> storing
> >>>> >> >>> >> the
> >>>> >> >>> >> >>>>> >> >> data
> >>>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
> >>>> converting
> >>>> >> it to
> >>>> >> >>> >> the
> >>>> >> >>> >> >>>>> >> >> required Java or Scala objects.
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
> >>>> >> consistent
> >>>> >> >>> >> with the
> >>>> >> >>> >> >>>>> >> >> Java
> >>>> >> >>> >> >>>>> >> >> API.
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need
> for a
> >>>> >> >>> keyword,
> >>>> >> >>> >> you
> >>>> >> >>> >> >>>>> >> >> can
> >>>> >> >>> >> >>>>> >> >> just write:
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re)
> =>
> >>>> new
> >>>> >> >>> >> MyResult(le,
> >>>> >> >>> >> >>>>> >> >> re)
> >>>> >> >>> >> >>>>> >> }
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
> >>>> >> >>> >> [hidden email]>
> >>>> >> >>> >> >>>>> >> wrote:
> >>>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
> >>>> >> >>> inconsistency
> >>>> >> >>> >> with
> >>>> >> >>> >> >>>>> >> >> > the
> >>>> >> >>> >> >>>>> >> Java
> >>>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(),
> IMO.
> >>>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the
> with()
> >>>> >> method
> >>>> >> >>> >> because
> >>>> >> >>> >> >>>>> >> "with"
> >>>> >> >>> >> >>>>> >> >> is
> >>>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
> >>>> similar
> >>>> >> for
> >>>> >> >>> >> Scala
> >>>> >> >>> >> >>>>> >> >> > or go
> >>>> >> >>> >> >>>>> >> >> with
> >>>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
> >>>> >> [hidden email]
> >>>> >> >>> >:
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as
> well.
> >>>> >> That
> >>>> >> >>> is a
> >>>> >> >>> >> >>>>> >> >> >> generic
> >>>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
> >>>> >> >>> >> >>>>> >> >> >>
> >>>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java
> or
> >>>> Scala
> >>>> >> >>> Tuples,
> >>>> >> >>> >> >>>>> >> >> >> with a
> >>>> >> >>> >> >>>>> >> >> generic
> >>>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
> >>>> >> >>> >> >>>>> >> >> >>
> >>>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian
> Hueske
> >>>> >> >>> >> >>>>> >> >> >> <[hidden email]>
> >>>> >> >>> >> >>>>> >> >> wrote:
> >>>> >> >>> >> >>>>> >> >> >>
> >>>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> >>>> >> >>> >> >>>>> >> >> >> >
> >>>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
> >>>> using
> >>>> >> the
> >>>> >> >>> >> >>>>> >> >> >> > CSVFormat
> >>>> >> >>> >> >>>>> >> won't
> >>>> >> >>> >> >>>>> >> >> >> work
> >>>> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
> >>>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
> >>>> manually...
> >>>> >> >>> >> >>>>> >> >> >> >
> >>>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
> >>>> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
> >>>> >> >>> >> >>>>> >> >> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > Hi,
> >>>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just
> change
> >>>> >> all
> >>>> >> >>> the
> >>>> >> >>> >> >>>>> >> >> >> > > example
> >>>> >> >>> >> >>>>> >> input
> >>>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input
> formats in
> >>>> all
> >>>> >> the
> >>>> >> >>> >> >>>>> >> >> >> > > examples.
> >>>> >> >>> >> >>>>> >> What
> >>>> >> >>> >> >>>>> >> >> do
> >>>> >> >>> >> >>>>> >> >> >> > > you think?
> >>>> >> >>> >> >>>>> >> >> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > Cheers,
> >>>> >> >>> >> >>>>> >> >> >> > > Aljoscha
> >>>> >> >>> >> >>>>> >> >> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
> >>>> Krettek
> >>>> >> <
> >>>> >> >>> >> >>>>> >> >> [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > > Hi,
> >>>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data
> types are
> >>>> >> >>> >> incompatible.
> >>>> >> >>> >> >>>>> >> >> >> > > > I'm
> >>>> >> >>> >> >>>>> >> >> afraid
> >>>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move
> the
> >>>> >> data to
> >>>> >> >>> a
> >>>> >> >>> >> >>>>> >> >> >> > > > static
> >>>> >> >>> >> >>>>> >> field
> >>>> >> >>> >> >>>>> >> >> and
> >>>> >> >>> >> >>>>> >> >> >> > > > convert it in the
> getDefaultEdgeDataSet()
> >>>> >> method in
> >>>> >> >>> >> Scala.
> >>>> >> >>> >> >>>>> >> >> >> > > > It's
> >>>> >> >>> >> >>>>> >> >> not
> >>>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the
> data
> >>>> and
> >>>> >> >>> make it
> >>>> >> >>> >> >>>>> >> >> >> > > > easier
> >>>> >> >>> >> >>>>> >> for
> >>>> >> >>> >> >>>>> >> >> it
> >>>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
> >>>> >> versions.
> >>>> >> >>> >> >>>>> >> >> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will
> probably
> >>>> >> occur
> >>>> >> >>> in
> >>>> >> >>> >> all
> >>>> >> >>> >> >>>>> >> >> >> > > > the
> >>>> >> >>> >> >>>>> >> >> >> examples.
> >>>> >> >>> >> >>>>> >> >> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > > Cheers,
> >>>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
> >>>> >> >>> >> >>>>> >> >> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM,
> Vasiliki
> >>>> >> Kalavri
> >>>> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >> Hey,
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
> >>>> >> example,
> >>>> >> >>> but
> >>>> >> >>> >> I am
> >>>> >> >>> >> >>>>> >> >> >> > > >> not
> >>>> >> >>> >> >>>>> >> sure
> >>>> >> >>> >> >>>>> >> >> >> how
> >>>> >> >>> >> >>>>> >> >> >> > to
> >>>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
> >>>> >> java-examples.
> >>>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class,
> the
> >>>> >> vertices
> >>>> >> >>> >> and
> >>>> >> >>> >> >>>>> >> >> >> > > >> edges
> >>>> >> >>> >> >>>>> >> data
> >>>> >> >>> >> >>>>> >> >> >> are
> >>>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
> >>>> >> getDefaultVertexDataSet()
> >>>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which
> take
> >>>> >> >>> >> >>>>> >> >> >> > > >> an
> >>>> >> org.apache.flink.api.java.ExecutionEnvironment
> >>>> >> >>> as
> >>>> >> >>> >> >>>>> >> parameter.
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static
> fields
> >>>> >> (like
> >>>> >> >>> in
> >>>> >> >>> >> the
> >>>> >> >>> >> >>>>> >> >> >> WordCountData
> >>>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a
> conversion
> >>>> >> >>> >> >>>>> >> >> >> > > >> from
> >>>> org.apache.flink.api.java.tuple.Tuple2 to
> >>>> >> >>> Scala
> >>>> >> >>> >> >>>>> >> >> >> > > >> tuple and
> >>>> >> >>> >> >>>>> >> >> from
> >>>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I
> guess
> >>>> this
> >>>> >> is
> >>>> >> >>> an
> >>>> >> >>> >> >>>>> >> unnecessary
> >>>> >> >>> >> >>>>> >> >> >> > > complexity
> >>>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
> >>>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
> >>>> example
> >>>> >> >>> data
> >>>> >> >>> >> in
> >>>> >> >>> >> >>>>> >> >> >> > > >> the
> >>>> >> >>> >> >>>>> >> Scala
> >>>> >> >>> >> >>>>> >> >> >> > > example.
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
> >>>> >> >>> >> >>>>> >> >> >> > > >> V.
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
> >>>> Krettek <
> >>>> >> >>> >> >>>>> >> [hidden email]
> >>>> >> >>> >> >>>>> >> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount
> example.
> >>>> >> It's
> >>>> >> >>> >> pretty
> >>>> >> >>> >> >>>>> >> >> >> > > >>> much a
> >>>> >> >>> >> >>>>> >> >> copy
> >>>> >> >>> >> >>>>> >> >> >> of
> >>>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups
> for the
> >>>> >> syntax
> >>>> >> >>> and
> >>>> >> >>> >> >>>>> >> >> >> > > >>> lambda
> >>>> >> >>> >> >>>>> >> >> >> > functions.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
> >>>> >> java-examples
> >>>> >> >>> >> as a
> >>>> >> >>> >> >>>>> >> >> dependency
> >>>> >> >>> >> >>>>> >> >> >> for
> >>>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to
> reuse
> >>>> the
> >>>> >> >>> example
> >>>> >> >>> >> >>>>> >> >> >> > > >>> input
> >>>> >> >>> >> >>>>> >> data.
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do
> a
> >>>> pull
> >>>> >> >>> request
> >>>> >> >>> >> >>>>> >> >> >> > > >>> against
> >>>> >> >>> >> >>>>> >> my
> >>>> >> >>> >> >>>>> >> >> repo
> >>>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM,
> Hermann
> >>>> >> Gábor <
> >>>> >> >>> >> >>>>> >> >> >> [hidden email]
> >>>> >> >>> >> >>>>> >> >> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> > +1
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM,
> Márton
> >>>> >> >>> Balassi <
> >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM,
> Kostas
> >>>> >> >>> Tzoumas <
> >>>> >> >>> >> >>>>> >> >> >> > > [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala
> consists of
> >>>> >> going
> >>>> >> >>> >> through
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
> >>>> >> >>> >> >>>>> >> >> tutorial
> >>>> >> >>> >> >>>>> >> >> >> so
> >>>> >> >>> >> >>>>> >> >> >> > > this
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> will
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for
> me and
> >>>> >> the
> >>>> >> >>> new
> >>>> >> >>> >> API
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
> >>>> Vasiliki
> >>>> >> >>> >> Kalavri <
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
> >>>> implement
> >>>> >> the
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and
> Kmeans for
> >>>> >> me :)
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03,
> Fabian
> >>>> >> Hueske <
> >>>> >> >>> >> >>>>> >> >> >> [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for
> TriangleEnumeration and
> >>>> >> >>> PageRank.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
> >>>> similar
> >>>> >> to
> >>>> >> >>> the
> >>>> >> >>> >> Java
> >>>> >> >>> >> >>>>> >> >> examples:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box
> without
> >>>> >> >>> parameters
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external
> data
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code
> structure
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
> >>>> Aljoscha
> >>>> >> >>> >> Krettek <
> >>>> >> >>> >> >>>>> >> >> >> > > [hidden email]
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
> >>>> reserve
> >>>> >> their
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> >>>> >> >>> >> >>>>> >> >> >> examples
> >>>> >> >>> >> >>>>> >> >> >> > > here.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at
> 8:55 PM,
> >>>> >> Fabian
> >>>> >> >>> >> Hueske
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
> >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
> >>>> >> implemented
> >>>> >> >>> by
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
> >>>> >> >>> >> >>>>> >> >> >> people
> >>>> >> >>> >> >>>>> >> >> >> > > >>> proved to
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or
> three
> >>>> >> examples.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if
> you'd
> >>>> >> port a
> >>>> >> >>> >> simple
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
> >>>> >> >>> >> >>>>> >> >> one
> >>>> >> >>> >> >>>>> >> >> >> > such
> >>>> >> >>> >> >>>>> >> >> >> > > as
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47
> GMT+02:00
> >>>> >> Aljoscha
> >>>> >> >>> >> Krettek
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
> >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working
> rewrite of
> >>>> the
> >>>> >> >>> Scala
> >>>> >> >>> >> API
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >>
> >>>> >> >>> >>
> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll
> only
> >>>> have
> >>>> >> to
> >>>> >> >>> >> write
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> >>>> >> >>> >> >>>>> >> tests
> >>>> >> >>> >> >>>>> >> >> and
> >>>> >> >>> >> >>>>> >> >> >> > > port
> >>>> >> >>> >> >>>>> >> >> >> > > >>> the
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think
> it
> >>>> makes
> >>>> >> >>> sense
> >>>> >> >>> >> to
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> >>>> >> >>> >> >>>>> >> other
> >>>> >> >>> >> >>>>> >> >> >> > people
> >>>> >> >>> >> >>>>> >> >> >> > > >>> port
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> the
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that
> someone
> >>>> else
> >>>> >> uses
> >>>> >> >>> >> it and
> >>>> >> >>> >> >>>>> >> maybe
> >>>> >> >>> >> >>>>> >> >> >> > notices
> >>>> >> >>> >> >>>>> >> >> >> > > some
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> >>>> >> >>> >> >>>>> >> >> >> > > >>>
> >>>> >> >>> >> >>>>> >> >> >> > >
> >>>> >> >>> >> >>>>> >> >> >> >
> >>>> >> >>> >> >>>>> >> >> >>
> >>>> >> >>> >> >>>>> >> >>
> >>>> >> >>> >> >>>>> >>
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>>
> >>>> >> >>> >> >>>
> >>>> >> >>> >>
> >>>> >> >>>
> >>>> >>
> >>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Scala API rewrite almost complete

Stephan Ewen
It would be nice to have a join variant that directly returns the value
rathern than an option. Why not have both (they are wrapped as flatJoins
anyway below, right?)

On Fri, Sep 12, 2014 at 11:50 AM, Fabian Hueske <[hidden email]> wrote:

> Sweet! I'm lovin' this :-)
>
> 2014-09-12 11:46 GMT+02:00 Aljoscha Krettek <[hidden email]>:
>
> > Also, you can use CaseClasses directly as the type for CSV input. So
> > instead of reading it as tuples and then having a mapper that maps to
> > your case classes you can use:
> >
> > env.readCsv[Edge](...)
> >
> > On Fri, Sep 12, 2014 at 11:43 AM, Aljoscha Krettek <[hidden email]>
> > wrote:
> > > I added support for specifying keys by name for CaseClasses. Check out
> > > the PageRank and TriangleEnumeration examples to see it in action.
> > >
> > > @Kostas: I think you could use them for the TPC-H examples.
> > >
> > > On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek <[hidden email]
> >
> > wrote:
> > >> Yes, that would allow list comprehensions. It would be possible to
> > >> have the Collection signature for join (and coGroup), i.e.:
> > >>
> > >> apply[R]((T, O) => TraversableOnce[O]): DataSet[O]
> > >>
> > >> (T and O are the left and right input type, R is result type)
> > >>
> > >> Then you can return collections and still return an option, as in:
> > >>
> > >> a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else
> > None }
> > >>
> > >> Because there is an implicit conversion from Options to a Collection.
> > >> This will always wrap the return value in a List with only one value.
> > >> I'm not sure we want the overhead here. I'm also not sure whether we
> > >> want the overhead of always having to use an Option even though the
> > >> join always returns a value.
> > >>
> > >> What do you think?
> > >>
> > >> On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <[hidden email]>
> > wrote:
> > >>> Hmmm, tricky question...
> > >>> How about the Option for Join as this is a tuple-wise operation and
> the
> > >>> Collection for Cogroup which is group-wise?
> > >>> Could we in that case use list comprehensions in Cogroup functions?
> > >>>
> > >>> Or is that too much mixing?
> > >>>
> > >>> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <[hidden email]>:
> > >>>
> > >>>> I didn't look at the example either.
> > >>>>
> > >>>> Addings collections is easy, it's just that we can either have
> > >>>> Collections or the Option, not both.
> > >>>>
> > >>>> For the coding style I followed this:
> > >>>>
> > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
> ,
> > >>>> which itself is based on this: http://docs.scala-lang.org/style/.
> It
> > >>>> is different from the Java Code Guidelines we have in place, yes.
> > >>>>
> > >>>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <[hidden email]
> >
> > >>>> wrote:
> > >>>> > I haven't looked at the LineRank example in detail, but if you
> > think that
> > >>>> > it adds something new to the examples collection, we can certainly
> > port
> > >>>> it
> > >>>> > also to Java.
> > >>>> > I think the Option and Collector return types are sufficient right
> > now
> > >>>> but
> > >>>> > if Collections are easy to add, go for it. ;-)
> > >>>> >
> > >>>> > Great that the Scala primitives are working! Also thanks for
> adding
> > >>>> > genSequence and adapting my examples.
> > >>>> > Btw. does the codestyle not apply for Scala files or do we have a
> > >>>> different
> > >>>> > there?
> > >>>> >
> > >>>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <[hidden email]
> >:
> > >>>> >
> > >>>> >> What about the LineRank example? We had that in Scala but never
> > had a
> > >>>> >> Java Example.
> > >>>> >>
> > >>>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <
> > [hidden email]>
> > >>>> >> wrote:
> > >>>> >> > Yes, I like that. For the ITCases I always just copied the Java
> > >>>> ITCase.
> > >>>> >> >
> > >>>> >> > The only examples that are missing now are LinearRegression and
> > the
> > >>>> >> > relational stuff.
> > >>>> >> >
> > >>>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <
> > [hidden email]>
> > >>>> >> wrote:
> > >>>> >> >> I just removed the old CountEdgeDegrees example.
> > >>>> >> >> That was a preprocessing step for the TriangleEnumeration, and
> > is now
> > >>>> >> part
> > >>>> >> >> of the new TriangleEnumerationOpt example.
> > >>>> >> >> So I guess, we don't need to port that one. As I said before,
> > I'd
> > >>>> >> prefer to
> > >>>> >> >> keep Java and Scala examples in sync.
> > >>>> >> >>
> > >>>> >> >> Cheers, Fabian
> > >>>> >> >>
> > >>>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <
> > [hidden email]>:
> > >>>> >> >>
> > >>>> >> >>> I added the PageRank example, thanks again fabian. :D
> > >>>> >> >>>
> > >>>> >> >>> Regarding the other stuff:
> > >>>> >> >>>  - There is a comment in DataSet.scala about including
> > >>>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
> > >>>> >> >>>  - I added generateSequence to ExecutionEnvironment.
> > >>>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed
> > it
> > >>>> while
> > >>>> >> >>> writing the tests, you probably had an older version of the
> > code.
> > >>>> >> >>>  - Yes, using List and other Interfaces is not possible, this
> > is
> > >>>> also
> > >>>> >> >>> a restriction in the Java API.
> > >>>> >> >>>
> > >>>> >> >>> What do you think about the interface of join and coGroup?
> > Right
> > >>>> now,
> > >>>> >> >>> you can either use a lambda that returns an Option or the
> > lambda
> > >>>> with
> > >>>> >> >>> the Collector. Originally I wanted to have also have a lambda
> > that
> > >>>> >> >>> returns a Collection, but due to type erasure this has the
> > same type
> > >>>> >> >>> as the lambda with the Option so I couldn't use it. There is
> an
> > >>>> >> >>> implicit conversion from Option to a Collection, so I could
> > change
> > >>>> it
> > >>>> >> >>> without breaking the examples we have now. What do you think?
> > >>>> >> >>>
> > >>>> >> >>> So far we have ported: WordCount, KMeans,
> ConnectedComponents,
> > >>>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
> > >>>> TriangleEnumerationNaive/Opt,
> > >>>> >> >>> PageRank
> > >>>> >> >>>
> > >>>> >> >>> These are the examples people called dibs on:
> > >>>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
> > >>>> LinearRegression
> > >>>> >> >>> Example from Java)
> > >>>> >> >>>  - ComputeEdgeDegrees (Hermann)
> > >>>> >> >>>
> > >>>> >> >>> Those are unclaimed (if I'm not mistaken):
> > >>>> >> >>>  - The relational Stuff
> > >>>> >> >>>
> > >>>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <
> > [hidden email]>
> > >>>> >> wrote:
> > >>>> >> >>> > +1 for removing RelationQuery
> > >>>> >> >>> >
> > >>>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
> > >>>> >> [hidden email]>
> > >>>> >> >>> > wrote:
> > >>>> >> >>> >
> > >>>> >> >>> >> By the way, what was called BatchGradientDescent in the
> > Scala
> > >>>> >> examples
> > >>>> >> >>> >> should be replaced by a port of the LinearRegression
> > Example from
> > >>>> >> >>> >> Java. I had them as two separate examples earlier.
> > >>>> >> >>> >>
> > >>>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts
> about
> > >>>> removing
> > >>>> >> >>> >> RelationalQuery?
> > >>>> >> >>> >>
> > >>>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
> > >>>> >> [hidden email]
> > >>>> >> >>> >
> > >>>> >> >>> >> wrote:
> > >>>> >> >>> >> > I added the Triangle Enumeration Examples, thanks
> Fabian.
> > >>>> >> >>> >> >
> > >>>> >> >>> >> > So far we have ported: WordCount, KMeans,
> > ConnectedComponents,
> > >>>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
> > >>>> >> TriangleEnumerationNaive/Opt
> > >>>> >> >>> >> >
> > >>>> >> >>> >> > These are the examples people called dibs on:
> > >>>> >> >>> >> >  - PageRank (Fabian)
> > >>>> >> >>> >> >  - BatchGradientDescent (Márton)
> > >>>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
> > >>>> >> >>> >> >
> > >>>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
> > >>>> >> >>> >> >  - The relational Stuff
> > >>>> >> >>> >> >  - LinearRegression
> > >>>> >> >>> >> >
> > >>>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
> > >>>> >> >>> [hidden email]>
> > >>>> >> >>> >> wrote:
> > >>>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
> > >>>> ported/unported
> > >>>> >> >>> >> >> examples in my mails. I'll rename the java example
> > package to
> > >>>> >> >>> examples
> > >>>> >> >>> >> >> once the Scala API merge is done.
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> I think the termination criterion is fine as it is.
> Just
> > >>>> because
> > >>>> >> >>> Scala
> > >>>> >> >>> >> >> enables functional programming doesn't mean it's always
> > the
> > >>>> best
> > >>>> >> >>> >> >> choice. :D
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> So far we have ported: WordCount, KMeans,
> > ConnectedComponents,
> > >>>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> These are the examples people called dibs on:
> > >>>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
> > >>>> >> >>> >> >>  - BatchGradientDescent (Márton)
> > >>>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
> > >>>> >> >>> >> >>  - The relational Stuff
> > >>>> >> >>> >> >>  - LinearRegression
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> Cheers,
> > >>>> >> >>> >> >> Aljoscha
> > >>>> >> >>> >> >>
> > >>>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
> > >>>> >> [hidden email]
> > >>>> >> >>> >
> > >>>> >> >>> >> wrote:
> > >>>> >> >>> >> >>> Transitive closure here, I also added a termination
> > criterion
> > >>>> >> in the
> > >>>> >> >>> >> Java
> > >>>> >> >>> >> >>> version:
> > >>>> >> >>> >>
> > >>>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
> > >>>> >> >>> >> >>>
> > >>>> >> >>> >> >>> Perhaps you can make the termination criterion in
> Scala
> > more
> > >>>> >> >>> >> functional?
> > >>>> >> >>> >> >>>
> > >>>> >> >>> >> >>> I noticed that the examples package name is
> > example.java but
> > >>>> >> >>> >> examples.scala
> > >>>> >> >>> >> >>>
> > >>>> >> >>> >> >>> Kostas
> > >>>> >> >>> >> >>>
> > >>>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
> > >>>> >> [hidden email]
> > >>>> >> >>> >
> > >>>> >> >>> >> wrote:
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not
> > on
> > >>>> your
> > >>>> >> >>> list).
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can
> > take
> > >>>> those
> > >>>> >> as
> > >>>> >> >>> >> well.
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>> How about removing the "RelationalQuery" from both
> > Scala and
> > >>>> >> Java?
> > >>>> >> >>> It
> > >>>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add
> > some
> > >>>> >> teaching
> > >>>> >> >>> >> value on
> > >>>> >> >>> >> >>>> top of TPC-H Q3?
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>> Kostas
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
> > >>>> >> >>> [hidden email]
> > >>>> >> >>> >> >
> > >>>> >> >>> >> >>>> wrote:
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
> > >>>> ConnectedComponents,
> > >>>> >> >>> >> >>>>> WebLogAnalysis
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> These are the examples people called dibs on:
> > >>>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
> > >>>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
> > >>>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
> > >>>> >> >>> >> >>>>>  - TransitiveClosure
> > >>>> >> >>> >> >>>>>  - The relational Stuff
> > >>>> >> >>> >> >>>>>  - LinearRegression
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> Cheers,
> > >>>> >> >>> >> >>>>> Aljoscha
> > >>>> >> >>> >> >>>>>
> > >>>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
> > >>>> >> >>> [hidden email]>
> > >>>> >> >>> >> >>>>> wrote:
> > >>>> >> >>> >> >>>>> > WebLog here:
> > >>>> >> >>> >> >>>>> >
> > >>>> >> >>> >> >>>>> >
> > >>>> >> >>> >>
> > >>>> >> >>>
> > >>>> >>
> > >>>>
> >
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> > >>>> >> >>> >> >>>>> >
> > >>>> >> >>> >> >>>>> > Do you need any more done?
> > >>>> >> >>> >> >>>>> >
> > >>>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
> > >>>> >> >>> >> [hidden email]>
> > >>>> >> >>> >> >>>>> > wrote:
> > >>>> >> >>> >> >>>>> >
> > >>>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from
> Vasia.
> > >>>> >> >>> >> >>>>> >>
> > >>>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
> > >>>> >> >>> >> >>>>> >>
> > >>>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
> > >>>> >> >>> [hidden email]
> > >>>> >> >>> >> >
> > >>>> >> >>> >> >>>>> >> wrote:
> > >>>> >> >>> >> >>>>> >> > Alright, will do.
> > >>>> >> >>> >> >>>>> >> > Thanks!
> > >>>> >> >>> >> >>>>> >> >
> > >>>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
> > >>>> >> >>> >> [hidden email]>:
> > >>>> >> >>> >> >>>>> >> >
> > >>>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and
> > KMeans.scala. I'm
> > >>>> >> storing
> > >>>> >> >>> >> the
> > >>>> >> >>> >> >>>>> >> >> data
> > >>>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
> > >>>> converting
> > >>>> >> it to
> > >>>> >> >>> >> the
> > >>>> >> >>> >> >>>>> >> >> required Java or Scala objects.
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make
> it
> > >>>> >> consistent
> > >>>> >> >>> >> with the
> > >>>> >> >>> >> >>>>> >> >> Java
> > >>>> >> >>> >> >>>>> >> >> API.
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need
> > for a
> > >>>> >> >>> keyword,
> > >>>> >> >>> >> you
> > >>>> >> >>> >> >>>>> >> >> can
> > >>>> >> >>> >> >>>>> >> >> just write:
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le,
> re)
> > =>
> > >>>> new
> > >>>> >> >>> >> MyResult(le,
> > >>>> >> >>> >> >>>>> >> >> re)
> > >>>> >> >>> >> >>>>> >> }
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske
> <
> > >>>> >> >>> >> [hidden email]>
> > >>>> >> >>> >> >>>>> >> wrote:
> > >>>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found
> an
> > >>>> >> >>> inconsistency
> > >>>> >> >>> >> with
> > >>>> >> >>> >> >>>>> >> >> > the
> > >>>> >> >>> >> >>>>> >> Java
> > >>>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(),
> > IMO.
> > >>>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the
> > with()
> > >>>> >> method
> > >>>> >> >>> >> because
> > >>>> >> >>> >> >>>>> >> "with"
> > >>>> >> >>> >> >>>>> >> >> is
> > >>>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer
> something
> > >>>> similar
> > >>>> >> for
> > >>>> >> >>> >> Scala
> > >>>> >> >>> >> >>>>> >> >> > or go
> > >>>> >> >>> >> >>>>> >> >> with
> > >>>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
> > >>>> >> [hidden email]
> > >>>> >> >>> >:
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work
> as
> > well.
> > >>>> >> That
> > >>>> >> >>> is a
> > >>>> >> >>> >> >>>>> >> >> >> generic
> > >>>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
> > >>>> >> >>> >> >>>>> >> >> >>
> > >>>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java
> > or
> > >>>> Scala
> > >>>> >> >>> Tuples,
> > >>>> >> >>> >> >>>>> >> >> >> with a
> > >>>> >> >>> >> >>>>> >> >> generic
> > >>>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
> > >>>> >> >>> >> >>>>> >> >> >>
> > >>>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian
> > Hueske
> > >>>> >> >>> >> >>>>> >> >> >> <[hidden email]>
> > >>>> >> >>> >> >>>>> >> >> wrote:
> > >>>> >> >>> >> >>>>> >> >> >>
> > >>>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
> > >>>> >> >>> >> >>>>> >> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,
> but
> > >>>> using
> > >>>> >> the
> > >>>> >> >>> >> >>>>> >> >> >> > CSVFormat
> > >>>> >> >>> >> >>>>> >> won't
> > >>>> >> >>> >> >>>>> >> >> >> work
> > >>>> >> >>> >> >>>>> >> >> >> > because this is based on a
> FileInputFormat.
> > >>>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
> > >>>> manually...
> > >>>> >> >>> >> >>>>> >> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha
> Krettek
> > >>>> >> >>> >> >>>>> >> >> >> > <[hidden email]>:
> > >>>> >> >>> >> >>>>> >> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > Hi,
> > >>>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just
> > change
> > >>>> >> all
> > >>>> >> >>> the
> > >>>> >> >>> >> >>>>> >> >> >> > > example
> > >>>> >> >>> >> >>>>> >> input
> > >>>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input
> > formats in
> > >>>> all
> > >>>> >> the
> > >>>> >> >>> >> >>>>> >> >> >> > > examples.
> > >>>> >> >>> >> >>>>> >> What
> > >>>> >> >>> >> >>>>> >> >> do
> > >>>> >> >>> >> >>>>> >> >> >> > > you think?
> > >>>> >> >>> >> >>>>> >> >> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > Cheers,
> > >>>> >> >>> >> >>>>> >> >> >> > > Aljoscha
> > >>>> >> >>> >> >>>>> >> >> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM,
> Aljoscha
> > >>>> Krettek
> > >>>> >> <
> > >>>> >> >>> >> >>>>> >> >> [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > > Hi,
> > >>>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data
> > types are
> > >>>> >> >>> >> incompatible.
> > >>>> >> >>> >> >>>>> >> >> >> > > > I'm
> > >>>> >> >>> >> >>>>> >> >> afraid
> > >>>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed:
> move
> > the
> > >>>> >> data to
> > >>>> >> >>> a
> > >>>> >> >>> >> >>>>> >> >> >> > > > static
> > >>>> >> >>> >> >>>>> >> field
> > >>>> >> >>> >> >>>>> >> >> and
> > >>>> >> >>> >> >>>>> >> >> >> > > > convert it in the
> > getDefaultEdgeDataSet()
> > >>>> >> method in
> > >>>> >> >>> >> Scala.
> > >>>> >> >>> >> >>>>> >> >> >> > > > It's
> > >>>> >> >>> >> >>>>> >> >> not
> > >>>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the
> > data
> > >>>> and
> > >>>> >> >>> make it
> > >>>> >> >>> >> >>>>> >> >> >> > > > easier
> > >>>> >> >>> >> >>>>> >> for
> > >>>> >> >>> >> >>>>> >> >> it
> > >>>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and
> Scala
> > >>>> >> versions.
> > >>>> >> >>> >> >>>>> >> >> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will
> > probably
> > >>>> >> occur
> > >>>> >> >>> in
> > >>>> >> >>> >> all
> > >>>> >> >>> >> >>>>> >> >> >> > > > the
> > >>>> >> >>> >> >>>>> >> >> >> examples.
> > >>>> >> >>> >> >>>>> >> >> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > > Cheers,
> > >>>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
> > >>>> >> >>> >> >>>>> >> >> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM,
> > Vasiliki
> > >>>> >> Kalavri
> > >>>> >> >>> >> >>>>> >> >> >> > > > <[hidden email]> wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >> Hey,
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected
> Components
> > >>>> >> example,
> > >>>> >> >>> but
> > >>>> >> >>> >> I am
> > >>>> >> >>> >> >>>>> >> >> >> > > >> not
> > >>>> >> >>> >> >>>>> >> sure
> > >>>> >> >>> >> >>>>> >> >> >> how
> > >>>> >> >>> >> >>>>> >> >> >> > to
> > >>>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
> > >>>> >> java-examples.
> > >>>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData
> class,
> > the
> > >>>> >> vertices
> > >>>> >> >>> >> and
> > >>>> >> >>> >> >>>>> >> >> >> > > >> edges
> > >>>> >> >>> >> >>>>> >> data
> > >>>> >> >>> >> >>>>> >> >> >> are
> > >>>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
> > >>>> >> getDefaultVertexDataSet()
> > >>>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which
> > take
> > >>>> >> >>> >> >>>>> >> >> >> > > >> an
> > >>>> >> org.apache.flink.api.java.ExecutionEnvironment
> > >>>> >> >>> as
> > >>>> >> >>> >> >>>>> >> parameter.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static
> > fields
> > >>>> >> (like
> > >>>> >> >>> in
> > >>>> >> >>> >> the
> > >>>> >> >>> >> >>>>> >> >> >> WordCountData
> > >>>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a
> > conversion
> > >>>> >> >>> >> >>>>> >> >> >> > > >> from
> > >>>> org.apache.flink.api.java.tuple.Tuple2 to
> > >>>> >> >>> Scala
> > >>>> >> >>> >> >>>>> >> >> >> > > >> tuple and
> > >>>> >> >>> >> >>>>> >> >> from
> > >>>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I
> > guess
> > >>>> this
> > >>>> >> is
> > >>>> >> >>> an
> > >>>> >> >>> >> >>>>> >> unnecessary
> > >>>> >> >>> >> >>>>> >> >> >> > > complexity
> > >>>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
> > >>>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy
> the
> > >>>> example
> > >>>> >> >>> data
> > >>>> >> >>> >> in
> > >>>> >> >>> >> >>>>> >> >> >> > > >> the
> > >>>> >> >>> >> >>>>> >> Scala
> > >>>> >> >>> >> >>>>> >> >> >> > > example.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
> > >>>> >> >>> >> >>>>> >> >> >> > > >> V.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
> > >>>> Krettek <
> > >>>> >> >>> >> >>>>> >> [hidden email]
> > >>>> >> >>> >> >>>>> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >>
> > https://github.com/aljoscha/incubator-flink/commits/scala-rework
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount
> > example.
> > >>>> >> It's
> > >>>> >> >>> >> pretty
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> much a
> > >>>> >> >>> >> >>>>> >> >> copy
> > >>>> >> >>> >> >>>>> >> >> >> of
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups
> > for the
> > >>>> >> syntax
> > >>>> >> >>> and
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> lambda
> > >>>> >> >>> >> >>>>> >> >> >> > functions.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
> > >>>> >> java-examples
> > >>>> >> >>> >> as a
> > >>>> >> >>> >> >>>>> >> >> dependency
> > >>>> >> >>> >> >>>>> >> >> >> for
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to
> > reuse
> > >>>> the
> > >>>> >> >>> example
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> input
> > >>>> >> >>> >> >>>>> >> data.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can
> do
> > a
> > >>>> pull
> > >>>> >> >>> request
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> against
> > >>>> >> >>> >> >>>>> >> my
> > >>>> >> >>> >> >>>>> >> >> repo
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM,
> > Hermann
> > >>>> >> Gábor <
> > >>>> >> >>> >> >>>>> >> >> >> [hidden email]
> > >>>> >> >>> >> >>>>> >> >> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> > +1
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM,
> > Márton
> > >>>> >> >>> Balassi <
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM,
> > Kostas
> > >>>> >> >>> Tzoumas <
> > >>>> >> >>> >> >>>>> >> >> >> > > [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala
> > consists of
> > >>>> >> going
> > >>>> >> >>> >> through
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
> > >>>> >> >>> >> >>>>> >> >> tutorial
> > >>>> >> >>> >> >>>>> >> >> >> so
> > >>>> >> >>> >> >>>>> >> >> >> > > this
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> will
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for
> > me and
> > >>>> >> the
> > >>>> >> >>> new
> > >>>> >> >>> >> API
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09
> PM,
> > >>>> Vasiliki
> > >>>> >> >>> >> Kalavri <
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
> > >>>> implement
> > >>>> >> the
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and
> > Kmeans for
> > >>>> >> me :)
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03,
> > Fabian
> > >>>> >> Hueske <
> > >>>> >> >>> >> >>>>> >> >> >> [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for
> > TriangleEnumeration and
> > >>>> >> >>> PageRank.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
> > >>>> similar
> > >>>> >> to
> > >>>> >> >>> the
> > >>>> >> >>> >> Java
> > >>>> >> >>> >> >>>>> >> >> examples:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box
> > without
> > >>>> >> >>> parameters
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external
> > data
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code
> > structure
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
> > >>>> Aljoscha
> > >>>> >> >>> >> Krettek <
> > >>>> >> >>> >> >>>>> >> >> >> > > [hidden email]
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
> > >>>> reserve
> > >>>> >> their
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
> > >>>> >> >>> >> >>>>> >> >> >> examples
> > >>>> >> >>> >> >>>>> >> >> >> > > here.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at
> > 8:55 PM,
> > >>>> >> Fabian
> > >>>> >> >>> >> Hueske
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having
> examples
> > >>>> >> implemented
> > >>>> >> >>> by
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
> > >>>> >> >>> >> >>>>> >> >> >> people
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> proved to
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or
> > three
> > >>>> >> examples.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if
> > you'd
> > >>>> >> port a
> > >>>> >> >>> >> simple
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
> > >>>> >> >>> >> >>>>> >> >> one
> > >>>> >> >>> >> >>>>> >> >> >> > such
> > >>>> >> >>> >> >>>>> >> >> >> > > as
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47
> > GMT+02:00
> > >>>> >> Aljoscha
> > >>>> >> >>> >> Krettek
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> [hidden email]
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working
> > rewrite of
> > >>>> the
> > >>>> >> >>> Scala
> > >>>> >> >>> >> API
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >>
> > >>>> >> >>> >>
> > https://github.com/aljoscha/incubator-flink/commits/scala-rework
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll
> > only
> > >>>> have
> > >>>> >> to
> > >>>> >> >>> >> write
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
> > >>>> >> >>> >> >>>>> >> tests
> > >>>> >> >>> >> >>>>> >> >> and
> > >>>> >> >>> >> >>>>> >> >> >> > > port
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> the
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you
> think
> > it
> > >>>> makes
> > >>>> >> >>> sense
> > >>>> >> >>> >> to
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
> > >>>> >> >>> >> >>>>> >> other
> > >>>> >> >>> >> >>>>> >> >> >> > people
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> port
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> the
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that
> > someone
> > >>>> else
> > >>>> >> uses
> > >>>> >> >>> >> it and
> > >>>> >> >>> >> >>>>> >> maybe
> > >>>> >> >>> >> >>>>> >> >> >> > notices
> > >>>> >> >>> >> >>>>> >> >> >> > > some
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
> > >>>> >> >>> >> >>>>> >> >> >> > > >>> >>
> > >>>> >> >>> >> >>>>> >> >> >> > > >>>
> > >>>> >> >>> >> >>>>> >> >> >> > >
> > >>>> >> >>> >> >>>>> >> >> >> >
> > >>>> >> >>> >> >>>>> >> >> >>
> > >>>> >> >>> >> >>>>> >> >>
> > >>>> >> >>> >> >>>>> >>
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>>
> > >>>> >> >>> >> >>>
> > >>>> >> >>> >>
> > >>>> >> >>>
> > >>>> >>
> > >>>>
> >
>
123