Hi,
I have a working rewrite of the Scala API here: https://github.com/aljoscha/incubator-flink/commits/scala-rework I'm hoping that I'll only have to write the tests and port the examples. Do you think it makes sense to let other people port the examples, so that someone else uses it and maybe notices some quirks in the API? Cheers, Aljoscha |
Hi,
I think having examples implemented by different people proved to be valuable in the past. I'd help with two or three examples. It might be helpful if you'd port a simple first one such as WordCount. Fabian 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > Hi, > I have a working rewrite of the Scala API here: > https://github.com/aljoscha/incubator-flink/commits/scala-rework > > I'm hoping that I'll only have to write the tests and port the > examples. Do you think it makes sense to let other people port the > examples, so that someone else uses it and maybe notices some quirks > in the API? > > Cheers, > Aljoscha > |
Will do, then people can reserve their favourite examples here.
On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> wrote: > Hi, > > I think having examples implemented by different people proved to be > valuable in the past. > I'd help with two or three examples. > > It might be helpful if you'd port a simple first one such as WordCount. > > Fabian > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > >> Hi, >> I have a working rewrite of the Scala API here: >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> I'm hoping that I'll only have to write the tests and port the >> examples. Do you think it makes sense to let other people port the >> examples, so that someone else uses it and maybe notices some quirks >> in the API? >> >> Cheers, >> Aljoscha >> |
I go for TriangleEnumeration and PageRank.
Let's also do the examples similar to the Java examples: - running out-of-the-box without parameters - parameters for external data - follow a similar code structure 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: > Will do, then people can reserve their favourite examples here. > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> wrote: > > Hi, > > > > I think having examples implemented by different people proved to be > > valuable in the past. > > I'd help with two or three examples. > > > > It might be helpful if you'd port a simple first one such as WordCount. > > > > Fabian > > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > >> Hi, > >> I have a working rewrite of the Scala API here: > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> > >> I'm hoping that I'll only have to write the tests and port the > >> examples. Do you think it makes sense to let other people port the > >> examples, so that someone else uses it and maybe notices some quirks > >> in the API? > >> > >> Cheers, > >> Aljoscha > >> > |
+1 for having other people implement the examples!
Connected Components and Kmeans for me :) -V. On 4 September 2014 21:03, Fabian Hueske <[hidden email]> wrote: > I go for TriangleEnumeration and PageRank. > > Let's also do the examples similar to the Java examples: > - running out-of-the-box without parameters > - parameters for external data > - follow a similar code structure > > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > Will do, then people can reserve their favourite examples here. > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> > wrote: > > > Hi, > > > > > > I think having examples implemented by different people proved to be > > > valuable in the past. > > > I'd help with two or three examples. > > > > > > It might be helpful if you'd port a simple first one such as WordCount. > > > > > > Fabian > > > > > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > >> Hi, > > >> I have a working rewrite of the Scala API here: > > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > >> > > >> I'm hoping that I'll only have to write the tests and port the > > >> examples. Do you think it makes sense to let other people port the > > >> examples, so that someone else uses it and maybe notices some quirks > > >> in the API? > > >> > > >> Cheers, > > >> Aljoscha > > >> > > > |
+1
I go for WebLogAnalysis. My experience with Scala consists of going through a tutorial so this will be a good stress test both for me and the new API :-) On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <[hidden email]> wrote: > +1 for having other people implement the examples! > Connected Components and Kmeans for me :) > > -V. > > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> wrote: > > > I go for TriangleEnumeration and PageRank. > > > > Let's also do the examples similar to the Java examples: > > - running out-of-the-box without parameters > > - parameters for external data > > - follow a similar code structure > > > > > > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > Will do, then people can reserve their favourite examples here. > > > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> > > wrote: > > > > Hi, > > > > > > > > I think having examples implemented by different people proved to be > > > > valuable in the past. > > > > I'd help with two or three examples. > > > > > > > > It might be helpful if you'd port a simple first one such as > WordCount. > > > > > > > > Fabian > > > > > > > > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > > > >> Hi, > > > >> I have a working rewrite of the Scala API here: > > > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > >> > > > >> I'm hoping that I'll only have to write the tests and port the > > > >> examples. Do you think it makes sense to let other people port the > > > >> examples, so that someone else uses it and maybe notices some quirks > > > >> in the API? > > > >> > > > >> Cheers, > > > >> Aljoscha > > > >> > > > > > > |
+1
BatchGradientDescent for me :) On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> wrote: > +1 > > I go for WebLogAnalysis. > > My experience with Scala consists of going through a tutorial so this will > be a good stress test both for me and the new API :-) > > > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > [hidden email]> > wrote: > > > +1 for having other people implement the examples! > > Connected Components and Kmeans for me :) > > > > -V. > > > > > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> wrote: > > > > > I go for TriangleEnumeration and PageRank. > > > > > > Let's also do the examples similar to the Java examples: > > > - running out-of-the-box without parameters > > > - parameters for external data > > > - follow a similar code structure > > > > > > > > > > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > > > Will do, then people can reserve their favourite examples here. > > > > > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> > > > wrote: > > > > > Hi, > > > > > > > > > > I think having examples implemented by different people proved to > be > > > > > valuable in the past. > > > > > I'd help with two or three examples. > > > > > > > > > > It might be helpful if you'd port a simple first one such as > > WordCount. > > > > > > > > > > Fabian > > > > > > > > > > > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > > > > > >> Hi, > > > > >> I have a working rewrite of the Scala API here: > > > > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > > >> > > > > >> I'm hoping that I'll only have to write the tests and port the > > > > >> examples. Do you think it makes sense to let other people port the > > > > >> examples, so that someone else uses it and maybe notices some > quirks > > > > >> in the API? > > > > >> > > > > >> Cheers, > > > > >> Aljoscha > > > > >> > > > > > > > > > > |
+1
ComputeEdgeDegrees for me! On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <[hidden email]> wrote: > +1 > > BatchGradientDescent for me :) > > > On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> > wrote: > > > +1 > > > > I go for WebLogAnalysis. > > > > My experience with Scala consists of going through a tutorial so this > will > > be a good stress test both for me and the new API :-) > > > > > > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > > [hidden email]> > > wrote: > > > > > +1 for having other people implement the examples! > > > Connected Components and Kmeans for me :) > > > > > > -V. > > > > > > > > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> wrote: > > > > > > > I go for TriangleEnumeration and PageRank. > > > > > > > > Let's also do the examples similar to the Java examples: > > > > - running out-of-the-box without parameters > > > > - parameters for external data > > > > - follow a similar code structure > > > > > > > > > > > > > > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > > > > > Will do, then people can reserve their favourite examples here. > > > > > > > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > I think having examples implemented by different people proved to > > be > > > > > > valuable in the past. > > > > > > I'd help with two or three examples. > > > > > > > > > > > > It might be helpful if you'd port a simple first one such as > > > WordCount. > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email] > >: > > > > > > > > > > > >> Hi, > > > > > >> I have a working rewrite of the Scala API here: > > > > > >> > https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > > > >> > > > > > >> I'm hoping that I'll only have to write the tests and port the > > > > > >> examples. Do you think it makes sense to let other people port > the > > > > > >> examples, so that someone else uses it and maybe notices some > > quirks > > > > > >> in the API? > > > > > >> > > > > > >> Cheers, > > > > > >> Aljoscha > > > > > >> > > > > > > > > > > > > > > > |
Alright, I updated my repo:
https://github.com/aljoscha/incubator-flink/commits/scala-rework This now has a working WordCount example. It's pretty much a copy of the Java example with some fixups for the syntax and lambda functions. You'll also notice that I added the java-examples as a dependency for the scala-examples. I did this to reuse the example input data. When you ported a program you can do a pull request against my repo and I will collect the examples. Happy coding. :D On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> wrote: > +1 > > ComputeEdgeDegrees for me! > > > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <[hidden email]> > wrote: > >> +1 >> >> BatchGradientDescent for me :) >> >> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> >> wrote: >> >> > +1 >> > >> > I go for WebLogAnalysis. >> > >> > My experience with Scala consists of going through a tutorial so this >> will >> > be a good stress test both for me and the new API :-) >> > >> > >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >> > [hidden email]> >> > wrote: >> > >> > > +1 for having other people implement the examples! >> > > Connected Components and Kmeans for me :) >> > > >> > > -V. >> > > >> > > >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> wrote: >> > > >> > > > I go for TriangleEnumeration and PageRank. >> > > > >> > > > Let's also do the examples similar to the Java examples: >> > > > - running out-of-the-box without parameters >> > > > - parameters for external data >> > > > - follow a similar code structure >> > > > >> > > > >> > > > >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email]>: >> > > > >> > > > > Will do, then people can reserve their favourite examples here. >> > > > > >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske <[hidden email]> >> > > > wrote: >> > > > > > Hi, >> > > > > > >> > > > > > I think having examples implemented by different people proved to >> > be >> > > > > > valuable in the past. >> > > > > > I'd help with two or three examples. >> > > > > > >> > > > > > It might be helpful if you'd port a simple first one such as >> > > WordCount. >> > > > > > >> > > > > > Fabian >> > > > > > >> > > > > > >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek <[hidden email] >> >: >> > > > > > >> > > > > >> Hi, >> > > > > >> I have a working rewrite of the Scala API here: >> > > > > >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> > > > > >> >> > > > > >> I'm hoping that I'll only have to write the tests and port the >> > > > > >> examples. Do you think it makes sense to let other people port >> the >> > > > > >> examples, so that someone else uses it and maybe notices some >> > quirks >> > > > > >> in the API? >> > > > > >> >> > > > > >> Cheers, >> > > > > >> Aljoscha >> > > > > >> >> > > > > >> > > > >> > > >> > >> |
Hey,
I have ported the Connected Components example, but I am not sure how to reuse the example input data from java-examples. In the ConnectedComponentsData class, the vertices and edges data are produced by the methods getDefaultVertexDataSet() and getDefaultEdgeDataSet(), which take an org.apache.flink.api.java.ExecutionEnvironment as parameter. One way is to provide public static fields (like in the WordCountData class), but this introduces a conversion from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from java.lang.Long to scala.Long and I guess this is an unnecessary complexity for an example (?). Another way is, of course, to copy the example data in the Scala example. Am I missing something here? Thanks! Cheers, V. On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> wrote: > Alright, I updated my repo: > https://github.com/aljoscha/incubator-flink/commits/scala-rework > > This now has a working WordCount example. It's pretty much a copy of > the Java example with some fixups for the syntax and lambda functions. > You'll also notice that I added the java-examples as a dependency for > the scala-examples. I did this to reuse the example input data. > > When you ported a program you can do a pull request against my repo > and I will collect the examples. > > Happy coding. :D > > On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> > wrote: > > +1 > > > > ComputeEdgeDegrees for me! > > > > > > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > [hidden email]> > > wrote: > > > >> +1 > >> > >> BatchGradientDescent for me :) > >> > >> > >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> > >> wrote: > >> > >> > +1 > >> > > >> > I go for WebLogAnalysis. > >> > > >> > My experience with Scala consists of going through a tutorial so this > >> will > >> > be a good stress test both for me and the new API :-) > >> > > >> > > >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >> > [hidden email]> > >> > wrote: > >> > > >> > > +1 for having other people implement the examples! > >> > > Connected Components and Kmeans for me :) > >> > > > >> > > -V. > >> > > > >> > > > >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> > wrote: > >> > > > >> > > > I go for TriangleEnumeration and PageRank. > >> > > > > >> > > > Let's also do the examples similar to the Java examples: > >> > > > - running out-of-the-box without parameters > >> > > > - parameters for external data > >> > > > - follow a similar code structure > >> > > > > >> > > > > >> > > > > >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email] > >: > >> > > > > >> > > > > Will do, then people can reserve their favourite examples here. > >> > > > > > >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > [hidden email]> > >> > > > wrote: > >> > > > > > Hi, > >> > > > > > > >> > > > > > I think having examples implemented by different people > proved to > >> > be > >> > > > > > valuable in the past. > >> > > > > > I'd help with two or three examples. > >> > > > > > > >> > > > > > It might be helpful if you'd port a simple first one such as > >> > > WordCount. > >> > > > > > > >> > > > > > Fabian > >> > > > > > > >> > > > > > > >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > [hidden email] > >> >: > >> > > > > > > >> > > > > >> Hi, > >> > > > > >> I have a working rewrite of the Scala API here: > >> > > > > >> > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> > > > > >> > >> > > > > >> I'm hoping that I'll only have to write the tests and port > the > >> > > > > >> examples. Do you think it makes sense to let other people > port > >> the > >> > > > > >> examples, so that someone else uses it and maybe notices some > >> > quirks > >> > > > > >> in the API? > >> > > > > >> > >> > > > > >> Cheers, > >> > > > > >> Aljoscha > >> > > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > |
Hi,
yes it's unfortunate that the data types are incompatible. I'm afraid you have to to what you proposed: move the data to a static field and convert it in the getDefaultEdgeDataSet() method in Scala. It's not nice, but copying would duplicate the data and make it easier for it to go out of sync in the Java and Scala versions. What do the others think? This will probably occur in all the examples. Cheers, Aljoscha On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri <[hidden email]> wrote: > Hey, > > I have ported the Connected Components example, but I am not sure how to > reuse the example input data from java-examples. > In the ConnectedComponentsData class, the vertices and edges data are > produced by the methods getDefaultVertexDataSet() > and getDefaultEdgeDataSet(), which take > an org.apache.flink.api.java.ExecutionEnvironment as parameter. > > One way is to provide public static fields (like in the WordCountData > class), but this introduces a conversion > from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > java.lang.Long to scala.Long and I guess this is an unnecessary complexity > for an example (?). > Another way is, of course, to copy the example data in the Scala example. > > Am I missing something here? > > Thanks! > > Cheers, > V. > > > On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> wrote: > >> Alright, I updated my repo: >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> This now has a working WordCount example. It's pretty much a copy of >> the Java example with some fixups for the syntax and lambda functions. >> You'll also notice that I added the java-examples as a dependency for >> the scala-examples. I did this to reuse the example input data. >> >> When you ported a program you can do a pull request against my repo >> and I will collect the examples. >> >> Happy coding. :D >> >> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> >> wrote: >> > +1 >> > >> > ComputeEdgeDegrees for me! >> > >> > >> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >> [hidden email]> >> > wrote: >> > >> >> +1 >> >> >> >> BatchGradientDescent for me :) >> >> >> >> >> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> >> >> wrote: >> >> >> >> > +1 >> >> > >> >> > I go for WebLogAnalysis. >> >> > >> >> > My experience with Scala consists of going through a tutorial so this >> >> will >> >> > be a good stress test both for me and the new API :-) >> >> > >> >> > >> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >> >> > [hidden email]> >> >> > wrote: >> >> > >> >> > > +1 for having other people implement the examples! >> >> > > Connected Components and Kmeans for me :) >> >> > > >> >> > > -V. >> >> > > >> >> > > >> >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> >> wrote: >> >> > > >> >> > > > I go for TriangleEnumeration and PageRank. >> >> > > > >> >> > > > Let's also do the examples similar to the Java examples: >> >> > > > - running out-of-the-box without parameters >> >> > > > - parameters for external data >> >> > > > - follow a similar code structure >> >> > > > >> >> > > > >> >> > > > >> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email] >> >: >> >> > > > >> >> > > > > Will do, then people can reserve their favourite examples here. >> >> > > > > >> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < >> [hidden email]> >> >> > > > wrote: >> >> > > > > > Hi, >> >> > > > > > >> >> > > > > > I think having examples implemented by different people >> proved to >> >> > be >> >> > > > > > valuable in the past. >> >> > > > > > I'd help with two or three examples. >> >> > > > > > >> >> > > > > > It might be helpful if you'd port a simple first one such as >> >> > > WordCount. >> >> > > > > > >> >> > > > > > Fabian >> >> > > > > > >> >> > > > > > >> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < >> [hidden email] >> >> >: >> >> > > > > > >> >> > > > > >> Hi, >> >> > > > > >> I have a working rewrite of the Scala API here: >> >> > > > > >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> > > > > >> >> >> > > > > >> I'm hoping that I'll only have to write the tests and port >> the >> >> > > > > >> examples. Do you think it makes sense to let other people >> port >> >> the >> >> > > > > >> examples, so that someone else uses it and maybe notices some >> >> > quirks >> >> > > > > >> in the API? >> >> > > > > >> >> >> > > > > >> Cheers, >> >> > > > > >> Aljoscha >> >> > > > > >> >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> |
Hi,
on second thought. Maybe we should just change all the example input data to strings and use CSV input formats in all the examples. What do you think? Cheers, Aljoscha On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> wrote: > Hi, > yes it's unfortunate that the data types are incompatible. I'm afraid > you have to to what you proposed: move the data to a static field and > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > nice, but copying would duplicate the data and make it easier for it > to go out of sync in the Java and Scala versions. > > What do the others think? This will probably occur in all the examples. > > Cheers, > Aljoscha > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > <[hidden email]> wrote: >> Hey, >> >> I have ported the Connected Components example, but I am not sure how to >> reuse the example input data from java-examples. >> In the ConnectedComponentsData class, the vertices and edges data are >> produced by the methods getDefaultVertexDataSet() >> and getDefaultEdgeDataSet(), which take >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. >> >> One way is to provide public static fields (like in the WordCountData >> class), but this introduces a conversion >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from >> java.lang.Long to scala.Long and I guess this is an unnecessary complexity >> for an example (?). >> Another way is, of course, to copy the example data in the Scala example. >> >> Am I missing something here? >> >> Thanks! >> >> Cheers, >> V. >> >> >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> wrote: >> >>> Alright, I updated my repo: >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>> >>> This now has a working WordCount example. It's pretty much a copy of >>> the Java example with some fixups for the syntax and lambda functions. >>> You'll also notice that I added the java-examples as a dependency for >>> the scala-examples. I did this to reuse the example input data. >>> >>> When you ported a program you can do a pull request against my repo >>> and I will collect the examples. >>> >>> Happy coding. :D >>> >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> >>> wrote: >>> > +1 >>> > >>> > ComputeEdgeDegrees for me! >>> > >>> > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >>> [hidden email]> >>> > wrote: >>> > >>> >> +1 >>> >> >>> >> BatchGradientDescent for me :) >>> >> >>> >> >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <[hidden email]> >>> >> wrote: >>> >> >>> >> > +1 >>> >> > >>> >> > I go for WebLogAnalysis. >>> >> > >>> >> > My experience with Scala consists of going through a tutorial so this >>> >> will >>> >> > be a good stress test both for me and the new API :-) >>> >> > >>> >> > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >>> >> > [hidden email]> >>> >> > wrote: >>> >> > >>> >> > > +1 for having other people implement the examples! >>> >> > > Connected Components and Kmeans for me :) >>> >> > > >>> >> > > -V. >>> >> > > >>> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> >>> wrote: >>> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >>> >> > > > >>> >> > > > Let's also do the examples similar to the Java examples: >>> >> > > > - running out-of-the-box without parameters >>> >> > > > - parameters for external data >>> >> > > > - follow a similar code structure >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <[hidden email] >>> >: >>> >> > > > >>> >> > > > > Will do, then people can reserve their favourite examples here. >>> >> > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < >>> [hidden email]> >>> >> > > > wrote: >>> >> > > > > > Hi, >>> >> > > > > > >>> >> > > > > > I think having examples implemented by different people >>> proved to >>> >> > be >>> >> > > > > > valuable in the past. >>> >> > > > > > I'd help with two or three examples. >>> >> > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one such as >>> >> > > WordCount. >>> >> > > > > > >>> >> > > > > > Fabian >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < >>> [hidden email] >>> >> >: >>> >> > > > > > >>> >> > > > > >> Hi, >>> >> > > > > >> I have a working rewrite of the Scala API here: >>> >> > > > > >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>> >> > > > > >> >>> >> > > > > >> I'm hoping that I'll only have to write the tests and port >>> the >>> >> > > > > >> examples. Do you think it makes sense to let other people >>> port >>> >> the >>> >> > > > > >> examples, so that someone else uses it and maybe notices some >>> >> > quirks >>> >> > > > > >> in the API? >>> >> > > > > >> >>> >> > > > > >> Cheers, >>> >> > > > > >> Aljoscha >>> >> > > > > >> >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> |
+1: If we opted for that we could easily use the same input for streaming
as well - we've been facing the same issue recently. On Mon, Sep 8, 2014 at 10:35 AM, Aljoscha Krettek <[hidden email]> wrote: > Hi, > on second thought. Maybe we should just change all the example input > data to strings and use CSV input formats in all the examples. What do > you think? > > Cheers, > Aljoscha > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> > wrote: > > Hi, > > yes it's unfortunate that the data types are incompatible. I'm afraid > > you have to to what you proposed: move the data to a static field and > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > > nice, but copying would duplicate the data and make it easier for it > > to go out of sync in the Java and Scala versions. > > > > What do the others think? This will probably occur in all the examples. > > > > Cheers, > > Aljoscha > > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > > <[hidden email]> wrote: > >> Hey, > >> > >> I have ported the Connected Components example, but I am not sure how to > >> reuse the example input data from java-examples. > >> In the ConnectedComponentsData class, the vertices and edges data are > >> produced by the methods getDefaultVertexDataSet() > >> and getDefaultEdgeDataSet(), which take > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > >> > >> One way is to provide public static fields (like in the WordCountData > >> class), but this introduces a conversion > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > >> java.lang.Long to scala.Long and I guess this is an unnecessary > complexity > >> for an example (?). > >> Another way is, of course, to copy the example data in the Scala > example. > >> > >> Am I missing something here? > >> > >> Thanks! > >> > >> Cheers, > >> V. > >> > >> > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> > wrote: > >> > >>> Alright, I updated my repo: > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >>> > >>> This now has a working WordCount example. It's pretty much a copy of > >>> the Java example with some fixups for the syntax and lambda functions. > >>> You'll also notice that I added the java-examples as a dependency for > >>> the scala-examples. I did this to reuse the example input data. > >>> > >>> When you ported a program you can do a pull request against my repo > >>> and I will collect the examples. > >>> > >>> Happy coding. :D > >>> > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> > >>> wrote: > >>> > +1 > >>> > > >>> > ComputeEdgeDegrees for me! > >>> > > >>> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > >>> [hidden email]> > >>> > wrote: > >>> > > >>> >> +1 > >>> >> > >>> >> BatchGradientDescent for me :) > >>> >> > >>> >> > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > [hidden email]> > >>> >> wrote: > >>> >> > >>> >> > +1 > >>> >> > > >>> >> > I go for WebLogAnalysis. > >>> >> > > >>> >> > My experience with Scala consists of going through a tutorial so > this > >>> >> will > >>> >> > be a good stress test both for me and the new API :-) > >>> >> > > >>> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >>> >> > [hidden email]> > >>> >> > wrote: > >>> >> > > >>> >> > > +1 for having other people implement the examples! > >>> >> > > Connected Components and Kmeans for me :) > >>> >> > > > >>> >> > > -V. > >>> >> > > > >>> >> > > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> > >>> wrote: > >>> >> > > > >>> >> > > > I go for TriangleEnumeration and PageRank. > >>> >> > > > > >>> >> > > > Let's also do the examples similar to the Java examples: > >>> >> > > > - running out-of-the-box without parameters > >>> >> > > > - parameters for external data > >>> >> > > > - follow a similar code structure > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > [hidden email] > >>> >: > >>> >> > > > > >>> >> > > > > Will do, then people can reserve their favourite examples > here. > >>> >> > > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > >>> [hidden email]> > >>> >> > > > wrote: > >>> >> > > > > > Hi, > >>> >> > > > > > > >>> >> > > > > > I think having examples implemented by different people > >>> proved to > >>> >> > be > >>> >> > > > > > valuable in the past. > >>> >> > > > > > I'd help with two or three examples. > >>> >> > > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one such > as > >>> >> > > WordCount. > >>> >> > > > > > > >>> >> > > > > > Fabian > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > >>> [hidden email] > >>> >> >: > >>> >> > > > > > > >>> >> > > > > >> Hi, > >>> >> > > > > >> I have a working rewrite of the Scala API here: > >>> >> > > > > >> > >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >>> >> > > > > >> > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and > port > >>> the > >>> >> > > > > >> examples. Do you think it makes sense to let other people > >>> port > >>> >> the > >>> >> > > > > >> examples, so that someone else uses it and maybe notices > some > >>> >> > quirks > >>> >> > > > > >> in the API? > >>> >> > > > > >> > >>> >> > > > > >> Cheers, > >>> >> > > > > >> Aljoscha > >>> >> > > > > >> > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> > |
In reply to this post by Aljoscha Krettek-2
Yeah, I ran into the same problem...
+1 for using Strings and parsing them, but using the CSVFormat won't work because this is based on a FileInputFormat. So we would need to parse the Strings manually... 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: > Hi, > on second thought. Maybe we should just change all the example input > data to strings and use CSV input formats in all the examples. What do > you think? > > Cheers, > Aljoscha > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> > wrote: > > Hi, > > yes it's unfortunate that the data types are incompatible. I'm afraid > > you have to to what you proposed: move the data to a static field and > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > > nice, but copying would duplicate the data and make it easier for it > > to go out of sync in the Java and Scala versions. > > > > What do the others think? This will probably occur in all the examples. > > > > Cheers, > > Aljoscha > > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > > <[hidden email]> wrote: > >> Hey, > >> > >> I have ported the Connected Components example, but I am not sure how to > >> reuse the example input data from java-examples. > >> In the ConnectedComponentsData class, the vertices and edges data are > >> produced by the methods getDefaultVertexDataSet() > >> and getDefaultEdgeDataSet(), which take > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > >> > >> One way is to provide public static fields (like in the WordCountData > >> class), but this introduces a conversion > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > >> java.lang.Long to scala.Long and I guess this is an unnecessary > complexity > >> for an example (?). > >> Another way is, of course, to copy the example data in the Scala > example. > >> > >> Am I missing something here? > >> > >> Thanks! > >> > >> Cheers, > >> V. > >> > >> > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> > wrote: > >> > >>> Alright, I updated my repo: > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >>> > >>> This now has a working WordCount example. It's pretty much a copy of > >>> the Java example with some fixups for the syntax and lambda functions. > >>> You'll also notice that I added the java-examples as a dependency for > >>> the scala-examples. I did this to reuse the example input data. > >>> > >>> When you ported a program you can do a pull request against my repo > >>> and I will collect the examples. > >>> > >>> Happy coding. :D > >>> > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email]> > >>> wrote: > >>> > +1 > >>> > > >>> > ComputeEdgeDegrees for me! > >>> > > >>> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > >>> [hidden email]> > >>> > wrote: > >>> > > >>> >> +1 > >>> >> > >>> >> BatchGradientDescent for me :) > >>> >> > >>> >> > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > [hidden email]> > >>> >> wrote: > >>> >> > >>> >> > +1 > >>> >> > > >>> >> > I go for WebLogAnalysis. > >>> >> > > >>> >> > My experience with Scala consists of going through a tutorial so > this > >>> >> will > >>> >> > be a good stress test both for me and the new API :-) > >>> >> > > >>> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >>> >> > [hidden email]> > >>> >> > wrote: > >>> >> > > >>> >> > > +1 for having other people implement the examples! > >>> >> > > Connected Components and Kmeans for me :) > >>> >> > > > >>> >> > > -V. > >>> >> > > > >>> >> > > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> > >>> wrote: > >>> >> > > > >>> >> > > > I go for TriangleEnumeration and PageRank. > >>> >> > > > > >>> >> > > > Let's also do the examples similar to the Java examples: > >>> >> > > > - running out-of-the-box without parameters > >>> >> > > > - parameters for external data > >>> >> > > > - follow a similar code structure > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > [hidden email] > >>> >: > >>> >> > > > > >>> >> > > > > Will do, then people can reserve their favourite examples > here. > >>> >> > > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > >>> [hidden email]> > >>> >> > > > wrote: > >>> >> > > > > > Hi, > >>> >> > > > > > > >>> >> > > > > > I think having examples implemented by different people > >>> proved to > >>> >> > be > >>> >> > > > > > valuable in the past. > >>> >> > > > > > I'd help with two or three examples. > >>> >> > > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one such > as > >>> >> > > WordCount. > >>> >> > > > > > > >>> >> > > > > > Fabian > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > >>> [hidden email] > >>> >> >: > >>> >> > > > > > > >>> >> > > > > >> Hi, > >>> >> > > > > >> I have a working rewrite of the Scala API here: > >>> >> > > > > >> > >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >>> >> > > > > >> > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and > port > >>> the > >>> >> > > > > >> examples. Do you think it makes sense to let other people > >>> port > >>> >> the > >>> >> > > > > >> examples, so that someone else uses it and maybe notices > some > >>> >> > quirks > >>> >> > > > > >> in the API? > >>> >> > > > > >> > >>> >> > > > > >> Cheers, > >>> >> > > > > >> Aljoscha > >>> >> > > > > >> > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> > |
Instead of Strings, Object[][] would work as well. That is a generic
representation of a Tuple. Alternatively, they could be stored as Java or Scala Tuples, with a generic utility method to convert between the two. On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> wrote: > Yeah, I ran into the same problem... > > +1 for using Strings and parsing them, but using the CSVFormat won't work > because this is based on a FileInputFormat. > So we would need to parse the Strings manually... > > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > Hi, > > on second thought. Maybe we should just change all the example input > > data to strings and use CSV input formats in all the examples. What do > > you think? > > > > Cheers, > > Aljoscha > > > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> > > wrote: > > > Hi, > > > yes it's unfortunate that the data types are incompatible. I'm afraid > > > you have to to what you proposed: move the data to a static field and > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > > > nice, but copying would duplicate the data and make it easier for it > > > to go out of sync in the Java and Scala versions. > > > > > > What do the others think? This will probably occur in all the examples. > > > > > > Cheers, > > > Aljoscha > > > > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > > > <[hidden email]> wrote: > > >> Hey, > > >> > > >> I have ported the Connected Components example, but I am not sure how > to > > >> reuse the example input data from java-examples. > > >> In the ConnectedComponentsData class, the vertices and edges data are > > >> produced by the methods getDefaultVertexDataSet() > > >> and getDefaultEdgeDataSet(), which take > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > > >> > > >> One way is to provide public static fields (like in the WordCountData > > >> class), but this introduces a conversion > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > > >> java.lang.Long to scala.Long and I guess this is an unnecessary > > complexity > > >> for an example (?). > > >> Another way is, of course, to copy the example data in the Scala > > example. > > >> > > >> Am I missing something here? > > >> > > >> Thanks! > > >> > > >> Cheers, > > >> V. > > >> > > >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> > > wrote: > > >> > > >>> Alright, I updated my repo: > > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > >>> > > >>> This now has a working WordCount example. It's pretty much a copy of > > >>> the Java example with some fixups for the syntax and lambda > functions. > > >>> You'll also notice that I added the java-examples as a dependency for > > >>> the scala-examples. I did this to reuse the example input data. > > >>> > > >>> When you ported a program you can do a pull request against my repo > > >>> and I will collect the examples. > > >>> > > >>> Happy coding. :D > > >>> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <[hidden email] > > > > >>> wrote: > > >>> > +1 > > >>> > > > >>> > ComputeEdgeDegrees for me! > > >>> > > > >>> > > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > > >>> [hidden email]> > > >>> > wrote: > > >>> > > > >>> >> +1 > > >>> >> > > >>> >> BatchGradientDescent for me :) > > >>> >> > > >>> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > > [hidden email]> > > >>> >> wrote: > > >>> >> > > >>> >> > +1 > > >>> >> > > > >>> >> > I go for WebLogAnalysis. > > >>> >> > > > >>> >> > My experience with Scala consists of going through a tutorial so > > this > > >>> >> will > > >>> >> > be a good stress test both for me and the new API :-) > > >>> >> > > > >>> >> > > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > > >>> >> > [hidden email]> > > >>> >> > wrote: > > >>> >> > > > >>> >> > > +1 for having other people implement the examples! > > >>> >> > > Connected Components and Kmeans for me :) > > >>> >> > > > > >>> >> > > -V. > > >>> >> > > > > >>> >> > > > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <[hidden email]> > > >>> wrote: > > >>> >> > > > > >>> >> > > > I go for TriangleEnumeration and PageRank. > > >>> >> > > > > > >>> >> > > > Let's also do the examples similar to the Java examples: > > >>> >> > > > - running out-of-the-box without parameters > > >>> >> > > > - parameters for external data > > >>> >> > > > - follow a similar code structure > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > > [hidden email] > > >>> >: > > >>> >> > > > > > >>> >> > > > > Will do, then people can reserve their favourite examples > > here. > > >>> >> > > > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > > >>> [hidden email]> > > >>> >> > > > wrote: > > >>> >> > > > > > Hi, > > >>> >> > > > > > > > >>> >> > > > > > I think having examples implemented by different people > > >>> proved to > > >>> >> > be > > >>> >> > > > > > valuable in the past. > > >>> >> > > > > > I'd help with two or three examples. > > >>> >> > > > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one > such > > as > > >>> >> > > WordCount. > > >>> >> > > > > > > > >>> >> > > > > > Fabian > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > > >>> [hidden email] > > >>> >> >: > > >>> >> > > > > > > > >>> >> > > > > >> Hi, > > >>> >> > > > > >> I have a working rewrite of the Scala API here: > > >>> >> > > > > >> > > >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > >>> >> > > > > >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and > > port > > >>> the > > >>> >> > > > > >> examples. Do you think it makes sense to let other > people > > >>> port > > >>> >> the > > >>> >> > > > > >> examples, so that someone else uses it and maybe > notices > > some > > >>> >> > quirks > > >>> >> > > > > >> in the API? > > >>> >> > > > > >> > > >>> >> > > > > >> Cheers, > > >>> >> > > > > >> Aljoscha > > >>> >> > > > > >> > > >>> >> > > > > > > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> > > > |
Aside from the DataSet issue, I also found an inconsistency with the Java
API. In Java join is done as: ds1.join(ds2).where(...).equalTo(...) where in the current Scala this is: ds1.join(d2).where(...).isEqualTo(...) isEqualTo() should be renamed to equalTo(), IMO. Also, join (+cross and coGroup?) lacks the with() method because "with" is a keyword in Scala. Should be offer something similar for Scala or go with map() on Tuple2(left, right)? 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>: > Instead of Strings, Object[][] would work as well. That is a generic > representation of a Tuple. > > Alternatively, they could be stored as Java or Scala Tuples, with a generic > utility method to convert between the two. > > On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> wrote: > > > Yeah, I ran into the same problem... > > > > +1 for using Strings and parsing them, but using the CSVFormat won't > work > > because this is based on a FileInputFormat. > > So we would need to parse the Strings manually... > > > > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > > > Hi, > > > on second thought. Maybe we should just change all the example input > > > data to strings and use CSV input formats in all the examples. What do > > > you think? > > > > > > Cheers, > > > Aljoscha > > > > > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> > > > wrote: > > > > Hi, > > > > yes it's unfortunate that the data types are incompatible. I'm afraid > > > > you have to to what you proposed: move the data to a static field and > > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > > > > nice, but copying would duplicate the data and make it easier for it > > > > to go out of sync in the Java and Scala versions. > > > > > > > > What do the others think? This will probably occur in all the > examples. > > > > > > > > Cheers, > > > > Aljoscha > > > > > > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > > > > <[hidden email]> wrote: > > > >> Hey, > > > >> > > > >> I have ported the Connected Components example, but I am not sure > how > > to > > > >> reuse the example input data from java-examples. > > > >> In the ConnectedComponentsData class, the vertices and edges data > are > > > >> produced by the methods getDefaultVertexDataSet() > > > >> and getDefaultEdgeDataSet(), which take > > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > > > >> > > > >> One way is to provide public static fields (like in the > WordCountData > > > >> class), but this introduces a conversion > > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > > > >> java.lang.Long to scala.Long and I guess this is an unnecessary > > > complexity > > > >> for an example (?). > > > >> Another way is, of course, to copy the example data in the Scala > > > example. > > > >> > > > >> Am I missing something here? > > > >> > > > >> Thanks! > > > >> > > > >> Cheers, > > > >> V. > > > >> > > > >> > > > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> > > > wrote: > > > >> > > > >>> Alright, I updated my repo: > > > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > >>> > > > >>> This now has a working WordCount example. It's pretty much a copy > of > > > >>> the Java example with some fixups for the syntax and lambda > > functions. > > > >>> You'll also notice that I added the java-examples as a dependency > for > > > >>> the scala-examples. I did this to reuse the example input data. > > > >>> > > > >>> When you ported a program you can do a pull request against my repo > > > >>> and I will collect the examples. > > > >>> > > > >>> Happy coding. :D > > > >>> > > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < > [hidden email] > > > > > > >>> wrote: > > > >>> > +1 > > > >>> > > > > >>> > ComputeEdgeDegrees for me! > > > >>> > > > > >>> > > > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > > > >>> [hidden email]> > > > >>> > wrote: > > > >>> > > > > >>> >> +1 > > > >>> >> > > > >>> >> BatchGradientDescent for me :) > > > >>> >> > > > >>> >> > > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > > > [hidden email]> > > > >>> >> wrote: > > > >>> >> > > > >>> >> > +1 > > > >>> >> > > > > >>> >> > I go for WebLogAnalysis. > > > >>> >> > > > > >>> >> > My experience with Scala consists of going through a tutorial > so > > > this > > > >>> >> will > > > >>> >> > be a good stress test both for me and the new API :-) > > > >>> >> > > > > >>> >> > > > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > > > >>> >> > [hidden email]> > > > >>> >> > wrote: > > > >>> >> > > > > >>> >> > > +1 for having other people implement the examples! > > > >>> >> > > Connected Components and Kmeans for me :) > > > >>> >> > > > > > >>> >> > > -V. > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < > [hidden email]> > > > >>> wrote: > > > >>> >> > > > > > >>> >> > > > I go for TriangleEnumeration and PageRank. > > > >>> >> > > > > > > >>> >> > > > Let's also do the examples similar to the Java examples: > > > >>> >> > > > - running out-of-the-box without parameters > > > >>> >> > > > - parameters for external data > > > >>> >> > > > - follow a similar code structure > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > > > [hidden email] > > > >>> >: > > > >>> >> > > > > > > >>> >> > > > > Will do, then people can reserve their favourite > examples > > > here. > > > >>> >> > > > > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > > > >>> [hidden email]> > > > >>> >> > > > wrote: > > > >>> >> > > > > > Hi, > > > >>> >> > > > > > > > > >>> >> > > > > > I think having examples implemented by different > people > > > >>> proved to > > > >>> >> > be > > > >>> >> > > > > > valuable in the past. > > > >>> >> > > > > > I'd help with two or three examples. > > > >>> >> > > > > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one > > such > > > as > > > >>> >> > > WordCount. > > > >>> >> > > > > > > > > >>> >> > > > > > Fabian > > > >>> >> > > > > > > > > >>> >> > > > > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > > > >>> [hidden email] > > > >>> >> >: > > > >>> >> > > > > > > > > >>> >> > > > > >> Hi, > > > >>> >> > > > > >> I have a working rewrite of the Scala API here: > > > >>> >> > > > > >> > > > >>> >> > https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > >>> >> > > > > >> > > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and > > > port > > > >>> the > > > >>> >> > > > > >> examples. Do you think it makes sense to let other > > people > > > >>> port > > > >>> >> the > > > >>> >> > > > > >> examples, so that someone else uses it and maybe > > notices > > > some > > > >>> >> > quirks > > > >>> >> > > > > >> in the API? > > > >>> >> > > > > >> > > > >>> >> > > > > >> Cheers, > > > >>> >> > > > > >> Aljoscha > > > >>> >> > > > > >> > > > >>> >> > > > > > > > >>> >> > > > > > > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> > > > > > > |
Ok people, executive decision. :D
Please look at KMeansData.java and KMeans.scala. I'm storing the data in multi-dimensional object arrays and then converting it to the required Java or Scala objects. Also, I changed isEqualTo to equalTo to make it consistent with the Java API. Regarding Join (and coGroup). There is no need for a keyword, you can just write: left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) } On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]> wrote: > Aside from the DataSet issue, I also found an inconsistency with the Java > API. In Java join is done as: > > ds1.join(ds2).where(...).equalTo(...) > > where in the current Scala this is: > > ds1.join(d2).where(...).isEqualTo(...) > > isEqualTo() should be renamed to equalTo(), IMO. > Also, join (+cross and coGroup?) lacks the with() method because "with" is > a keyword in Scala. Should be offer something similar for Scala or go with > map() on Tuple2(left, right)? > > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>: > >> Instead of Strings, Object[][] would work as well. That is a generic >> representation of a Tuple. >> >> Alternatively, they could be stored as Java or Scala Tuples, with a generic >> utility method to convert between the two. >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> wrote: >> >> > Yeah, I ran into the same problem... >> > >> > +1 for using Strings and parsing them, but using the CSVFormat won't >> work >> > because this is based on a FileInputFormat. >> > So we would need to parse the Strings manually... >> > >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: >> > >> > > Hi, >> > > on second thought. Maybe we should just change all the example input >> > > data to strings and use CSV input formats in all the examples. What do >> > > you think? >> > > >> > > Cheers, >> > > Aljoscha >> > > >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[hidden email]> >> > > wrote: >> > > > Hi, >> > > > yes it's unfortunate that the data types are incompatible. I'm afraid >> > > > you have to to what you proposed: move the data to a static field and >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not >> > > > nice, but copying would duplicate the data and make it easier for it >> > > > to go out of sync in the Java and Scala versions. >> > > > >> > > > What do the others think? This will probably occur in all the >> examples. >> > > > >> > > > Cheers, >> > > > Aljoscha >> > > > >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >> > > > <[hidden email]> wrote: >> > > >> Hey, >> > > >> >> > > >> I have ported the Connected Components example, but I am not sure >> how >> > to >> > > >> reuse the example input data from java-examples. >> > > >> In the ConnectedComponentsData class, the vertices and edges data >> are >> > > >> produced by the methods getDefaultVertexDataSet() >> > > >> and getDefaultEdgeDataSet(), which take >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. >> > > >> >> > > >> One way is to provide public static fields (like in the >> WordCountData >> > > >> class), but this introduces a conversion >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from >> > > >> java.lang.Long to scala.Long and I guess this is an unnecessary >> > > complexity >> > > >> for an example (?). >> > > >> Another way is, of course, to copy the example data in the Scala >> > > example. >> > > >> >> > > >> Am I missing something here? >> > > >> >> > > >> Thanks! >> > > >> >> > > >> Cheers, >> > > >> V. >> > > >> >> > > >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email]> >> > > wrote: >> > > >> >> > > >>> Alright, I updated my repo: >> > > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> > > >>> >> > > >>> This now has a working WordCount example. It's pretty much a copy >> of >> > > >>> the Java example with some fixups for the syntax and lambda >> > functions. >> > > >>> You'll also notice that I added the java-examples as a dependency >> for >> > > >>> the scala-examples. I did this to reuse the example input data. >> > > >>> >> > > >>> When you ported a program you can do a pull request against my repo >> > > >>> and I will collect the examples. >> > > >>> >> > > >>> Happy coding. :D >> > > >>> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >> [hidden email] >> > > >> > > >>> wrote: >> > > >>> > +1 >> > > >>> > >> > > >>> > ComputeEdgeDegrees for me! >> > > >>> > >> > > >>> > >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >> > > >>> [hidden email]> >> > > >>> > wrote: >> > > >>> > >> > > >>> >> +1 >> > > >>> >> >> > > >>> >> BatchGradientDescent for me :) >> > > >>> >> >> > > >>> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >> > > [hidden email]> >> > > >>> >> wrote: >> > > >>> >> >> > > >>> >> > +1 >> > > >>> >> > >> > > >>> >> > I go for WebLogAnalysis. >> > > >>> >> > >> > > >>> >> > My experience with Scala consists of going through a tutorial >> so >> > > this >> > > >>> >> will >> > > >>> >> > be a good stress test both for me and the new API :-) >> > > >>> >> > >> > > >>> >> > >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >> > > >>> >> > [hidden email]> >> > > >>> >> > wrote: >> > > >>> >> > >> > > >>> >> > > +1 for having other people implement the examples! >> > > >>> >> > > Connected Components and Kmeans for me :) >> > > >>> >> > > >> > > >>> >> > > -V. >> > > >>> >> > > >> > > >>> >> > > >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >> [hidden email]> >> > > >>> wrote: >> > > >>> >> > > >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >> > > >>> >> > > > >> > > >>> >> > > > Let's also do the examples similar to the Java examples: >> > > >>> >> > > > - running out-of-the-box without parameters >> > > >>> >> > > > - parameters for external data >> > > >>> >> > > > - follow a similar code structure >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >> > > [hidden email] >> > > >>> >: >> > > >>> >> > > > >> > > >>> >> > > > > Will do, then people can reserve their favourite >> examples >> > > here. >> > > >>> >> > > > > >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < >> > > >>> [hidden email]> >> > > >>> >> > > > wrote: >> > > >>> >> > > > > > Hi, >> > > >>> >> > > > > > >> > > >>> >> > > > > > I think having examples implemented by different >> people >> > > >>> proved to >> > > >>> >> > be >> > > >>> >> > > > > > valuable in the past. >> > > >>> >> > > > > > I'd help with two or three examples. >> > > >>> >> > > > > > >> > > >>> >> > > > > > It might be helpful if you'd port a simple first one >> > such >> > > as >> > > >>> >> > > WordCount. >> > > >>> >> > > > > > >> > > >>> >> > > > > > Fabian >> > > >>> >> > > > > > >> > > >>> >> > > > > > >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < >> > > >>> [hidden email] >> > > >>> >> >: >> > > >>> >> > > > > > >> > > >>> >> > > > > >> Hi, >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here: >> > > >>> >> > > > > >> >> > > >>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and >> > > port >> > > >>> the >> > > >>> >> > > > > >> examples. Do you think it makes sense to let other >> > people >> > > >>> port >> > > >>> >> the >> > > >>> >> > > > > >> examples, so that someone else uses it and maybe >> > notices >> > > some >> > > >>> >> > quirks >> > > >>> >> > > > > >> in the API? >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> Cheers, >> > > >>> >> > > > > >> Aljoscha >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> > > >>> >> > > > >> > > >>> >> > > >> > > >>> >> > >> > > >>> >> >> > > >>> >> > > >> > >> |
Alright, will do.
Thanks! 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>: > Ok people, executive decision. :D > > Please look at KMeansData.java and KMeans.scala. I'm storing the data > in multi-dimensional object arrays and then converting it to the > required Java or Scala objects. > > Also, I changed isEqualTo to equalTo to make it consistent with the Java > API. > > Regarding Join (and coGroup). There is no need for a keyword, you can > just write: > > left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) } > > On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]> wrote: > > Aside from the DataSet issue, I also found an inconsistency with the Java > > API. In Java join is done as: > > > > ds1.join(ds2).where(...).equalTo(...) > > > > where in the current Scala this is: > > > > ds1.join(d2).where(...).isEqualTo(...) > > > > isEqualTo() should be renamed to equalTo(), IMO. > > Also, join (+cross and coGroup?) lacks the with() method because "with" > is > > a keyword in Scala. Should be offer something similar for Scala or go > with > > map() on Tuple2(left, right)? > > > > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>: > > > >> Instead of Strings, Object[][] would work as well. That is a generic > >> representation of a Tuple. > >> > >> Alternatively, they could be stored as Java or Scala Tuples, with a > generic > >> utility method to convert between the two. > >> > >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> > wrote: > >> > >> > Yeah, I ran into the same problem... > >> > > >> > +1 for using Strings and parsing them, but using the CSVFormat won't > >> work > >> > because this is based on a FileInputFormat. > >> > So we would need to parse the Strings manually... > >> > > >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: > >> > > >> > > Hi, > >> > > on second thought. Maybe we should just change all the example input > >> > > data to strings and use CSV input formats in all the examples. What > do > >> > > you think? > >> > > > >> > > Cheers, > >> > > Aljoscha > >> > > > >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < > [hidden email]> > >> > > wrote: > >> > > > Hi, > >> > > > yes it's unfortunate that the data types are incompatible. I'm > afraid > >> > > > you have to to what you proposed: move the data to a static field > and > >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's > not > >> > > > nice, but copying would duplicate the data and make it easier for > it > >> > > > to go out of sync in the Java and Scala versions. > >> > > > > >> > > > What do the others think? This will probably occur in all the > >> examples. > >> > > > > >> > > > Cheers, > >> > > > Aljoscha > >> > > > > >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > >> > > > <[hidden email]> wrote: > >> > > >> Hey, > >> > > >> > >> > > >> I have ported the Connected Components example, but I am not sure > >> how > >> > to > >> > > >> reuse the example input data from java-examples. > >> > > >> In the ConnectedComponentsData class, the vertices and edges data > >> are > >> > > >> produced by the methods getDefaultVertexDataSet() > >> > > >> and getDefaultEdgeDataSet(), which take > >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > >> > > >> > >> > > >> One way is to provide public static fields (like in the > >> WordCountData > >> > > >> class), but this introduces a conversion > >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and > from > >> > > >> java.lang.Long to scala.Long and I guess this is an unnecessary > >> > > complexity > >> > > >> for an example (?). > >> > > >> Another way is, of course, to copy the example data in the Scala > >> > > example. > >> > > >> > >> > > >> Am I missing something here? > >> > > >> > >> > > >> Thanks! > >> > > >> > >> > > >> Cheers, > >> > > >> V. > >> > > >> > >> > > >> > >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email] > > > >> > > wrote: > >> > > >> > >> > > >>> Alright, I updated my repo: > >> > > >>> > https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> > > >>> > >> > > >>> This now has a working WordCount example. It's pretty much a > copy > >> of > >> > > >>> the Java example with some fixups for the syntax and lambda > >> > functions. > >> > > >>> You'll also notice that I added the java-examples as a > dependency > >> for > >> > > >>> the scala-examples. I did this to reuse the example input data. > >> > > >>> > >> > > >>> When you ported a program you can do a pull request against my > repo > >> > > >>> and I will collect the examples. > >> > > >>> > >> > > >>> Happy coding. :D > >> > > >>> > >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < > >> [hidden email] > >> > > > >> > > >>> wrote: > >> > > >>> > +1 > >> > > >>> > > >> > > >>> > ComputeEdgeDegrees for me! > >> > > >>> > > >> > > >>> > > >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > >> > > >>> [hidden email]> > >> > > >>> > wrote: > >> > > >>> > > >> > > >>> >> +1 > >> > > >>> >> > >> > > >>> >> BatchGradientDescent for me :) > >> > > >>> >> > >> > > >>> >> > >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > >> > > [hidden email]> > >> > > >>> >> wrote: > >> > > >>> >> > >> > > >>> >> > +1 > >> > > >>> >> > > >> > > >>> >> > I go for WebLogAnalysis. > >> > > >>> >> > > >> > > >>> >> > My experience with Scala consists of going through a > tutorial > >> so > >> > > this > >> > > >>> >> will > >> > > >>> >> > be a good stress test both for me and the new API :-) > >> > > >>> >> > > >> > > >>> >> > > >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >> > > >>> >> > [hidden email]> > >> > > >>> >> > wrote: > >> > > >>> >> > > >> > > >>> >> > > +1 for having other people implement the examples! > >> > > >>> >> > > Connected Components and Kmeans for me :) > >> > > >>> >> > > > >> > > >>> >> > > -V. > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < > >> [hidden email]> > >> > > >>> wrote: > >> > > >>> >> > > > >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. > >> > > >>> >> > > > > >> > > >>> >> > > > Let's also do the examples similar to the Java > examples: > >> > > >>> >> > > > - running out-of-the-box without parameters > >> > > >>> >> > > > - parameters for external data > >> > > >>> >> > > > - follow a similar code structure > >> > > >>> >> > > > > >> > > >>> >> > > > > >> > > >>> >> > > > > >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > >> > > [hidden email] > >> > > >>> >: > >> > > >>> >> > > > > >> > > >>> >> > > > > Will do, then people can reserve their favourite > >> examples > >> > > here. > >> > > >>> >> > > > > > >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > >> > > >>> [hidden email]> > >> > > >>> >> > > > wrote: > >> > > >>> >> > > > > > Hi, > >> > > >>> >> > > > > > > >> > > >>> >> > > > > > I think having examples implemented by different > >> people > >> > > >>> proved to > >> > > >>> >> > be > >> > > >>> >> > > > > > valuable in the past. > >> > > >>> >> > > > > > I'd help with two or three examples. > >> > > >>> >> > > > > > > >> > > >>> >> > > > > > It might be helpful if you'd port a simple first > one > >> > such > >> > > as > >> > > >>> >> > > WordCount. > >> > > >>> >> > > > > > > >> > > >>> >> > > > > > Fabian > >> > > >>> >> > > > > > > >> > > >>> >> > > > > > > >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > >> > > >>> [hidden email] > >> > > >>> >> >: > >> > > >>> >> > > > > > > >> > > >>> >> > > > > >> Hi, > >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here: > >> > > >>> >> > > > > >> > >> > > >>> >> > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> > > >>> >> > > > > >> > >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests > and > >> > > port > >> > > >>> the > >> > > >>> >> > > > > >> examples. Do you think it makes sense to let other > >> > people > >> > > >>> port > >> > > >>> >> the > >> > > >>> >> > > > > >> examples, so that someone else uses it and maybe > >> > notices > >> > > some > >> > > >>> >> > quirks > >> > > >>> >> > > > > >> in the API? > >> > > >>> >> > > > > >> > >> > > >>> >> > > > > >> Cheers, > >> > > >>> >> > > > > >> Aljoscha > >> > > >>> >> > > > > >> > >> > > >>> >> > > > > > >> > > >>> >> > > > > >> > > >>> >> > > > >> > > >>> >> > > >> > > >>> >> > >> > > >>> > >> > > > >> > > >> > |
I added the ConnectedComponents Example from Vasia.
Keep 'em coming, people. :D On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]> wrote: > Alright, will do. > Thanks! > > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>: > >> Ok people, executive decision. :D >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data >> in multi-dimensional object arrays and then converting it to the >> required Java or Scala objects. >> >> Also, I changed isEqualTo to equalTo to make it consistent with the Java >> API. >> >> Regarding Join (and coGroup). There is no need for a keyword, you can >> just write: >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) } >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]> wrote: >> > Aside from the DataSet issue, I also found an inconsistency with the Java >> > API. In Java join is done as: >> > >> > ds1.join(ds2).where(...).equalTo(...) >> > >> > where in the current Scala this is: >> > >> > ds1.join(d2).where(...).isEqualTo(...) >> > >> > isEqualTo() should be renamed to equalTo(), IMO. >> > Also, join (+cross and coGroup?) lacks the with() method because "with" >> is >> > a keyword in Scala. Should be offer something similar for Scala or go >> with >> > map() on Tuple2(left, right)? >> > >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>: >> > >> >> Instead of Strings, Object[][] would work as well. That is a generic >> >> representation of a Tuple. >> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, with a >> generic >> >> utility method to convert between the two. >> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> >> wrote: >> >> >> >> > Yeah, I ran into the same problem... >> >> > >> >> > +1 for using Strings and parsing them, but using the CSVFormat won't >> >> work >> >> > because this is based on a FileInputFormat. >> >> > So we would need to parse the Strings manually... >> >> > >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: >> >> > >> >> > > Hi, >> >> > > on second thought. Maybe we should just change all the example input >> >> > > data to strings and use CSV input formats in all the examples. What >> do >> >> > > you think? >> >> > > >> >> > > Cheers, >> >> > > Aljoscha >> >> > > >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >> [hidden email]> >> >> > > wrote: >> >> > > > Hi, >> >> > > > yes it's unfortunate that the data types are incompatible. I'm >> afraid >> >> > > > you have to to what you proposed: move the data to a static field >> and >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's >> not >> >> > > > nice, but copying would duplicate the data and make it easier for >> it >> >> > > > to go out of sync in the Java and Scala versions. >> >> > > > >> >> > > > What do the others think? This will probably occur in all the >> >> examples. >> >> > > > >> >> > > > Cheers, >> >> > > > Aljoscha >> >> > > > >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >> >> > > > <[hidden email]> wrote: >> >> > > >> Hey, >> >> > > >> >> >> > > >> I have ported the Connected Components example, but I am not sure >> >> how >> >> > to >> >> > > >> reuse the example input data from java-examples. >> >> > > >> In the ConnectedComponentsData class, the vertices and edges data >> >> are >> >> > > >> produced by the methods getDefaultVertexDataSet() >> >> > > >> and getDefaultEdgeDataSet(), which take >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. >> >> > > >> >> >> > > >> One way is to provide public static fields (like in the >> >> WordCountData >> >> > > >> class), but this introduces a conversion >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and >> from >> >> > > >> java.lang.Long to scala.Long and I guess this is an unnecessary >> >> > > complexity >> >> > > >> for an example (?). >> >> > > >> Another way is, of course, to copy the example data in the Scala >> >> > > example. >> >> > > >> >> >> > > >> Am I missing something here? >> >> > > >> >> >> > > >> Thanks! >> >> > > >> >> >> > > >> Cheers, >> >> > > >> V. >> >> > > >> >> >> > > >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <[hidden email] >> > >> >> > > wrote: >> >> > > >> >> >> > > >>> Alright, I updated my repo: >> >> > > >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> > > >>> >> >> > > >>> This now has a working WordCount example. It's pretty much a >> copy >> >> of >> >> > > >>> the Java example with some fixups for the syntax and lambda >> >> > functions. >> >> > > >>> You'll also notice that I added the java-examples as a >> dependency >> >> for >> >> > > >>> the scala-examples. I did this to reuse the example input data. >> >> > > >>> >> >> > > >>> When you ported a program you can do a pull request against my >> repo >> >> > > >>> and I will collect the examples. >> >> > > >>> >> >> > > >>> Happy coding. :D >> >> > > >>> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >> >> [hidden email] >> >> > > >> >> > > >>> wrote: >> >> > > >>> > +1 >> >> > > >>> > >> >> > > >>> > ComputeEdgeDegrees for me! >> >> > > >>> > >> >> > > >>> > >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >> >> > > >>> [hidden email]> >> >> > > >>> > wrote: >> >> > > >>> > >> >> > > >>> >> +1 >> >> > > >>> >> >> >> > > >>> >> BatchGradientDescent for me :) >> >> > > >>> >> >> >> > > >>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >> >> > > [hidden email]> >> >> > > >>> >> wrote: >> >> > > >>> >> >> >> > > >>> >> > +1 >> >> > > >>> >> > >> >> > > >>> >> > I go for WebLogAnalysis. >> >> > > >>> >> > >> >> > > >>> >> > My experience with Scala consists of going through a >> tutorial >> >> so >> >> > > this >> >> > > >>> >> will >> >> > > >>> >> > be a good stress test both for me and the new API :-) >> >> > > >>> >> > >> >> > > >>> >> > >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >> >> > > >>> >> > [hidden email]> >> >> > > >>> >> > wrote: >> >> > > >>> >> > >> >> > > >>> >> > > +1 for having other people implement the examples! >> >> > > >>> >> > > Connected Components and Kmeans for me :) >> >> > > >>> >> > > >> >> > > >>> >> > > -V. >> >> > > >>> >> > > >> >> > > >>> >> > > >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >> >> [hidden email]> >> >> > > >>> wrote: >> >> > > >>> >> > > >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >> >> > > >>> >> > > > >> >> > > >>> >> > > > Let's also do the examples similar to the Java >> examples: >> >> > > >>> >> > > > - running out-of-the-box without parameters >> >> > > >>> >> > > > - parameters for external data >> >> > > >>> >> > > > - follow a similar code structure >> >> > > >>> >> > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >> >> > > [hidden email] >> >> > > >>> >: >> >> > > >>> >> > > > >> >> > > >>> >> > > > > Will do, then people can reserve their favourite >> >> examples >> >> > > here. >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < >> >> > > >>> [hidden email]> >> >> > > >>> >> > > > wrote: >> >> > > >>> >> > > > > > Hi, >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > I think having examples implemented by different >> >> people >> >> > > >>> proved to >> >> > > >>> >> > be >> >> > > >>> >> > > > > > valuable in the past. >> >> > > >>> >> > > > > > I'd help with two or three examples. >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple first >> one >> >> > such >> >> > > as >> >> > > >>> >> > > WordCount. >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > Fabian >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < >> >> > > >>> [hidden email] >> >> > > >>> >> >: >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > >> Hi, >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here: >> >> > > >>> >> > > > > >> >> >> > > >>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests >> and >> >> > > port >> >> > > >>> the >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let other >> >> > people >> >> > > >>> port >> >> > > >>> >> the >> >> > > >>> >> > > > > >> examples, so that someone else uses it and maybe >> >> > notices >> >> > > some >> >> > > >>> >> > quirks >> >> > > >>> >> > > > > >> in the API? >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> Cheers, >> >> > > >>> >> > > > > >> Aljoscha >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > >> >> > > >>> >> > >> >> > > >>> >> >> >> > > >>> >> >> > > >> >> > >> >> >> |
WebLog here:
https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala Do you need any more done? On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[hidden email]> wrote: > I added the ConnectedComponents Example from Vasia. > > Keep 'em coming, people. :D > > On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[hidden email]> wrote: > > Alright, will do. > > Thanks! > > > > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[hidden email]>: > > > >> Ok people, executive decision. :D > >> > >> Please look at KMeansData.java and KMeans.scala. I'm storing the data > >> in multi-dimensional object arrays and then converting it to the > >> required Java or Scala objects. > >> > >> Also, I changed isEqualTo to equalTo to make it consistent with the Java > >> API. > >> > >> Regarding Join (and coGroup). There is no need for a keyword, you can > >> just write: > >> > >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) > } > >> > >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[hidden email]> > wrote: > >> > Aside from the DataSet issue, I also found an inconsistency with the > Java > >> > API. In Java join is done as: > >> > > >> > ds1.join(ds2).where(...).equalTo(...) > >> > > >> > where in the current Scala this is: > >> > > >> > ds1.join(d2).where(...).isEqualTo(...) > >> > > >> > isEqualTo() should be renamed to equalTo(), IMO. > >> > Also, join (+cross and coGroup?) lacks the with() method because > "with" > >> is > >> > a keyword in Scala. Should be offer something similar for Scala or go > >> with > >> > map() on Tuple2(left, right)? > >> > > >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[hidden email]>: > >> > > >> >> Instead of Strings, Object[][] would work as well. That is a generic > >> >> representation of a Tuple. > >> >> > >> >> Alternatively, they could be stored as Java or Scala Tuples, with a > >> generic > >> >> utility method to convert between the two. > >> >> > >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[hidden email]> > >> wrote: > >> >> > >> >> > Yeah, I ran into the same problem... > >> >> > > >> >> > +1 for using Strings and parsing them, but using the CSVFormat > won't > >> >> work > >> >> > because this is based on a FileInputFormat. > >> >> > So we would need to parse the Strings manually... > >> >> > > >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[hidden email]>: > >> >> > > >> >> > > Hi, > >> >> > > on second thought. Maybe we should just change all the example > input > >> >> > > data to strings and use CSV input formats in all the examples. > What > >> do > >> >> > > you think? > >> >> > > > >> >> > > Cheers, > >> >> > > Aljoscha > >> >> > > > >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < > >> [hidden email]> > >> >> > > wrote: > >> >> > > > Hi, > >> >> > > > yes it's unfortunate that the data types are incompatible. I'm > >> afraid > >> >> > > > you have to to what you proposed: move the data to a static > field > >> and > >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's > >> not > >> >> > > > nice, but copying would duplicate the data and make it easier > for > >> it > >> >> > > > to go out of sync in the Java and Scala versions. > >> >> > > > > >> >> > > > What do the others think? This will probably occur in all the > >> >> examples. > >> >> > > > > >> >> > > > Cheers, > >> >> > > > Aljoscha > >> >> > > > > >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > >> >> > > > <[hidden email]> wrote: > >> >> > > >> Hey, > >> >> > > >> > >> >> > > >> I have ported the Connected Components example, but I am not > sure > >> >> how > >> >> > to > >> >> > > >> reuse the example input data from java-examples. > >> >> > > >> In the ConnectedComponentsData class, the vertices and edges > data > >> >> are > >> >> > > >> produced by the methods getDefaultVertexDataSet() > >> >> > > >> and getDefaultEdgeDataSet(), which take > >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as > parameter. > >> >> > > >> > >> >> > > >> One way is to provide public static fields (like in the > >> >> WordCountData > >> >> > > >> class), but this introduces a conversion > >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and > >> from > >> >> > > >> java.lang.Long to scala.Long and I guess this is an > unnecessary > >> >> > > complexity > >> >> > > >> for an example (?). > >> >> > > >> Another way is, of course, to copy the example data in the > Scala > >> >> > > example. > >> >> > > >> > >> >> > > >> Am I missing something here? > >> >> > > >> > >> >> > > >> Thanks! > >> >> > > >> > >> >> > > >> Cheers, > >> >> > > >> V. > >> >> > > >> > >> >> > > >> > >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < > [hidden email] > >> > > >> >> > > wrote: > >> >> > > >> > >> >> > > >>> Alright, I updated my repo: > >> >> > > >>> > >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> >> > > >>> > >> >> > > >>> This now has a working WordCount example. It's pretty much a > >> copy > >> >> of > >> >> > > >>> the Java example with some fixups for the syntax and lambda > >> >> > functions. > >> >> > > >>> You'll also notice that I added the java-examples as a > >> dependency > >> >> for > >> >> > > >>> the scala-examples. I did this to reuse the example input > data. > >> >> > > >>> > >> >> > > >>> When you ported a program you can do a pull request against > my > >> repo > >> >> > > >>> and I will collect the examples. > >> >> > > >>> > >> >> > > >>> Happy coding. :D > >> >> > > >>> > >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < > >> >> [hidden email] > >> >> > > > >> >> > > >>> wrote: > >> >> > > >>> > +1 > >> >> > > >>> > > >> >> > > >>> > ComputeEdgeDegrees for me! > >> >> > > >>> > > >> >> > > >>> > > >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > >> >> > > >>> [hidden email]> > >> >> > > >>> > wrote: > >> >> > > >>> > > >> >> > > >>> >> +1 > >> >> > > >>> >> > >> >> > > >>> >> BatchGradientDescent for me :) > >> >> > > >>> >> > >> >> > > >>> >> > >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > >> >> > > [hidden email]> > >> >> > > >>> >> wrote: > >> >> > > >>> >> > >> >> > > >>> >> > +1 > >> >> > > >>> >> > > >> >> > > >>> >> > I go for WebLogAnalysis. > >> >> > > >>> >> > > >> >> > > >>> >> > My experience with Scala consists of going through a > >> tutorial > >> >> so > >> >> > > this > >> >> > > >>> >> will > >> >> > > >>> >> > be a good stress test both for me and the new API :-) > >> >> > > >>> >> > > >> >> > > >>> >> > > >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >> >> > > >>> >> > [hidden email]> > >> >> > > >>> >> > wrote: > >> >> > > >>> >> > > >> >> > > >>> >> > > +1 for having other people implement the examples! > >> >> > > >>> >> > > Connected Components and Kmeans for me :) > >> >> > > >>> >> > > > >> >> > > >>> >> > > -V. > >> >> > > >>> >> > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < > >> >> [hidden email]> > >> >> > > >>> wrote: > >> >> > > >>> >> > > > >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > Let's also do the examples similar to the Java > >> examples: > >> >> > > >>> >> > > > - running out-of-the-box without parameters > >> >> > > >>> >> > > > - parameters for external data > >> >> > > >>> >> > > > - follow a similar code structure > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > >> >> > > [hidden email] > >> >> > > >>> >: > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > Will do, then people can reserve their favourite > >> >> examples > >> >> > > here. > >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > >> >> > > >>> [hidden email]> > >> >> > > >>> >> > > > wrote: > >> >> > > >>> >> > > > > > Hi, > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > > I think having examples implemented by different > >> >> people > >> >> > > >>> proved to > >> >> > > >>> >> > be > >> >> > > >>> >> > > > > > valuable in the past. > >> >> > > >>> >> > > > > > I'd help with two or three examples. > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple first > >> one > >> >> > such > >> >> > > as > >> >> > > >>> >> > > WordCount. > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > > Fabian > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > >> >> > > >>> [hidden email] > >> >> > > >>> >> >: > >> >> > > >>> >> > > > > > > >> >> > > >>> >> > > > > >> Hi, > >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here: > >> >> > > >>> >> > > > > >> > >> >> > > >>> >> > >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> >> > > >>> >> > > > > >> > >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the > tests > >> and > >> >> > > port > >> >> > > >>> the > >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let > other > >> >> > people > >> >> > > >>> port > >> >> > > >>> >> the > >> >> > > >>> >> > > > > >> examples, so that someone else uses it and > maybe > >> >> > notices > >> >> > > some > >> >> > > >>> >> > quirks > >> >> > > >>> >> > > > > >> in the API? > >> >> > > >>> >> > > > > >> > >> >> > > >>> >> > > > > >> Cheers, > >> >> > > >>> >> > > > > >> Aljoscha > >> >> > > >>> >> > > > > >> > >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > >> >> > > >>> >> > >> >> > > >>> > >> >> > > > >> >> > > >> >> > >> > |
Free forum by Nabble | Edit this page |