Hi,
I'm currently working on some code in Flink's runtime and want to use some Java Primitive Collections to improve performance. As fas as I can see no Primitive Collections library is in the dependencies so I wanted to ask if anybody has any preferences or input on which library the project should use. The viable candidates (from a licensing perspective at least) are: - Apache Commons Primitives Collections - High Performance Primitives Collections for Java (HPPC) - fastutil They are all APL 2.0 licensed. Since I'm probably not the only one who is going to use Primitive Collections I don't want to introduce a new dependency without any discussion. Cheers, Robert |
Hi Robert,
The Apache Commons Primitives Collection project seems to be pretty inactive. The last release was in 2003, there are many dead links on the website. I would not suggest to use it. HPPC and fastutil seem pretty similar to me. Both have a somewhat active mailing list and up-to-date releases. Apache Giraph is using fastutil. In my opinion, its up to you to decide what you want to use. On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury <[hidden email]> wrote: > Hi, > > I'm currently working on some code in Flink's runtime and want to use some > Java Primitive Collections to improve performance. > > As fas as I can see no Primitive Collections library is in the dependencies > so I wanted to ask if anybody has any preferences or input on which library > the project should use. > > The viable candidates (from a licensing perspective at least) are: > > - Apache Commons Primitives Collections > - High Performance Primitives Collections for Java (HPPC) > - fastutil > > They are all APL 2.0 licensed. > > Since I'm probably not the only one who is going to use Primitive > Collections I don't want to introduce a new dependency without any > discussion. > > Cheers, > Robert > |
+1 for fastutils
On 06/20/2014 08:50 AM, Robert Metzger wrote: > Hi Robert, > > The Apache Commons Primitives Collection project seems to be pretty > inactive. The last release was in 2003, there are many dead links on the > website. I would not suggest to use it. > HPPC and fastutil seem pretty similar to me. Both have a somewhat active > mailing list and up-to-date releases. Apache Giraph is using fastutil. > In my opinion, its up to you to decide what you want to use. > > > > > On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury <[hidden email]> > wrote: > >> Hi, >> >> I'm currently working on some code in Flink's runtime and want to use some >> Java Primitive Collections to improve performance. >> >> As fas as I can see no Primitive Collections library is in the dependencies >> so I wanted to ask if anybody has any preferences or input on which library >> the project should use. >> >> The viable candidates (from a licensing perspective at least) are: >> >> - Apache Commons Primitives Collections >> - High Performance Primitives Collections for Java (HPPC) >> - fastutil >> >> They are all APL 2.0 licensed. >> >> Since I'm probably not the only one who is going to use Primitive >> Collections I don't want to introduce a new dependency without any >> discussion. >> >> Cheers, >> Robert >> > |
Okay,
I'm going to add fastutil to the dependencies in my next pull request. Cheers, Robert On Jun 20, 2014 8:52 AM, "Sebastian Schelter" <[hidden email]> wrote: > +1 for fastutils > > > On 06/20/2014 08:50 AM, Robert Metzger wrote: > >> Hi Robert, >> >> The Apache Commons Primitives Collection project seems to be pretty >> inactive. The last release was in 2003, there are many dead links on the >> website. I would not suggest to use it. >> HPPC and fastutil seem pretty similar to me. Both have a somewhat active >> mailing list and up-to-date releases. Apache Giraph is using fastutil. >> In my opinion, its up to you to decide what you want to use. >> >> >> >> >> On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury < >> [hidden email]> >> wrote: >> >> Hi, >>> >>> I'm currently working on some code in Flink's runtime and want to use >>> some >>> Java Primitive Collections to improve performance. >>> >>> As fas as I can see no Primitive Collections library is in the >>> dependencies >>> so I wanted to ask if anybody has any preferences or input on which >>> library >>> the project should use. >>> >>> The viable candidates (from a licensing perspective at least) are: >>> >>> - Apache Commons Primitives Collections >>> - High Performance Primitives Collections for Java (HPPC) >>> - fastutil >>> >>> They are all APL 2.0 licensed. >>> >>> Since I'm probably not the only one who is going to use Primitive >>> Collections I don't want to introduce a new dependency without any >>> discussion. >>> >>> Cheers, >>> Robert >>> >>> >> > |
A word of caution -- fastutil is a massive library, 20MB or so and 10K
files if I recall correctly. It was pulled out of Spark just because it was making the deployment jars huge (and wasn't used much). Make sure it's worth it. On Tue, Jun 24, 2014 at 8:33 AM, Robert Waury <[hidden email]> wrote: > Okay, > > I'm going to add fastutil to the dependencies in my next pull request. > > Cheers, > Robert > On Jun 20, 2014 8:52 AM, "Sebastian Schelter" <[hidden email]> wrote: > >> +1 for fastutils >> >> >> On 06/20/2014 08:50 AM, Robert Metzger wrote: >> >>> Hi Robert, >>> >>> The Apache Commons Primitives Collection project seems to be pretty >>> inactive. The last release was in 2003, there are many dead links on the >>> website. I would not suggest to use it. >>> HPPC and fastutil seem pretty similar to me. Both have a somewhat active >>> mailing list and up-to-date releases. Apache Giraph is using fastutil. >>> In my opinion, its up to you to decide what you want to use. >>> >>> >>> >>> >>> On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury < >>> [hidden email]> >>> wrote: >>> >>> Hi, >>>> >>>> I'm currently working on some code in Flink's runtime and want to use >>>> some >>>> Java Primitive Collections to improve performance. >>>> >>>> As fas as I can see no Primitive Collections library is in the >>>> dependencies >>>> so I wanted to ask if anybody has any preferences or input on which >>>> library >>>> the project should use. >>>> >>>> The viable candidates (from a licensing perspective at least) are: >>>> >>>> - Apache Commons Primitives Collections >>>> - High Performance Primitives Collections for Java (HPPC) >>>> - fastutil >>>> >>>> They are all APL 2.0 licensed. >>>> >>>> Since I'm probably not the only one who is going to use Primitive >>>> Collections I don't want to introduce a new dependency without any >>>> discussion. >>>> >>>> Cheers, >>>> Robert >>>> >>>> >>> >> |
What did Spark use instead of fastutil?
On Tue, Jun 24, 2014 at 9:40 AM, Sean Owen <[hidden email]> wrote: > A word of caution -- fastutil is a massive library, 20MB or so and 10K > files if I recall correctly. It was pulled out of Spark just because > it was making the deployment jars huge (and wasn't used much). Make > sure it's worth it. > > On Tue, Jun 24, 2014 at 8:33 AM, Robert Waury > <[hidden email]> wrote: > > Okay, > > > > I'm going to add fastutil to the dependencies in my next pull request. > > > > Cheers, > > Robert > > On Jun 20, 2014 8:52 AM, "Sebastian Schelter" <[hidden email]> wrote: > > > >> +1 for fastutils > >> > >> > >> On 06/20/2014 08:50 AM, Robert Metzger wrote: > >> > >>> Hi Robert, > >>> > >>> The Apache Commons Primitives Collection project seems to be pretty > >>> inactive. The last release was in 2003, there are many dead links on > the > >>> website. I would not suggest to use it. > >>> HPPC and fastutil seem pretty similar to me. Both have a somewhat > active > >>> mailing list and up-to-date releases. Apache Giraph is using fastutil. > >>> In my opinion, its up to you to decide what you want to use. > >>> > >>> > >>> > >>> > >>> On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury < > >>> [hidden email]> > >>> wrote: > >>> > >>> Hi, > >>>> > >>>> I'm currently working on some code in Flink's runtime and want to use > >>>> some > >>>> Java Primitive Collections to improve performance. > >>>> > >>>> As fas as I can see no Primitive Collections library is in the > >>>> dependencies > >>>> so I wanted to ask if anybody has any preferences or input on which > >>>> library > >>>> the project should use. > >>>> > >>>> The viable candidates (from a licensing perspective at least) are: > >>>> > >>>> - Apache Commons Primitives Collections > >>>> - High Performance Primitives Collections for Java (HPPC) > >>>> - fastutil > >>>> > >>>> They are all APL 2.0 licensed. > >>>> > >>>> Since I'm probably not the only one who is going to use Primitive > >>>> Collections I don't want to introduce a new dependency without any > >>>> discussion. > >>>> > >>>> Cheers, > >>>> Robert > >>>> > >>>> > >>> > >> > |
It's own implementation of a primitive map. The needs were limited,
and two Scala classes did the trick. On Tue, Jun 24, 2014 at 9:18 AM, Robert Metzger <[hidden email]> wrote: > What did Spark use instead of fastutil? > > > On Tue, Jun 24, 2014 at 9:40 AM, Sean Owen <[hidden email]> wrote: > >> A word of caution -- fastutil is a massive library, 20MB or so and 10K >> files if I recall correctly. It was pulled out of Spark just because >> it was making the deployment jars huge (and wasn't used much). Make >> sure it's worth it. >> >> On Tue, Jun 24, 2014 at 8:33 AM, Robert Waury >> <[hidden email]> wrote: >> > Okay, >> > >> > I'm going to add fastutil to the dependencies in my next pull request. >> > >> > Cheers, >> > Robert >> > On Jun 20, 2014 8:52 AM, "Sebastian Schelter" <[hidden email]> wrote: >> > >> >> +1 for fastutils >> >> >> >> >> >> On 06/20/2014 08:50 AM, Robert Metzger wrote: >> >> >> >>> Hi Robert, >> >>> >> >>> The Apache Commons Primitives Collection project seems to be pretty >> >>> inactive. The last release was in 2003, there are many dead links on >> the >> >>> website. I would not suggest to use it. >> >>> HPPC and fastutil seem pretty similar to me. Both have a somewhat >> active >> >>> mailing list and up-to-date releases. Apache Giraph is using fastutil. >> >>> In my opinion, its up to you to decide what you want to use. >> >>> >> >>> >> >>> >> >>> >> >>> On Thu, Jun 19, 2014 at 2:53 PM, Robert Waury < >> >>> [hidden email]> >> >>> wrote: >> >>> >> >>> Hi, >> >>>> >> >>>> I'm currently working on some code in Flink's runtime and want to use >> >>>> some >> >>>> Java Primitive Collections to improve performance. >> >>>> >> >>>> As fas as I can see no Primitive Collections library is in the >> >>>> dependencies >> >>>> so I wanted to ask if anybody has any preferences or input on which >> >>>> library >> >>>> the project should use. >> >>>> >> >>>> The viable candidates (from a licensing perspective at least) are: >> >>>> >> >>>> - Apache Commons Primitives Collections >> >>>> - High Performance Primitives Collections for Java (HPPC) >> >>>> - fastutil >> >>>> >> >>>> They are all APL 2.0 licensed. >> >>>> >> >>>> Since I'm probably not the only one who is going to use Primitive >> >>>> Collections I don't want to introduce a new dependency without any >> >>>> discussion. >> >>>> >> >>>> Cheers, >> >>>> Robert >> >>>> >> >>>> >> >>> >> >> >> |
I agree with Sean. Fastutil does all sorts of combinations of primitive key
and value types, blowing the number of classes up. If you only need a simple growing array of primitive longs, it may be simpler to just implement it yourself. |
In reply to this post by Sean Owen
Sean, which is the API you are referring to. I am actually looking for a similar API for memory optimization but wasnt able to find it. JavaDoubleRDD doesnt serve the purpose. Looking for a object double sort of primitve map.
|
There is also https://github.com/OpenHFT/Koloboke
But I feel Flink can have its own collections which are more optimized for Flink use cases. You can bench mark and see what works best. |
FWIW I've been happy with Carrot HPPC in Java:
https://github.com/carrotsearch/hppc On Tue, Nov 25, 2014 at 3:24 PM, sirinath <[hidden email]> wrote: > There is also https://github.com/OpenHFT/Koloboke > > But I feel Flink can have its own collections which are more optimized for > Flink use cases. You can bench mark and see what works best. > > > > -- > View this message in context: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Java-Primitive-Collections-in-Flink-tp563p2612.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list archive at Nabble.com. |
Koloboke is the fastest. Also speeds may improve further.
May be you can as the library author to implement some of the features you want as he makes that request. See: https://github.com/OpenHFT/Koloboke/wiki/Koloboke:-roll-the-collection-implementation-with-features-you-need |
OK, I'm converting to Koloboke. It does work a bit faster, but I like
that you can use them as regular Collections, and that it integrates / backports directly the Java 8 collection APIs. On Tue, Nov 25, 2014 at 3:31 PM, sirinath <[hidden email]> wrote: > Koloboke is the fastest. Also speeds may improve further. > > May be you can as the library author to implement some of the features you > want as he makes that request. See: > https://github.com/OpenHFT/Koloboke/wiki/Koloboke:-roll-the-collection-implementation-with-features-you-need > > > > -- > View this message in context: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Java-Primitive-Collections-in-Flink-tp563p2614.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |