Is the `flink-connector-filesystem` connector supposed to work with the
latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto` filesystem implementation? -Jamie |
Hi,
I'm afraid not, since the BucketingSink uses the Hadoop FileSystem directly and not the Flink FileSystem abstraction. The flink-s3-fs-* modules only provide Flink FileSystems. One of the goals for 1.6 is to provide a BucketingSink that uses the Flink FileSystem and also works well with eventually consistent file systems. -- Aljoscha > On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote: > > Is the `flink-connector-filesystem` connector supposed to work with the > latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto` > filesystem implementation? > > -Jamie |
Thanks, Aljoscha :)
So is it possible to continue to use the new "native' fllesystems along with the BucketingSink by including the Hadoop dependencies only in the user's uber jar? Or is that asking for trouble? Has anyone tried that successfully? -Jamie On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]> wrote: > Hi, > > I'm afraid not, since the BucketingSink uses the Hadoop FileSystem > directly and not the Flink FileSystem abstraction. The flink-s3-fs-* > modules only provide Flink FileSystems. > > One of the goals for 1.6 is to provide a BucketingSink that uses the Flink > FileSystem and also works well with eventually consistent file systems. > > -- > Aljoscha > > > On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote: > > > > Is the `flink-connector-filesystem` connector supposed to work with the > > latest hadoop-free Flink releases, say along with the > `flink-s3-fs-presto` > > filesystem implementation? > > > > -Jamie > > |
You mean putting the Flink-native S3 filesystem in the user jar or Hadoop in the user jar. The former wouldn't work, I think, because the FileSystems are being initialised before the user-jar is loaded. The latter might work but only if you don't have Hadoop in the classpath, i.e. not on YARN and only on a Hadoop-free cluster. Maybe...
> On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote: > > Thanks, Aljoscha :) > > So is it possible to continue to use the new "native' fllesystems along > with the BucketingSink by including the Hadoop dependencies only in the > user's uber jar? Or is that asking for trouble? Has anyone tried that > successfully? > > -Jamie > > > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]> > wrote: > >> Hi, >> >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-* >> modules only provide Flink FileSystems. >> >> One of the goals for 1.6 is to provide a BucketingSink that uses the Flink >> FileSystem and also works well with eventually consistent file systems. >> >> -- >> Aljoscha >> >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote: >>> >>> Is the `flink-connector-filesystem` connector supposed to work with the >>> latest hadoop-free Flink releases, say along with the >> `flink-s3-fs-presto` >>> filesystem implementation? >>> >>> -Jamie >> >> |
Yeah, I meant that latter.. but it sounds like it could be just asking for
trouble. I just like the idea of keeping the set of un-shaded JARs in the flink/lib directory to a minimum.. Thanks. On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]> wrote: > You mean putting the Flink-native S3 filesystem in the user jar or Hadoop > in the user jar. The former wouldn't work, I think, because the FileSystems > are being initialised before the user-jar is loaded. The latter might work > but only if you don't have Hadoop in the classpath, i.e. not on YARN and > only on a Hadoop-free cluster. Maybe... > > > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote: > > > > Thanks, Aljoscha :) > > > > So is it possible to continue to use the new "native' fllesystems along > > with the BucketingSink by including the Hadoop dependencies only in the > > user's uber jar? Or is that asking for trouble? Has anyone tried that > > successfully? > > > > -Jamie > > > > > > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]> > > wrote: > > > >> Hi, > >> > >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem > >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-* > >> modules only provide Flink FileSystems. > >> > >> One of the goals for 1.6 is to provide a BucketingSink that uses the > Flink > >> FileSystem and also works well with eventually consistent file systems. > >> > >> -- > >> Aljoscha > >> > >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote: > >>> > >>> Is the `flink-connector-filesystem` connector supposed to work with the > >>> latest hadoop-free Flink releases, say along with the > >> `flink-s3-fs-presto` > >>> filesystem implementation? > >>> > >>> -Jamie > >> > >> > > |
Hi!
You could try to have Hadoop in your application Jar file, but I expect trouble with s3a, because of its specific way they do connection pooling. Making the Bucketing Sink work with Flink's file systems (and thus with Hadoop-free Flink) is super high up on the list as soon as the release is out. Stephan On Fri, Feb 23, 2018 at 2:32 PM, Jamie Grier <[hidden email]> wrote: > Yeah, I meant that latter.. but it sounds like it could be just asking for > trouble. I just like the idea of keeping the set of un-shaded JARs in the > flink/lib directory to a minimum.. > > Thanks. > > On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]> > wrote: > > > You mean putting the Flink-native S3 filesystem in the user jar or Hadoop > > in the user jar. The former wouldn't work, I think, because the > FileSystems > > are being initialised before the user-jar is loaded. The latter might > work > > but only if you don't have Hadoop in the classpath, i.e. not on YARN and > > only on a Hadoop-free cluster. Maybe... > > > > > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote: > > > > > > Thanks, Aljoscha :) > > > > > > So is it possible to continue to use the new "native' fllesystems along > > > with the BucketingSink by including the Hadoop dependencies only in the > > > user's uber jar? Or is that asking for trouble? Has anyone tried that > > > successfully? > > > > > > -Jamie > > > > > > > > > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek < > [hidden email]> > > > wrote: > > > > > >> Hi, > > >> > > >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem > > >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-* > > >> modules only provide Flink FileSystems. > > >> > > >> One of the goals for 1.6 is to provide a BucketingSink that uses the > > Flink > > >> FileSystem and also works well with eventually consistent file > systems. > > >> > > >> -- > > >> Aljoscha > > >> > > >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote: > > >>> > > >>> Is the `flink-connector-filesystem` connector supposed to work with > the > > >>> latest hadoop-free Flink releases, say along with the > > >> `flink-s3-fs-presto` > > >>> filesystem implementation? > > >>> > > >>> -Jamie > > >> > > >> > > > > > |
Free forum by Nabble | Edit this page |