(DEPRECATED) Apache Flink Mailing List archive.

Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Classic

List

Threaded

6 messages Options

Jamie Grier-2

Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Is the `flink-connector-filesystem` connector supposed to work with the
latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto`
filesystem implementation?

-Jamie

Aljoscha Krettek-2

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Hi,

I'm afraid not, since the BucketingSink uses the Hadoop FileSystem directly and not the Flink FileSystem abstraction. The flink-s3-fs-* modules only provide Flink FileSystems.

One of the goals for 1.6 is to provide a BucketingSink that uses the Flink FileSystem and also works well with eventually consistent file systems.

--
Aljoscha

> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
>
> Is the `flink-connector-filesystem` connector supposed to work with the
> latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto`
> filesystem implementation?
>
> -Jamie

Jamie Grier-2

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Thanks, Aljoscha :)

So is it possible to continue to use the new "native' fllesystems along
with the BucketingSink by including the Hadoop dependencies only in the
user's uber jar? Or is that asking for trouble? Has anyone tried that
successfully?

-Jamie

On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
wrote:

> Hi,
>
> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> modules only provide Flink FileSystems.
>
> One of the goals for 1.6 is to provide a BucketingSink that uses the Flink
> FileSystem and also works well with eventually consistent file systems.
>
> --
> Aljoscha
>
> > On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> >
> > Is the `flink-connector-filesystem` connector supposed to work with the
> > latest hadoop-free Flink releases, say along with the
> `flink-s3-fs-presto`
> > filesystem implementation?
> >
> > -Jamie
>
>

Aljoscha Krettek-2

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

You mean putting the Flink-native S3 filesystem in the user jar or Hadoop in the user jar. The former wouldn't work, I think, because the FileSystems are being initialised before the user-jar is loaded. The latter might work but only if you don't have Hadoop in the classpath, i.e. not on YARN and only on a Hadoop-free cluster. Maybe...

> On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
>
> Thanks, Aljoscha :)
>
> So is it possible to continue to use the new "native' fllesystems along
> with the BucketingSink by including the Hadoop dependencies only in the
> user's uber jar? Or is that asking for trouble? Has anyone tried that
> successfully?
>
> -Jamie
>
>
> On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
>> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
>> modules only provide Flink FileSystems.
>>
>> One of the goals for 1.6 is to provide a BucketingSink that uses the Flink
>> FileSystem and also works well with eventually consistent file systems.
>>
>> --
>> Aljoscha
>>
>>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
>>>
>>> Is the `flink-connector-filesystem` connector supposed to work with the
>>> latest hadoop-free Flink releases, say along with the
>> `flink-s3-fs-presto`
>>> filesystem implementation?
>>>
>>> -Jamie
>>
>>

Jamie Grier-2

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Yeah, I meant that latter.. but it sounds like it could be just asking for
trouble. I just like the idea of keeping the set of un-shaded JARs in the
flink/lib directory to a minimum..

Thanks.

On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]>
wrote:

> You mean putting the Flink-native S3 filesystem in the user jar or Hadoop
> in the user jar. The former wouldn't work, I think, because the FileSystems
> are being initialised before the user-jar is loaded. The latter might work
> but only if you don't have Hadoop in the classpath, i.e. not on YARN and
> only on a Hadoop-free cluster. Maybe...
>
> > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
> >
> > Thanks, Aljoscha :)
> >
> > So is it possible to continue to use the new "native' fllesystems along
> > with the BucketingSink by including the Hadoop dependencies only in the
> > user's uber jar? Or is that asking for trouble? Has anyone tried that
> > successfully?
> >
> > -Jamie
> >
> >
> > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> >> modules only provide Flink FileSystems.
> >>
> >> One of the goals for 1.6 is to provide a BucketingSink that uses the
> Flink
> >> FileSystem and also works well with eventually consistent file systems.
> >>
> >> --
> >> Aljoscha
> >>
> >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> >>>
> >>> Is the `flink-connector-filesystem` connector supposed to work with the
> >>> latest hadoop-free Flink releases, say along with the
> >> `flink-s3-fs-presto`
> >>> filesystem implementation?
> >>>
> >>> -Jamie
> >>
> >>
>
>

Stephan Ewen

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Hi!

You could try to have Hadoop in your application Jar file, but I expect
trouble with s3a, because of its specific way they do connection pooling.

Making the Bucketing Sink work with Flink's file systems (and thus with
Hadoop-free Flink) is super high up on the list as soon as the release is
out.

Stephan

On Fri, Feb 23, 2018 at 2:32 PM, Jamie Grier <[hidden email]> wrote:

> Yeah, I meant that latter.. but it sounds like it could be just asking for
> trouble. I just like the idea of keeping the set of un-shaded JARs in the
> flink/lib directory to a minimum..
>
> Thanks.
>
> On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
> > You mean putting the Flink-native S3 filesystem in the user jar or Hadoop
> > in the user jar. The former wouldn't work, I think, because the
> FileSystems
> > are being initialised before the user-jar is loaded. The latter might
> work
> > but only if you don't have Hadoop in the classpath, i.e. not on YARN and
> > only on a Hadoop-free cluster. Maybe...
> >
> > > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
> > >
> > > Thanks, Aljoscha :)
> > >
> > > So is it possible to continue to use the new "native' fllesystems along
> > > with the BucketingSink by including the Hadoop dependencies only in the
> > > user's uber jar? Or is that asking for trouble? Has anyone tried that
> > > successfully?
> > >
> > > -Jamie
> > >
> > >
> > > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <
> [hidden email]>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> > >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> > >> modules only provide Flink FileSystems.
> > >>
> > >> One of the goals for 1.6 is to provide a BucketingSink that uses the
> > Flink
> > >> FileSystem and also works well with eventually consistent file
> systems.
> > >>
> > >> --
> > >> Aljoscha
> > >>
> > >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> > >>>
> > >>> Is the `flink-connector-filesystem` connector supposed to work with
> the
> > >>> latest hadoop-free Flink releases, say along with the
> > >> `flink-s3-fs-presto`
> > >>> filesystem implementation?
> > >>>
> > >>> -Jamie
> > >>
> > >>
> >
> >
>