(DEPRECATED) Apache Flink Mailing List archive.

difference b/t filesystem and connector-filesystem

Classic

List

Threaded

15 messages Options

cw7k

difference b/t filesystem and connector-filesystem

Hi, I'm trying to understand the difference between the flink-filesystem and flink-connector-filesystem. How is each intended to be used?
If adding support for a different storage provider that supports HDFS, should additions be made to one or the other, or both? Thanks.

Till Rohrmann

Re: difference b/t filesystem and connector-filesystem

Hi,

the flink-connector-filesystem contains the BucketingSink which is a
connector with which you can write your data to a file system. It provides
exactly once processing guarantees and allows to write data to different
buckets [1].

The flink-filesystem module contains different file system implementations
(like mapr fs, hdfs or s3). If you want to use, for example, s3 file
system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.

So if you want to write your data to s3 using the BucketingSink, then you
have to add flink-connector-filesystem for the BucketingSink as well as a
s3 file system implementations (e.g. flink-s3-fs-hadoop or
flink-s3-fs-presto).

Usually, there should be no need to change Flink's filesystem
implementations. If you want to add a new connector, then this would go to
flink-connectors or to Apache Bahir [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html

[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir

Cheers,
Till

On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:

> Hi, I'm trying to understand the difference between the flink-filesystem
> and flink-connector-filesystem. How is each intended to be used?
> If adding support for a different storage provider that supports HDFS,
> should additions be made to one or the other, or both? Thanks.

cw7k

Re: difference b/t filesystem and connector-filesystem

Thanks, is there a sample project that outputs a Flink example app (ie WordCount) to S3?
On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote:

Hi,

the flink-connector-filesystem contains the BucketingSink which is a
connector with which you can write your data to a file system. It provides
exactly once processing guarantees and allows to write data to different
buckets [1].

The flink-filesystem module contains different file system implementations
(like mapr fs, hdfs or s3). If you want to use, for example, s3 file
system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.

So if you want to write your data to s3 using the BucketingSink, then you
have to add flink-connector-filesystem for the BucketingSink as well as a
s3 file system implementations (e.g. flink-s3-fs-hadoop or
flink-s3-fs-presto).

Usually, there should be no need to change Flink's filesystem
implementations. If you want to add a new connector, then this would go to
flink-connectors or to Apache Bahir [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html

[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir

Cheers,
Till

On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:

> Hi, I'm trying to understand the difference between the flink-filesystem
> and flink-connector-filesystem. How is each intended to be used?
> If adding support for a different storage provider that supports HDFS,
> should additions be made to one or the other, or both? Thanks.

cw7k

Re: difference b/t filesystem and connector-filesystem

In reply to this post by Till Rohrmann

Hi, question on this page:
"You need to point Flink to a valid Hadoop configuration..."https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html#s3-simple-storage-service
How do you point Flink to the Hadoop config?
On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote:

Hi,

the flink-connector-filesystem contains the BucketingSink which is a
connector with which you can write your data to a file system. It provides
exactly once processing guarantees and allows to write data to different
buckets [1].

The flink-filesystem module contains different file system implementations
(like mapr fs, hdfs or s3). If you want to use, for example, s3 file
system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.

So if you want to write your data to s3 using the BucketingSink, then you
have to add flink-connector-filesystem for the BucketingSink as well as a
s3 file system implementations (e.g. flink-s3-fs-hadoop or
flink-s3-fs-presto).

Usually, there should be no need to change Flink's filesystem
implementations. If you want to add a new connector, then this would go to
flink-connectors or to Apache Bahir [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html

[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir

Cheers,
Till

On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:

> Hi, I'm trying to understand the difference between the flink-filesystem
> and flink-connector-filesystem. How is each intended to be used?
> If adding support for a different storage provider that supports HDFS,
> should additions be made to one or the other, or both? Thanks.

cw7k

adding a new cloud filesystem

Hi, I'm adding support for more cloud storage providers such as Google (gcs://) and Oracle (oci://).
I have an oci:// test working based on the s3a:// test but when I try it on an actual Flink job like WordCount, I get this message:
"org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded."
How do I register new schemes into the file system factory? Thanks. On Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> wrote:

Hi, question on this page:
"You need to point Flink to a valid Hadoop configuration..."https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html#s3-simple-storage-service
How do you point Flink to the Hadoop config?
On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote:

Hi,

the flink-connector-filesystem contains the BucketingSink which is a
connector with which you can write your data to a file system. It provides
exactly once processing guarantees and allows to write data to different
buckets [1].

The flink-filesystem module contains different file system implementations
(like mapr fs, hdfs or s3). If you want to use, for example, s3 file
system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.

So if you want to write your data to s3 using the BucketingSink, then you
have to add flink-connector-filesystem for the BucketingSink as well as a
s3 file system implementations (e.g. flink-s3-fs-hadoop or
flink-s3-fs-presto).

Usually, there should be no need to change Flink's filesystem
implementations. If you want to add a new connector, then this would go to
flink-connectors or to Apache Bahir [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html

[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir

Cheers,
Till

On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:

> Hi, I'm trying to understand the difference between the flink-filesystem
> and flink-connector-filesystem. How is each intended to be used?
> If adding support for a different storage provider that supports HDFS,
> should additions be made to one or the other, or both? Thanks.

Fabian Hueske-2

Re: adding a new cloud filesystem

Hi,

please have a look at this doc page [1].
It describes how to add new file system implementations and also how to
configure them.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#adding-new-file-system-implementations

2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>:

> Hi, I'm adding support for more cloud storage providers such as Google
> (gcs://) and Oracle (oci://).
> I have an oci:// test working based on the s3a:// test but when I try it
> on an actual Flink job like WordCount, I get this message:
> "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
> find a file system implementation for scheme 'oci'. The scheme is not
> directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded."
> How do I register new schemes into the file system factory? Thanks. On
> Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]>
> wrote:
>
> Hi, question on this page:
> "You need to point Flink to a valid Hadoop configuration..."https://ci.
> apache.org/projects/flink/flink-docs-release-1.4/ops/
> deployment/aws.html#s3-simple-storage-service
> How do you point Flink to the Hadoop config?
> On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> [hidden email]> wrote:
>
> Hi,
>
> the flink-connector-filesystem contains the BucketingSink which is a
> connector with which you can write your data to a file system. It provides
> exactly once processing guarantees and allows to write data to different
> buckets [1].
>
> The flink-filesystem module contains different file system implementations
> (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.
>
> So if you want to write your data to s3 using the BucketingSink, then you
> have to add flink-connector-filesystem for the BucketingSink as well as a
> s3 file system implementations (e.g. flink-s3-fs-hadoop or
> flink-s3-fs-presto).
>
> Usually, there should be no need to change Flink's filesystem
> implementations. If you want to add a new connector, then this would go to
> flink-connectors or to Apache Bahir [2].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> filesystem_sink.html
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-
> master/dev/connectors/index.html#connectors-in-apache-bahir
>
> Cheers,
> Till
>
> On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:
>
> > Hi, I'm trying to understand the difference between the flink-filesystem
> > and flink-connector-filesystem. How is each intended to be used?
> > If adding support for a different storage provider that supports HDFS,
> > should additions be made to one or the other, or both? Thanks.
>

cw7k

Re: adding a new cloud filesystem

Thanks. I'm looking at the s3 example and I can only find the S3FileSystemFactory but not the File System implementation (subclass of org.apache.flink.core.fs.FileSystem).
Is that requirement still needed? On Wednesday, January 17, 2018, 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote:

Hi,

please have a look at this doc page [1].
It describes how to add new file system implementations and also how to
configure them.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#adding-new-file-system-implementations

2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>:

Fabian Hueske-2

Re: adding a new cloud filesystem

In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:

> Thanks. I'm looking at the s3 example and I can only find the
> S3FileSystemFactory but not the File System implementation (subclass
> of org.apache.flink.core.fs.FileSystem).
> Is that requirement still needed? On Wednesday, January 17, 2018,
> 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote:
>
> Hi,
>
> please have a look at this doc page [1].
> It describes how to add new file system implementations and also how to
> configure them.
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/ops/filesystems.html#adding-new-file-system-implementations
>
> 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>:
>
> > Hi, I'm adding support for more cloud storage providers such as Google
> > (gcs://) and Oracle (oci://).
> > I have an oci:// test working based on the s3a:// test but when I try it
> > on an actual Flink job like WordCount, I get this message:
> > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could
> not
> > find a file system implementation for scheme 'oci'. The scheme is not
> > directly supported by Flink and no Hadoop file system to support this
> > scheme could be loaded."
> > How do I register new schemes into the file system factory? Thanks.
> On
> > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]>
> > wrote:
> >
> > Hi, question on this page:
> > "You need to point Flink to a valid Hadoop configuration..."https://ci.
> > apache.org/projects/flink/flink-docs-release-1.4/ops/
> > deployment/aws.html#s3-simple-storage-service
> > How do you point Flink to the Hadoop config?
> > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> > [hidden email]> wrote:
> >
> > Hi,
> >
> > the flink-connector-filesystem contains the BucketingSink which is a
> > connector with which you can write your data to a file system. It
> provides
> > exactly once processing guarantees and allows to write data to different
> > buckets [1].
> >
> > The flink-filesystem module contains different file system
> implementations
> > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto
> module.
> >
> > So if you want to write your data to s3 using the BucketingSink, then you
> > have to add flink-connector-filesystem for the BucketingSink as well as a
> > s3 file system implementations (e.g. flink-s3-fs-hadoop or
> > flink-s3-fs-presto).
> >
> > Usually, there should be no need to change Flink's filesystem
> > implementations. If you want to add a new connector, then this would go
> to
> > flink-connectors or to Apache Bahir [2].
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> > filesystem_sink.html
> >
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-
> > master/dev/connectors/index.html#connectors-in-apache-bahir
> >
> > Cheers,
> > Till
> >
> > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:
> >
> > > Hi, I'm trying to understand the difference between the
> flink-filesystem
> > > and flink-connector-filesystem. How is each intended to be used?
> > > If adding support for a different storage provider that supports HDFS,
> > > should additions be made to one or the other, or both? Thanks.
> >
>
>

cw7k

Re: adding a new cloud filesystem

Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error:
"Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded."
What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote:

In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:

cw7k

Re: adding a new cloud filesystem

Hi, just a bit more info, I have a test function working using oci://, based on the S3 test:
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/HadoopS3FileSystemITCase.java#L169
However, when I try to get the WordCount example's WriteAsText to write to my new filesystem:
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L82

that's where I got the "Could not find a file system implementation" error mentioned earlier.

On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k <[hidden email]> wrote:

Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error:
"Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded."
What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote:

In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:

cw7k

Re: adding a new cloud filesystem

Ok, I have the factory working in the WordCount example. I had to move the factory code and META-INF into the WordCount project.
For general Flink jobs, I'm assuming that the goal would be to be able to import the factory from the job itself instead of needing to copy the factory .java file into each project? If so, any guidelines on how to do that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k <[hidden email]> wrote:

Hi, just a bit more info, I have a test function working using oci://, based on the S3 test:
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/HadoopS3FileSystemITCase.java#L169
However, when I try to get the WordCount example's WriteAsText to write to my new filesystem:
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L82

that's where I got the "Could not find a file system implementation" error mentioned earlier.

On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k <[hidden email]> wrote:

Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error:
"Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded."
What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote:

In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:

Gary Yao

Re: difference b/t filesystem and connector-filesystem

In reply to this post by cw7k

Hi,

to do that you can set the env variable HADOOP_CONF_DIR:

https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#hdfs

Best,

Gary

On Wed, Jan 17, 2018 at 2:27 AM, cw7k <[hidden email]> wrote:

> Hi, question on this page:
> "You need to point Flink to a valid Hadoop configuration..."https://ci.
> apache.org/projects/flink/flink-docs-release-1.4/ops/
> deployment/aws.html#s3-simple-storage-service
> How do you point Flink to the Hadoop config?
> On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> [hidden email]> wrote:
>
> Hi,
>
> the flink-connector-filesystem contains the BucketingSink which is a
> connector with which you can write your data to a file system. It provides
> exactly once processing guarantees and allows to write data to different
> buckets [1].
>
> The flink-filesystem module contains different file system implementations
> (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module.
>
> So if you want to write your data to s3 using the BucketingSink, then you
> have to add flink-connector-filesystem for the BucketingSink as well as a
> s3 file system implementations (e.g. flink-s3-fs-hadoop or
> flink-s3-fs-presto).
>
> Usually, there should be no need to change Flink's filesystem
> implementations. If you want to add a new connector, then this would go to
> flink-connectors or to Apache Bahir [2].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> filesystem_sink.html
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-
> master/dev/connectors/index.html#connectors-in-apache-bahir
>
> Cheers,
> Till
>
> On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:
>
> > Hi, I'm trying to understand the difference between the flink-filesystem
> > and flink-connector-filesystem. How is each intended to be used?
> > If adding support for a different storage provider that supports HDFS,
> > should additions be made to one or the other, or both? Thanks.
>
>

Fabian Hueske-2

Re: adding a new cloud filesystem

In reply to this post by cw7k

Great! Thanks for reporting back.

2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>:

> Ok, I have the factory working in the WordCount example. I had to move
> the factory code and META-INF into the WordCount project.
> For general Flink jobs, I'm assuming that the goal would be to be able to
> import the factory from the job itself instead of needing to copy the
> factory .java file into each project? If so, any guidelines on how to do
> that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k
> <[hidden email]> wrote:
>
> Hi, just a bit more info, I have a test function working using oci://,
> based on the S3 test:
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-
> hadoop/src/test/java/org/apache/flink/fs/s3hadoop/
> HadoopS3FileSystemITCase.java#L169
> However, when I try to get the WordCount example's WriteAsText to write to
> my new filesystem:
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-
> streaming/src/main/java/org/apache/flink/streaming/
> examples/wordcount/WordCount.java#L82
>
> that's where I got the "Could not find a file system implementation" error
> mentioned earlier.
>
> On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k
> <[hidden email]> wrote:
>
> Thanks. I now have the 3 requirements fulfilled but the scheme isn't
> being loaded; I get this error:
> "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Could not find a file system implementation for scheme 'oci'. The scheme is
> not directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded."
> What's the best way to debug the loading of the schemes/filesystems by the
> ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian
> Hueske <[hidden email]> wrote:
>
> In fact, there are two S3FileSystemFactory classes, one for Hadoop and
> another one for Presto.
> In both cases an external file system class is wrapped in Flink's
> HadoopFileSystem class [1] [2].
>
> Best, Fabian
>
> [1]
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-
> hadoop/src/main/java/org/apache/flink/fs/s3hadoop/
> S3FileSystemFactory.java#L132
> [2]
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-
> presto/src/main/java/org/apache/flink/fs/s3presto/
> S3FileSystemFactory.java#L131
>
> 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:
>
> > Thanks. I'm looking at the s3 example and I can only find the
> > S3FileSystemFactory but not the File System implementation (subclass
> > of org.apache.flink.core.fs.FileSystem).
> > Is that requirement still needed? On Wednesday, January 17, 2018,
> > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote:
> >
> > Hi,
> >
> > please have a look at this doc page [1].
> > It describes how to add new file system implementations and also how to
> > configure them.
> >
> > Best, Fabian
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-
> > release-1.4/ops/filesystems.html#adding-new-file-system-implementations
> >
> > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>:
> >
> > > Hi, I'm adding support for more cloud storage providers such as Google
> > > (gcs://) and Oracle (oci://).
> > > I have an oci:// test working based on the s3a:// test but when I try
> it
> > > on an actual Flink job like WordCount, I get this message:
> > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could
> > not
> > > find a file system implementation for scheme 'oci'. The scheme is not
> > > directly supported by Flink and no Hadoop file system to support this
> > > scheme could be loaded."
> > > How do I register new schemes into the file system factory? Thanks.
> > On
> > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]
> >
> > > wrote:
> > >
> > > Hi, question on this page:
> > > "You need to point Flink to a valid Hadoop configuration..."https://ci
> .
> > > apache.org/projects/flink/flink-docs-release-1.4/ops/
> > > deployment/aws.html#s3-simple-storage-service
> > > How do you point Flink to the Hadoop config?
> > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> > > [hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > the flink-connector-filesystem contains the BucketingSink which is a
> > > connector with which you can write your data to a file system. It
> > provides
> > > exactly once processing guarantees and allows to write data to
> different
> > > buckets [1].
> > >
> > > The flink-filesystem module contains different file system
> > implementations
> > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto
> > module.
> > >
> > > So if you want to write your data to s3 using the BucketingSink, then
> you
> > > have to add flink-connector-filesystem for the BucketingSink as well
> as a
> > > s3 file system implementations (e.g. flink-s3-fs-hadoop or
> > > flink-s3-fs-presto).
> > >
> > > Usually, there should be no need to change Flink's filesystem
> > > implementations. If you want to add a new connector, then this would go
> > to
> > > flink-connectors or to Apache Bahir [2].
> > >
> > > [1]
> > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> > > filesystem_sink.html
> > >
> > > [2]
> > > https://ci.apache.org/projects/flink/flink-docs-
> > > master/dev/connectors/index.html#connectors-in-apache-bahir
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote:
> > >
> > > > Hi, I'm trying to understand the difference between the
> > flink-filesystem
> > > > and flink-connector-filesystem. How is each intended to be used?
> > > > If adding support for a different storage provider that supports
> HDFS,
> > > > should additions be made to one or the other, or both? Thanks.
> > >
> >
> >
>
>

cw7k

filesystem output partitioning

Hi, I ran the WordCount batch program and noticed the output was split into 5 files.Is there documentation on how the splitting is done and how to tweak it? On Friday, January 19, 2018, 12:06:45 AM PST, Fabian Hueske <[hidden email]> wrote:

Great! Thanks for reporting back.

2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>:

Fabian Hueske-2

Re: filesystem output partitioning

In DataSet (batch) programs, FileOutputFormats write one output file for
each parallel operator instance.
If your operator runs with a parallelism of 8, the output is split across 8
files.

2018-01-22 23:42 GMT+01:00 cw7k <[hidden email]>:

> Hi, I ran the WordCount batch program and noticed the output was split
> into 5 files.Is there documentation on how the splitting is done and how to
> tweak it? On Friday, January 19, 2018, 12:06:45 AM PST, Fabian Hueske <
> [hidden email]> wrote:
>
> Great! Thanks for reporting back.
>
> 2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>:
>
> > Ok, I have the factory working in the WordCount example. I had to move
> > the factory code and META-INF into the WordCount project.
> > For general Flink jobs, I'm assuming that the goal would be to be able to
> > import the factory from the job itself instead of needing to copy the
> > factory .java file into each project? If so, any guidelines on how to do
> > that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k
> > <[hidden email]> wrote:
> >
> > Hi, just a bit more info, I have a test function working using oci://,
> > based on the S3 test:
> > https://github.com/apache/flink/blob/master/flink-
> filesystems/flink-s3-fs-
> > hadoop/src/test/java/org/apache/flink/fs/s3hadoop/
> > HadoopS3FileSystemITCase.java#L169
> > However, when I try to get the WordCount example's WriteAsText to write
> to
> > my new filesystem:
> > https://github.com/apache/flink/blob/master/flink-
> examples/flink-examples-
> > streaming/src/main/java/org/apache/flink/streaming/
> > examples/wordcount/WordCount.java#L82
> >
> > that's where I got the "Could not find a file system implementation"
> error
> > mentioned earlier.
> >
> > On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k
> > <[hidden email]> wrote:
> >
> > Thanks. I now have the 3 requirements fulfilled but the scheme isn't
> > being loaded; I get this error:
> > "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeExc
> eption:
> > Could not find a file system implementation for scheme 'oci'. The scheme
> is
> > not directly supported by Flink and no Hadoop file system to support this
> > scheme could be loaded."
> > What's the best way to debug the loading of the schemes/filesystems by
> the
> > ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian
> > Hueske <[hidden email]> wrote:
> >
> > In fact, there are two S3FileSystemFactory classes, one for Hadoop and
> > another one for Presto.
> > In both cases an external file system class is wrapped in Flink's
> > HadoopFileSystem class [1] [2].
> >
> > Best, Fabian
> >
> > [1]
> > https://github.com/apache/flink/blob/master/flink-
> filesystems/flink-s3-fs-
> > hadoop/src/main/java/org/apache/flink/fs/s3hadoop/
> > S3FileSystemFactory.java#L132
> > [2]
> > https://github.com/apache/flink/blob/master/flink-
> filesystems/flink-s3-fs-
> > presto/src/main/java/org/apache/flink/fs/s3presto/
> > S3FileSystemFactory.java#L131
> >
> > 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>:
> >
> > > Thanks. I'm looking at the s3 example and I can only find the
> > > S3FileSystemFactory but not the File System implementation (subclass
> > > of org.apache.flink.core.fs.FileSystem).
> > > Is that requirement still needed? On Wednesday, January 17, 2018,
> > > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > please have a look at this doc page [1].
> > > It describes how to add new file system implementations and also how to
> > > configure them.
> > >
> > > Best, Fabian
> > >
> > > [1]
> > > https://ci.apache.org/projects/flink/flink-docs-
> > > release-1.4/ops/filesystems.html#adding-new-file-system-
> implementations
> > >
> > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>:
> > >
> > > > Hi, I'm adding support for more cloud storage providers such as
> Google
> > > > (gcs://) and Oracle (oci://).
> > > > I have an oci:// test working based on the s3a:// test but when I try
> > it
> > > > on an actual Flink job like WordCount, I get this message:
> > > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Could
> > > not
> > > > find a file system implementation for scheme 'oci'. The scheme is not
> > > > directly supported by Flink and no Hadoop file system to support this
> > > > scheme could be loaded."
> > > > How do I register new schemes into the file system factory? Thanks.
> > > On
> > > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k
> <[hidden email]
> > >
> > > > wrote:
> > > >
> > > > Hi, question on this page:
> > > > "You need to point Flink to a valid Hadoop configuration..."
> https://ci
> > .
> > > > apache.org/projects/flink/flink-docs-release-1.4/ops/
> > > > deployment/aws.html#s3-simple-storage-service
> > > > How do you point Flink to the Hadoop config?
> > > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> > > > [hidden email]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > the flink-connector-filesystem contains the BucketingSink which is a
> > > > connector with which you can write your data to a file system. It
> > > provides
> > > > exactly once processing guarantees and allows to write data to
> > different
> > > > buckets [1].
> > > >
> > > > The flink-filesystem module contains different file system
> > > implementations
> > > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> > > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto
> > > module.
> > > >
> > > > So if you want to write your data to s3 using the BucketingSink, then
> > you
> > > > have to add flink-connector-filesystem for the BucketingSink as well
> > as a
> > > > s3 file system implementations (e.g. flink-s3-fs-hadoop or
> > > > flink-s3-fs-presto).
> > > >
> > > > Usually, there should be no need to change Flink's filesystem
> > > > implementations. If you want to add a new connector, then this would
> go
> > > to
> > > > flink-connectors or to Apache Bahir [2].
> > > >
> > > > [1]
> > > > https://ci.apache.org/projects/flink/flink-docs-
> master/dev/connectors/
> > > > filesystem_sink.html
> > > >
> > > > [2]
> > > > https://ci.apache.org/projects/flink/flink-docs-
> > > > master/dev/connectors/index.html#connectors-in-apache-bahir
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]>
> wrote:
> > > >
> > > > > Hi, I'm trying to understand the difference between the
> > > flink-filesystem
> > > > > and flink-connector-filesystem. How is each intended to be used?
> > > > > If adding support for a different storage provider that supports
> > HDFS,
> > > > > should additions be made to one or the other, or both? Thanks.
> > > >
> > >
> > >
> >
> >
>