Reading parquet from S3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading parquet from S3

Kirill Polunin
Hi,
I'm writing an article about Apache Flink. I wanted to ask, does Apache
Flink supports reading Parquet files in Batching mode? Our team is going to
read parquet files from S3. I didn't find such page in flink's tutorials.
I will be grateful if you help, thank you


--

[image: Brandmark_small.jpg]

Kirill Polunin, BigData Engineer

Grid Dynamics

Szczytnicka 11, Wroclaw, Poland

Dir: (+48) 535-073-641 | skype: imbamrgrey

Read Grid Dynamics' Tech Blog
<http://blog.griddynamics.com/?utm_campaign=Big%20Data%20Blog%20social%20media%20promotion&utm_medium=CTA&utm_source=Email>

This email message (and any attachments) is confidential and may be
privileged or otherwise protected from disclosure by applicable law. If you
are not the intended recipient or have received this in error please notify
the system manager, [hidden email] and remove this message and
any attachments from your system. Any unauthorized dissemination, copying
or other use of this message and/or any attachments is strictly prohibited
and may constitute a breach of civil or criminal law. Grid Dynamics may
monitor email traffic data and also the content of email.
Reply | Threaded
Open this post in threaded view
|

Re: Reading parquet from S3

Jingsong Li
Hi Kirill,

For DataSet API, yes, we have "ParquetRowInputFormat",
"ParquetPojoInputFormat" and "ParquetMapInputFormat" [1].

For file system parquet files in table:
- Before 1.11, you can register a ParquetTableSource to batch table
environment with legacy planner.
- In 1.11, you can create file system table with parquet format, and
reading from SQL.

For hive in table, we support reading table with parquet format.

[1]
https://github.com/apache/flink/tree/38e5e8161a9c763cf7df3b642830b5a97371bb00/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet

Best,
Jingsong Lee

On Wed, May 13, 2020 at 7:26 PM Kirill Polunin <[hidden email]>
wrote:

> Hi,
> I'm writing an article about Apache Flink. I wanted to ask, does Apache
> Flink supports reading Parquet files in Batching mode? Our team is going to
> read parquet files from S3. I didn't find such page in flink's tutorials.
> I will be grateful if you help, thank you
>
>
> --
>
> [image: Brandmark_small.jpg]
>
> Kirill Polunin, BigData Engineer
>
> Grid Dynamics
>
> Szczytnicka 11, Wroclaw, Poland
>
> Dir: (+48) 535-073-641 | skype: imbamrgrey
>
> Read Grid Dynamics' Tech Blog
> <
> http://blog.griddynamics.com/?utm_campaign=Big%20Data%20Blog%20social%20media%20promotion&utm_medium=CTA&utm_source=Email
> >
>
> This email message (and any attachments) is confidential and may be
> privileged or otherwise protected from disclosure by applicable law. If you
> are not the intended recipient or have received this in error please notify
> the system manager, [hidden email] and remove this message and
> any attachments from your system. Any unauthorized dissemination, copying
> or other use of this message and/or any attachments is strictly prohibited
> and may constitute a breach of civil or criminal law. Grid Dynamics may
> monitor email traffic data and also the content of email.
>


--
Best, Jingsong Lee