[DISCUSS]Some thoughts about CatalogPartitionSpec

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS]Some thoughts about CatalogPartitionSpec

Jun Zhang
 Hello dev:
     Now I encounter a problem when using the method
"Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)".
     I found that the partitionSpec type in CatalogPartitionSpec is
Map<String, String>,
     This is no problem for hivecatalog, but my subclass of Catalog needs
precise types. For example, if the partition is of int type, passing in
"123" will not work.
     So I think whether the partitionSpec field of Flink's
CatalogPartitionSpec is changed to Map<String, Object> type will be more
reasonable and universal?
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

Jark Wu-2
Hi Jun,

I'm curious why it doesn't work when represented in string?
You can get the field type from the CatalogTable#getSchema(),
then parse/cast the partition value to the type you want.

Best,
Jark


On Tue, 5 Jan 2021 at 13:43, Jun Zhang <[hidden email]> wrote:

>  Hello dev:
>      Now I encounter a problem when using the method
> "Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)".
>      I found that the partitionSpec type in CatalogPartitionSpec is
> Map<String, String>,
>      This is no problem for hivecatalog, but my subclass of Catalog needs
> precise types. For example, if the partition is of int type, passing in
> "123" will not work.
>      So I think whether the partitionSpec field of Flink's
> CatalogPartitionSpec is changed to Map<String, Object> type will be more
> reasonable and universal?
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

Jun Zhang
hi ,Jack:

If the partition type is int and we pass in a string type, the system will
throw an exception that the type does not match. We can indeed cast by get
the schema, but I think if CatalogPartitionSpec#partitionSpec is of type
Map<String, Object>, there is no need to do cast operation, and the
universal and compatibility are better

Jark Wu <[hidden email]> 于2021年1月5日周二 下午1:47写道:

> Hi Jun,
>
> I'm curious why it doesn't work when represented in string?
> You can get the field type from the CatalogTable#getSchema(),
> then parse/cast the partition value to the type you want.
>
> Best,
> Jark
>
>
> On Tue, 5 Jan 2021 at 13:43, Jun Zhang <[hidden email]> wrote:
>
> >  Hello dev:
> >      Now I encounter a problem when using the method
> > "Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)".
> >      I found that the partitionSpec type in CatalogPartitionSpec is
> > Map<String, String>,
> >      This is no problem for hivecatalog, but my subclass of Catalog needs
> > precise types. For example, if the partition is of int type, passing in
> > "123" will not work.
> >      So I think whether the partitionSpec field of Flink's
> > CatalogPartitionSpec is changed to Map<String, Object> type will be more
> > reasonable and universal?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

Jark Wu-2
Hi Jun,

AFAIK, the main reason to use Map<String, String> is because it's easy for
serialization and deserialization.
For example, if we use Java `LocalDateTime` instead of String to represent
TIMESTAMP partition value,
then users may deserialize into Java `Timestamp` to Flink framework, which
may cause problems.

Re: "the system will throw an exception that the type does not match",
could your system store partition values as string type?

Best,
Jark

On Tue, 5 Jan 2021 at 14:09, Jun Zhang <[hidden email]> wrote:

> hi ,Jack:
>
> If the partition type is int and we pass in a string type, the system will
> throw an exception that the type does not match. We can indeed cast by get
> the schema, but I think if CatalogPartitionSpec#partitionSpec is of type
> Map<String, Object>, there is no need to do cast operation, and the
> universal and compatibility are better
>
> Jark Wu <[hidden email]> 于2021年1月5日周二 下午1:47写道:
>
> > Hi Jun,
> >
> > I'm curious why it doesn't work when represented in string?
> > You can get the field type from the CatalogTable#getSchema(),
> > then parse/cast the partition value to the type you want.
> >
> > Best,
> > Jark
> >
> >
> > On Tue, 5 Jan 2021 at 13:43, Jun Zhang <[hidden email]>
> wrote:
> >
> > >  Hello dev:
> > >      Now I encounter a problem when using the method
> > > "Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)".
> > >      I found that the partitionSpec type in CatalogPartitionSpec is
> > > Map<String, String>,
> > >      This is no problem for hivecatalog, but my subclass of Catalog
> needs
> > > precise types. For example, if the partition is of int type, passing in
> > > "123" will not work.
> > >      So I think whether the partitionSpec field of Flink's
> > > CatalogPartitionSpec is changed to Map<String, Object> type will be
> more
> > > reasonable and universal?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

Jun Zhang
hi,jark:
Thanks for your explanation. I am doing the integration of flink and
iceberg. The iceberg partition needs to be of accurate type, and I cannot
modify it.

I will follow what you suggestion, get the column type by  schema, and then
do the cast.

Jark Wu <[hidden email]> 于2021年1月5日周二 下午3:05写道:

> Hi Jun,
>
> AFAIK, the main reason to use Map<String, String> is because it's easy for
> serialization and deserialization.
> For example, if we use Java `LocalDateTime` instead of String to represent
> TIMESTAMP partition value,
> then users may deserialize into Java `Timestamp` to Flink framework, which
> may cause problems.
>
> Re: "the system will throw an exception that the type does not match",
> could your system store partition values as string type?
>
> Best,
> Jark
>
> On Tue, 5 Jan 2021 at 14:09, Jun Zhang <[hidden email]> wrote:
>
> > hi ,Jack:
> >
> > If the partition type is int and we pass in a string type, the system
> will
> > throw an exception that the type does not match. We can indeed cast by
> get
> > the schema, but I think if CatalogPartitionSpec#partitionSpec is of type
> > Map<String, Object>, there is no need to do cast operation, and the
> > universal and compatibility are better
> >
> > Jark Wu <[hidden email]> 于2021年1月5日周二 下午1:47写道:
> >
> > > Hi Jun,
> > >
> > > I'm curious why it doesn't work when represented in string?
> > > You can get the field type from the CatalogTable#getSchema(),
> > > then parse/cast the partition value to the type you want.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Tue, 5 Jan 2021 at 13:43, Jun Zhang <[hidden email]>
> > wrote:
> > >
> > > >  Hello dev:
> > > >      Now I encounter a problem when using the method
> > > > "Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)".
> > > >      I found that the partitionSpec type in CatalogPartitionSpec is
> > > > Map<String, String>,
> > > >      This is no problem for hivecatalog, but my subclass of Catalog
> > needs
> > > > precise types. For example, if the partition is of int type, passing
> in
> > > > "123" will not work.
> > > >      So I think whether the partitionSpec field of Flink's
> > > > CatalogPartitionSpec is changed to Map<String, Object> type will be
> > more
> > > > reasonable and universal?
> > > >
> > >
> >
>