[DISCUSS] Primary keys in Table API

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Primary keys in Table API

dwysakowicz
Hi,

I wanted to bring up the topic of primary key constraints that we could
leverage for runtime optimizations. Please have a look at the proposal
and I would especially draw your attention to the topic of nullability
of columns that are part of a primary key. Some time in the future we
will also be able to leverage it for upserting streaming tables.

Here is the proposal:
https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API

Best,

Dawid



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

Jark Wu-2
Hi Dawid,

Thanks for driving this. Primary key is a very important and useful feature
to Flink Table API / SQL.

I'm +1 to the general design.

And have some thoughts as following:

1. > The design says Flink only support NOT ENFORCED mode. But the DDL
and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
    What will we do when user specify an ENFORCED mode?  Shall we forbid
it?

2. > "When creating a table, creating a primary key constraint will alter
the columns nullability."
    I think we should force the columns to be declared NOT NULL, instead of
change the columns nullability implicitly.
    Otherwise, it may be confused what is the nullbility of the primary key
columns.

3. > "Support for Unique key is not part of the FLIP. It is just mentioned
to show how can we extend the primary key concept with more constraints in
the future."
    I think we can include Unique Key as part of the FLIP, as it is already
stated clearly. We can put the implemenation task of unique key in a lower
priority.

4. > Method for retrieving primary key constraint is in CatalogBaseTable.
    In SQL standard, primary key is declared in line or out of line. If it
is out of line, it is still in the schema part, i.e. besides column
definitions and in the parentheses.
    So I think maybe primary key should belong to `TableSchema` becuase it
is a part of schema.

    CREATE TABLE xxx (
       a int,
       b varchar,
       c bigint,
       primary key (a, b)
    ) with (...)


Best,
Jark


On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[hidden email]>
wrote:

> Hi,
>
> I wanted to bring up the topic of primary key constraints that we could
> leverage for runtime optimizations. Please have a look at the proposal
> and I would especially draw your attention to the topic of nullability
> of columns that are part of a primary key. Some time in the future we
> will also be able to leverage it for upserting streaming tables.
>
> Here is the proposal:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>
> Best,
>
> Dawid
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

dwysakowicz

Hi Jark,

Thank you for your comments.

Ad. 1 My intention was to throw an exception if a user tries to use the ENFORCE mode. SQL standard states this is the default mode, so if we ever want to support the ENFORCE/NOT ENFORCE mode we need to make sure users now always provide the NOT ENFORCE flag. Does it make sense to you? If it does, I will clarify the FLIP on this topic.

Ad. 2 This is the behavior of majority of RDBM systems. I think this is quite useful from the usability perspective. A column is nullable by default. It is also a good practice to always define PRIMARY KEY for a table. Therefore we would force users to add additional annotations. E.g. instead of: CREATE TABLE tableName(id INT PRIMARY KEY); user would always have to type CREATE TABLE tableName(id INT NOT NULL PRIMARY KEY);

I am not totally against it, if this is the preferred way.

Ad. 3 The only reason why I didn't want to add a full UNIQUE support is that we would have to decide how do we handle null for the UNIQUE constraint. Some databases allow multiple rows to have a null in a column with unique constraint, whereas other allow only one. That means if you have a table CREATE TABLE tableName (id INTEGER UNIQUE); In some databases it is legal to have rows (null; null; null; null; ....) while in others you can have at most (null). Do you think we should still discuss this as well?

Ad. 4 I don't have a strong opinion on that. Actually SQL standard describes a Table descriptor. Some excerpts from the standard:

— The name of the base table.
— An indication of whether the table is a regular persistent base table, a system-versioned table, a global
temporary table, a created local temporary table, or a declared local temporary table.

...

— An indication of whether the table is insertable-into or not.

— The column descriptor of each column in the table.

— The descriptor of each table constraint specified for the table.

In this context I thought primary key belongs to the Table rather than schema, where schema is as far as I understand just a list of Table column descriptors. Tbh the TableSchema does not fit very well into that description of the descriptor. If you have a good reasons why you think it would be better to have it in the TableSchema, I would be fine with it. We should though have a clear separation what belongs to the Table schema and what does not (e.g. partitions are in the Catalog table now).

Best,

Dawid

On 20/11/2019 04:32, Jark Wu wrote:
Hi Dawid,

Thanks for driving this. Primary key is a very important and useful feature
to Flink Table API / SQL.

I'm +1 to the general design.

And have some thoughts as following:

1. > The design says Flink only support NOT ENFORCED mode. But the DDL
and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
    What will we do when user specify an ENFORCED mode?  Shall we forbid
it?

2. > "When creating a table, creating a primary key constraint will alter
the columns nullability."
    I think we should force the columns to be declared NOT NULL, instead of
change the columns nullability implicitly.
    Otherwise, it may be confused what is the nullbility of the primary key
columns.

3. > "Support for Unique key is not part of the FLIP. It is just mentioned
to show how can we extend the primary key concept with more constraints in
the future."
    I think we can include Unique Key as part of the FLIP, as it is already
stated clearly. We can put the implemenation task of unique key in a lower
priority.

4. > Method for retrieving primary key constraint is in CatalogBaseTable.
    In SQL standard, primary key is declared in line or out of line. If it
is out of line, it is still in the schema part, i.e. besides column
definitions and in the parentheses.
    So I think maybe primary key should belong to `TableSchema` becuase it
is a part of schema.

    CREATE TABLE xxx (
       a int,
       b varchar,
       c bigint,
       primary key (a, b)
    ) with (...)


Best,
Jark


On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz [hidden email]
wrote:

Hi,

I wanted to bring up the topic of primary key constraints that we could
leverage for runtime optimizations. Please have a look at the proposal
and I would especially draw your attention to the topic of nullability
of columns that are part of a primary key. Some time in the future we
will also be able to leverage it for upserting streaming tables.

Here is the proposal:

https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API

Best,

Dawid




    

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

dwysakowicz
In reply to this post by Jark Wu-2
Hey,

After an offline discussion I updated the FLIP slightly. In the initial
FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
into the statement paranthesis. I also moved the primary key to the
TableSchema class.

I also extended sections about UNIQUE key support and how do the primary
keys affect sources and sinks.

Please have another look. If there are no further concerns, I would like
to start the vote soon.

Best,

Dawid

On 20/11/2019 04:32, Jark Wu wrote:

> Hi Dawid,
>
> Thanks for driving this. Primary key is a very important and useful feature
> to Flink Table API / SQL.
>
> I'm +1 to the general design.
>
> And have some thoughts as following:
>
> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>     What will we do when user specify an ENFORCED mode?  Shall we forbid
> it?
>
> 2. > "When creating a table, creating a primary key constraint will alter
> the columns nullability."
>     I think we should force the columns to be declared NOT NULL, instead of
> change the columns nullability implicitly.
>     Otherwise, it may be confused what is the nullbility of the primary key
> columns.
>
> 3. > "Support for Unique key is not part of the FLIP. It is just mentioned
> to show how can we extend the primary key concept with more constraints in
> the future."
>     I think we can include Unique Key as part of the FLIP, as it is already
> stated clearly. We can put the implemenation task of unique key in a lower
> priority.
>
> 4. > Method for retrieving primary key constraint is in CatalogBaseTable.
>     In SQL standard, primary key is declared in line or out of line. If it
> is out of line, it is still in the schema part, i.e. besides column
> definitions and in the parentheses.
>     So I think maybe primary key should belong to `TableSchema` becuase it
> is a part of schema.
>
>     CREATE TABLE xxx (
>        a int,
>        b varchar,
>        c bigint,
>        primary key (a, b)
>     ) with (...)
>
>
> Best,
> Jark
>
>
> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I wanted to bring up the topic of primary key constraints that we could
>> leverage for runtime optimizations. Please have a look at the proposal
>> and I would especially draw your attention to the topic of nullability
>> of columns that are part of a primary key. Some time in the future we
>> will also be able to leverage it for upserting streaming tables.
>>
>> Here is the proposal:
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>>
>> Best,
>>
>> Dawid
>>
>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

Kurt Young
Hi all,

Thanks Dawid for bringing this up, this is very important property of SQL
and
I'm big +1 to have this.

But before the discussion going deeply, I want to first mention that
according to
SQL standard, any unique key constraint which including primary key, should
be
always be enforced.

Please see:

4.18.3.1 Introduction to table constraints

"NOTE 50 — A unique constraint is always enforced. A table check constraint
or
a referential constraint can either be enforced or not enforced."

Best,
Kurt


On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hey,
>
> After an offline discussion I updated the FLIP slightly. In the initial
> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
> into the statement paranthesis. I also moved the primary key to the
> TableSchema class.
>
> I also extended sections about UNIQUE key support and how do the primary
> keys affect sources and sinks.
>
> Please have another look. If there are no further concerns, I would like
> to start the vote soon.
>
> Best,
>
> Dawid
>
> On 20/11/2019 04:32, Jark Wu wrote:
> > Hi Dawid,
> >
> > Thanks for driving this. Primary key is a very important and useful
> feature
> > to Flink Table API / SQL.
> >
> > I'm +1 to the general design.
> >
> > And have some thoughts as following:
> >
> > 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
> > and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
> >     What will we do when user specify an ENFORCED mode?  Shall we forbid
> > it?
> >
> > 2. > "When creating a table, creating a primary key constraint will alter
> > the columns nullability."
> >     I think we should force the columns to be declared NOT NULL, instead
> of
> > change the columns nullability implicitly.
> >     Otherwise, it may be confused what is the nullbility of the primary
> key
> > columns.
> >
> > 3. > "Support for Unique key is not part of the FLIP. It is just
> mentioned
> > to show how can we extend the primary key concept with more constraints
> in
> > the future."
> >     I think we can include Unique Key as part of the FLIP, as it is
> already
> > stated clearly. We can put the implemenation task of unique key in a
> lower
> > priority.
> >
> > 4. > Method for retrieving primary key constraint is in CatalogBaseTable.
> >     In SQL standard, primary key is declared in line or out of line. If
> it
> > is out of line, it is still in the schema part, i.e. besides column
> > definitions and in the parentheses.
> >     So I think maybe primary key should belong to `TableSchema` becuase
> it
> > is a part of schema.
> >
> >     CREATE TABLE xxx (
> >        a int,
> >        b varchar,
> >        c bigint,
> >        primary key (a, b)
> >     ) with (...)
> >
> >
> > Best,
> > Jark
> >
> >
> > On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> I wanted to bring up the topic of primary key constraints that we could
> >> leverage for runtime optimizations. Please have a look at the proposal
> >> and I would especially draw your attention to the topic of nullability
> >> of columns that are part of a primary key. Some time in the future we
> >> will also be able to leverage it for upserting streaming tables.
> >>
> >> Here is the proposal:
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

dwysakowicz
Hi Kurt,

You are right that this is included in the SQL standard.

The problem is we are not a database and we do not own the data. There
is no way for us to truly enforce the uniqueness constraint. We are
always writing to some external systems that, moreover we are not an
exclusive produce/consumer for that data. We cannot ensure that external
systems will not corrupt the data.

The way RDBM systems ensure uniqueness constraint is that they keep the
index of all keys for a table that they check against for every write.
This works only if the system that maintains the index is the only
writer to that table.

Moreover some of the systems also allow loosening this restriction e.g.

DB2:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html

HIVE(with a non standard syntax, they do it for the same reasons as
listed above):
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints


What do you think?

Best,

Dawid


On 20/11/2019 13:51, Kurt Young wrote:

> Hi all,
>
> Thanks Dawid for bringing this up, this is very important property of SQL
> and
> I'm big +1 to have this.
>
> But before the discussion going deeply, I want to first mention that
> according to
> SQL standard, any unique key constraint which including primary key, should
> be
> always be enforced.
>
> Please see:
>
> 4.18.3.1 Introduction to table constraints
>
> "NOTE 50 — A unique constraint is always enforced. A table check constraint
> or
> a referential constraint can either be enforced or not enforced."
>
> Best,
> Kurt
>
>
> On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hey,
>>
>> After an offline discussion I updated the FLIP slightly. In the initial
>> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
>> into the statement paranthesis. I also moved the primary key to the
>> TableSchema class.
>>
>> I also extended sections about UNIQUE key support and how do the primary
>> keys affect sources and sinks.
>>
>> Please have another look. If there are no further concerns, I would like
>> to start the vote soon.
>>
>> Best,
>>
>> Dawid
>>
>> On 20/11/2019 04:32, Jark Wu wrote:
>>> Hi Dawid,
>>>
>>> Thanks for driving this. Primary key is a very important and useful
>> feature
>>> to Flink Table API / SQL.
>>>
>>> I'm +1 to the general design.
>>>
>>> And have some thoughts as following:
>>>
>>> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
>>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>>>     What will we do when user specify an ENFORCED mode?  Shall we forbid
>>> it?
>>>
>>> 2. > "When creating a table, creating a primary key constraint will alter
>>> the columns nullability."
>>>     I think we should force the columns to be declared NOT NULL, instead
>> of
>>> change the columns nullability implicitly.
>>>     Otherwise, it may be confused what is the nullbility of the primary
>> key
>>> columns.
>>>
>>> 3. > "Support for Unique key is not part of the FLIP. It is just
>> mentioned
>>> to show how can we extend the primary key concept with more constraints
>> in
>>> the future."
>>>     I think we can include Unique Key as part of the FLIP, as it is
>> already
>>> stated clearly. We can put the implemenation task of unique key in a
>> lower
>>> priority.
>>>
>>> 4. > Method for retrieving primary key constraint is in CatalogBaseTable.
>>>     In SQL standard, primary key is declared in line or out of line. If
>> it
>>> is out of line, it is still in the schema part, i.e. besides column
>>> definitions and in the parentheses.
>>>     So I think maybe primary key should belong to `TableSchema` becuase
>> it
>>> is a part of schema.
>>>
>>>     CREATE TABLE xxx (
>>>        a int,
>>>        b varchar,
>>>        c bigint,
>>>        primary key (a, b)
>>>     ) with (...)
>>>
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wanted to bring up the topic of primary key constraints that we could
>>>> leverage for runtime optimizations. Please have a look at the proposal
>>>> and I would especially draw your attention to the topic of nullability
>>>> of columns that are part of a primary key. Some time in the future we
>>>> will also be able to leverage it for upserting streaming tables.
>>>>
>>>> Here is the proposal:
>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

Kurt Young
Hi Dawid,

Actually I'm not 100% sure about both choices: always enforce primary key
by Flink and not enforce at all.

As you said, always enforce primary key is hard to achieve since Flink
doesn't
own data most of the time. But with some certain cases, it's actually
doable, like
writing a new table in a batch job, or if we have upsert flag supported in
message
queue. So it's hard to say Flink has no chance to enforce primary key here.

On the other hand, not enforce primary key indeed save us a lot of work,
but also
would cause confusing sometimes. Especially when the table is written and
read
by Flink only. For example, some user created a table with primary key not
enforced,
and they will use it as source and sink table. This situation is quite
confusing, because
Flink didn't enforce primary key when writing, so it's hard to say we can
rely on such
information during read. It looks we are contradictory with ourself.

AFAIK, most databases will only do validity check only when writing to
table, and
simply trust such information during read. I think we could also follow
this way. Since
we are not sure yet what kind of behavior when writing to a table with
primary key, I would
suggest to defer this util we start to talk about the source and sink
behavior with primary
key, maybe as well as introducing update, delete flag.

To summarize, my proposal would be:
1. Add primary key but not introducing enforce flag now
2. Table with primary key can't be used as sink, and we will simply trust
pk info during read.

Best,
Kurt


On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi Kurt,
>
> You are right that this is included in the SQL standard.
>
> The problem is we are not a database and we do not own the data. There
> is no way for us to truly enforce the uniqueness constraint. We are
> always writing to some external systems that, moreover we are not an
> exclusive produce/consumer for that data. We cannot ensure that external
> systems will not corrupt the data.
>
> The way RDBM systems ensure uniqueness constraint is that they keep the
> index of all keys for a table that they check against for every write.
> This works only if the system that maintains the index is the only
> writer to that table.
>
> Moreover some of the systems also allow loosening this restriction e.g.
>
> DB2:
>
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>
> HIVE(with a non standard syntax, they do it for the same reasons as
> listed above):
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>
>
> What do you think?
>
> Best,
>
> Dawid
>
>
> On 20/11/2019 13:51, Kurt Young wrote:
> > Hi all,
> >
> > Thanks Dawid for bringing this up, this is very important property of SQL
> > and
> > I'm big +1 to have this.
> >
> > But before the discussion going deeply, I want to first mention that
> > according to
> > SQL standard, any unique key constraint which including primary key,
> should
> > be
> > always be enforced.
> >
> > Please see:
> >
> > 4.18.3.1 Introduction to table constraints
> >
> > "NOTE 50 — A unique constraint is always enforced. A table check
> constraint
> > or
> > a referential constraint can either be enforced or not enforced."
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <[hidden email]
> >
> > wrote:
> >
> >> Hey,
> >>
> >> After an offline discussion I updated the FLIP slightly. In the initial
> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
> >> into the statement paranthesis. I also moved the primary key to the
> >> TableSchema class.
> >>
> >> I also extended sections about UNIQUE key support and how do the primary
> >> keys affect sources and sinks.
> >>
> >> Please have another look. If there are no further concerns, I would like
> >> to start the vote soon.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 20/11/2019 04:32, Jark Wu wrote:
> >>> Hi Dawid,
> >>>
> >>> Thanks for driving this. Primary key is a very important and useful
> >> feature
> >>> to Flink Table API / SQL.
> >>>
> >>> I'm +1 to the general design.
> >>>
> >>> And have some thoughts as following:
> >>>
> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
> >>>     What will we do when user specify an ENFORCED mode?  Shall we
> forbid
> >>> it?
> >>>
> >>> 2. > "When creating a table, creating a primary key constraint will
> alter
> >>> the columns nullability."
> >>>     I think we should force the columns to be declared NOT NULL,
> instead
> >> of
> >>> change the columns nullability implicitly.
> >>>     Otherwise, it may be confused what is the nullbility of the primary
> >> key
> >>> columns.
> >>>
> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
> >> mentioned
> >>> to show how can we extend the primary key concept with more constraints
> >> in
> >>> the future."
> >>>     I think we can include Unique Key as part of the FLIP, as it is
> >> already
> >>> stated clearly. We can put the implemenation task of unique key in a
> >> lower
> >>> priority.
> >>>
> >>> 4. > Method for retrieving primary key constraint is in
> CatalogBaseTable.
> >>>     In SQL standard, primary key is declared in line or out of line. If
> >> it
> >>> is out of line, it is still in the schema part, i.e. besides column
> >>> definitions and in the parentheses.
> >>>     So I think maybe primary key should belong to `TableSchema` becuase
> >> it
> >>> is a part of schema.
> >>>
> >>>     CREATE TABLE xxx (
> >>>        a int,
> >>>        b varchar,
> >>>        c bigint,
> >>>        primary key (a, b)
> >>>     ) with (...)
> >>>
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I wanted to bring up the topic of primary key constraints that we
> could
> >>>> leverage for runtime optimizations. Please have a look at the proposal
> >>>> and I would especially draw your attention to the topic of nullability
> >>>> of columns that are part of a primary key. Some time in the future we
> >>>> will also be able to leverage it for upserting streaming tables.
> >>>>
> >>>> Here is the proposal:
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>>
> >>>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

Kurt Young
Hi all,

After some offline discussion with Dawid, I changed my mind. We still agree
on that
we should only support reading from tables with primary key, forbid writing
to such table.

But the enforce flag should be introduced with primary key, and probably
should be
`NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
when
creating a table with primary key constraint. If we follow the SQL standard
here, we should
do the validity check during the creating operation, and this will
introduce a really big
burden here and might make supporting this feature not feasible.

Best,
Kurt


On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[hidden email]> wrote:

> Hi Dawid,
>
> Actually I'm not 100% sure about both choices: always enforce primary key
> by Flink and not enforce at all.
>
> As you said, always enforce primary key is hard to achieve since Flink
> doesn't
> own data most of the time. But with some certain cases, it's actually
> doable, like
> writing a new table in a batch job, or if we have upsert flag supported in
> message
> queue. So it's hard to say Flink has no chance to enforce primary key
> here.
>
> On the other hand, not enforce primary key indeed save us a lot of work,
> but also
> would cause confusing sometimes. Especially when the table is written and
> read
> by Flink only. For example, some user created a table with primary key not
> enforced,
> and they will use it as source and sink table. This situation is quite
> confusing, because
> Flink didn't enforce primary key when writing, so it's hard to say we can
> rely on such
> information during read. It looks we are contradictory with ourself.
>
> AFAIK, most databases will only do validity check only when writing to
> table, and
> simply trust such information during read. I think we could also follow
> this way. Since
> we are not sure yet what kind of behavior when writing to a table with
> primary key, I would
> suggest to defer this util we start to talk about the source and sink
> behavior with primary
> key, maybe as well as introducing update, delete flag.
>
> To summarize, my proposal would be:
> 1. Add primary key but not introducing enforce flag now
> 2. Table with primary key can't be used as sink, and we will simply trust
> pk info during read.
>
> Best,
> Kurt
>
>
> On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi Kurt,
>>
>> You are right that this is included in the SQL standard.
>>
>> The problem is we are not a database and we do not own the data. There
>> is no way for us to truly enforce the uniqueness constraint. We are
>> always writing to some external systems that, moreover we are not an
>> exclusive produce/consumer for that data. We cannot ensure that external
>> systems will not corrupt the data.
>>
>> The way RDBM systems ensure uniqueness constraint is that they keep the
>> index of all keys for a table that they check against for every write.
>> This works only if the system that maintains the index is the only
>> writer to that table.
>>
>> Moreover some of the systems also allow loosening this restriction e.g.
>>
>> DB2:
>>
>> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>>
>> HIVE(with a non standard syntax, they do it for the same reasons as
>> listed above):
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>>
>>
>> What do you think?
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 20/11/2019 13:51, Kurt Young wrote:
>> > Hi all,
>> >
>> > Thanks Dawid for bringing this up, this is very important property of
>> SQL
>> > and
>> > I'm big +1 to have this.
>> >
>> > But before the discussion going deeply, I want to first mention that
>> > according to
>> > SQL standard, any unique key constraint which including primary key,
>> should
>> > be
>> > always be enforced.
>> >
>> > Please see:
>> >
>> > 4.18.3.1 Introduction to table constraints
>> >
>> > "NOTE 50 — A unique constraint is always enforced. A table check
>> constraint
>> > or
>> > a referential constraint can either be enforced or not enforced."
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
>> [hidden email]>
>> > wrote:
>> >
>> >> Hey,
>> >>
>> >> After an offline discussion I updated the FLIP slightly. In the initial
>> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
>> >> into the statement paranthesis. I also moved the primary key to the
>> >> TableSchema class.
>> >>
>> >> I also extended sections about UNIQUE key support and how do the
>> primary
>> >> keys affect sources and sinks.
>> >>
>> >> Please have another look. If there are no further concerns, I would
>> like
>> >> to start the vote soon.
>> >>
>> >> Best,
>> >>
>> >> Dawid
>> >>
>> >> On 20/11/2019 04:32, Jark Wu wrote:
>> >>> Hi Dawid,
>> >>>
>> >>> Thanks for driving this. Primary key is a very important and useful
>> >> feature
>> >>> to Flink Table API / SQL.
>> >>>
>> >>> I'm +1 to the general design.
>> >>>
>> >>> And have some thoughts as following:
>> >>>
>> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
>> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>> >>>     What will we do when user specify an ENFORCED mode?  Shall we
>> forbid
>> >>> it?
>> >>>
>> >>> 2. > "When creating a table, creating a primary key constraint will
>> alter
>> >>> the columns nullability."
>> >>>     I think we should force the columns to be declared NOT NULL,
>> instead
>> >> of
>> >>> change the columns nullability implicitly.
>> >>>     Otherwise, it may be confused what is the nullbility of the
>> primary
>> >> key
>> >>> columns.
>> >>>
>> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
>> >> mentioned
>> >>> to show how can we extend the primary key concept with more
>> constraints
>> >> in
>> >>> the future."
>> >>>     I think we can include Unique Key as part of the FLIP, as it is
>> >> already
>> >>> stated clearly. We can put the implemenation task of unique key in a
>> >> lower
>> >>> priority.
>> >>>
>> >>> 4. > Method for retrieving primary key constraint is in
>> CatalogBaseTable.
>> >>>     In SQL standard, primary key is declared in line or out of line.
>> If
>> >> it
>> >>> is out of line, it is still in the schema part, i.e. besides column
>> >>> definitions and in the parentheses.
>> >>>     So I think maybe primary key should belong to `TableSchema`
>> becuase
>> >> it
>> >>> is a part of schema.
>> >>>
>> >>>     CREATE TABLE xxx (
>> >>>        a int,
>> >>>        b varchar,
>> >>>        c bigint,
>> >>>        primary key (a, b)
>> >>>     ) with (...)
>> >>>
>> >>>
>> >>> Best,
>> >>> Jark
>> >>>
>> >>>
>> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
>> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I wanted to bring up the topic of primary key constraints that we
>> could
>> >>>> leverage for runtime optimizations. Please have a look at the
>> proposal
>> >>>> and I would especially draw your attention to the topic of
>> nullability
>> >>>> of columns that are part of a primary key. Some time in the future we
>> >>>> will also be able to leverage it for upserting streaming tables.
>> >>>>
>> >>>> Here is the proposal:
>> >>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>> >>>> Best,
>> >>>>
>> >>>> Dawid
>> >>>>
>> >>>>
>> >>>>
>> >>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

Jark Wu-2
Hi Dawid,

I don't have other concerns now. Thanks for the explanation.

Best,
Jark

On Thu, 21 Nov 2019 at 18:13, Kurt Young <[hidden email]> wrote:

> Hi all,
>
> After some offline discussion with Dawid, I changed my mind. We still agree
> on that
> we should only support reading from tables with primary key, forbid writing
> to such table.
>
> But the enforce flag should be introduced with primary key, and probably
> should be
> `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
> when
> creating a table with primary key constraint. If we follow the SQL standard
> here, we should
> do the validity check during the creating operation, and this will
> introduce a really big
> burden here and might make supporting this feature not feasible.
>
> Best,
> Kurt
>
>
> On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[hidden email]> wrote:
>
> > Hi Dawid,
> >
> > Actually I'm not 100% sure about both choices: always enforce primary key
> > by Flink and not enforce at all.
> >
> > As you said, always enforce primary key is hard to achieve since Flink
> > doesn't
> > own data most of the time. But with some certain cases, it's actually
> > doable, like
> > writing a new table in a batch job, or if we have upsert flag supported
> in
> > message
> > queue. So it's hard to say Flink has no chance to enforce primary key
> > here.
> >
> > On the other hand, not enforce primary key indeed save us a lot of work,
> > but also
> > would cause confusing sometimes. Especially when the table is written and
> > read
> > by Flink only. For example, some user created a table with primary key
> not
> > enforced,
> > and they will use it as source and sink table. This situation is quite
> > confusing, because
> > Flink didn't enforce primary key when writing, so it's hard to say we can
> > rely on such
> > information during read. It looks we are contradictory with ourself.
> >
> > AFAIK, most databases will only do validity check only when writing to
> > table, and
> > simply trust such information during read. I think we could also follow
> > this way. Since
> > we are not sure yet what kind of behavior when writing to a table with
> > primary key, I would
> > suggest to defer this util we start to talk about the source and sink
> > behavior with primary
> > key, maybe as well as introducing update, delete flag.
> >
> > To summarize, my proposal would be:
> > 1. Add primary key but not introducing enforce flag now
> > 2. Table with primary key can't be used as sink, and we will simply trust
> > pk info during read.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[hidden email]
> >
> > wrote:
> >
> >> Hi Kurt,
> >>
> >> You are right that this is included in the SQL standard.
> >>
> >> The problem is we are not a database and we do not own the data. There
> >> is no way for us to truly enforce the uniqueness constraint. We are
> >> always writing to some external systems that, moreover we are not an
> >> exclusive produce/consumer for that data. We cannot ensure that external
> >> systems will not corrupt the data.
> >>
> >> The way RDBM systems ensure uniqueness constraint is that they keep the
> >> index of all keys for a table that they check against for every write.
> >> This works only if the system that maintains the index is the only
> >> writer to that table.
> >>
> >> Moreover some of the systems also allow loosening this restriction e.g.
> >>
> >> DB2:
> >>
> >>
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
> >>
> >> HIVE(with a non standard syntax, they do it for the same reasons as
> >> listed above):
> >>
> >>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
> >>
> >>
> >> What do you think?
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >>
> >> On 20/11/2019 13:51, Kurt Young wrote:
> >> > Hi all,
> >> >
> >> > Thanks Dawid for bringing this up, this is very important property of
> >> SQL
> >> > and
> >> > I'm big +1 to have this.
> >> >
> >> > But before the discussion going deeply, I want to first mention that
> >> > according to
> >> > SQL standard, any unique key constraint which including primary key,
> >> should
> >> > be
> >> > always be enforced.
> >> >
> >> > Please see:
> >> >
> >> > 4.18.3.1 Introduction to table constraints
> >> >
> >> > "NOTE 50 — A unique constraint is always enforced. A table check
> >> constraint
> >> > or
> >> > a referential constraint can either be enforced or not enforced."
> >> >
> >> > Best,
> >> > Kurt
> >> >
> >> >
> >> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> >> Hey,
> >> >>
> >> >> After an offline discussion I updated the FLIP slightly. In the
> initial
> >> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
> >> >> into the statement paranthesis. I also moved the primary key to the
> >> >> TableSchema class.
> >> >>
> >> >> I also extended sections about UNIQUE key support and how do the
> >> primary
> >> >> keys affect sources and sinks.
> >> >>
> >> >> Please have another look. If there are no further concerns, I would
> >> like
> >> >> to start the vote soon.
> >> >>
> >> >> Best,
> >> >>
> >> >> Dawid
> >> >>
> >> >> On 20/11/2019 04:32, Jark Wu wrote:
> >> >>> Hi Dawid,
> >> >>>
> >> >>> Thanks for driving this. Primary key is a very important and useful
> >> >> feature
> >> >>> to Flink Table API / SQL.
> >> >>>
> >> >>> I'm +1 to the general design.
> >> >>>
> >> >>> And have some thoughts as following:
> >> >>>
> >> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the
> DDL
> >> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
> >> >>>     What will we do when user specify an ENFORCED mode?  Shall we
> >> forbid
> >> >>> it?
> >> >>>
> >> >>> 2. > "When creating a table, creating a primary key constraint will
> >> alter
> >> >>> the columns nullability."
> >> >>>     I think we should force the columns to be declared NOT NULL,
> >> instead
> >> >> of
> >> >>> change the columns nullability implicitly.
> >> >>>     Otherwise, it may be confused what is the nullbility of the
> >> primary
> >> >> key
> >> >>> columns.
> >> >>>
> >> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
> >> >> mentioned
> >> >>> to show how can we extend the primary key concept with more
> >> constraints
> >> >> in
> >> >>> the future."
> >> >>>     I think we can include Unique Key as part of the FLIP, as it is
> >> >> already
> >> >>> stated clearly. We can put the implemenation task of unique key in a
> >> >> lower
> >> >>> priority.
> >> >>>
> >> >>> 4. > Method for retrieving primary key constraint is in
> >> CatalogBaseTable.
> >> >>>     In SQL standard, primary key is declared in line or out of line.
> >> If
> >> >> it
> >> >>> is out of line, it is still in the schema part, i.e. besides column
> >> >>> definitions and in the parentheses.
> >> >>>     So I think maybe primary key should belong to `TableSchema`
> >> becuase
> >> >> it
> >> >>> is a part of schema.
> >> >>>
> >> >>>     CREATE TABLE xxx (
> >> >>>        a int,
> >> >>>        b varchar,
> >> >>>        c bigint,
> >> >>>        primary key (a, b)
> >> >>>     ) with (...)
> >> >>>
> >> >>>
> >> >>> Best,
> >> >>> Jark
> >> >>>
> >> >>>
> >> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
> >> [hidden email]>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I wanted to bring up the topic of primary key constraints that we
> >> could
> >> >>>> leverage for runtime optimizations. Please have a look at the
> >> proposal
> >> >>>> and I would especially draw your attention to the topic of
> >> nullability
> >> >>>> of columns that are part of a primary key. Some time in the future
> we
> >> >>>> will also be able to leverage it for upserting streaming tables.
> >> >>>>
> >> >>>> Here is the proposal:
> >> >>>>
> >> >>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
> >> >>>> Best,
> >> >>>>
> >> >>>> Dawid
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Primary keys in Table API

dwysakowicz
Thank you for your comments!

I will start the vote then.

Best,

Dawid

On 22/11/2019 07:30, Jark Wu wrote:

> Hi Dawid,
>
> I don't have other concerns now. Thanks for the explanation.
>
> Best,
> Jark
>
> On Thu, 21 Nov 2019 at 18:13, Kurt Young <[hidden email]> wrote:
>
>> Hi all,
>>
>> After some offline discussion with Dawid, I changed my mind. We still agree
>> on that
>> we should only support reading from tables with primary key, forbid writing
>> to such table.
>>
>> But the enforce flag should be introduced with primary key, and probably
>> should be
>> `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
>> when
>> creating a table with primary key constraint. If we follow the SQL standard
>> here, we should
>> do the validity check during the creating operation, and this will
>> introduce a really big
>> burden here and might make supporting this feature not feasible.
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[hidden email]> wrote:
>>
>>> Hi Dawid,
>>>
>>> Actually I'm not 100% sure about both choices: always enforce primary key
>>> by Flink and not enforce at all.
>>>
>>> As you said, always enforce primary key is hard to achieve since Flink
>>> doesn't
>>> own data most of the time. But with some certain cases, it's actually
>>> doable, like
>>> writing a new table in a batch job, or if we have upsert flag supported
>> in
>>> message
>>> queue. So it's hard to say Flink has no chance to enforce primary key
>>> here.
>>>
>>> On the other hand, not enforce primary key indeed save us a lot of work,
>>> but also
>>> would cause confusing sometimes. Especially when the table is written and
>>> read
>>> by Flink only. For example, some user created a table with primary key
>> not
>>> enforced,
>>> and they will use it as source and sink table. This situation is quite
>>> confusing, because
>>> Flink didn't enforce primary key when writing, so it's hard to say we can
>>> rely on such
>>> information during read. It looks we are contradictory with ourself.
>>>
>>> AFAIK, most databases will only do validity check only when writing to
>>> table, and
>>> simply trust such information during read. I think we could also follow
>>> this way. Since
>>> we are not sure yet what kind of behavior when writing to a table with
>>> primary key, I would
>>> suggest to defer this util we start to talk about the source and sink
>>> behavior with primary
>>> key, maybe as well as introducing update, delete flag.
>>>
>>> To summarize, my proposal would be:
>>> 1. Add primary key but not introducing enforce flag now
>>> 2. Table with primary key can't be used as sink, and we will simply trust
>>> pk info during read.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[hidden email]
>>>
>>> wrote:
>>>
>>>> Hi Kurt,
>>>>
>>>> You are right that this is included in the SQL standard.
>>>>
>>>> The problem is we are not a database and we do not own the data. There
>>>> is no way for us to truly enforce the uniqueness constraint. We are
>>>> always writing to some external systems that, moreover we are not an
>>>> exclusive produce/consumer for that data. We cannot ensure that external
>>>> systems will not corrupt the data.
>>>>
>>>> The way RDBM systems ensure uniqueness constraint is that they keep the
>>>> index of all keys for a table that they check against for every write.
>>>> This works only if the system that maintains the index is the only
>>>> writer to that table.
>>>>
>>>> Moreover some of the systems also allow loosening this restriction e.g.
>>>>
>>>> DB2:
>>>>
>>>>
>> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>>>> HIVE(with a non standard syntax, they do it for the same reasons as
>>>> listed above):
>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 20/11/2019 13:51, Kurt Young wrote:
>>>>> Hi all,
>>>>>
>>>>> Thanks Dawid for bringing this up, this is very important property of
>>>> SQL
>>>>> and
>>>>> I'm big +1 to have this.
>>>>>
>>>>> But before the discussion going deeply, I want to first mention that
>>>>> according to
>>>>> SQL standard, any unique key constraint which including primary key,
>>>> should
>>>>> be
>>>>> always be enforced.
>>>>>
>>>>> Please see:
>>>>>
>>>>> 4.18.3.1 Introduction to table constraints
>>>>>
>>>>> "NOTE 50 — A unique constraint is always enforced. A table check
>>>> constraint
>>>>> or
>>>>> a referential constraint can either be enforced or not enforced."
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
>>>> [hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> After an offline discussion I updated the FLIP slightly. In the
>> initial
>>>>>> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
>>>>>> into the statement paranthesis. I also moved the primary key to the
>>>>>> TableSchema class.
>>>>>>
>>>>>> I also extended sections about UNIQUE key support and how do the
>>>> primary
>>>>>> keys affect sources and sinks.
>>>>>>
>>>>>> Please have another look. If there are no further concerns, I would
>>>> like
>>>>>> to start the vote soon.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> On 20/11/2019 04:32, Jark Wu wrote:
>>>>>>> Hi Dawid,
>>>>>>>
>>>>>>> Thanks for driving this. Primary key is a very important and useful
>>>>>> feature
>>>>>>> to Flink Table API / SQL.
>>>>>>>
>>>>>>> I'm +1 to the general design.
>>>>>>>
>>>>>>> And have some thoughts as following:
>>>>>>>
>>>>>>> 1. > The design says Flink only support NOT ENFORCED mode. But the
>> DDL
>>>>>>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>>>>>>>     What will we do when user specify an ENFORCED mode?  Shall we
>>>> forbid
>>>>>>> it?
>>>>>>>
>>>>>>> 2. > "When creating a table, creating a primary key constraint will
>>>> alter
>>>>>>> the columns nullability."
>>>>>>>     I think we should force the columns to be declared NOT NULL,
>>>> instead
>>>>>> of
>>>>>>> change the columns nullability implicitly.
>>>>>>>     Otherwise, it may be confused what is the nullbility of the
>>>> primary
>>>>>> key
>>>>>>> columns.
>>>>>>>
>>>>>>> 3. > "Support for Unique key is not part of the FLIP. It is just
>>>>>> mentioned
>>>>>>> to show how can we extend the primary key concept with more
>>>> constraints
>>>>>> in
>>>>>>> the future."
>>>>>>>     I think we can include Unique Key as part of the FLIP, as it is
>>>>>> already
>>>>>>> stated clearly. We can put the implemenation task of unique key in a
>>>>>> lower
>>>>>>> priority.
>>>>>>>
>>>>>>> 4. > Method for retrieving primary key constraint is in
>>>> CatalogBaseTable.
>>>>>>>     In SQL standard, primary key is declared in line or out of line.
>>>> If
>>>>>> it
>>>>>>> is out of line, it is still in the schema part, i.e. besides column
>>>>>>> definitions and in the parentheses.
>>>>>>>     So I think maybe primary key should belong to `TableSchema`
>>>> becuase
>>>>>> it
>>>>>>> is a part of schema.
>>>>>>>
>>>>>>>     CREATE TABLE xxx (
>>>>>>>        a int,
>>>>>>>        b varchar,
>>>>>>>        c bigint,
>>>>>>>        primary key (a, b)
>>>>>>>     ) with (...)
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I wanted to bring up the topic of primary key constraints that we
>>>> could
>>>>>>>> leverage for runtime optimizations. Please have a look at the
>>>> proposal
>>>>>>>> and I would especially draw your attention to the topic of
>>>> nullability
>>>>>>>> of columns that are part of a primary key. Some time in the future
>> we
>>>>>>>> will also be able to leverage it for upserting streaming tables.
>>>>>>>>
>>>>>>>> Here is the proposal:
>>>>>>>>
>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Dawid
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>


signature.asc (849 bytes) Download Attachment