Hi dev,
I'd like to kick off a discussion on a mechanism to validate the precision of columns for some connectors. We come to an agreement that the user should be informed if the connector does not support the desired precision. And from the connector developer's view, there are 3-levels information to be considered: - the ability of external systems (e.g. Apache Derby support TIMESTAMP(9), Mysql support TIMESTAMP(6), etc) Connector developers should use this information to validate user's DDL and make sure throw an exception if concrete column is out of range. - schema of referenced tables in external systems If the schema information of referenced tables is available in Compile Time, connector developers could use it to find the mismatch between DDL. But in most cases, the schema information is unavailable because of network isolation or authority management. We should use it with caution. - schema-less external systems (e.g. HBase) If the external systems is schema-less like HBase, the connector developer should make sure the connector doesn't cause precision loss (e.g. flink-hbase serializes java.sql.Timestamp to long in bytes which only keep millisecond's precision.) To make it more specific, some scenarios of JDBC Connector are list as following: - The underlying DB supports DECIMAL(65, 30), which is out of the range of Flink's Decimal - The underlying DB supports TIMESTAMP(6), and user want to define a table with TIMESTAMP(9) in Flink - User defines a table with DECIMAL(10, 4) in underlying DB, and want to define a table with DECIMAL(5, 2) in Flink - The precision of the underlying DB varies between different versions What do you think about this? any feedback are appreciates. *Best Regards,* *Zhenghua Gao* |
Hi Zhenghua,
I think it's not just about precision of type. Connectors not validate the types either. Now there is "SchemaValidator", this validator is just used to validate type properties. But not for connector type support. I think we can have something like "DataTypeValidator" to help connectors validating their type support. Consider current validator design, validator is called by connector itself. it's more like a util class than a mechanism. Best, Jingsong Lee On Fri, Jan 10, 2020 at 11:47 AM Zhenghua Gao <[hidden email]> wrote: > Hi dev, > > I'd like to kick off a discussion on a mechanism to validate the precision > of columns for some connectors. > > We come to an agreement that the user should be informed if the connector > does not support the desired precision. And from the connector developer's > view, there are 3-levels information to be considered: > > - the ability of external systems (e.g. Apache Derby support > TIMESTAMP(9), Mysql support TIMESTAMP(6), etc) > > Connector developers should use this information to validate user's DDL and > make sure throw an exception if concrete column is out of range. > > > - schema of referenced tables in external systems > > If the schema information of referenced tables is available in Compile > Time, connector developers could use it to find the mismatch between DDL. > But in most cases, the schema information is unavailable because of network > isolation or authority management. We should use it with caution. > > > - schema-less external systems (e.g. HBase) > > If the external systems is schema-less like HBase, the connector developer > should make sure the connector doesn't cause precision loss (e.g. > flink-hbase serializes java.sql.Timestamp to long in bytes which only keep > millisecond's precision.) > > To make it more specific, some scenarios of JDBC Connector are list as > following: > > - The underlying DB supports DECIMAL(65, 30), which is out of the range > of Flink's Decimal > - The underlying DB supports TIMESTAMP(6), and user want to define a > table with TIMESTAMP(9) in Flink > - User defines a table with DECIMAL(10, 4) in underlying DB, and want to > define a table with DECIMAL(5, 2) in Flink > - The precision of the underlying DB varies between different versions > > > What do you think about this? any feedback are appreciates. > > *Best Regards,* > *Zhenghua Gao* > -- Best, Jingsong Lee |
Hi Jingsong Lee
You are right that the connectors don't validate data types either now. We seems lack a mechanism to validate with properties[1], data types, etc for CREATE TABLE. [1] https://issues.apache.org/jira/browse/FLINK-15509 *Best Regards,* *Zhenghua Gao* On Fri, Jan 10, 2020 at 2:59 PM Jingsong Li <[hidden email]> wrote: > Hi Zhenghua, > > I think it's not just about precision of type. Connectors not validate the > types either. > Now there is "SchemaValidator", this validator is just used to validate > type properties. But not for connector type support. > I think we can have something like "DataTypeValidator" to help connectors > validating their type support. > > Consider current validator design, validator is called by connector itself. > it's more like a util class than a mechanism. > > Best, > Jingsong Lee > > On Fri, Jan 10, 2020 at 11:47 AM Zhenghua Gao <[hidden email]> wrote: > > > Hi dev, > > > > I'd like to kick off a discussion on a mechanism to validate the > precision > > of columns for some connectors. > > > > We come to an agreement that the user should be informed if the connector > > does not support the desired precision. And from the connector > developer's > > view, there are 3-levels information to be considered: > > > > - the ability of external systems (e.g. Apache Derby support > > TIMESTAMP(9), Mysql support TIMESTAMP(6), etc) > > > > Connector developers should use this information to validate user's DDL > and > > make sure throw an exception if concrete column is out of range. > > > > > > - schema of referenced tables in external systems > > > > If the schema information of referenced tables is available in Compile > > Time, connector developers could use it to find the mismatch between DDL. > > But in most cases, the schema information is unavailable because of > network > > isolation or authority management. We should use it with caution. > > > > > > - schema-less external systems (e.g. HBase) > > > > If the external systems is schema-less like HBase, the connector > developer > > should make sure the connector doesn't cause precision loss (e.g. > > flink-hbase serializes java.sql.Timestamp to long in bytes which only > keep > > millisecond's precision.) > > > > To make it more specific, some scenarios of JDBC Connector are list as > > following: > > > > - The underlying DB supports DECIMAL(65, 30), which is out of the > range > > of Flink's Decimal > > - The underlying DB supports TIMESTAMP(6), and user want to define a > > table with TIMESTAMP(9) in Flink > > - User defines a table with DECIMAL(10, 4) in underlying DB, and want > to > > define a table with DECIMAL(5, 2) in Flink > > - The precision of the underlying DB varies between different versions > > > > > > What do you think about this? any feedback are appreciates. > > > > *Best Regards,* > > *Zhenghua Gao* > > > > > -- > Best, Jingsong Lee > |
Hi Zhenghua,
For external systems with schema, I think the schema information is available most of the time and should be the single source of truth to programmatically mapping column precision via Flink catalogs, to minimize users efforts creating schema redundantly again and avoid any human errors. They will be a subset of the systems supported types and precision, and thus you don't need to validate the 1st category of "the ability of external system". It would apply to most schema storage system, like relational DBMS, hive metastore, avro schema in confluent schema registry for kafka. From my observation, the real problem right now is Flink cannot truly leverage external system schemas via Flink Catalogs, as documented in [1]. I'm not sure if there's any unsolvable network or authorization problems, as most systems nowadays can be read with simple access id/key pair via vpc, intranet, or internet. What problems have you ran into? For schemaless systems, we'd have to rely on full testing coverage in Flink. [1] https://issues.apache.org/jira/browse/FLINK-15545 On Fri, Jan 10, 2020 at 1:12 AM Zhenghua Gao <[hidden email]> wrote: > Hi Jingsong Lee > > You are right that the connectors don't validate data types either now. > We seems lack a mechanism to validate with properties[1], data types, etc > for CREATE TABLE. > > [1] https://issues.apache.org/jira/browse/FLINK-15509 > > *Best Regards,* > *Zhenghua Gao* > > > On Fri, Jan 10, 2020 at 2:59 PM Jingsong Li <[hidden email]> > wrote: > > > Hi Zhenghua, > > > > I think it's not just about precision of type. Connectors not validate > the > > types either. > > Now there is "SchemaValidator", this validator is just used to validate > > type properties. But not for connector type support. > > I think we can have something like "DataTypeValidator" to help connectors > > validating their type support. > > > > Consider current validator design, validator is called by connector > itself. > > it's more like a util class than a mechanism. > > > > Best, > > Jingsong Lee > > > > On Fri, Jan 10, 2020 at 11:47 AM Zhenghua Gao <[hidden email]> wrote: > > > > > Hi dev, > > > > > > I'd like to kick off a discussion on a mechanism to validate the > > precision > > > of columns for some connectors. > > > > > > We come to an agreement that the user should be informed if the > connector > > > does not support the desired precision. And from the connector > > developer's > > > view, there are 3-levels information to be considered: > > > > > > - the ability of external systems (e.g. Apache Derby support > > > TIMESTAMP(9), Mysql support TIMESTAMP(6), etc) > > > > > > Connector developers should use this information to validate user's DDL > > and > > > make sure throw an exception if concrete column is out of range. > > > > > > > > > - schema of referenced tables in external systems > > > > > > If the schema information of referenced tables is available in Compile > > > Time, connector developers could use it to find the mismatch between > DDL. > > > But in most cases, the schema information is unavailable because of > > network > > > isolation or authority management. We should use it with caution. > > > > > > > > > - schema-less external systems (e.g. HBase) > > > > > > If the external systems is schema-less like HBase, the connector > > developer > > > should make sure the connector doesn't cause precision loss (e.g. > > > flink-hbase serializes java.sql.Timestamp to long in bytes which only > > keep > > > millisecond's precision.) > > > > > > To make it more specific, some scenarios of JDBC Connector are list as > > > following: > > > > > > - The underlying DB supports DECIMAL(65, 30), which is out of the > > range > > > of Flink's Decimal > > > - The underlying DB supports TIMESTAMP(6), and user want to define a > > > table with TIMESTAMP(9) in Flink > > > - User defines a table with DECIMAL(10, 4) in underlying DB, and > want > > to > > > define a table with DECIMAL(5, 2) in Flink > > > - The precision of the underlying DB varies between different > versions > > > > > > > > > What do you think about this? any feedback are appreciates. > > > > > > *Best Regards,* > > > *Zhenghua Gao* > > > > > > > > > -- > > Best, Jingsong Lee > > > |
Free forum by Nabble | Edit this page |