Hi there,
Here in Pinterest, we utilize thrift end to end in our tech stack. As we have been building Flink as a service platform, the team spent time working on supporting Flink jobs with thrift format and successfully launched a good number of important jobs in Production in H1. In H2, we are looking at supporting Flink SQL with native Thrift support. We have some prototypes already running in development settings and plan to move forward on this approach. In the long run, we thought out of box thrift format support would benefit other folks as well. So the question is if there is already some effort around this space we can sync with? Chen Pinterest Data |
Hi Chen,
Are building something like hive thrift server ? Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > Hi there, > > Here in Pinterest, we utilize thrift end to end in our tech stack. As we > have been building Flink as a service platform, the team spent time working > on supporting Flink jobs with thrift format and successfully launched a > good number of important jobs in Production in H1. > > In H2, we are looking at supporting Flink SQL with native Thrift support. > We have some prototypes already running in development settings and plan to > move forward on this approach. > > In the long run, we thought out of box thrift format support would benefit > other folks as well. So the question is if there is already some effort > around this space we can sync with? > > Chen > Pinterest Data > -- Best Regards Jeff Zhang |
Jeff,
Are you referring something like this SPIP? https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 Not at this moment, we are working on desr/ser work at the moment. Would be good to starts discussion and learn if folks working on related areas and align. Chen On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > Hi Chen, > > Are building something like hive thrift server ? > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > Hi there, > > > > Here in Pinterest, we utilize thrift end to end in our tech stack. As we > > have been building Flink as a service platform, the team spent time > working > > on supporting Flink jobs with thrift format and successfully launched a > > good number of important jobs in Production in H1. > > > > In H2, we are looking at supporting Flink SQL with native Thrift support. > > We have some prototypes already running in development settings and plan > to > > move forward on this approach. > > > > In the long run, we thought out of box thrift format support would > benefit > > other folks as well. So the question is if there is already some effort > > around this space we can sync with? > > > > Chen > > Pinterest Data > > > > > -- > Best Regards > > Jeff Zhang > |
Hi Chen,
Right, this is what I mean. Could you provide more details about the desr/ser work ? Giving a concrete example or usage scenario would be helpful. Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > Jeff, > > Are you referring something like this SPIP? > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > Not at this moment, we are working on desr/ser work at the moment. Would be > good to starts discussion and learn if folks working on related areas and > align. > > Chen > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > > > Hi Chen, > > > > Are building something like hive thrift server ? > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > > > Hi there, > > > > > > Here in Pinterest, we utilize thrift end to end in our tech stack. As > we > > > have been building Flink as a service platform, the team spent time > > working > > > on supporting Flink jobs with thrift format and successfully launched a > > > good number of important jobs in Production in H1. > > > > > > In H2, we are looking at supporting Flink SQL with native Thrift > support. > > > We have some prototypes already running in development settings and > plan > > to > > > move forward on this approach. > > > > > > In the long run, we thought out of box thrift format support would > > benefit > > > other folks as well. So the question is if there is already some effort > > > around this space we can sync with? > > > > > > Chen > > > Pinterest Data > > > > > > > > > -- > > Best Regards > > > > Jeff Zhang > > > -- Best Regards Jeff Zhang |
Hi Chen,
Thanks for bringing up this discussion. We are doing something similar internally recently. Our use case is that many services in our company are built with thrift protocol, and we want to support accessing these RPC services natively with Flink SQL. Currently, there are two ways that we aim to support, they are thrift RPC Sink and thrift RPC temporal table (dimension table). Then our scenario is that we need to support both (de)ser with thrift format, and accessing the thrift RPC service. Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > Hi Chen, > > Right, this is what I mean. Could you provide more details about the > desr/ser work ? Giving a concrete example or usage scenario would be > helpful. > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > > Jeff, > > > > Are you referring something like this SPIP? > > > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > Not at this moment, we are working on desr/ser work at the moment. Would > be > > good to starts discussion and learn if folks working on related areas and > > align. > > > > Chen > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > > > > > Hi Chen, > > > > > > Are building something like hive thrift server ? > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > > > > > Hi there, > > > > > > > > Here in Pinterest, we utilize thrift end to end in our tech stack. As > > we > > > > have been building Flink as a service platform, the team spent time > > > working > > > > on supporting Flink jobs with thrift format and successfully > launched a > > > > good number of important jobs in Production in H1. > > > > > > > > In H2, we are looking at supporting Flink SQL with native Thrift > > support. > > > > We have some prototypes already running in development settings and > > plan > > > to > > > > move forward on this approach. > > > > > > > > In the long run, we thought out of box thrift format support would > > > benefit > > > > other folks as well. So the question is if there is already some > effort > > > > around this space we can sync with? > > > > > > > > Chen > > > > Pinterest Data > > > > > > > > > > > > > -- > > > Best Regards > > > > > > Jeff Zhang > > > > > > > > -- > Best Regards > > Jeff Zhang > -- Best, Benchao Li |
Jeff
A sample would be you have a Kafka topic stores record in thrift format, - Flink SQL will not work because it doesn't support thrift format out of the box, - table schema can't be inferred so the user might end up handcrafting field by field mapping - thrift object serialization fall back to kryo after user write it's own version of TDSerializer/TBaseSerailizer based implementation. - thrift RPC needs user do a bit more work and setup. bonus, jvm <-> python can share same dataformat with same schema Chen Benchao, Sounds great! Glad to hear folks are working on this area. On top of my head, lists of iteams could be - adding support in flink-format (e.g flink-thrift) - evaluate if TBaseSeralizaer (Kryo) need extra work - derive table schema out of thrift struct (java/python or .thrift) - Row / RowTypeInfo related transformations. - Thrift RPC Table sink v.s Stream sink in Flink SQL - thrift RPC temporal table (dimension table). (copy from your side) What do you think? Thanks, Chen On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote: > Hi Chen, > > Thanks for bringing up this discussion. We are doing something similar > internally recently. > > Our use case is that many services in our company are built with > thrift protocol, and we > want to support accessing these RPC services natively with Flink SQL. > Currently, there are two ways that we aim to support, they are thrift RPC > Sink and thrift RPC > temporal table (dimension table). > Then our scenario is that we need to support both (de)ser with > thrift format, and accessing > the thrift RPC service. > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > Hi Chen, > > > > Right, this is what I mean. Could you provide more details about the > > desr/ser work ? Giving a concrete example or usage scenario would be > > helpful. > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > > > > Jeff, > > > > > > Are you referring something like this SPIP? > > > > > > > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > > Not at this moment, we are working on desr/ser work at the moment. > Would > > be > > > good to starts discussion and learn if folks working on related areas > and > > > align. > > > > > > Chen > > > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > > > > > > > Hi Chen, > > > > > > > > Are building something like hive thrift server ? > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > > > > > > > Hi there, > > > > > > > > > > Here in Pinterest, we utilize thrift end to end in our tech stack. > As > > > we > > > > > have been building Flink as a service platform, the team spent time > > > > working > > > > > on supporting Flink jobs with thrift format and successfully > > launched a > > > > > good number of important jobs in Production in H1. > > > > > > > > > > In H2, we are looking at supporting Flink SQL with native Thrift > > > support. > > > > > We have some prototypes already running in development settings and > > > plan > > > > to > > > > > move forward on this approach. > > > > > > > > > > In the long run, we thought out of box thrift format support would > > > > benefit > > > > > other folks as well. So the question is if there is already some > > effort > > > > > around this space we can sync with? > > > > > > > > > > Chen > > > > > Pinterest Data > > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > > > > > Jeff Zhang > > > > > > > > > > > > > -- > > Best Regards > > > > Jeff Zhang > > > > > -- > > Best, > Benchao Li > |
Hi Chen,
- adding support in flink-format (e.g flink-thrift) Sure. We should have a flink-thrift format to do the (de)ser work. - evaluate if TBaseSeralizaer (Kryo) need extra work I don't known if I understand it correctly, I think we don't need to transfer thrift data inside Flink, we just deserialize it at Source, and serialize it at Sink. - derive table schema out of thrift struct (java/python or .thrift) We can either derive the schema from thrift struct, or just define a standard DDL to match the thrift definition. - Row / RowTypeInfo related transformations. Sure. - Thrift RPC Table sink v.s Stream sink in Flink SQL Currently we don't consider Stream Sink scenario because it's easy for Stream users to do it by themselves. - thrift RPC temporal table (dimension table). (copy from your side) Sure, in this case, we do the RPC read. And in RPC Table Sink, we do the RPC write. Chen Qin <[hidden email]> 于2020年7月21日周二 上午2:55写道: > Jeff > > A sample would be you have a Kafka topic stores record in thrift format, > - Flink SQL will not work because it doesn't support thrift format out of > the box, > - table schema can't be inferred so the user might end up handcrafting > field by field mapping > - thrift object serialization fall back to kryo after user write it's own > version of TDSerializer/TBaseSerailizer based implementation. > - thrift RPC needs user do a bit more work and setup. > > bonus, > jvm <-> python can share same dataformat with same schema > > Chen > > Benchao, > > Sounds great! Glad to hear folks are working on this area. > > On top of my head, lists of iteams could be > - adding support in flink-format (e.g flink-thrift) > - evaluate if TBaseSeralizaer (Kryo) need extra work > - derive table schema out of thrift struct (java/python or .thrift) > - Row / RowTypeInfo related transformations. > - Thrift RPC Table sink v.s Stream sink in Flink SQL > - thrift RPC temporal table (dimension table). (copy from your side) > > What do you think? > > Thanks, > Chen > > On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote: > > > Hi Chen, > > > > Thanks for bringing up this discussion. We are doing something similar > > internally recently. > > > > Our use case is that many services in our company are built with > > thrift protocol, and we > > want to support accessing these RPC services natively with Flink SQL. > > Currently, there are two ways that we aim to support, they are thrift RPC > > Sink and thrift RPC > > temporal table (dimension table). > > Then our scenario is that we need to support both (de)ser with > > thrift format, and accessing > > the thrift RPC service. > > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > > > Hi Chen, > > > > > > Right, this is what I mean. Could you provide more details about the > > > desr/ser work ? Giving a concrete example or usage scenario would be > > > helpful. > > > > > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > > > > > > Jeff, > > > > > > > > Are you referring something like this SPIP? > > > > > > > > > > > > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > > > Not at this moment, we are working on desr/ser work at the moment. > > Would > > > be > > > > good to starts discussion and learn if folks working on related areas > > and > > > > align. > > > > > > > > Chen > > > > > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > > > > > > > > > Hi Chen, > > > > > > > > > > Are building something like hive thrift server ? > > > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > > > > > > > > > Hi there, > > > > > > > > > > > > Here in Pinterest, we utilize thrift end to end in our tech > stack. > > As > > > > we > > > > > > have been building Flink as a service platform, the team spent > time > > > > > working > > > > > > on supporting Flink jobs with thrift format and successfully > > > launched a > > > > > > good number of important jobs in Production in H1. > > > > > > > > > > > > In H2, we are looking at supporting Flink SQL with native Thrift > > > > support. > > > > > > We have some prototypes already running in development settings > and > > > > plan > > > > > to > > > > > > move forward on this approach. > > > > > > > > > > > > In the long run, we thought out of box thrift format support > would > > > > > benefit > > > > > > other folks as well. So the question is if there is already some > > > effort > > > > > > around this space we can sync with? > > > > > > > > > > > > Chen > > > > > > Pinterest Data > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards > > > > > > > > > > Jeff Zhang > > > > > > > > > > > > > > > > > > -- > > > Best Regards > > > > > > Jeff Zhang > > > > > > > > > -- > > > > Best, > > Benchao Li > > > -- Best, Benchao Li |
Hi Chen,
Your listed items sound great to me. I think we can start from the thrift format, could you open an issue for it? The community also planned to support PB format in the next version, maybe can work together. Deriving table schema out of thrift struct is also an interesting topic, and is also needed in other cases, like deriving table schema from Avro schema, we had some discussion in FLINK-18158 [1]. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-18158 On Tue, 21 Jul 2020 at 11:05, Benchao Li <[hidden email]> wrote: > Hi Chen, > > - adding support in flink-format (e.g flink-thrift) > Sure. We should have a flink-thrift format to do the (de)ser work. > - evaluate if TBaseSeralizaer (Kryo) need extra work > I don't known if I understand it correctly, I think we don't need to > transfer thrift data inside Flink, we just > deserialize it at Source, and serialize it at Sink. > - derive table schema out of thrift struct (java/python or .thrift) > We can either derive the schema from thrift struct, or just define a > standard DDL to match the thrift definition. > - Row / RowTypeInfo related transformations. > Sure. > - Thrift RPC Table sink v.s Stream sink in Flink SQL > Currently we don't consider Stream Sink scenario because it's easy for > Stream users to do it by themselves. > - thrift RPC temporal table (dimension table). (copy from your side) > Sure, in this case, we do the RPC read. And in RPC Table Sink, we do the > RPC write. > > > Chen Qin <[hidden email]> 于2020年7月21日周二 上午2:55写道: > > > Jeff > > > > A sample would be you have a Kafka topic stores record in thrift format, > > - Flink SQL will not work because it doesn't support thrift format out of > > the box, > > - table schema can't be inferred so the user might end up handcrafting > > field by field mapping > > - thrift object serialization fall back to kryo after user write it's own > > version of TDSerializer/TBaseSerailizer based implementation. > > - thrift RPC needs user do a bit more work and setup. > > > > bonus, > > jvm <-> python can share same dataformat with same schema > > > > Chen > > > > Benchao, > > > > Sounds great! Glad to hear folks are working on this area. > > > > On top of my head, lists of iteams could be > > - adding support in flink-format (e.g flink-thrift) > > - evaluate if TBaseSeralizaer (Kryo) need extra work > > - derive table schema out of thrift struct (java/python or .thrift) > > - Row / RowTypeInfo related transformations. > > - Thrift RPC Table sink v.s Stream sink in Flink SQL > > - thrift RPC temporal table (dimension table). (copy from your side) > > > > What do you think? > > > > Thanks, > > Chen > > > > On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote: > > > > > Hi Chen, > > > > > > Thanks for bringing up this discussion. We are doing something similar > > > internally recently. > > > > > > Our use case is that many services in our company are built with > > > thrift protocol, and we > > > want to support accessing these RPC services natively with Flink SQL. > > > Currently, there are two ways that we aim to support, they are thrift > RPC > > > Sink and thrift RPC > > > temporal table (dimension table). > > > Then our scenario is that we need to support both (de)ser with > > > thrift format, and accessing > > > the thrift RPC service. > > > > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > > > > > Hi Chen, > > > > > > > > Right, this is what I mean. Could you provide more details about the > > > > desr/ser work ? Giving a concrete example or usage scenario would be > > > > helpful. > > > > > > > > > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > > > > > > > > Jeff, > > > > > > > > > > Are you referring something like this SPIP? > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > > > > Not at this moment, we are working on desr/ser work at the moment. > > > Would > > > > be > > > > > good to starts discussion and learn if folks working on related > areas > > > and > > > > > align. > > > > > > > > > > Chen > > > > > > > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> > wrote: > > > > > > > > > > > Hi Chen, > > > > > > > > > > > > Are building something like hive thrift server ? > > > > > > > > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > > > > > > > > > > > Hi there, > > > > > > > > > > > > > > Here in Pinterest, we utilize thrift end to end in our tech > > stack. > > > As > > > > > we > > > > > > > have been building Flink as a service platform, the team spent > > time > > > > > > working > > > > > > > on supporting Flink jobs with thrift format and successfully > > > > launched a > > > > > > > good number of important jobs in Production in H1. > > > > > > > > > > > > > > In H2, we are looking at supporting Flink SQL with native > Thrift > > > > > support. > > > > > > > We have some prototypes already running in development settings > > and > > > > > plan > > > > > > to > > > > > > > move forward on this approach. > > > > > > > > > > > > > > In the long run, we thought out of box thrift format support > > would > > > > > > benefit > > > > > > > other folks as well. So the question is if there is already > some > > > > effort > > > > > > > around this space we can sync with? > > > > > > > > > > > > > > Chen > > > > > > > Pinterest Data > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards > > > > > > > > > > > > Jeff Zhang > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > > > > > Jeff Zhang > > > > > > > > > > > > > -- > > > > > > Best, > > > Benchao Li > > > > > > > > -- > > Best, > Benchao Li > |
In reply to this post by Benchao Li-2
Hi,
I've just spotted this PR that might be helpful in the discussion: https://github.com/apache/flink/pull/8067 Best, Dawid On 20/07/2020 04:30, Benchao Li wrote: > Hi Chen, > > Thanks for bringing up this discussion. We are doing something similar > internally recently. > > Our use case is that many services in our company are built with > thrift protocol, and we > want to support accessing these RPC services natively with Flink SQL. > Currently, there are two ways that we aim to support, they are thrift RPC > Sink and thrift RPC > temporal table (dimension table). > Then our scenario is that we need to support both (de)ser with > thrift format, and accessing > the thrift RPC service. > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > >> Hi Chen, >> >> Right, this is what I mean. Could you provide more details about the >> desr/ser work ? Giving a concrete example or usage scenario would be >> helpful. >> >> >> >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: >> >>> Jeff, >>> >>> Are you referring something like this SPIP? >>> >>> >> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 >>> Not at this moment, we are working on desr/ser work at the moment. Would >> be >>> good to starts discussion and learn if folks working on related areas and >>> align. >>> >>> Chen >>> >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: >>> >>>> Hi Chen, >>>> >>>> Are building something like hive thrift server ? >>>> >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: >>>> >>>>> Hi there, >>>>> >>>>> Here in Pinterest, we utilize thrift end to end in our tech stack. As >>> we >>>>> have been building Flink as a service platform, the team spent time >>>> working >>>>> on supporting Flink jobs with thrift format and successfully >> launched a >>>>> good number of important jobs in Production in H1. >>>>> >>>>> In H2, we are looking at supporting Flink SQL with native Thrift >>> support. >>>>> We have some prototypes already running in development settings and >>> plan >>>> to >>>>> move forward on this approach. >>>>> >>>>> In the long run, we thought out of box thrift format support would >>>> benefit >>>>> other folks as well. So the question is if there is already some >> effort >>>>> around this space we can sync with? >>>>> >>>>> Chen >>>>> Pinterest Data >>>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >> >> -- >> Best Regards >> >> Jeff Zhang >> > signature.asc (849 bytes) Download Attachment |
Thanks Dawid for the link. I have a glance at the PR.
I think we can continue the thrift format based on the PR (would be better to reach out to the author). Best, Jark On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]> wrote: > Hi, > > I've just spotted this PR that might be helpful in the discussion: > https://github.com/apache/flink/pull/8067 > > Best, > > Dawid > > On 20/07/2020 04:30, Benchao Li wrote: > > Hi Chen, > > > > Thanks for bringing up this discussion. We are doing something similar > > internally recently. > > > > Our use case is that many services in our company are built with > > thrift protocol, and we > > want to support accessing these RPC services natively with Flink SQL. > > Currently, there are two ways that we aim to support, they are thrift RPC > > Sink and thrift RPC > > temporal table (dimension table). > > Then our scenario is that we need to support both (de)ser with > > thrift format, and accessing > > the thrift RPC service. > > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > >> Hi Chen, > >> > >> Right, this is what I mean. Could you provide more details about the > >> desr/ser work ? Giving a concrete example or usage scenario would be > >> helpful. > >> > >> > >> > >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > >> > >>> Jeff, > >>> > >>> Are you referring something like this SPIP? > >>> > >>> > >> > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > >>> Not at this moment, we are working on desr/ser work at the moment. > Would > >> be > >>> good to starts discussion and learn if folks working on related areas > and > >>> align. > >>> > >>> Chen > >>> > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > >>> > >>>> Hi Chen, > >>>> > >>>> Are building something like hive thrift server ? > >>>> > >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > >>>> > >>>>> Hi there, > >>>>> > >>>>> Here in Pinterest, we utilize thrift end to end in our tech stack. As > >>> we > >>>>> have been building Flink as a service platform, the team spent time > >>>> working > >>>>> on supporting Flink jobs with thrift format and successfully > >> launched a > >>>>> good number of important jobs in Production in H1. > >>>>> > >>>>> In H2, we are looking at supporting Flink SQL with native Thrift > >>> support. > >>>>> We have some prototypes already running in development settings and > >>> plan > >>>> to > >>>>> move forward on this approach. > >>>>> > >>>>> In the long run, we thought out of box thrift format support would > >>>> benefit > >>>>> other folks as well. So the question is if there is already some > >> effort > >>>>> around this space we can sync with? > >>>>> > >>>>> Chen > >>>>> Pinterest Data > >>>>> > >>>> > >>>> -- > >>>> Best Regards > >>>> > >>>> Jeff Zhang > >>>> > >> > >> -- > >> Best Regards > >> > >> Jeff Zhang > >> > > > > |
Thanks for the discussion. In https://github.com/apache/flink/pull/8067 we
made an initial version on adding thrift-format support in flink, and haven't got time to finish it. Feel free to take it over and make changes. I've also linked this discussion thread in https://issues.apache.org/jira/browse/FLINK-11746. Regards, -Yu On Tue, Jul 21, 2020 at 1:14 AM Jark Wu <[hidden email]> wrote: > Thanks Dawid for the link. I have a glance at the PR. > > I think we can continue the thrift format based on the PR (would be better > to reach out to the author). > > Best, > Jark > > On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]> > wrote: > > > Hi, > > > > I've just spotted this PR that might be helpful in the discussion: > > https://github.com/apache/flink/pull/8067 > > > > Best, > > > > Dawid > > > > On 20/07/2020 04:30, Benchao Li wrote: > > > Hi Chen, > > > > > > Thanks for bringing up this discussion. We are doing something similar > > > internally recently. > > > > > > Our use case is that many services in our company are built with > > > thrift protocol, and we > > > want to support accessing these RPC services natively with Flink SQL. > > > Currently, there are two ways that we aim to support, they are thrift > RPC > > > Sink and thrift RPC > > > temporal table (dimension table). > > > Then our scenario is that we need to support both (de)ser with > > > thrift format, and accessing > > > the thrift RPC service. > > > > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > > > >> Hi Chen, > > >> > > >> Right, this is what I mean. Could you provide more details about the > > >> desr/ser work ? Giving a concrete example or usage scenario would be > > >> helpful. > > >> > > >> > > >> > > >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > >> > > >>> Jeff, > > >>> > > >>> Are you referring something like this SPIP? > > >>> > > >>> > > >> > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > >>> Not at this moment, we are working on desr/ser work at the moment. > > Would > > >> be > > >>> good to starts discussion and learn if folks working on related areas > > and > > >>> align. > > >>> > > >>> Chen > > >>> > > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote: > > >>> > > >>>> Hi Chen, > > >>>> > > >>>> Are building something like hive thrift server ? > > >>>> > > >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > >>>> > > >>>>> Hi there, > > >>>>> > > >>>>> Here in Pinterest, we utilize thrift end to end in our tech stack. > As > > >>> we > > >>>>> have been building Flink as a service platform, the team spent time > > >>>> working > > >>>>> on supporting Flink jobs with thrift format and successfully > > >> launched a > > >>>>> good number of important jobs in Production in H1. > > >>>>> > > >>>>> In H2, we are looking at supporting Flink SQL with native Thrift > > >>> support. > > >>>>> We have some prototypes already running in development settings and > > >>> plan > > >>>> to > > >>>>> move forward on this approach. > > >>>>> > > >>>>> In the long run, we thought out of box thrift format support would > > >>>> benefit > > >>>>> other folks as well. So the question is if there is already some > > >> effort > > >>>>> around this space we can sync with? > > >>>>> > > >>>>> Chen > > >>>>> Pinterest Data > > >>>>> > > >>>> > > >>>> -- > > >>>> Best Regards > > >>>> > > >>>> Jeff Zhang > > >>>> > > >> > > >> -- > > >> Best Regards > > >> > > >> Jeff Zhang > > >> > > > > > > > > |
Thanks, Yu sharing more background on this.
Jark, We were able to sync with Yu a bit offline. I think we should reuse Jira and the future on how to reuse code when we get into the implementation phase. and continue the discussion maybe share a google doc detail list of work and options so folks can agree on as first step. Please assign FLINK-11746 to me account. As Benchao previously pointed out, Flink SQL thrift seems likely growing beyond single pr work. - Ser/Deser, use kryo to customize seralizer or infer POJO from thrift from source - TableSchema and Type translation, use DDL to match or use thrift to infer DDL, will nest column pruning works? - As most online services use either gRPc or thrift as service endpoint definition. Is there a proper way to construct a "table" that interact directly with those online services (v.s async io) ? Thanks, Chen On Tue, Jul 21, 2020 at 12:14 PM Yu Yang <[hidden email]> wrote: > Thanks for the discussion. In https://github.com/apache/flink/pull/8067 we > made an initial version on adding thrift-format support in flink, and > haven't got time to finish it. Feel free to take it over and make changes. > I've also linked this discussion thread in > https://issues.apache.org/jira/browse/FLINK-11746. > > Regards, > -Yu > > On Tue, Jul 21, 2020 at 1:14 AM Jark Wu <[hidden email]> wrote: > > > Thanks Dawid for the link. I have a glance at the PR. > > > > I think we can continue the thrift format based on the PR (would be > better > > to reach out to the author). > > > > Best, > > Jark > > > > On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]> > > wrote: > > > > > Hi, > > > > > > I've just spotted this PR that might be helpful in the discussion: > > > https://github.com/apache/flink/pull/8067 > > > > > > Best, > > > > > > Dawid > > > > > > On 20/07/2020 04:30, Benchao Li wrote: > > > > Hi Chen, > > > > > > > > Thanks for bringing up this discussion. We are doing something > similar > > > > internally recently. > > > > > > > > Our use case is that many services in our company are built with > > > > thrift protocol, and we > > > > want to support accessing these RPC services natively with Flink SQL. > > > > Currently, there are two ways that we aim to support, they are thrift > > RPC > > > > Sink and thrift RPC > > > > temporal table (dimension table). > > > > Then our scenario is that we need to support both (de)ser with > > > > thrift format, and accessing > > > > the thrift RPC service. > > > > > > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道: > > > > > > > >> Hi Chen, > > > >> > > > >> Right, this is what I mean. Could you provide more details about the > > > >> desr/ser work ? Giving a concrete example or usage scenario would be > > > >> helpful. > > > >> > > > >> > > > >> > > > >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道: > > > >> > > > >>> Jeff, > > > >>> > > > >>> Are you referring something like this SPIP? > > > >>> > > > >>> > > > >> > > > > > > https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0 > > > >>> Not at this moment, we are working on desr/ser work at the moment. > > > Would > > > >> be > > > >>> good to starts discussion and learn if folks working on related > areas > > > and > > > >>> align. > > > >>> > > > >>> Chen > > > >>> > > > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> > wrote: > > > >>> > > > >>>> Hi Chen, > > > >>>> > > > >>>> Are building something like hive thrift server ? > > > >>>> > > > >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道: > > > >>>> > > > >>>>> Hi there, > > > >>>>> > > > >>>>> Here in Pinterest, we utilize thrift end to end in our tech > stack. > > As > > > >>> we > > > >>>>> have been building Flink as a service platform, the team spent > time > > > >>>> working > > > >>>>> on supporting Flink jobs with thrift format and successfully > > > >> launched a > > > >>>>> good number of important jobs in Production in H1. > > > >>>>> > > > >>>>> In H2, we are looking at supporting Flink SQL with native Thrift > > > >>> support. > > > >>>>> We have some prototypes already running in development settings > and > > > >>> plan > > > >>>> to > > > >>>>> move forward on this approach. > > > >>>>> > > > >>>>> In the long run, we thought out of box thrift format support > would > > > >>>> benefit > > > >>>>> other folks as well. So the question is if there is already some > > > >> effort > > > >>>>> around this space we can sync with? > > > >>>>> > > > >>>>> Chen > > > >>>>> Pinterest Data > > > >>>>> > > > >>>> > > > >>>> -- > > > >>>> Best Regards > > > >>>> > > > >>>> Jeff Zhang > > > >>>> > > > >> > > > >> -- > > > >> Best Regards > > > >> > > > >> Jeff Zhang > > > >> > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |