thrift support

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

thrift support

chenqin
Hi there,

Here in Pinterest, we utilize thrift end to end in our tech stack. As we
have been building Flink as a service platform, the team spent time working
on supporting Flink jobs with thrift format and successfully launched a
good number of important jobs in Production in H1.

In H2, we are looking at supporting Flink SQL with native Thrift support.
We have some prototypes already running in development settings and plan to
move forward on this approach.

In the long run, we thought out of box thrift format support would benefit
other folks as well. So the question is if there is already some effort
around this space we can sync with?

Chen
Pinterest Data
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Jeff Zhang
Hi Chen,

Are building something like hive thrift server ?

Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:

> Hi there,
>
> Here in Pinterest, we utilize thrift end to end in our tech stack. As we
> have been building Flink as a service platform, the team spent time working
> on supporting Flink jobs with thrift format and successfully launched a
> good number of important jobs in Production in H1.
>
> In H2, we are looking at supporting Flink SQL with native Thrift support.
> We have some prototypes already running in development settings and plan to
> move forward on this approach.
>
> In the long run, we thought out of box thrift format support would benefit
> other folks as well. So the question is if there is already some effort
> around this space we can sync with?
>
> Chen
> Pinterest Data
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

chenqin
Jeff,

Are you referring something like this SPIP?
https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
Not at this moment, we are working on desr/ser work at the moment. Would be
good to starts discussion and learn if folks working on related areas and
align.

Chen

On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:

> Hi Chen,
>
> Are building something like hive thrift server ?
>
> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
>
> > Hi there,
> >
> > Here in Pinterest, we utilize thrift end to end in our tech stack. As we
> > have been building Flink as a service platform, the team spent time
> working
> > on supporting Flink jobs with thrift format and successfully launched a
> > good number of important jobs in Production in H1.
> >
> > In H2, we are looking at supporting Flink SQL with native Thrift support.
> > We have some prototypes already running in development settings and plan
> to
> > move forward on this approach.
> >
> > In the long run, we thought out of box thrift format support would
> benefit
> > other folks as well. So the question is if there is already some effort
> > around this space we can sync with?
> >
> > Chen
> > Pinterest Data
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Jeff Zhang
Hi Chen,

Right, this is what I mean. Could you provide more details about the
desr/ser work ? Giving a concrete example or usage scenario would be
helpful.



Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:

> Jeff,
>
> Are you referring something like this SPIP?
>
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> Not at this moment, we are working on desr/ser work at the moment. Would be
> good to starts discussion and learn if folks working on related areas and
> align.
>
> Chen
>
> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
>
> > Hi Chen,
> >
> > Are building something like hive thrift server ?
> >
> > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> >
> > > Hi there,
> > >
> > > Here in Pinterest, we utilize thrift end to end in our tech stack. As
> we
> > > have been building Flink as a service platform, the team spent time
> > working
> > > on supporting Flink jobs with thrift format and successfully launched a
> > > good number of important jobs in Production in H1.
> > >
> > > In H2, we are looking at supporting Flink SQL with native Thrift
> support.
> > > We have some prototypes already running in development settings and
> plan
> > to
> > > move forward on this approach.
> > >
> > > In the long run, we thought out of box thrift format support would
> > benefit
> > > other folks as well. So the question is if there is already some effort
> > > around this space we can sync with?
> > >
> > > Chen
> > > Pinterest Data
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Benchao Li-2
Hi Chen,

Thanks for bringing up this discussion. We are doing something similar
internally recently.

Our use case is that many services in our company are built with
thrift protocol, and we
want to support accessing these RPC services natively with Flink SQL.
Currently, there are two ways that we aim to support, they are thrift RPC
Sink and thrift RPC
temporal table (dimension table).
Then our scenario is that we need to support both (de)ser with
thrift format, and accessing
the thrift RPC service.

Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:

> Hi Chen,
>
> Right, this is what I mean. Could you provide more details about the
> desr/ser work ? Giving a concrete example or usage scenario would be
> helpful.
>
>
>
> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
>
> > Jeff,
> >
> > Are you referring something like this SPIP?
> >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > Not at this moment, we are working on desr/ser work at the moment. Would
> be
> > good to starts discussion and learn if folks working on related areas and
> > align.
> >
> > Chen
> >
> > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
> >
> > > Hi Chen,
> > >
> > > Are building something like hive thrift server ?
> > >
> > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > >
> > > > Hi there,
> > > >
> > > > Here in Pinterest, we utilize thrift end to end in our tech stack. As
> > we
> > > > have been building Flink as a service platform, the team spent time
> > > working
> > > > on supporting Flink jobs with thrift format and successfully
> launched a
> > > > good number of important jobs in Production in H1.
> > > >
> > > > In H2, we are looking at supporting Flink SQL with native Thrift
> > support.
> > > > We have some prototypes already running in development settings and
> > plan
> > > to
> > > > move forward on this approach.
> > > >
> > > > In the long run, we thought out of box thrift format support would
> > > benefit
> > > > other folks as well. So the question is if there is already some
> effort
> > > > around this space we can sync with?
> > > >
> > > > Chen
> > > > Pinterest Data
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


--

Best,
Benchao Li
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

chenqin
Jeff

A sample would be you have a Kafka topic stores record in thrift format,
- Flink SQL will not work because it doesn't support thrift format out of
the box,
- table schema can't be inferred so the user might end up handcrafting
field by field mapping
- thrift object serialization fall back to kryo after user write it's own
version of TDSerializer/TBaseSerailizer based implementation.
- thrift RPC needs user do a bit more work and setup.

bonus,
jvm <-> python can share same dataformat with same schema

Chen

Benchao,

Sounds great! Glad to hear folks are working on this area.

On top of my head, lists of iteams could be
- adding support in flink-format (e.g flink-thrift)
- evaluate if TBaseSeralizaer (Kryo) need extra work
- derive table schema out of thrift struct (java/python or .thrift)
- Row / RowTypeInfo related transformations.
- Thrift RPC Table sink v.s Stream sink in Flink SQL
- thrift RPC temporal table (dimension table). (copy from your side)

What do you think?

Thanks,
Chen

On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote:

> Hi Chen,
>
> Thanks for bringing up this discussion. We are doing something similar
> internally recently.
>
> Our use case is that many services in our company are built with
> thrift protocol, and we
> want to support accessing these RPC services natively with Flink SQL.
> Currently, there are two ways that we aim to support, they are thrift RPC
> Sink and thrift RPC
> temporal table (dimension table).
> Then our scenario is that we need to support both (de)ser with
> thrift format, and accessing
> the thrift RPC service.
>
> Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
>
> > Hi Chen,
> >
> > Right, this is what I mean. Could you provide more details about the
> > desr/ser work ? Giving a concrete example or usage scenario would be
> > helpful.
> >
> >
> >
> > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> >
> > > Jeff,
> > >
> > > Are you referring something like this SPIP?
> > >
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > Not at this moment, we are working on desr/ser work at the moment.
> Would
> > be
> > > good to starts discussion and learn if folks working on related areas
> and
> > > align.
> > >
> > > Chen
> > >
> > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
> > >
> > > > Hi Chen,
> > > >
> > > > Are building something like hive thrift server ?
> > > >
> > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > > >
> > > > > Hi there,
> > > > >
> > > > > Here in Pinterest, we utilize thrift end to end in our tech stack.
> As
> > > we
> > > > > have been building Flink as a service platform, the team spent time
> > > > working
> > > > > on supporting Flink jobs with thrift format and successfully
> > launched a
> > > > > good number of important jobs in Production in H1.
> > > > >
> > > > > In H2, we are looking at supporting Flink SQL with native Thrift
> > > support.
> > > > > We have some prototypes already running in development settings and
> > > plan
> > > > to
> > > > > move forward on this approach.
> > > > >
> > > > > In the long run, we thought out of box thrift format support would
> > > > benefit
> > > > > other folks as well. So the question is if there is already some
> > effort
> > > > > around this space we can sync with?
> > > > >
> > > > > Chen
> > > > > Pinterest Data
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
> --
>
> Best,
> Benchao Li
>
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Benchao Li-2
Hi Chen,

- adding support in flink-format (e.g flink-thrift)
  Sure. We should have a flink-thrift format to do the (de)ser work.
- evaluate if TBaseSeralizaer (Kryo) need extra work
  I don't known if I understand it correctly, I think we don't need to
transfer thrift data inside Flink, we just
  deserialize it at Source, and serialize it at Sink.
- derive table schema out of thrift struct (java/python or .thrift)
  We can either derive the schema from thrift struct, or just define a
standard DDL to match the thrift definition.
- Row / RowTypeInfo related transformations.
  Sure.
- Thrift RPC Table sink v.s Stream sink in Flink SQL
  Currently we don't consider Stream Sink scenario because it's easy for
Stream users to do it by themselves.
- thrift RPC temporal table (dimension table). (copy from your side)
  Sure, in this case, we do the RPC read. And in RPC Table Sink, we do the
RPC write.


Chen Qin <[hidden email]> 于2020年7月21日周二 上午2:55写道:

> Jeff
>
> A sample would be you have a Kafka topic stores record in thrift format,
> - Flink SQL will not work because it doesn't support thrift format out of
> the box,
> - table schema can't be inferred so the user might end up handcrafting
> field by field mapping
> - thrift object serialization fall back to kryo after user write it's own
> version of TDSerializer/TBaseSerailizer based implementation.
> - thrift RPC needs user do a bit more work and setup.
>
> bonus,
> jvm <-> python can share same dataformat with same schema
>
> Chen
>
> Benchao,
>
> Sounds great! Glad to hear folks are working on this area.
>
> On top of my head, lists of iteams could be
> - adding support in flink-format (e.g flink-thrift)
> - evaluate if TBaseSeralizaer (Kryo) need extra work
> - derive table schema out of thrift struct (java/python or .thrift)
> - Row / RowTypeInfo related transformations.
> - Thrift RPC Table sink v.s Stream sink in Flink SQL
> - thrift RPC temporal table (dimension table). (copy from your side)
>
> What do you think?
>
> Thanks,
> Chen
>
> On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote:
>
> > Hi Chen,
> >
> > Thanks for bringing up this discussion. We are doing something similar
> > internally recently.
> >
> > Our use case is that many services in our company are built with
> > thrift protocol, and we
> > want to support accessing these RPC services natively with Flink SQL.
> > Currently, there are two ways that we aim to support, they are thrift RPC
> > Sink and thrift RPC
> > temporal table (dimension table).
> > Then our scenario is that we need to support both (de)ser with
> > thrift format, and accessing
> > the thrift RPC service.
> >
> > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
> >
> > > Hi Chen,
> > >
> > > Right, this is what I mean. Could you provide more details about the
> > > desr/ser work ? Giving a concrete example or usage scenario would be
> > > helpful.
> > >
> > >
> > >
> > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> > >
> > > > Jeff,
> > > >
> > > > Are you referring something like this SPIP?
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > > Not at this moment, we are working on desr/ser work at the moment.
> > Would
> > > be
> > > > good to starts discussion and learn if folks working on related areas
> > and
> > > > align.
> > > >
> > > > Chen
> > > >
> > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
> > > >
> > > > > Hi Chen,
> > > > >
> > > > > Are building something like hive thrift server ?
> > > > >
> > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > > > >
> > > > > > Hi there,
> > > > > >
> > > > > > Here in Pinterest, we utilize thrift end to end in our tech
> stack.
> > As
> > > > we
> > > > > > have been building Flink as a service platform, the team spent
> time
> > > > > working
> > > > > > on supporting Flink jobs with thrift format and successfully
> > > launched a
> > > > > > good number of important jobs in Production in H1.
> > > > > >
> > > > > > In H2, we are looking at supporting Flink SQL with native Thrift
> > > > support.
> > > > > > We have some prototypes already running in development settings
> and
> > > > plan
> > > > > to
> > > > > > move forward on this approach.
> > > > > >
> > > > > > In the long run, we thought out of box thrift format support
> would
> > > > > benefit
> > > > > > other folks as well. So the question is if there is already some
> > > effort
> > > > > > around this space we can sync with?
> > > > > >
> > > > > > Chen
> > > > > > Pinterest Data
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


--

Best,
Benchao Li
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Jark Wu-2
Hi Chen,

Your listed items sound great to me. I think we can start from the thrift
format, could you open an issue for it?
The community also planned to support PB format in the next version, maybe
can work together.

Deriving table schema out of thrift struct is also an interesting topic,
and is also needed in other cases,
like deriving table schema from Avro schema, we had some discussion
in FLINK-18158 [1].

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-18158

On Tue, 21 Jul 2020 at 11:05, Benchao Li <[hidden email]> wrote:

> Hi Chen,
>
> - adding support in flink-format (e.g flink-thrift)
>   Sure. We should have a flink-thrift format to do the (de)ser work.
> - evaluate if TBaseSeralizaer (Kryo) need extra work
>   I don't known if I understand it correctly, I think we don't need to
> transfer thrift data inside Flink, we just
>   deserialize it at Source, and serialize it at Sink.
> - derive table schema out of thrift struct (java/python or .thrift)
>   We can either derive the schema from thrift struct, or just define a
> standard DDL to match the thrift definition.
> - Row / RowTypeInfo related transformations.
>   Sure.
> - Thrift RPC Table sink v.s Stream sink in Flink SQL
>   Currently we don't consider Stream Sink scenario because it's easy for
> Stream users to do it by themselves.
> - thrift RPC temporal table (dimension table). (copy from your side)
>   Sure, in this case, we do the RPC read. And in RPC Table Sink, we do the
> RPC write.
>
>
> Chen Qin <[hidden email]> 于2020年7月21日周二 上午2:55写道:
>
> > Jeff
> >
> > A sample would be you have a Kafka topic stores record in thrift format,
> > - Flink SQL will not work because it doesn't support thrift format out of
> > the box,
> > - table schema can't be inferred so the user might end up handcrafting
> > field by field mapping
> > - thrift object serialization fall back to kryo after user write it's own
> > version of TDSerializer/TBaseSerailizer based implementation.
> > - thrift RPC needs user do a bit more work and setup.
> >
> > bonus,
> > jvm <-> python can share same dataformat with same schema
> >
> > Chen
> >
> > Benchao,
> >
> > Sounds great! Glad to hear folks are working on this area.
> >
> > On top of my head, lists of iteams could be
> > - adding support in flink-format (e.g flink-thrift)
> > - evaluate if TBaseSeralizaer (Kryo) need extra work
> > - derive table schema out of thrift struct (java/python or .thrift)
> > - Row / RowTypeInfo related transformations.
> > - Thrift RPC Table sink v.s Stream sink in Flink SQL
> > - thrift RPC temporal table (dimension table). (copy from your side)
> >
> > What do you think?
> >
> > Thanks,
> > Chen
> >
> > On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <[hidden email]> wrote:
> >
> > > Hi Chen,
> > >
> > > Thanks for bringing up this discussion. We are doing something similar
> > > internally recently.
> > >
> > > Our use case is that many services in our company are built with
> > > thrift protocol, and we
> > > want to support accessing these RPC services natively with Flink SQL.
> > > Currently, there are two ways that we aim to support, they are thrift
> RPC
> > > Sink and thrift RPC
> > > temporal table (dimension table).
> > > Then our scenario is that we need to support both (de)ser with
> > > thrift format, and accessing
> > > the thrift RPC service.
> > >
> > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
> > >
> > > > Hi Chen,
> > > >
> > > > Right, this is what I mean. Could you provide more details about the
> > > > desr/ser work ? Giving a concrete example or usage scenario would be
> > > > helpful.
> > > >
> > > >
> > > >
> > > > Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> > > >
> > > > > Jeff,
> > > > >
> > > > > Are you referring something like this SPIP?
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > > > Not at this moment, we are working on desr/ser work at the moment.
> > > Would
> > > > be
> > > > > good to starts discussion and learn if folks working on related
> areas
> > > and
> > > > > align.
> > > > >
> > > > > Chen
> > > > >
> > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]>
> wrote:
> > > > >
> > > > > > Hi Chen,
> > > > > >
> > > > > > Are building something like hive thrift server ?
> > > > > >
> > > > > > Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > > > > >
> > > > > > > Hi there,
> > > > > > >
> > > > > > > Here in Pinterest, we utilize thrift end to end in our tech
> > stack.
> > > As
> > > > > we
> > > > > > > have been building Flink as a service platform, the team spent
> > time
> > > > > > working
> > > > > > > on supporting Flink jobs with thrift format and successfully
> > > > launched a
> > > > > > > good number of important jobs in Production in H1.
> > > > > > >
> > > > > > > In H2, we are looking at supporting Flink SQL with native
> Thrift
> > > > > support.
> > > > > > > We have some prototypes already running in development settings
> > and
> > > > > plan
> > > > > > to
> > > > > > > move forward on this approach.
> > > > > > >
> > > > > > > In the long run, we thought out of box thrift format support
> > would
> > > > > > benefit
> > > > > > > other folks as well. So the question is if there is already
> some
> > > > effort
> > > > > > > around this space we can sync with?
> > > > > > >
> > > > > > > Chen
> > > > > > > Pinterest Data
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Jeff Zhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

dwysakowicz
In reply to this post by Benchao Li-2
Hi,

I've just spotted this PR that might be helpful in the discussion:
https://github.com/apache/flink/pull/8067

Best,

Dawid

On 20/07/2020 04:30, Benchao Li wrote:

> Hi Chen,
>
> Thanks for bringing up this discussion. We are doing something similar
> internally recently.
>
> Our use case is that many services in our company are built with
> thrift protocol, and we
> want to support accessing these RPC services natively with Flink SQL.
> Currently, there are two ways that we aim to support, they are thrift RPC
> Sink and thrift RPC
> temporal table (dimension table).
> Then our scenario is that we need to support both (de)ser with
> thrift format, and accessing
> the thrift RPC service.
>
> Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
>
>> Hi Chen,
>>
>> Right, this is what I mean. Could you provide more details about the
>> desr/ser work ? Giving a concrete example or usage scenario would be
>> helpful.
>>
>>
>>
>> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
>>
>>> Jeff,
>>>
>>> Are you referring something like this SPIP?
>>>
>>>
>> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
>>> Not at this moment, we are working on desr/ser work at the moment. Would
>> be
>>> good to starts discussion and learn if folks working on related areas and
>>> align.
>>>
>>> Chen
>>>
>>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
>>>
>>>> Hi Chen,
>>>>
>>>> Are building something like hive thrift server ?
>>>>
>>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
>>>>
>>>>> Hi there,
>>>>>
>>>>> Here in Pinterest, we utilize thrift end to end in our tech stack. As
>>> we
>>>>> have been building Flink as a service platform, the team spent time
>>>> working
>>>>> on supporting Flink jobs with thrift format and successfully
>> launched a
>>>>> good number of important jobs in Production in H1.
>>>>>
>>>>> In H2, we are looking at supporting Flink SQL with native Thrift
>>> support.
>>>>> We have some prototypes already running in development settings and
>>> plan
>>>> to
>>>>> move forward on this approach.
>>>>>
>>>>> In the long run, we thought out of box thrift format support would
>>>> benefit
>>>>> other folks as well. So the question is if there is already some
>> effort
>>>>> around this space we can sync with?
>>>>>
>>>>> Chen
>>>>> Pinterest Data
>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Jark Wu-2
Thanks Dawid for the link. I have a glance at the PR.

I think we can continue the thrift format based on the PR (would be better
to reach out to the author).

Best,
Jark

On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]>
wrote:

> Hi,
>
> I've just spotted this PR that might be helpful in the discussion:
> https://github.com/apache/flink/pull/8067
>
> Best,
>
> Dawid
>
> On 20/07/2020 04:30, Benchao Li wrote:
> > Hi Chen,
> >
> > Thanks for bringing up this discussion. We are doing something similar
> > internally recently.
> >
> > Our use case is that many services in our company are built with
> > thrift protocol, and we
> > want to support accessing these RPC services natively with Flink SQL.
> > Currently, there are two ways that we aim to support, they are thrift RPC
> > Sink and thrift RPC
> > temporal table (dimension table).
> > Then our scenario is that we need to support both (de)ser with
> > thrift format, and accessing
> > the thrift RPC service.
> >
> > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
> >
> >> Hi Chen,
> >>
> >> Right, this is what I mean. Could you provide more details about the
> >> desr/ser work ? Giving a concrete example or usage scenario would be
> >> helpful.
> >>
> >>
> >>
> >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> >>
> >>> Jeff,
> >>>
> >>> Are you referring something like this SPIP?
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> >>> Not at this moment, we are working on desr/ser work at the moment.
> Would
> >> be
> >>> good to starts discussion and learn if folks working on related areas
> and
> >>> align.
> >>>
> >>> Chen
> >>>
> >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
> >>>
> >>>> Hi Chen,
> >>>>
> >>>> Are building something like hive thrift server ?
> >>>>
> >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> >>>>
> >>>>> Hi there,
> >>>>>
> >>>>> Here in Pinterest, we utilize thrift end to end in our tech stack. As
> >>> we
> >>>>> have been building Flink as a service platform, the team spent time
> >>>> working
> >>>>> on supporting Flink jobs with thrift format and successfully
> >> launched a
> >>>>> good number of important jobs in Production in H1.
> >>>>>
> >>>>> In H2, we are looking at supporting Flink SQL with native Thrift
> >>> support.
> >>>>> We have some prototypes already running in development settings and
> >>> plan
> >>>> to
> >>>>> move forward on this approach.
> >>>>>
> >>>>> In the long run, we thought out of box thrift format support would
> >>>> benefit
> >>>>> other folks as well. So the question is if there is already some
> >> effort
> >>>>> around this space we can sync with?
> >>>>>
> >>>>> Chen
> >>>>> Pinterest Data
> >>>>>
> >>>>
> >>>> --
> >>>> Best Regards
> >>>>
> >>>> Jeff Zhang
> >>>>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

Yu Yang
Thanks for the discussion. In https://github.com/apache/flink/pull/8067 we
made an initial version on adding thrift-format support in flink, and
haven't got time to finish it. Feel free to take it over and make changes.
I've also linked this discussion thread in
https://issues.apache.org/jira/browse/FLINK-11746.

Regards,
-Yu

On Tue, Jul 21, 2020 at 1:14 AM Jark Wu <[hidden email]> wrote:

> Thanks Dawid for the link. I have a glance at the PR.
>
> I think we can continue the thrift format based on the PR (would be better
> to reach out to the author).
>
> Best,
> Jark
>
> On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I've just spotted this PR that might be helpful in the discussion:
> > https://github.com/apache/flink/pull/8067
> >
> > Best,
> >
> > Dawid
> >
> > On 20/07/2020 04:30, Benchao Li wrote:
> > > Hi Chen,
> > >
> > > Thanks for bringing up this discussion. We are doing something similar
> > > internally recently.
> > >
> > > Our use case is that many services in our company are built with
> > > thrift protocol, and we
> > > want to support accessing these RPC services natively with Flink SQL.
> > > Currently, there are two ways that we aim to support, they are thrift
> RPC
> > > Sink and thrift RPC
> > > temporal table (dimension table).
> > > Then our scenario is that we need to support both (de)ser with
> > > thrift format, and accessing
> > > the thrift RPC service.
> > >
> > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
> > >
> > >> Hi Chen,
> > >>
> > >> Right, this is what I mean. Could you provide more details about the
> > >> desr/ser work ? Giving a concrete example or usage scenario would be
> > >> helpful.
> > >>
> > >>
> > >>
> > >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> > >>
> > >>> Jeff,
> > >>>
> > >>> Are you referring something like this SPIP?
> > >>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > >>> Not at this moment, we are working on desr/ser work at the moment.
> > Would
> > >> be
> > >>> good to starts discussion and learn if folks working on related areas
> > and
> > >>> align.
> > >>>
> > >>> Chen
> > >>>
> > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]> wrote:
> > >>>
> > >>>> Hi Chen,
> > >>>>
> > >>>> Are building something like hive thrift server ?
> > >>>>
> > >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > >>>>
> > >>>>> Hi there,
> > >>>>>
> > >>>>> Here in Pinterest, we utilize thrift end to end in our tech stack.
> As
> > >>> we
> > >>>>> have been building Flink as a service platform, the team spent time
> > >>>> working
> > >>>>> on supporting Flink jobs with thrift format and successfully
> > >> launched a
> > >>>>> good number of important jobs in Production in H1.
> > >>>>>
> > >>>>> In H2, we are looking at supporting Flink SQL with native Thrift
> > >>> support.
> > >>>>> We have some prototypes already running in development settings and
> > >>> plan
> > >>>> to
> > >>>>> move forward on this approach.
> > >>>>>
> > >>>>> In the long run, we thought out of box thrift format support would
> > >>>> benefit
> > >>>>> other folks as well. So the question is if there is already some
> > >> effort
> > >>>>> around this space we can sync with?
> > >>>>>
> > >>>>> Chen
> > >>>>> Pinterest Data
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards
> > >>>>
> > >>>> Jeff Zhang
> > >>>>
> > >>
> > >> --
> > >> Best Regards
> > >>
> > >> Jeff Zhang
> > >>
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: thrift support

chenqin
Thanks, Yu sharing more background on this.

Jark,

We were able to sync with Yu a bit offline. I think we should reuse Jira
and the future on how to reuse code when we get into the implementation
phase.
and continue the discussion maybe share a google doc detail list of work
and options so folks can agree on as first step. Please assign FLINK-11746 to
me account.

As Benchao previously pointed out, Flink SQL thrift seems likely growing
beyond single pr work.
- Ser/Deser, use kryo to customize seralizer or infer POJO from thrift from
source
- TableSchema and Type translation, use DDL to match or use thrift to infer
DDL, will nest column pruning works?
- As most online services use either gRPc or thrift as service endpoint
definition. Is there a proper way to construct a "table" that interact
directly with those online services (v.s async io) ?

Thanks,
Chen

On Tue, Jul 21, 2020 at 12:14 PM Yu Yang <[hidden email]> wrote:

> Thanks for the discussion. In https://github.com/apache/flink/pull/8067 we
> made an initial version on adding thrift-format support in flink, and
> haven't got time to finish it. Feel free to take it over and make changes.
> I've also linked this discussion thread in
> https://issues.apache.org/jira/browse/FLINK-11746.
>
> Regards,
> -Yu
>
> On Tue, Jul 21, 2020 at 1:14 AM Jark Wu <[hidden email]> wrote:
>
> > Thanks Dawid for the link. I have a glance at the PR.
> >
> > I think we can continue the thrift format based on the PR (would be
> better
> > to reach out to the author).
> >
> > Best,
> > Jark
> >
> > On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz <[hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > I've just spotted this PR that might be helpful in the discussion:
> > > https://github.com/apache/flink/pull/8067
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 20/07/2020 04:30, Benchao Li wrote:
> > > > Hi Chen,
> > > >
> > > > Thanks for bringing up this discussion. We are doing something
> similar
> > > > internally recently.
> > > >
> > > > Our use case is that many services in our company are built with
> > > > thrift protocol, and we
> > > > want to support accessing these RPC services natively with Flink SQL.
> > > > Currently, there are two ways that we aim to support, they are thrift
> > RPC
> > > > Sink and thrift RPC
> > > > temporal table (dimension table).
> > > > Then our scenario is that we need to support both (de)ser with
> > > > thrift format, and accessing
> > > > the thrift RPC service.
> > > >
> > > > Jeff Zhang <[hidden email]> 于2020年7月19日周日 上午9:43写道:
> > > >
> > > >> Hi Chen,
> > > >>
> > > >> Right, this is what I mean. Could you provide more details about the
> > > >> desr/ser work ? Giving a concrete example or usage scenario would be
> > > >> helpful.
> > > >>
> > > >>
> > > >>
> > > >> Chen Qin <[hidden email]> 于2020年7月18日周六 下午11:09写道:
> > > >>
> > > >>> Jeff,
> > > >>>
> > > >>> Are you referring something like this SPIP?
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > >>> Not at this moment, we are working on desr/ser work at the moment.
> > > Would
> > > >> be
> > > >>> good to starts discussion and learn if folks working on related
> areas
> > > and
> > > >>> align.
> > > >>>
> > > >>> Chen
> > > >>>
> > > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <[hidden email]>
> wrote:
> > > >>>
> > > >>>> Hi Chen,
> > > >>>>
> > > >>>> Are building something like hive thrift server ?
> > > >>>>
> > > >>>> Chen Qin <[hidden email]> 于2020年7月18日周六 上午8:50写道:
> > > >>>>
> > > >>>>> Hi there,
> > > >>>>>
> > > >>>>> Here in Pinterest, we utilize thrift end to end in our tech
> stack.
> > As
> > > >>> we
> > > >>>>> have been building Flink as a service platform, the team spent
> time
> > > >>>> working
> > > >>>>> on supporting Flink jobs with thrift format and successfully
> > > >> launched a
> > > >>>>> good number of important jobs in Production in H1.
> > > >>>>>
> > > >>>>> In H2, we are looking at supporting Flink SQL with native Thrift
> > > >>> support.
> > > >>>>> We have some prototypes already running in development settings
> and
> > > >>> plan
> > > >>>> to
> > > >>>>> move forward on this approach.
> > > >>>>>
> > > >>>>> In the long run, we thought out of box thrift format support
> would
> > > >>>> benefit
> > > >>>>> other folks as well. So the question is if there is already some
> > > >> effort
> > > >>>>> around this space we can sync with?
> > > >>>>>
> > > >>>>> Chen
> > > >>>>> Pinterest Data
> > > >>>>>
> > > >>>>
> > > >>>> --
> > > >>>> Best Regards
> > > >>>>
> > > >>>> Jeff Zhang
> > > >>>>
> > > >>
> > > >> --
> > > >> Best Regards
> > > >>
> > > >> Jeff Zhang
> > > >>
> > > >
> > >
> > >
> >
>