(DEPRECATED) Apache Flink Mailing List archive.

Flink on Tez

Classic

List

Threaded

16 messages Options

Kostas Tzoumas-2

Flink on Tez

Hello Flink and Tez,

I would like to point you to a first version of Flink running on
Tez. This is a Flink subproject (to be initially contributed
to flink-addons) that allows you to run unmodified Flink programs on
top of Apache Tez.

You can get the code here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support

If you want to give it a spin, some basic instructions are here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez

Be warned that this is still work in progress, so you may encounter
bugs, and this has not yet been optimized for performance.

A few words on how it works and the motivation:

The programs pass as usual through the Flink compiler and use the
Flink runtime operators (map, reduce, join, etc, including the Flink
facilities for sorting, hashing, etc). Instead of generating a Flink
distributed program (called "JobGraph" in Flink), we can now also
generate a Tez program (called "DAG" in Tez).

I have been asked why would we want to do that, as Flink has its own
execution engine. Two reasons in my opinion.

First, Tez follows design choices that are geared towards resource
elasticity, whereas the design choices behind Flink's engine are
geared more towards low latency querying and iterative
processing. Therefoere, the two engines can really complement each
other. Users can run their Flink programs in the engine that fits
better their use case and setup.

Second, in Flink we have put a lot of effort in separating program
assembly with program execution and architecting the system in layers
(APIs, common API, compiler, data processing runtime, distributed
execution engine). The possibility to swap execution engines is a good
showcase of the benefits of such a layered architecture.

Of course, trying it out and reporting bugs or contributing is very
welcome!

Best,
Kostas

Arun Murthy

Re: Flink on Tez

This is great news!

Awesome work everyeone... super excited to see this!

Arun

On Fri, Nov 7, 2014 at 10:03 AM, Kostas Tzoumas <[hidden email]> wrote:

> Hello Flink and Tez,
>
> I would like to point you to a first version of Flink running on
> Tez. This is a Flink subproject (to be initially contributed
> to flink-addons) that allows you to run unmodified Flink programs on
> top of Apache Tez.
>
> You can get the code here:
> https://github.com/ktzoumas/incubator-flink/tree/tez_support
>
> If you want to give it a spin, some basic instructions are here:
>
> https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez
>
>
> Be warned that this is still work in progress, so you may encounter
> bugs, and this has not yet been optimized for performance.
>
> A few words on how it works and the motivation:
>
> The programs pass as usual through the Flink compiler and use the
> Flink runtime operators (map, reduce, join, etc, including the Flink
> facilities for sorting, hashing, etc). Instead of generating a Flink
> distributed program (called "JobGraph" in Flink), we can now also
> generate a Tez program (called "DAG" in Tez).
>
> I have been asked why would we want to do that, as Flink has its own
> execution engine. Two reasons in my opinion.
>
> First, Tez follows design choices that are geared towards resource
> elasticity, whereas the design choices behind Flink's engine are
> geared more towards low latency querying and iterative
> processing. Therefoere, the two engines can really complement each
> other. Users can run their Flink programs in the engine that fits
> better their use case and setup.
>
> Second, in Flink we have put a lot of effort in separating program
> assembly with program execution and architecting the system in layers
> (APIs, common API, compiler, data processing runtime, distributed
> execution engine). The possibility to swap execution engines is a good
> showcase of the benefits of such a layered architecture.
>
> Of course, trying it out and reporting bugs or contributing is very
> welcome!
>
> Best,
> Kostas
>

--

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Bikas Saha

RE: Flink on Tez

In reply to this post by Kostas Tzoumas-2

Nice! Looking forward to working with the Flink community on supporting this
effort in any way we can help.

Bikas

-----Original Message-----
From: Kostas Tzoumas [mailto:[hidden email]]
Sent: Friday, November 07, 2014 10:03 AM
To: [hidden email]; [hidden email]
Subject: Flink on Tez

Hello Flink and Tez,

I would like to point you to a first version of Flink running on Tez. This
is a Flink subproject (to be initially contributed to flink-addons) that
allows you to run unmodified Flink programs on top of Apache Tez.

You can get the code here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support

If you want to give it a spin, some basic instructions are here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez

Be warned that this is still work in progress, so you may encounter bugs,
and this has not yet been optimized for performance.

A few words on how it works and the motivation:

The programs pass as usual through the Flink compiler and use the Flink
runtime operators (map, reduce, join, etc, including the Flink facilities
for sorting, hashing, etc). Instead of generating a Flink distributed
program (called "JobGraph" in Flink), we can now also generate a Tez program
(called "DAG" in Tez).

I have been asked why would we want to do that, as Flink has its own
execution engine. Two reasons in my opinion.

First, Tez follows design choices that are geared towards resource
elasticity, whereas the design choices behind Flink's engine are geared more
towards low latency querying and iterative processing. Therefoere, the two
engines can really complement each other. Users can run their Flink programs
in the engine that fits better their use case and setup.

Second, in Flink we have put a lot of effort in separating program assembly
with program execution and architecting the system in layers (APIs, common
API, compiler, data processing runtime, distributed execution engine). The
possibility to swap execution engines is a good showcase of the benefits of
such a layered architecture.

Of course, trying it out and reporting bugs or contributing is very welcome!

Best,
Kostas

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

sirinath

RE: Flink on Tez

Great news.

But as I see it if you are looking to use Flink standalone and embedded best is to try to have own implementation in the long run. Besides this is more optimized or created with Hadoop and related projects in mind and is not a simple library dependency.

Again great to see the initiative.

(Disclaimer: I am not part of the Flink team or committeer or contributor. I am just a bystander who is interested in using Flink in a particular way / domain.)

Suminda

Aljoscha Krettek-2

Re: Flink on Tez

Yes, but the beauty of it is that Flink is designed in such a way that
we can switch underlying runtime execution strategies and systems. Tez
will never be required, it's just another possible execution mode.

Regards,
Aljoscha

On Sat, Nov 8, 2014 at 12:07 PM, sirinath <[hidden email]> wrote:

> Great news.
>
> But as I see it if you are looking to use Flink standalone and embedded best
> is to try to have own implementation in the long run. Besides this is more
> optimized or created with Hadoop and related projects in mind and is not a
> simple library dependency.
>
> Again great to see the initiative.
>
> (Disclaimer: I am not part of the Flink team or committeer or contributor. I
> am just a bystander who is interested in using Flink in a particular way /
> domain.)
>
> Suminda
>
>
>
> --
> View this message in context: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Flink-on-Tez-tp2407p2416.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list archive at Nabble.com.

Flavio Pompermaier

Re: Flink on Tez

Sorry for the newbie question but for me it's not clear when users
should switch from flink runtime to tez,
they seems to do the same thing right?

On Sat, Nov 8, 2014 at 6:02 PM, Aljoscha Krettek <[hidden email]>
wrote:

> Yes, but the beauty of it is that Flink is designed in such a way that
> we can switch underlying runtime execution strategies and systems. Tez
> will never be required, it's just another possible execution mode.
>
> Regards,
> Aljoscha
>
> On Sat, Nov 8, 2014 at 12:07 PM, sirinath <[hidden email]> wrote:
> > Great news.
> >
> > But as I see it if you are looking to use Flink standalone and embedded
> best
> > is to try to have own implementation in the long run. Besides this is
> more
> > optimized or created with Hadoop and related projects in mind and is not
> a
> > simple library dependency.
> >
> > Again great to see the initiative.
> >
> > (Disclaimer: I am not part of the Flink team or committeer or
> contributor. I
> > am just a bystander who is interested in using Flink in a particular way
> /
> > domain.)
> >
> > Suminda
> >
> >
> >
> > --
> > View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Flink-on-Tez-tp2407p2416.html
> > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> list archive at Nabble.com.
>

Kostas Tzoumas-2

Re: Flink on Tez

Just to clarify, the Flink engine is not going anywhere :-) Flink should
IMO always have its own execution engine, which is currently under very
active development.

Tez and Java collections are two additional backends that users will be
able to enable depending on their needs.

On Saturday, November 8, 2014, Flavio Pompermaier <[hidden email]>
wrote:

> Sorry for the newbie question but for me it's not clear when users
> should switch from flink runtime to tez,
> they seems to do the same thing right?
>
> On Sat, Nov 8, 2014 at 6:02 PM, Aljoscha Krettek <[hidden email]
> <javascript:;>>
> wrote:
>
> > Yes, but the beauty of it is that Flink is designed in such a way that
> > we can switch underlying runtime execution strategies and systems. Tez
> > will never be required, it's just another possible execution mode.
> >
> > Regards,
> > Aljoscha
> >
> > On Sat, Nov 8, 2014 at 12:07 PM, sirinath <[hidden email]
> <javascript:;>> wrote:
> > > Great news.
> > >
> > > But as I see it if you are looking to use Flink standalone and embedded
> > best
> > > is to try to have own implementation in the long run. Besides this is
> > more
> > > optimized or created with Hadoop and related projects in mind and is
> not
> > a
> > > simple library dependency.
> > >
> > > Again great to see the initiative.
> > >
> > > (Disclaimer: I am not part of the Flink team or committeer or
> > contributor. I
> > > am just a bystander who is interested in using Flink in a particular
> way
> > /
> > > domain.)
> > >
> > > Suminda
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Flink-on-Tez-tp2407p2416.html
> > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> > list archive at Nabble.com.
> >
>

Kostas Tzoumas-2

Re: Flink on Tez

In reply to this post by Flavio Pompermaier

Flavio, see my first post in this thread.

If these differences are fine print for your application/requirements, then
yes, both backends will do the same thing (distribute computation). In that
case, the suggestion would be to use Flink in the normal way, as it is
much more mature implementation-wise than Flink-on-Tez.

On Saturday, November 8, 2014, Flavio Pompermaier <[hidden email]>
wrote:

Flavio Pompermaier

Re: Flink on Tez

Thanks Kostas fir the reply.
I've already read your first post and what is not fully clear to me is the
technical motivation of. "Tez follows design choices that are geared
towards resource elasticity, whereas the design choices behind Flink's
engine are geared more towards low latency querying and iterative
processing". Since in the future I could have to choose one of the 2 I'd
like to better understand the pros and the cons of the 2 runtimes.

Thanks in advance,
Flavio
On Nov 9, 2014 11:02 AM, "Kostas Tzoumas" <[hidden email]> wrote:

> Flavio, see my first post in this thread.
>
> If these differences are fine print for your application/requirements, then
> yes, both backends will do the same thing (distribute computation). In that
> case, the suggestion would be to use Flink in the normal way, as it is
> much more mature implementation-wise than Flink-on-Tez.
>
> On Saturday, November 8, 2014, Flavio Pompermaier <[hidden email]>
> wrote:
>
> > Sorry for the newbie question but for me it's not clear when users
> > should switch from flink runtime to tez,
> > they seems to do the same thing right?
> >
> > On Sat, Nov 8, 2014 at 6:02 PM, Aljoscha Krettek <[hidden email]
> > <javascript:;>>
> > wrote:
> >
> > > Yes, but the beauty of it is that Flink is designed in such a way that
> > > we can switch underlying runtime execution strategies and systems. Tez
> > > will never be required, it's just another possible execution mode.
> > >
> > > Regards,
> > > Aljoscha
> > >
> > > On Sat, Nov 8, 2014 at 12:07 PM, sirinath <[hidden email]
> > <javascript:;>> wrote:
> > > > Great news.
> > > >
> > > > But as I see it if you are looking to use Flink standalone and
> embedded
> > > best
> > > > is to try to have own implementation in the long run. Besides this is
> > > more
> > > > optimized or created with Hadoop and related projects in mind and is
> > not
> > > a
> > > > simple library dependency.
> > > >
> > > > Again great to see the initiative.
> > > >
> > > > (Disclaimer: I am not part of the Flink team or committeer or
> > > contributor. I
> > > > am just a bystander who is interested in using Flink in a particular
> > way
> > > /
> > > > domain.)
> > > >
> > > > Suminda
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Flink-on-Tez-tp2407p2416.html
> > > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> > > list archive at Nabble.com.
> > >
> >
>

Sean Owen-2

Re: Flink on Tez

This was kind of the substance of the same conversation that happened
about Spark on Tez, which looks like it was rejected
(https://issues.apache.org/jira/browse/SPARK-3561):

- Committing to an SPI interface is hard and imposes its own design
and runtime limitations
- Fragments / forks efforts across two implementations
- If execution engine A is good at one thing and B at another, then
you force a choice that users have difficulty making, when you could
try to improve either engine to do both
- (And in particular, the YARN elasticity thing was in theory being
improved already for Spark anyway, so wasn't as compelling)

Of course you could make that argument against any abstraction, but
the execution engine is so core to the point of these projects that
the logic may be different. It's not the same as an execution engine
abstracting over different data layers.

Here, maybe the SPI interface is already there and committed-to, and
there are real benefits to Tez as an alternative. I do think it will
be helpful to further articulate when you would want to use one over
the other, as even I have the same broad question. If they're the
same, what's the point? if they're not, when are they different -- is
it really an issue of resource elasticity vs maturity of the
integration? makes sense to me.

On Sun, Nov 9, 2014 at 10:50 AM, Flavio Pompermaier
<[hidden email]> wrote:

> Thanks Kostas fir the reply.
> I've already read your first post and what is not fully clear to me is the
> technical motivation of. "Tez follows design choices that are geared
> towards resource elasticity, whereas the design choices behind Flink's
> engine are geared more towards low latency querying and iterative
> processing". Since in the future I could have to choose one of the 2 I'd
> like to better understand the pros and the cons of the 2 runtimes.
>
> Thanks in advance,
> Flavio
> On Nov 9, 2014 11:02 AM, "Kostas Tzoumas" <[hidden email]> wrote:
>
>> Flavio, see my first post in this thread.
>>
>> If these differences are fine print for your application/requirements, then
>> yes, both backends will do the same thing (distribute computation). In that
>> case, the suggestion would be to use Flink in the normal way, as it is
>> much more mature implementation-wise than Flink-on-Tez.

sirinath

Re: Flink on Tez

Why don't you have an internal engine which addresses both the concerns raised than one over the other. Also configuration can help towards which type of optimisation is done in cases where things are mutually exclusive.

Fabian Hueske

Re: Flink on Tez

Flavio, you can switch between both engines with virtually no effort if you
use YARN. In that case, you'll either start Flink's own runtime or Tez on
YARN.

2014-11-09 14:34 GMT+01:00 sirinath <[hidden email]>:

> Why don't you have an internal engine which addresses both the concerns
> raised than one over the other. Also configuration can help towards which
> type of optimisation is done in cases where things are mutually exclusive.
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Flink-on-Tez-tp2407p2427.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

Kostas Tzoumas-2

Re: Flink on Tez

In reply to this post by Sean Owen-2

Sean, these are all sensible questions. As this codebase matures and is
eventually committed to Flink, it makes sense to create a guide of cases
where one engine would be a better fit than another.

Following the discussion in https://issues.apache.org/jira/browse/SPARK-3561,
I read that it is not a goal of the Spark project to have alternative
execution engines. Given that, it makes sense to not add support for Tez as
an execution engine.

What I think is cool in Flink is that swapping the execution engine does
not in effect change the complete code path that jobs follow. All of
Flink optimization is still used, and also all Flink runtime algorithms
(e.g., sorting, hashing, etc) are still used to execute the operators, they
just live inside Tez processors rather than Flink job vertices.

On Sun, Nov 9, 2014 at 12:09 PM, Sean Owen <[hidden email]> wrote:

> This was kind of the substance of the same conversation that happened
> about Spark on Tez, which looks like it was rejected
> (https://issues.apache.org/jira/browse/SPARK-3561):
>
> - Committing to an SPI interface is hard and imposes its own design
> and runtime limitations
> - Fragments / forks efforts across two implementations
> - If execution engine A is good at one thing and B at another, then
> you force a choice that users have difficulty making, when you could
> try to improve either engine to do both
> - (And in particular, the YARN elasticity thing was in theory being
> improved already for Spark anyway, so wasn't as compelling)
>
> Of course you could make that argument against any abstraction, but
> the execution engine is so core to the point of these projects that
> the logic may be different. It's not the same as an execution engine
> abstracting over different data layers.
>
> Here, maybe the SPI interface is already there and committed-to, and
> there are real benefits to Tez as an alternative. I do think it will
> be helpful to further articulate when you would want to use one over
> the other, as even I have the same broad question. If they're the
> same, what's the point? if they're not, when are they different -- is
> it really an issue of resource elasticity vs maturity of the
> integration? makes sense to me.
>
>
> On Sun, Nov 9, 2014 at 10:50 AM, Flavio Pompermaier
> <[hidden email]> wrote:
> > Thanks Kostas fir the reply.
> > I've already read your first post and what is not fully clear to me is
> the
> > technical motivation of. "Tez follows design choices that are geared
> > towards resource elasticity, whereas the design choices behind Flink's
> > engine are geared more towards low latency querying and iterative
> > processing". Since in the future I could have to choose one of the 2 I'd
> > like to better understand the pros and the cons of the 2 runtimes.
> >
> > Thanks in advance,
> > Flavio
> > On Nov 9, 2014 11:02 AM, "Kostas Tzoumas" <[hidden email]> wrote:
> >
> >> Flavio, see my first post in this thread.
> >>
> >> If these differences are fine print for your application/requirements,
> then
> >> yes, both backends will do the same thing (distribute computation). In
> that
> >> case, the suggestion would be to use Flink in the normal way, as it is
> >> much more mature implementation-wise than Flink-on-Tez.
>

Henry Saputra

Re: Flink on Tez

In reply to this post by Kostas Tzoumas-2

HI Kostas,

Since Tez underneath is using YARN so what does local execution means
in this case?

- Henry

On Fri, Nov 7, 2014 at 10:03 AM, Kostas Tzoumas <[hidden email]> wrote:

> Hello Flink and Tez,
>
> I would like to point you to a first version of Flink running on
> Tez. This is a Flink subproject (to be initially contributed
> to flink-addons) that allows you to run unmodified Flink programs on
> top of Apache Tez.
>
> You can get the code here:
> https://github.com/ktzoumas/incubator-flink/tree/tez_support
>
> If you want to give it a spin, some basic instructions are here:
> https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez
>
>
> Be warned that this is still work in progress, so you may encounter
> bugs, and this has not yet been optimized for performance.
>
> A few words on how it works and the motivation:
>
> The programs pass as usual through the Flink compiler and use the
> Flink runtime operators (map, reduce, join, etc, including the Flink
> facilities for sorting, hashing, etc). Instead of generating a Flink
> distributed program (called "JobGraph" in Flink), we can now also
> generate a Tez program (called "DAG" in Tez).
>
> I have been asked why would we want to do that, as Flink has its own
> execution engine. Two reasons in my opinion.
>
> First, Tez follows design choices that are geared towards resource
> elasticity, whereas the design choices behind Flink's engine are
> geared more towards low latency querying and iterative
> processing. Therefoere, the two engines can really complement each
> other. Users can run their Flink programs in the engine that fits
> better their use case and setup.
>
> Second, in Flink we have put a lot of effort in separating program
> assembly with program execution and architecting the system in layers
> (APIs, common API, compiler, data processing runtime, distributed
> execution engine). The possibility to swap execution engines is a good
> showcase of the benefits of such a layered architecture.
>
> Of course, trying it out and reporting bugs or contributing is very
> welcome!
>
> Best,
> Kostas

sirinath

Re: Flink on Tez

All this is good where you can swap engines but at this point I think you should make the internal engine as good as possible to it dominated all other engines. This way there will be not use to change.

Also relying or running on top of software designed for other ecosystems is not a good trent. Best is to have your own tightly coupled and well optimised for your use cases. Integration and interoperability is another story.

Kostas Tzoumas-2

Re: Flink on Tez

In reply to this post by Henry Saputra

Hi Henry, Tez has a local execution mode for debugging/experimenting etc:
http://tez.apache.org/localmode.html

On Tue, Nov 11, 2014 at 1:58 AM, Henry Saputra <[hidden email]>
wrote:

> HI Kostas,
>
> Since Tez underneath is using YARN so what does local execution means
> in this case?
>
> - Henry
>
> On Fri, Nov 7, 2014 at 10:03 AM, Kostas Tzoumas <[hidden email]>
> wrote:
> > Hello Flink and Tez,
> >
> > I would like to point you to a first version of Flink running on
> > Tez. This is a Flink subproject (to be initially contributed
> > to flink-addons) that allows you to run unmodified Flink programs on
> > top of Apache Tez.
> >
> > You can get the code here:
> > https://github.com/ktzoumas/incubator-flink/tree/tez_support
> >
> > If you want to give it a spin, some basic instructions are here:
> >
> https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez
> >
> >
> > Be warned that this is still work in progress, so you may encounter
> > bugs, and this has not yet been optimized for performance.
> >
> > A few words on how it works and the motivation:
> >
> > The programs pass as usual through the Flink compiler and use the
> > Flink runtime operators (map, reduce, join, etc, including the Flink
> > facilities for sorting, hashing, etc). Instead of generating a Flink
> > distributed program (called "JobGraph" in Flink), we can now also
> > generate a Tez program (called "DAG" in Tez).
> >
> > I have been asked why would we want to do that, as Flink has its own
> > execution engine. Two reasons in my opinion.
> >
> > First, Tez follows design choices that are geared towards resource
> > elasticity, whereas the design choices behind Flink's engine are
> > geared more towards low latency querying and iterative
> > processing. Therefoere, the two engines can really complement each
> > other. Users can run their Flink programs in the engine that fits
> > better their use case and setup.
> >
> > Second, in Flink we have put a lot of effort in separating program
> > assembly with program execution and architecting the system in layers
> > (APIs, common API, compiler, data processing runtime, distributed
> > execution engine). The possibility to swap execution engines is a good
> > showcase of the benefits of such a layered architecture.
> >
> > Of course, trying it out and reporting bugs or contributing is very
> > welcome!
> >
> > Best,
> > Kostas
>