[Dev] Flink 'InputFormat' Interface execution related problem

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Dev] Flink 'InputFormat' Interface execution related problem

Pawan Manishka Gunarathna
Hi,

When we are implementing that Flink *InputFormat* Interface, if we have that*
input split creation* part in our data analytics server APIs can we
directly go to the second phase of the flink InputFormat Interface
execution.

Basically I need to know that can we read those InputSplits directly,
without generating InputSplits inside the InputFormat Interface. So it
would be great if you can provide any kind of help.

Thanks,
Pawan

--

*Pawan Gunaratne*
*Mob: +94 770373556*
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Flink 'InputFormat' Interface execution related problem

Fabian Hueske-2
Hi Pawan,

I don't this this works. The InputSplits are generated by the JobManager,
i.e., not in parallel by a single process.
After the parallel InputFormats have been started on the TaskManagers, they
request InputSplits and open() them. If there are no InputSplits there is
no work to be done and open will not be called.
You can tweak the behavior by implementing your own InputSplits and
InputSplitAssigner which assigns exactly one input split to each task.

Fabian

2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
[hidden email]>:

> Hi,
>
> When we are implementing that Flink *InputFormat* Interface, if we have
> that*
> input split creation* part in our data analytics server APIs can we
> directly go to the second phase of the flink InputFormat Interface
> execution.
>
> Basically I need to know that can we read those InputSplits directly,
> without generating InputSplits inside the InputFormat Interface. So it
> would be great if you can provide any kind of help.
>
> Thanks,
> Pawan
>
> --
>
> *Pawan Gunaratne*
> *Mob: +94 770373556*
>
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Flink 'InputFormat' Interface execution related problem

Pawan Manishka Gunarathna
Hi,
Thanks for your help. Since Our data source has database tables
architecture I have a thought of follow that 'JDBCInputFormat' in Flink. It
would be great if you can provide some information regarding how that
JDBCInputFormat execution happens?

Thanks,
Pawan

On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske <[hidden email]> wrote:

> Hi Pawan,
>
> I don't this this works. The InputSplits are generated by the JobManager,
> i.e., not in parallel by a single process.
> After the parallel InputFormats have been started on the TaskManagers, they
> request InputSplits and open() them. If there are no InputSplits there is
> no work to be done and open will not be called.
> You can tweak the behavior by implementing your own InputSplits and
> InputSplitAssigner which assigns exactly one input split to each task.
>
> Fabian
>
> 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> [hidden email]>:
>
> > Hi,
> >
> > When we are implementing that Flink *InputFormat* Interface, if we have
> > that*
> > input split creation* part in our data analytics server APIs can we
> > directly go to the second phase of the flink InputFormat Interface
> > execution.
> >
> > Basically I need to know that can we read those InputSplits directly,
> > without generating InputSplits inside the InputFormat Interface. So it
> > would be great if you can provide any kind of help.
> >
> > Thanks,
> > Pawan
> >
> > --
> >
> > *Pawan Gunaratne*
> > *Mob: +94 770373556*
> >
>



--

*Pawan Gunaratne*
*Mob: +94 770373556*
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Flink 'InputFormat' Interface execution related problem

Fabian Hueske-2
Hi,

JdbcInputFormat implements the InputFormat interface and is handled exactly
like any other InputFormat.
In contrast to file-based input formats, users must explicitly specify the
input splits by providing an array of parameter values which are injected
into a parameterized query.
This is done because it is not easy to implement a generic method
automatically split a query into multiple (preferably equal-sized) partial
queries.

Best, Fabian

2017-01-24 6:31 GMT+01:00 Pawan Manishka Gunarathna <
[hidden email]>:

> Hi,
> Thanks for your help. Since Our data source has database tables
> architecture I have a thought of follow that 'JDBCInputFormat' in Flink. It
> would be great if you can provide some information regarding how that
> JDBCInputFormat execution happens?
>
> Thanks,
> Pawan
>
> On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske <[hidden email]> wrote:
>
> > Hi Pawan,
> >
> > I don't this this works. The InputSplits are generated by the JobManager,
> > i.e., not in parallel by a single process.
> > After the parallel InputFormats have been started on the TaskManagers,
> they
> > request InputSplits and open() them. If there are no InputSplits there is
> > no work to be done and open will not be called.
> > You can tweak the behavior by implementing your own InputSplits and
> > InputSplitAssigner which assigns exactly one input split to each task.
> >
> > Fabian
> >
> > 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> > [hidden email]>:
> >
> > > Hi,
> > >
> > > When we are implementing that Flink *InputFormat* Interface, if we have
> > > that*
> > > input split creation* part in our data analytics server APIs can we
> > > directly go to the second phase of the flink InputFormat Interface
> > > execution.
> > >
> > > Basically I need to know that can we read those InputSplits directly,
> > > without generating InputSplits inside the InputFormat Interface. So it
> > > would be great if you can provide any kind of help.
> > >
> > > Thanks,
> > > Pawan
> > >
> > > --
> > >
> > > *Pawan Gunaratne*
> > > *Mob: +94 770373556*
> > >
> >
>
>
>
> --
>
> *Pawan Gunaratne*
> *Mob: +94 770373556*
>
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Flink 'InputFormat' Interface execution related problem

Flavio Pompermaier
If your column on which you want to perform the split is numeric you can
use the NumericBetweenParametersProvider interface that automatically
computes the splits for you. This is an example of its usage (at windows of
1000 items at a time) taken from the test class *JDBCInputFormatTest*:

final int *fetchSize* = 1000;
final Long *min* = 0L;
final Long *max* = 1_000_000L;
ParameterValuesProvider pramProvider = new
*NumericBetweenParametersProvider*(fetchSize, min, max);
jdbcInputFormat = JDBCInputFormat.buildJDBCInputFormat()
.setDrivername(DRIVER_CLASS)
.setDBUrl(DB_URL)
.setQuery(JDBCTestBase.SELECT_ALL_BOOKS_SPLIT_BY_ID)
.setRowTypeInfo(rowTypeInfo)
.setParametersProvider(pramProvider)
.setResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE)
.finish();

I hope this could help,
Flavio

On Tue, Jan 24, 2017 at 10:57 AM, Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> JdbcInputFormat implements the InputFormat interface and is handled exactly
> like any other InputFormat.
> In contrast to file-based input formats, users must explicitly specify the
> input splits by providing an array of parameter values which are injected
> into a parameterized query.
> This is done because it is not easy to implement a generic method
> automatically split a query into multiple (preferably equal-sized) partial
> queries.
>
> Best, Fabian
>
> 2017-01-24 6:31 GMT+01:00 Pawan Manishka Gunarathna <
> [hidden email]>:
>
> > Hi,
> > Thanks for your help. Since Our data source has database tables
> > architecture I have a thought of follow that 'JDBCInputFormat' in Flink.
> It
> > would be great if you can provide some information regarding how that
> > JDBCInputFormat execution happens?
> >
> > Thanks,
> > Pawan
> >
> > On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske <[hidden email]>
> wrote:
> >
> > > Hi Pawan,
> > >
> > > I don't this this works. The InputSplits are generated by the
> JobManager,
> > > i.e., not in parallel by a single process.
> > > After the parallel InputFormats have been started on the TaskManagers,
> > they
> > > request InputSplits and open() them. If there are no InputSplits there
> is
> > > no work to be done and open will not be called.
> > > You can tweak the behavior by implementing your own InputSplits and
> > > InputSplitAssigner which assigns exactly one input split to each task.
> > >
> > > Fabian
> > >
> > > 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> > > [hidden email]>:
> > >
> > > > Hi,
> > > >
> > > > When we are implementing that Flink *InputFormat* Interface, if we
> have
> > > > that*
> > > > input split creation* part in our data analytics server APIs can we
> > > > directly go to the second phase of the flink InputFormat Interface
> > > > execution.
> > > >
> > > > Basically I need to know that can we read those InputSplits directly,
> > > > without generating InputSplits inside the InputFormat Interface. So
> it
> > > > would be great if you can provide any kind of help.
> > > >
> > > > Thanks,
> > > > Pawan
> > > >
> > > > --
> > > >
> > > > *Pawan Gunaratne*
> > > > *Mob: +94 770373556*
> > > >
> > >
> >
> >
> >
> > --
> >
> > *Pawan Gunaratne*
> > *Mob: +94 770373556*
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Flink 'InputFormat' Interface execution related problem

Pawan Manishka Gunarathna
Hi,
Thanks a lot for Fabian and Flavio.....Those information really helpful.

On Tue, Jan 24, 2017 at 3:36 PM, Flavio Pompermaier <[hidden email]>
wrote:

> If your column on which you want to perform the split is numeric you can
> use the NumericBetweenParametersProvider interface that automatically
> computes the splits for you. This is an example of its usage (at windows of
> 1000 items at a time) taken from the test class *JDBCInputFormatTest*:
>
> final int *fetchSize* = 1000;
> final Long *min* = 0L;
> final Long *max* = 1_000_000L;
> ParameterValuesProvider pramProvider = new
> *NumericBetweenParametersProvider*(fetchSize, min, max);
> jdbcInputFormat = JDBCInputFormat.buildJDBCInputFormat()
> .setDrivername(DRIVER_CLASS)
> .setDBUrl(DB_URL)
> .setQuery(JDBCTestBase.SELECT_ALL_BOOKS_SPLIT_BY_ID)
> .setRowTypeInfo(rowTypeInfo)
> .setParametersProvider(pramProvider)
> .setResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE)
> .finish();
>
> I hope this could help,
> Flavio
>
> On Tue, Jan 24, 2017 at 10:57 AM, Fabian Hueske <[hidden email]> wrote:
>
> > Hi,
> >
> > JdbcInputFormat implements the InputFormat interface and is handled
> exactly
> > like any other InputFormat.
> > In contrast to file-based input formats, users must explicitly specify
> the
> > input splits by providing an array of parameter values which are injected
> > into a parameterized query.
> > This is done because it is not easy to implement a generic method
> > automatically split a query into multiple (preferably equal-sized)
> partial
> > queries.
> >
> > Best, Fabian
> >
> > 2017-01-24 6:31 GMT+01:00 Pawan Manishka Gunarathna <
> > [hidden email]>:
> >
> > > Hi,
> > > Thanks for your help. Since Our data source has database tables
> > > architecture I have a thought of follow that 'JDBCInputFormat' in
> Flink.
> > It
> > > would be great if you can provide some information regarding how that
> > > JDBCInputFormat execution happens?
> > >
> > > Thanks,
> > > Pawan
> > >
> > > On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske <[hidden email]>
> > wrote:
> > >
> > > > Hi Pawan,
> > > >
> > > > I don't this this works. The InputSplits are generated by the
> > JobManager,
> > > > i.e., not in parallel by a single process.
> > > > After the parallel InputFormats have been started on the
> TaskManagers,
> > > they
> > > > request InputSplits and open() them. If there are no InputSplits
> there
> > is
> > > > no work to be done and open will not be called.
> > > > You can tweak the behavior by implementing your own InputSplits and
> > > > InputSplitAssigner which assigns exactly one input split to each
> task.
> > > >
> > > > Fabian
> > > >
> > > > 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> > > > [hidden email]>:
> > > >
> > > > > Hi,
> > > > >
> > > > > When we are implementing that Flink *InputFormat* Interface, if we
> > have
> > > > > that*
> > > > > input split creation* part in our data analytics server APIs can we
> > > > > directly go to the second phase of the flink InputFormat Interface
> > > > > execution.
> > > > >
> > > > > Basically I need to know that can we read those InputSplits
> directly,
> > > > > without generating InputSplits inside the InputFormat Interface. So
> > it
> > > > > would be great if you can provide any kind of help.
> > > > >
> > > > > Thanks,
> > > > > Pawan
> > > > >
> > > > > --
> > > > >
> > > > > *Pawan Gunaratne*
> > > > > *Mob: +94 770373556*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > *Pawan Gunaratne*
> > > *Mob: +94 770373556*
> > >
> >
>



--

*Pawan Gunaratne*
*Mob: +94 770373556*