[DISCUSS] Flink framework and user log separation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Flink framework and user log separation

vino yang
Dear devs,

Currently, for log output, Flink does not explicitly distinguish between framework logs and user logs. In Task Manager, logs from the framework are intermixed with the user's business logs. In some deployment models, such as Standalone or YARN session, there are different task instances of different jobs deployed in the same Task Manager. It makes the log event flow more confusing unless the users explicitly use tags to distinguish them and it makes locating problems more difficult and inefficient. For YARN job cluster deployment model, this problem will not be very serious, but we still need to artificially distinguish between the framework and the business log. Overall, we found that Flink's existing log model has the following problems:


  • Framework log and business log are mixed in the same log file. There is no way to make a clear distinction, which is not conducive to problem location and analysis;

  • Not conducive to the independent collection of business logs;


Therefore, we propose a mechanism to separate the framework and business log. It can split existing log files for Task Manager.


Currently, it is associated with two JIRA issue:

  • FLINK-11202[1]: Split log file per job

  • FLINK-11782[2]: Enhance TaskManager log visualization by listing all log files for Flink web UI


We have implemented and validated it in standalone and Flink on YARN (job cluster) mode.

sketch 1:



sketch 2:



Best,
Vino

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Flink framework and user log separation

Jamie Grier-3
I think maybe if I understood this correctly this design is going in the
wrong direction.  The problem with Flink logging, when you are running
multiple jobs in the same TMs, is not just about separating out the
business level logging into separate files.  The Flink framework itself
logs many things where there is clearly a single job in context but that
all ends up in the same log file and with no clear separation amongst the
log lines.

Also, I don't think shooting to have multiple log files is a very good idea
either.  It's common, especially on container-based deployments, that the
expectation is that a process (like Flink) logs everything to stdout and
the surrounding tooling takes care of routing that log data somewhere.  I
think we should stick with that model and expect that there will be a
single log stream coming out of each Flink process.

Instead, I think it would be better to enhance Flink's logging capability
such that the appropriate context can be added to each log line with the
exact format controlled by the end user.  It might make sense to take a
look at MDC, for example, as a way to approach this.


On Thu, Feb 28, 2019 at 4:24 AM vino yang <[hidden email]> wrote:

> Dear devs,
>
> Currently, for log output, Flink does not explicitly distinguish between
> framework logs and user logs. In Task Manager, logs from the framework are
> intermixed with the user's business logs. In some deployment models, such
> as Standalone or YARN session, there are different task instances of
> different jobs deployed in the same Task Manager. It makes the log event
> flow more confusing unless the users explicitly use tags to distinguish
> them and it makes locating problems more difficult and inefficient. For
> YARN job cluster deployment model, this problem will not be very serious,
> but we still need to artificially distinguish between the framework and the
> business log. Overall, we found that Flink's existing log model has the
> following problems:
>
>
>    -
>
>    Framework log and business log are mixed in the same log file. There
>    is no way to make a clear distinction, which is not conducive to problem
>    location and analysis;
>    -
>
>    Not conducive to the independent collection of business logs;
>
>
> Therefore, we propose a mechanism to separate the framework and business
> log. It can split existing log files for Task Manager.
>
> Currently, it is associated with two JIRA issue:
>
>    -
>
>    FLINK-11202[1]: Split log file per job
>    -
>
>    FLINK-11782[2]: Enhance TaskManager log visualization by listing all
>    log files for Flink web UI
>
>
> We have implemented and validated it in standalone and Flink on YARN (job
> cluster) mode.
>
> sketch 1:
>
> [image: flink-web-ui-taskmanager-log-files.png]
>
> sketch 2:
> [image: flink-web-ui-taskmanager-log-files-2.png]
>
> Design documentation :
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
>
> Best,
> Vino
>
> [1]: https://issues.apache.org/jira/browse/FLINK-11202
> [2]: https://issues.apache.org/jira/browse/FLINK-11782
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Flink framework and user log separation

vino yang
Hi Jamie Grier,

Thank you for your reply, let me add some explanations to this design.

First of all, as stated in "Goal", it is mainly for the "Standalone"
cluster model, although we have implemented it for Flink on YARN, this does
not mean that we can't turn off this feature by means of options. It should
be noted that the separation is basically based on the "log configuration
file", it is very scalable and even allows users to define the log pattern
of the configuration file (of course this is an extension feature, not
mentioned in the design documentation). In fact, "multiple files are a
special case of a single file", we can provide an option to keep it still
the default behavior, it should be the scene you expect in the container.

According to Flink's official 2016 adjustment report [1], users using the
standalone mode are quite close to the yarn mode (unfortunately there is no
data support in 2017). Although we mainly use Flink on Yarn now, we have
used standalone in depth (close to the daily processing volume of 20
trillion messages). In this scenario, the user logs generated by different
job's tasks are mixed together, and it is very difficult to locate the
issue. Moreover, as we configure the log file scrolling policy, we have to
log in to the server to view it. Therefore, we expect that for the same
task manager, the user logs generated by the tasks from the same job can be
distinguished.

In addition, I have tried MDC technology, but it can not achieve the goal.
The underlying Flink is log4j 1.x and logback. We need to be compatible
with both frameworks at the same time, and we don't allow large-scale
changes to the active code, and no sense to the user.

Some other points:

1) Many of our users have experience using Storm and Spark, and they are
more accustomed to that style in standalone mode;
2) We split the user log by Job, which will help to implement the "business
log aggregation" feature based on the Job.

Best,
Vino

[1]: https://www.ververica.com/blog/flink-user-survey-2016-part-1

Jamie Grier <[hidden email]> 于2019年3月1日周五 上午7:32写道:

> I think maybe if I understood this correctly this design is going in the
> wrong direction.  The problem with Flink logging, when you are running
> multiple jobs in the same TMs, is not just about separating out the
> business level logging into separate files.  The Flink framework itself
> logs many things where there is clearly a single job in context but that
> all ends up in the same log file and with no clear separation amongst the
> log lines.
>
> Also, I don't think shooting to have multiple log files is a very good idea
> either.  It's common, especially on container-based deployments, that the
> expectation is that a process (like Flink) logs everything to stdout and
> the surrounding tooling takes care of routing that log data somewhere.  I
> think we should stick with that model and expect that there will be a
> single log stream coming out of each Flink process.
>
> Instead, I think it would be better to enhance Flink's logging capability
> such that the appropriate context can be added to each log line with the
> exact format controlled by the end user.  It might make sense to take a
> look at MDC, for example, as a way to approach this.
>
>
> On Thu, Feb 28, 2019 at 4:24 AM vino yang <[hidden email]> wrote:
>
> > Dear devs,
> >
> > Currently, for log output, Flink does not explicitly distinguish between
> > framework logs and user logs. In Task Manager, logs from the framework
> are
> > intermixed with the user's business logs. In some deployment models, such
> > as Standalone or YARN session, there are different task instances of
> > different jobs deployed in the same Task Manager. It makes the log event
> > flow more confusing unless the users explicitly use tags to distinguish
> > them and it makes locating problems more difficult and inefficient. For
> > YARN job cluster deployment model, this problem will not be very serious,
> > but we still need to artificially distinguish between the framework and
> the
> > business log. Overall, we found that Flink's existing log model has the
> > following problems:
> >
> >
> >    -
> >
> >    Framework log and business log are mixed in the same log file. There
> >    is no way to make a clear distinction, which is not conducive to
> problem
> >    location and analysis;
> >    -
> >
> >    Not conducive to the independent collection of business logs;
> >
> >
> > Therefore, we propose a mechanism to separate the framework and business
> > log. It can split existing log files for Task Manager.
> >
> > Currently, it is associated with two JIRA issue:
> >
> >    -
> >
> >    FLINK-11202[1]: Split log file per job
> >    -
> >
> >    FLINK-11782[2]: Enhance TaskManager log visualization by listing all
> >    log files for Flink web UI
> >
> >
> > We have implemented and validated it in standalone and Flink on YARN (job
> > cluster) mode.
> >
> > sketch 1:
> >
> > [image: flink-web-ui-taskmanager-log-files.png]
> >
> > sketch 2:
> > [image: flink-web-ui-taskmanager-log-files-2.png]
> >
> > Design documentation :
> >
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
> >
> > Best,
> > Vino
> >
> > [1]: https://issues.apache.org/jira/browse/FLINK-11202
> > [2]: https://issues.apache.org/jira/browse/FLINK-11782
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Flink framework and user log separation

Stephan Ewen
Is that something that can just be done by the right logging framework and
configuration?

Like having a log framework with two targets, one filtered on
"org.apache.flink" and the other one filtered on "my.company.project" or so?

On Fri, Mar 1, 2019 at 3:44 AM vino yang <[hidden email]> wrote:

> Hi Jamie Grier,
>
> Thank you for your reply, let me add some explanations to this design.
>
> First of all, as stated in "Goal", it is mainly for the "Standalone"
> cluster model, although we have implemented it for Flink on YARN, this does
> not mean that we can't turn off this feature by means of options. It should
> be noted that the separation is basically based on the "log configuration
> file", it is very scalable and even allows users to define the log pattern
> of the configuration file (of course this is an extension feature, not
> mentioned in the design documentation). In fact, "multiple files are a
> special case of a single file", we can provide an option to keep it still
> the default behavior, it should be the scene you expect in the container.
>
> According to Flink's official 2016 adjustment report [1], users using the
> standalone mode are quite close to the yarn mode (unfortunately there is no
> data support in 2017). Although we mainly use Flink on Yarn now, we have
> used standalone in depth (close to the daily processing volume of 20
> trillion messages). In this scenario, the user logs generated by different
> job's tasks are mixed together, and it is very difficult to locate the
> issue. Moreover, as we configure the log file scrolling policy, we have to
> log in to the server to view it. Therefore, we expect that for the same
> task manager, the user logs generated by the tasks from the same job can be
> distinguished.
>
> In addition, I have tried MDC technology, but it can not achieve the goal.
> The underlying Flink is log4j 1.x and logback. We need to be compatible
> with both frameworks at the same time, and we don't allow large-scale
> changes to the active code, and no sense to the user.
>
> Some other points:
>
> 1) Many of our users have experience using Storm and Spark, and they are
> more accustomed to that style in standalone mode;
> 2) We split the user log by Job, which will help to implement the "business
> log aggregation" feature based on the Job.
>
> Best,
> Vino
>
> [1]: https://www.ververica.com/blog/flink-user-survey-2016-part-1
>
> Jamie Grier <[hidden email]> 于2019年3月1日周五 上午7:32写道:
>
> > I think maybe if I understood this correctly this design is going in the
> > wrong direction.  The problem with Flink logging, when you are running
> > multiple jobs in the same TMs, is not just about separating out the
> > business level logging into separate files.  The Flink framework itself
> > logs many things where there is clearly a single job in context but that
> > all ends up in the same log file and with no clear separation amongst the
> > log lines.
> >
> > Also, I don't think shooting to have multiple log files is a very good
> idea
> > either.  It's common, especially on container-based deployments, that the
> > expectation is that a process (like Flink) logs everything to stdout and
> > the surrounding tooling takes care of routing that log data somewhere.  I
> > think we should stick with that model and expect that there will be a
> > single log stream coming out of each Flink process.
> >
> > Instead, I think it would be better to enhance Flink's logging capability
> > such that the appropriate context can be added to each log line with the
> > exact format controlled by the end user.  It might make sense to take a
> > look at MDC, for example, as a way to approach this.
> >
> >
> > On Thu, Feb 28, 2019 at 4:24 AM vino yang <[hidden email]> wrote:
> >
> > > Dear devs,
> > >
> > > Currently, for log output, Flink does not explicitly distinguish
> between
> > > framework logs and user logs. In Task Manager, logs from the framework
> > are
> > > intermixed with the user's business logs. In some deployment models,
> such
> > > as Standalone or YARN session, there are different task instances of
> > > different jobs deployed in the same Task Manager. It makes the log
> event
> > > flow more confusing unless the users explicitly use tags to distinguish
> > > them and it makes locating problems more difficult and inefficient. For
> > > YARN job cluster deployment model, this problem will not be very
> serious,
> > > but we still need to artificially distinguish between the framework and
> > the
> > > business log. Overall, we found that Flink's existing log model has the
> > > following problems:
> > >
> > >
> > >    -
> > >
> > >    Framework log and business log are mixed in the same log file. There
> > >    is no way to make a clear distinction, which is not conducive to
> > problem
> > >    location and analysis;
> > >    -
> > >
> > >    Not conducive to the independent collection of business logs;
> > >
> > >
> > > Therefore, we propose a mechanism to separate the framework and
> business
> > > log. It can split existing log files for Task Manager.
> > >
> > > Currently, it is associated with two JIRA issue:
> > >
> > >    -
> > >
> > >    FLINK-11202[1]: Split log file per job
> > >    -
> > >
> > >    FLINK-11782[2]: Enhance TaskManager log visualization by listing all
> > >    log files for Flink web UI
> > >
> > >
> > > We have implemented and validated it in standalone and Flink on YARN
> (job
> > > cluster) mode.
> > >
> > > sketch 1:
> > >
> > > [image: flink-web-ui-taskmanager-log-files.png]
> > >
> > > sketch 2:
> > > [image: flink-web-ui-taskmanager-log-files-2.png]
> > >
> > > Design documentation :
> > >
> >
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
> > >
> > > Best,
> > > Vino
> > >
> > > [1]: https://issues.apache.org/jira/browse/FLINK-11202
> > > [2]: https://issues.apache.org/jira/browse/FLINK-11782
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Flink framework and user log separation

Chesnay Schepler-3
 From what I understand this isn't about logging Flink/user messages to
different files, but log everything relevant to a specific job to a
separate file (including what is being logged in runtime classes, i.e.
Tasks, Operators etc.)

On 04/07/2019 12:37, Stephan Ewen wrote:

> Is that something that can just be done by the right logging framework and
> configuration?
>
> Like having a log framework with two targets, one filtered on
> "org.apache.flink" and the other one filtered on "my.company.project" or so?
>
> On Fri, Mar 1, 2019 at 3:44 AM vino yang <[hidden email]> wrote:
>
>> Hi Jamie Grier,
>>
>> Thank you for your reply, let me add some explanations to this design.
>>
>> First of all, as stated in "Goal", it is mainly for the "Standalone"
>> cluster model, although we have implemented it for Flink on YARN, this does
>> not mean that we can't turn off this feature by means of options. It should
>> be noted that the separation is basically based on the "log configuration
>> file", it is very scalable and even allows users to define the log pattern
>> of the configuration file (of course this is an extension feature, not
>> mentioned in the design documentation). In fact, "multiple files are a
>> special case of a single file", we can provide an option to keep it still
>> the default behavior, it should be the scene you expect in the container.
>>
>> According to Flink's official 2016 adjustment report [1], users using the
>> standalone mode are quite close to the yarn mode (unfortunately there is no
>> data support in 2017). Although we mainly use Flink on Yarn now, we have
>> used standalone in depth (close to the daily processing volume of 20
>> trillion messages). In this scenario, the user logs generated by different
>> job's tasks are mixed together, and it is very difficult to locate the
>> issue. Moreover, as we configure the log file scrolling policy, we have to
>> log in to the server to view it. Therefore, we expect that for the same
>> task manager, the user logs generated by the tasks from the same job can be
>> distinguished.
>>
>> In addition, I have tried MDC technology, but it can not achieve the goal.
>> The underlying Flink is log4j 1.x and logback. We need to be compatible
>> with both frameworks at the same time, and we don't allow large-scale
>> changes to the active code, and no sense to the user.
>>
>> Some other points:
>>
>> 1) Many of our users have experience using Storm and Spark, and they are
>> more accustomed to that style in standalone mode;
>> 2) We split the user log by Job, which will help to implement the "business
>> log aggregation" feature based on the Job.
>>
>> Best,
>> Vino
>>
>> [1]: https://www.ververica.com/blog/flink-user-survey-2016-part-1
>>
>> Jamie Grier <[hidden email]> 于2019年3月1日周五 上午7:32写道:
>>
>>> I think maybe if I understood this correctly this design is going in the
>>> wrong direction.  The problem with Flink logging, when you are running
>>> multiple jobs in the same TMs, is not just about separating out the
>>> business level logging into separate files.  The Flink framework itself
>>> logs many things where there is clearly a single job in context but that
>>> all ends up in the same log file and with no clear separation amongst the
>>> log lines.
>>>
>>> Also, I don't think shooting to have multiple log files is a very good
>> idea
>>> either.  It's common, especially on container-based deployments, that the
>>> expectation is that a process (like Flink) logs everything to stdout and
>>> the surrounding tooling takes care of routing that log data somewhere.  I
>>> think we should stick with that model and expect that there will be a
>>> single log stream coming out of each Flink process.
>>>
>>> Instead, I think it would be better to enhance Flink's logging capability
>>> such that the appropriate context can be added to each log line with the
>>> exact format controlled by the end user.  It might make sense to take a
>>> look at MDC, for example, as a way to approach this.
>>>
>>>
>>> On Thu, Feb 28, 2019 at 4:24 AM vino yang <[hidden email]> wrote:
>>>
>>>> Dear devs,
>>>>
>>>> Currently, for log output, Flink does not explicitly distinguish
>> between
>>>> framework logs and user logs. In Task Manager, logs from the framework
>>> are
>>>> intermixed with the user's business logs. In some deployment models,
>> such
>>>> as Standalone or YARN session, there are different task instances of
>>>> different jobs deployed in the same Task Manager. It makes the log
>> event
>>>> flow more confusing unless the users explicitly use tags to distinguish
>>>> them and it makes locating problems more difficult and inefficient. For
>>>> YARN job cluster deployment model, this problem will not be very
>> serious,
>>>> but we still need to artificially distinguish between the framework and
>>> the
>>>> business log. Overall, we found that Flink's existing log model has the
>>>> following problems:
>>>>
>>>>
>>>>     -
>>>>
>>>>     Framework log and business log are mixed in the same log file. There
>>>>     is no way to make a clear distinction, which is not conducive to
>>> problem
>>>>     location and analysis;
>>>>     -
>>>>
>>>>     Not conducive to the independent collection of business logs;
>>>>
>>>>
>>>> Therefore, we propose a mechanism to separate the framework and
>> business
>>>> log. It can split existing log files for Task Manager.
>>>>
>>>> Currently, it is associated with two JIRA issue:
>>>>
>>>>     -
>>>>
>>>>     FLINK-11202[1]: Split log file per job
>>>>     -
>>>>
>>>>     FLINK-11782[2]: Enhance TaskManager log visualization by listing all
>>>>     log files for Flink web UI
>>>>
>>>>
>>>> We have implemented and validated it in standalone and Flink on YARN
>> (job
>>>> cluster) mode.
>>>>
>>>> sketch 1:
>>>>
>>>> [image: flink-web-ui-taskmanager-log-files.png]
>>>>
>>>> sketch 2:
>>>> [image: flink-web-ui-taskmanager-log-files-2.png]
>>>>
>>>> Design documentation :
>>>>
>> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
>>>> Best,
>>>> Vino
>>>>
>>>> [1]: https://issues.apache.org/jira/browse/FLINK-11202
>>>> [2]: https://issues.apache.org/jira/browse/FLINK-11782
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Flink framework and user log separation

vino yang
In reply to this post by Stephan Ewen
Hi Stephan,

Thanks for your reply.

In some cases, your solution can take effects.

However, in some scenarios, it does not meet the requirement:


   - One program has multiple job instances;
   - If we make Flink as a platform, we can not know the package of the
   users' program to config the log profiles before starting the cluster

Chesnay's understanding is right, we need to split business logs based on
the job.

Recently, a user also feedbacked this requirement.[1]

[1]: https://issues.apache.org/jira/browse/FLINK-12953

Stephan Ewen <[hidden email]> 于2019年7月4日周四 下午6:38写道:

> Is that something that can just be done by the right logging framework and
> configuration?
>
> Like having a log framework with two targets, one filtered on
> "org.apache.flink" and the other one filtered on "my.company.project" or
> so?
>
> On Fri, Mar 1, 2019 at 3:44 AM vino yang <[hidden email]> wrote:
>
> > Hi Jamie Grier,
> >
> > Thank you for your reply, let me add some explanations to this design.
> >
> > First of all, as stated in "Goal", it is mainly for the "Standalone"
> > cluster model, although we have implemented it for Flink on YARN, this
> does
> > not mean that we can't turn off this feature by means of options. It
> should
> > be noted that the separation is basically based on the "log configuration
> > file", it is very scalable and even allows users to define the log
> pattern
> > of the configuration file (of course this is an extension feature, not
> > mentioned in the design documentation). In fact, "multiple files are a
> > special case of a single file", we can provide an option to keep it still
> > the default behavior, it should be the scene you expect in the container.
> >
> > According to Flink's official 2016 adjustment report [1], users using the
> > standalone mode are quite close to the yarn mode (unfortunately there is
> no
> > data support in 2017). Although we mainly use Flink on Yarn now, we have
> > used standalone in depth (close to the daily processing volume of 20
> > trillion messages). In this scenario, the user logs generated by
> different
> > job's tasks are mixed together, and it is very difficult to locate the
> > issue. Moreover, as we configure the log file scrolling policy, we have
> to
> > log in to the server to view it. Therefore, we expect that for the same
> > task manager, the user logs generated by the tasks from the same job can
> be
> > distinguished.
> >
> > In addition, I have tried MDC technology, but it can not achieve the
> goal.
> > The underlying Flink is log4j 1.x and logback. We need to be compatible
> > with both frameworks at the same time, and we don't allow large-scale
> > changes to the active code, and no sense to the user.
> >
> > Some other points:
> >
> > 1) Many of our users have experience using Storm and Spark, and they are
> > more accustomed to that style in standalone mode;
> > 2) We split the user log by Job, which will help to implement the
> "business
> > log aggregation" feature based on the Job.
> >
> > Best,
> > Vino
> >
> > [1]: https://www.ververica.com/blog/flink-user-survey-2016-part-1
> >
> > Jamie Grier <[hidden email]> 于2019年3月1日周五 上午7:32写道:
> >
> > > I think maybe if I understood this correctly this design is going in
> the
> > > wrong direction.  The problem with Flink logging, when you are running
> > > multiple jobs in the same TMs, is not just about separating out the
> > > business level logging into separate files.  The Flink framework itself
> > > logs many things where there is clearly a single job in context but
> that
> > > all ends up in the same log file and with no clear separation amongst
> the
> > > log lines.
> > >
> > > Also, I don't think shooting to have multiple log files is a very good
> > idea
> > > either.  It's common, especially on container-based deployments, that
> the
> > > expectation is that a process (like Flink) logs everything to stdout
> and
> > > the surrounding tooling takes care of routing that log data
> somewhere.  I
> > > think we should stick with that model and expect that there will be a
> > > single log stream coming out of each Flink process.
> > >
> > > Instead, I think it would be better to enhance Flink's logging
> capability
> > > such that the appropriate context can be added to each log line with
> the
> > > exact format controlled by the end user.  It might make sense to take a
> > > look at MDC, for example, as a way to approach this.
> > >
> > >
> > > On Thu, Feb 28, 2019 at 4:24 AM vino yang <[hidden email]>
> wrote:
> > >
> > > > Dear devs,
> > > >
> > > > Currently, for log output, Flink does not explicitly distinguish
> > between
> > > > framework logs and user logs. In Task Manager, logs from the
> framework
> > > are
> > > > intermixed with the user's business logs. In some deployment models,
> > such
> > > > as Standalone or YARN session, there are different task instances of
> > > > different jobs deployed in the same Task Manager. It makes the log
> > event
> > > > flow more confusing unless the users explicitly use tags to
> distinguish
> > > > them and it makes locating problems more difficult and inefficient.
> For
> > > > YARN job cluster deployment model, this problem will not be very
> > serious,
> > > > but we still need to artificially distinguish between the framework
> and
> > > the
> > > > business log. Overall, we found that Flink's existing log model has
> the
> > > > following problems:
> > > >
> > > >
> > > >    -
> > > >
> > > >    Framework log and business log are mixed in the same log file.
> There
> > > >    is no way to make a clear distinction, which is not conducive to
> > > problem
> > > >    location and analysis;
> > > >    -
> > > >
> > > >    Not conducive to the independent collection of business logs;
> > > >
> > > >
> > > > Therefore, we propose a mechanism to separate the framework and
> > business
> > > > log. It can split existing log files for Task Manager.
> > > >
> > > > Currently, it is associated with two JIRA issue:
> > > >
> > > >    -
> > > >
> > > >    FLINK-11202[1]: Split log file per job
> > > >    -
> > > >
> > > >    FLINK-11782[2]: Enhance TaskManager log visualization by listing
> all
> > > >    log files for Flink web UI
> > > >
> > > >
> > > > We have implemented and validated it in standalone and Flink on YARN
> > (job
> > > > cluster) mode.
> > > >
> > > > sketch 1:
> > > >
> > > > [image: flink-web-ui-taskmanager-log-files.png]
> > > >
> > > > sketch 2:
> > > > [image: flink-web-ui-taskmanager-log-files-2.png]
> > > >
> > > > Design documentation :
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > [1]: https://issues.apache.org/jira/browse/FLINK-11202
> > > > [2]: https://issues.apache.org/jira/browse/FLINK-11782
> > > >
> > >
> >
>