Hi all!
We have been thinking that it would be a great improvement to add contextual information to the Flink logs: - Container / yarn / host info to JM/TM logs - Job info (job id/ jobname) to task logs I this should be similar to how the metric scopes are set up and should be able to provide the same information for logs. Ideally it would be user configurable. We are wondering what would be the best way to do this, and would like to ask for opinions or past experiences. Our natural first thought was setting NDC / MDC in the different threads but it seems to be a somewhat fragile mechanism as it can be easily "cleared" or deleted by the user. What do you think? Gyula |
Hi Gyula,
Sorry for the late reply. I think it is definitely a challenge in terms of log visibility. However, for your requirement I think you can customize your Flink job by utilizing a customized log formatter/encoder (e.g. log4j.properties or logback.xml) and a suitable logger implementation. One example you can follow is to provide customFields in your log encoding [1,2] and utilizing a supported Appender to append your log to a file. You can also utilize a more customized appender to log the data into some external database (for example, ElasticSearch and access via Kibana). One challenge you might face is how to configure these contextual information dynamically. In our setup, these contextual information are configured as system env params when job launches. so loggers can dynamically resolve them during start time. Please let me know if any of the suggestions above helps. Cheers, Rong [1] https://github.com/logstash/logstash-logback-encoder/blob/master/src/test/resources/logback-test.xml#L13 [2] https://github.com/logstash/logstash-logback-encoder On Thu, Oct 3, 2019 at 1:56 AM Gyula Fóra <[hidden email]> wrote: > Hi all! > > We have been thinking that it would be a great improvement to add > contextual information to the Flink logs: > > - Container / yarn / host info to JM/TM logs > - Job info (job id/ jobname) to task logs > > I this should be similar to how the metric scopes are set up and should be > able to provide the same information for logs. Ideally it would be user > configurable. > > We are wondering what would be the best way to do this, and would like to > ask for opinions or past experiences. > > Our natural first thought was setting NDC / MDC in the different threads > but it seems to be a somewhat fragile mechanism as it can be easily > "cleared" or deleted by the user. > > What do you think? > > Gyula > |
+1 to Rong’s approach. We use a similar solution to the log context problem
on YARN setups. FYI. WRT container contextual informations, we collection logs via ELK so that the log file paths (which contains application id and container id) and the host are attached with the logs. But if you don’t want a new log collector, you can also use the system env variables in your log pattern. Flink sets the container informations into the system env variables, which could be found in the container launch script. WRT job contextual informations, we’ve tried MDC on task threads but it ended up with poor readability because Flink system threads are not set with the MDC variables (in my case user info), so now we use user name in system env as the logger pattern variable instead. However, for job id/name, I’m afraid that they can not be found in the default system env variables. You may need to find a way to set them into the system env or system properties. Best, Paul Lam > 在 2019年10月15日,12:50,Rong Rong <[hidden email]> 写道: > > Hi Gyula, > > Sorry for the late reply. I think it is definitely a challenge in terms of > log visibility. > However, for your requirement I think you can customize your Flink job by > utilizing a customized log formatter/encoder (e.g. log4j.properties or > logback.xml) and a suitable logger implementation. > > One example you can follow is to provide customFields in your log encoding > [1,2] and utilizing a supported Appender to append your log to a file. > You can also utilize a more customized appender to log the data into some > external database (for example, ElasticSearch and access via Kibana). > > One challenge you might face is how to configure these contextual > information dynamically. In our setup, these contextual information are > configured as system env params when job launches. so loggers can > dynamically resolve them during start time. > > Please let me know if any of the suggestions above helps. > > Cheers, > Rong > > [1] > https://github.com/logstash/logstash-logback-encoder/blob/master/src/test/resources/logback-test.xml#L13 > [2] https://github.com/logstash/logstash-logback-encoder > > On Thu, Oct 3, 2019 at 1:56 AM Gyula Fóra <[hidden email]> wrote: > >> Hi all! >> >> We have been thinking that it would be a great improvement to add >> contextual information to the Flink logs: >> >> - Container / yarn / host info to JM/TM logs >> - Job info (job id/ jobname) to task logs >> >> I this should be similar to how the metric scopes are set up and should be >> able to provide the same information for logs. Ideally it would be user >> configurable. >> >> We are wondering what would be the best way to do this, and would like to >> ask for opinions or past experiences. >> >> Our natural first thought was setting NDC / MDC in the different threads >> but it seems to be a somewhat fragile mechanism as it can be easily >> "cleared" or deleted by the user. >> >> What do you think? >> >> Gyula >> |
Hi all!
Thanks for the answers, this has been very helpful and we could set up a similar scheme using the Env variables. Cheers, Gyula On Tue, Oct 15, 2019 at 9:55 AM Paul Lam <[hidden email]> wrote: > +1 to Rong’s approach. We use a similar solution to the log context > problem > on YARN setups. FYI. > > WRT container contextual informations, we collection logs via ELK so that > the log file paths (which contains application id and container id) and > the host > are attached with the logs. But if you don’t want a new log collector, you > can > also use the system env variables in your log pattern. Flink sets the > container > informations into the system env variables, which could be found in the > container > launch script. > > WRT job contextual informations, we’ve tried MDC on task threads but it > ended > up with poor readability because Flink system threads are not set with the > MDC > variables (in my case user info), so now we use user name in system env as > the logger pattern variable instead. However, for job id/name, I’m afraid > that > they can not be found in the default system env variables. You may need > to find a way to set them into the system env or system properties. > > Best, > Paul Lam > > > 在 2019年10月15日,12:50,Rong Rong <[hidden email]> 写道: > > > > Hi Gyula, > > > > Sorry for the late reply. I think it is definitely a challenge in terms > of > > log visibility. > > However, for your requirement I think you can customize your Flink job by > > utilizing a customized log formatter/encoder (e.g. log4j.properties or > > logback.xml) and a suitable logger implementation. > > > > One example you can follow is to provide customFields in your log > encoding > > [1,2] and utilizing a supported Appender to append your log to a file. > > You can also utilize a more customized appender to log the data into some > > external database (for example, ElasticSearch and access via Kibana). > > > > One challenge you might face is how to configure these contextual > > information dynamically. In our setup, these contextual information are > > configured as system env params when job launches. so loggers can > > dynamically resolve them during start time. > > > > Please let me know if any of the suggestions above helps. > > > > Cheers, > > Rong > > > > [1] > > > https://github.com/logstash/logstash-logback-encoder/blob/master/src/test/resources/logback-test.xml#L13 > > [2] https://github.com/logstash/logstash-logback-encoder > > > > On Thu, Oct 3, 2019 at 1:56 AM Gyula Fóra <[hidden email]> wrote: > > > >> Hi all! > >> > >> We have been thinking that it would be a great improvement to add > >> contextual information to the Flink logs: > >> > >> - Container / yarn / host info to JM/TM logs > >> - Job info (job id/ jobname) to task logs > >> > >> I this should be similar to how the metric scopes are set up and should > be > >> able to provide the same information for logs. Ideally it would be user > >> configurable. > >> > >> We are wondering what would be the best way to do this, and would like > to > >> ask for opinions or past experiences. > >> > >> Our natural first thought was setting NDC / MDC in the different threads > >> but it seems to be a somewhat fragile mechanism as it can be easily > >> "cleared" or deleted by the user. > >> > >> What do you think? > >> > >> Gyula > >> > > |
+1 to Rong’s approach.
Using java option and log4j, we could save the user logs to different file. Best Yang Gyula Fóra <[hidden email]> 于2019年10月18日周五 下午4:41写道: > Hi all! > > Thanks for the answers, this has been very helpful and we could set up a > similar scheme using the Env variables. > > Cheers, > Gyula > > On Tue, Oct 15, 2019 at 9:55 AM Paul Lam <[hidden email]> wrote: > > > +1 to Rong’s approach. We use a similar solution to the log context > > problem > > on YARN setups. FYI. > > > > WRT container contextual informations, we collection logs via ELK so that > > the log file paths (which contains application id and container id) and > > the host > > are attached with the logs. But if you don’t want a new log collector, > you > > can > > also use the system env variables in your log pattern. Flink sets the > > container > > informations into the system env variables, which could be found in the > > container > > launch script. > > > > WRT job contextual informations, we’ve tried MDC on task threads but it > > ended > > up with poor readability because Flink system threads are not set with > the > > MDC > > variables (in my case user info), so now we use user name in system env > as > > the logger pattern variable instead. However, for job id/name, I’m afraid > > that > > they can not be found in the default system env variables. You may need > > to find a way to set them into the system env or system properties. > > > > Best, > > Paul Lam > > > > > 在 2019年10月15日,12:50,Rong Rong <[hidden email]> 写道: > > > > > > Hi Gyula, > > > > > > Sorry for the late reply. I think it is definitely a challenge in terms > > of > > > log visibility. > > > However, for your requirement I think you can customize your Flink job > by > > > utilizing a customized log formatter/encoder (e.g. log4j.properties or > > > logback.xml) and a suitable logger implementation. > > > > > > One example you can follow is to provide customFields in your log > > encoding > > > [1,2] and utilizing a supported Appender to append your log to a file. > > > You can also utilize a more customized appender to log the data into > some > > > external database (for example, ElasticSearch and access via Kibana). > > > > > > One challenge you might face is how to configure these contextual > > > information dynamically. In our setup, these contextual information are > > > configured as system env params when job launches. so loggers can > > > dynamically resolve them during start time. > > > > > > Please let me know if any of the suggestions above helps. > > > > > > Cheers, > > > Rong > > > > > > [1] > > > > > > https://github.com/logstash/logstash-logback-encoder/blob/master/src/test/resources/logback-test.xml#L13 > > > [2] https://github.com/logstash/logstash-logback-encoder > > > > > > On Thu, Oct 3, 2019 at 1:56 AM Gyula Fóra <[hidden email]> > wrote: > > > > > >> Hi all! > > >> > > >> We have been thinking that it would be a great improvement to add > > >> contextual information to the Flink logs: > > >> > > >> - Container / yarn / host info to JM/TM logs > > >> - Job info (job id/ jobname) to task logs > > >> > > >> I this should be similar to how the metric scopes are set up and > should > > be > > >> able to provide the same information for logs. Ideally it would be > user > > >> configurable. > > >> > > >> We are wondering what would be the best way to do this, and would like > > to > > >> ask for opinions or past experiences. > > >> > > >> Our natural first thought was setting NDC / MDC in the different > threads > > >> but it seems to be a somewhat fragile mechanism as it can be easily > > >> "cleared" or deleted by the user. > > >> > > >> What do you think? > > >> > > >> Gyula > > >> > > > > > |
Free forum by Nabble | Edit this page |