Victor Wong created FLINK-15448:
-----------------------------------
Summary: Make "ResourceID#toString" more descriptive
Key: FLINK-15448
URL:
https://issues.apache.org/jira/browse/FLINK-15448 Project: Flink
Issue Type: Improvement
Affects Versions: 1.9.1
Reporter: Victor Wong
With Flink on Yarn, sometimes we ran into an exception like this:
{code:java}
java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id container_xxxx timed out.
{code}
We'd like to find out the host of the lost TaskManager to log into it for more details, we have to check the previous logs for the host information, which is a little time-consuming.
Maybe we can add more descriptive information to ResourceID of Yarn containers, e.g. "container_xxx@host_name:port_number".
Here's the demo:
{code:java}
class ResourceID {
final String resourceId;
final String details;
public ResourceID(String resourceId) {
this.resourceId = resourceId;
this.details = resourceId;
}
public ResourceID(String resourceId, String details) {
this.resourceId = resourceId;
this.details = details;
}
public String toString() {
return details;
}
}
// in flink-yarn
private void startTaskExecutorInContainer(Container container) {
final String containerIdStr = container.getId().toString();
final String containerDetail = container.getId() + "@" + container.getNodeId();
final ResourceID resourceId = new ResourceID(containerIdStr, containerDetail);
...
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)