[jira] [Created] (FLINK-22729) Truncated Messages in Python workers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22729) Truncated Messages in Python workers

Shang Yuanchun (Jira)
Stephan Ewen created FLINK-22729:
------------------------------------

             Summary: Truncated Messages in Python workers
                 Key: FLINK-22729
                 URL: https://issues.apache.org/jira/browse/FLINK-22729
             Project: Flink
          Issue Type: Bug
          Components: Stateful Functions
    Affects Versions: statefun-2.2.2
         Environment: The Stateful Function version is 2.2.2, java8. The Java App as well as
the external Python workers are deployed in the same kubernetes cluster.
            Reporter: Stephan Ewen
             Fix For: statefun-3.1.0


Recently we started seeing the following faulty behavior in the Flink
Stateful Functions HTTP communication towards external Python workers.
This is only occurring when the system is under heavy load.

The Java Application will send HTTP Messages to an external Python
Function but the external Function fails to parse the message with a
"Truncated Message Error". Printouts show that the truncated message
looks as follows:

{code}
<Start of Message>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protob
{code}


Which leads to the following Error in the Python worker:

{code}
Error Parsing Message: Truncated Message
{code}

Either the sender or the receiver (or something in between) seems to be
truncacting some (not all) messages at some random point in the payload.
The source code in both Flink SDKs looks to be correct. We temporarily
solved this by setting the "maxNumBatchRequests" parameter in the
external function definition really low. But this is not an ideal
solution as we believe this adds considerable communication overhead
between the Java and the Python Functions.

The Stateful Function version is 2.2.2, java8. The Java App as well as
the external Python workers are deployed in the same kubernetes cluster.

----

This was reported on the Mailing List in http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefun-Truncated-Messages-in-Python-workers-td43831.html





--
This message was sent by Atlassian Jira
(v8.3.4#803005)