(DEPRECATED) Apache Flink Mailing List archive.

offheap memory allocation and memory leak bug

Classic

List

Threaded

6 messages Options

CPC

offheap memory allocation and memory leak bug

Hi,

I am making some test about offheap memory usage and encounter an odd behavior. My taskmanager heap limit is 12288 Mb and when i set "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off heap area at most which is heapsize*0.95(value of taskmanager.memory.fraction). But when i submit second job it allocated another 11GB and does not free memory since MaxDirectMemorySize set to -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel oom killed taskmanager. If i hit perform gc from visualvm between jobs then it release direct memory but memory usage of taskmanager in ps command is still around 20GB(RSS) and 27GB(virtual size) in that case i could submit my test job a few times without oom killed task manager but after 10 submit it killed taskmanager again. I dont understand why jvm memory usage is still high even if all direct memory released. Do you have any idea? Then i set MaxDirectMemorySize to 12 GB in this case it freed direct memory without any explicit gc triggering from visualvm but jvm process memory usage was still high around 20GB(RSS) and 27GB(virtual size). After again maybe 10 submit it killed taskmanager. I think this is a bug and make it imposible to reuse taskmanagers without restarting them in standalone mode.

CPC

Re: offheap memory allocation and memory leak bug

Hello,

I repeated the same test with conf values.

taskmanager.heap.mb: 6500
taskmanager.memory.off-heap: true
taskmanager.memory.fraction: 0.9

i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started with

capacman 14543 323 56.0 17014744 13731328 pts/1 Sl 16:23 35:25 /home/capacman/programlama/java/jdk1.7.0_75/bin/java -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M -Xmx650M -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml -classpath /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar::: org.apache.flink.runtime.taskmanager.TaskManager --configDir /home/capacman/Data/programlama/flink-1.0.3/conf

but memory usage reach up to 13Gb. Could somebodey explain me why memory usage is so high? I expect it to be at most 8GB with some jvm internal overhead.

On 17 June 2016 at 20:26, CPC <[hidden email]> wrote:

Hi,

I am making some test about offheap memory usage and encounter an odd behavior. My taskmanager heap limit is 12288 Mb and when i set "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off heap area at most which is heapsize*0.95(value of taskmanager.memory.fraction). But when i submit second job it allocated another 11GB and does not free memory since MaxDirectMemorySize set to -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel oom killed taskmanager. If i hit perform gc from visualvm between jobs then it release direct memory but memory usage of taskmanager in ps command is still around 20GB(RSS) and 27GB(virtual size) in that case i could submit my test job a few times without oom killed task manager but after 10 submit it killed taskmanager again. I dont understand why jvm memory usage is still high even if all direct memory released. Do you have any idea? Then i set MaxDirectMemorySize to 12 GB in this case it freed direct memory without any explicit gc triggering from visualvm but jvm process memory usage was still high around 20GB(RSS) and 27GB(virtual size). After again maybe 10 submit it killed taskmanager. I think this is a bug and make it imposible to reuse taskmanagers without restarting them in standalone mode.

CPC

Re: offheap memory allocation and memory leak bug

Hi,

I think i found some information regarding this behavior. In jvm it is almost imposible to free allocated memory via ByteBuffer.allocateDirect. There is no explicit way to say jvm "free this direct bytebuffer". In some forums they said you can free memory with below method:

def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
if(!buffers.isEmpty){
val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
cleanerMethod.setAccessible(true)
buffers.foreach{buffer=>
val cleaner = cleanerMethod.invoke(buffer)
val cleanMethod = cleaner.getClass().getMethod("clean")
cleanMethod.setAccessible(true)
cleanMethod.invoke(cleaner)
}
}
List.empty[ByteBuffer]
}

but since cleaner method is an internal method ,above is not recommended and not working in every jvm and java 9 does not support it also. I also made some tests with above method and behavior is not predictable. If memory allocated by some other thread and that thread exit then it release memory. Actually GC controls directMemory buffers. If there is no gc activity and memory is allocated and then dereferenced by different threads memory usage goes beyond intended and machine goes to swap then os kills taskmanager. In my tests i saw that behaviour:

Suppose that thread A allocated 8gb memory exit and there is no reference to allocated memory

than thread B allocated 8gb memory exit and there is no reference to allocated memory

when i look at direct memory usage from jvisualvm it looks like below(-Xmx512m -XX:MaxDirectMemorySize=12G)

but RSS of the process is 16 GB. If i call System.gc at that point RSS drops to 8GB but not to expected point.

This is why Apache cassandra guys select sun.misc.Unsafe(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html).

I think currently only way to limit memory usage in flink if you want to use same taskmanager across jobs is via "taskmanager.memory.preallocate: true". Since it allocate memory at the beginning and not freed its memory usage stays constant.

PS: Sorry for my english i am not a native speaker. I hope i can explain what i intended to :)

On 18 June 2016 at 16:36, CPC <[hidden email]> wrote:

Hello,

I repeated the same test with conf values.
taskmanager.heap.mb: 6500
taskmanager.memory.off-heap: true
taskmanager.memory.fraction: 0.9

i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started with
capacman 14543 323 56.0 17014744 13731328 pts/1 Sl 16:23 35:25 /home/capacman/programlama/java/jdk1.7.0_75/bin/java -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M -Xmx650M -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml -classpath /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar::: org.apache.flink.runtime.taskmanager.TaskManager --configDir /home/capacman/Data/programlama/flink-1.0.3/conf

but memory usage reach up to 13Gb. Could somebodey explain me why memory usage is so high? I expect it to be at most 8GB with some jvm internal overhead.

On 17 June 2016 at 20:26, CPC <[hidden email]> wrote:
Hi,

I am making some test about offheap memory usage and encounter an odd behavior. My taskmanager heap limit is 12288 Mb and when i set "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off heap area at most which is heapsize*0.95(value of taskmanager.memory.fraction). But when i submit second job it allocated another 11GB and does not free memory since MaxDirectMemorySize set to -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel oom killed taskmanager. If i hit perform gc from visualvm between jobs then it release direct memory but memory usage of taskmanager in ps command is still around 20GB(RSS) and 27GB(virtual size) in that case i could submit my test job a few times without oom killed task manager but after 10 submit it killed taskmanager again. I dont understand why jvm memory usage is still high even if all direct memory released. Do you have any idea? Then i set MaxDirectMemorySize to 12 GB in this case it freed direct memory without any explicit gc triggering from visualvm but jvm process memory usage was still high around 20GB(RSS) and 27GB(virtual size). After again maybe 10 submit it killed taskmanager. I think this is a bug and make it imposible to reuse taskmanagers without restarting them in standalone mode.

Till Rohrmann

Re: offheap memory allocation and memory leak bug

Hi,

your observation sounds like a bug to me and we have to further investigate
it. I assume that you’re running a batch job, right? Could you maybe share
your complete configuration and the job to reproduce the problem with us?

I think that your investigation that direct buffers are not properly freed
and garbage collected can be right. I will open a JIRA issue to further
investigate and solve the problem. Thanks for reporting :-)

At the moment, one way to solve this problem is, as you’ve already stated,
to set taskmanager.memory.preallocate: true in your configuration. For
batch jobs, this should actually improve the runtime performance at the
cost of a slightly longer start-up time for your TaskManagers.

Cheers,
Till

On Sun, Jun 19, 2016 at 6:16 PM, CPC <[hidden email]> wrote:

> Hi,
>
> I think i found some information regarding this behavior. In jvm it is
> almost imposible to free allocated memory via ByteBuffer.allocateDirect.
> There is no explicit way to say jvm "free this direct bytebuffer". In some
> forums they said you can free memory with below method:
>
>> def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
>>
>> if(!buffers.isEmpty){
>>
>> val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
>>
>> cleanerMethod.setAccessible(true)
>>
>> buffers.foreach{buffer=>
>>
>> val cleaner = cleanerMethod.invoke(buffer)
>>
>> val cleanMethod = cleaner.getClass().getMethod("clean")
>>
>> cleanMethod.setAccessible(true)
>>
>> cleanMethod.invoke(cleaner)
>>
>> }
>>
>> }
>>
>> List.empty[ByteBuffer]
>>
>> }
>>
>>
> but since cleaner method is an internal method ,above is not recommended
> and not working in every jvm and java 9 does not support it also. I also
> made some tests with above method and behavior is not predictable. If
> memory allocated by some other thread and that thread exit then it release
> memory. Actually GC controls directMemory buffers. If there is no gc
> activity and memory is allocated and then dereferenced by different threads
> memory usage goes beyond intended and machine goes to swap then os kills
> taskmanager. In my tests i saw that behaviour:
>
> Suppose that thread A allocated 8gb memory exit and there is no reference
> to allocated memory
> than thread B allocated 8gb memory exit and there is no reference to
> allocated memory
>
> when i look at direct memory usage from jvisualvm it looks like
> below(-Xmx512m -XX:MaxDirectMemorySize=12G)
>
> [image: Inline images 1]
>
> but RSS of the process is 16 GB. If i call System.gc at that point RSS
> drops to 8GB but not to expected point.
>
> This is why Apache cassandra guys select sun.misc.Unsafe(
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html
> ).
>
> I think currently only way to limit memory usage in flink if you want to
> use same taskmanager across jobs is via "taskmanager.memory.preallocate:
> true". Since it allocate memory at the beginning and not freed its memory
> usage stays constant.
>
> PS: Sorry for my english i am not a native speaker. I hope i can explain
> what i intended to :)
>
>
>
> On 18 June 2016 at 16:36, CPC <[hidden email]> wrote:
>
>> Hello,
>>
>> I repeated the same test with conf values.
>>
>>> taskmanager.heap.mb: 6500
>>>
>>> taskmanager.memory.off-heap: true
>>>
>>> taskmanager.memory.fraction: 0.9
>>>
>>>
>> i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started
>> with
>>
>>> capacman 14543 323 56.0 17014744 13731328 pts/1 Sl 16:23 35:25
>>> /home/capacman/programlama/java/jdk1.7.0_75/bin/java
>>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M -Xmx650M
>>> -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m
>>> -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log
>>> -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties
>>> -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml
>>> -classpath
>>> /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar:::
>>> org.apache.flink.runtime.taskmanager.TaskManager --configDir
>>> /home/capacman/Data/programlama/flink-1.0.3/conf
>>>
>>
>> but memory usage reach up to 13Gb. Could somebodey explain me why memory
>> usage is so high? I expect it to be at most 8GB with some jvm internal
>> overhead.
>>
>> [image: Inline images 1]
>>
>> [image: Inline images 2]
>>
>> On 17 June 2016 at 20:26, CPC <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> I am making some test about offheap memory usage and encounter an odd
>>> behavior. My taskmanager heap limit is 12288 Mb and when i set
>>> "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off
>>> heap area at most which is heapsize*0.95(value of
>>> taskmanager.memory.fraction). But when i submit second job it allocated
>>> another 11GB and does not free memory since MaxDirectMemorySize set to
>>> -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is
>>> TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel oom
>>> killed taskmanager. If i hit perform gc from visualvm between jobs then it
>>> release direct memory but memory usage of taskmanager in ps command is
>>> still around 20GB(RSS) and 27GB(virtual size) in that case i could submit
>>> my test job a few times without oom killed task manager but after 10 submit
>>> it killed taskmanager again. I dont understand why jvm memory usage is
>>> still high even if all direct memory released. Do you have any idea? Then
>>> i set MaxDirectMemorySize to 12 GB in this case it freed direct memory
>>> without any explicit gc triggering from visualvm but jvm process memory
>>> usage was still high around 20GB(RSS) and 27GB(virtual size). After again
>>> maybe 10 submit it killed taskmanager. I think this is a bug and make it
>>> imposible to reuse taskmanagers without restarting them in standalone mode.
>>>
>>> [image: Inline images 1]
>>>
>>> [image: Inline images 2]
>>>
>>
>>
>

CPC

Re: offheap memory allocation and memory leak bug

Hi Till,

I saw jira issue. Do you want me to upload input dataset as well? If you
want i can prepare a github repo if it would be more easier.
On Jun 20, 2016 1:10 PM, "Till Rohrmann" <[hidden email]> wrote:

> Hi,
>
> your observation sounds like a bug to me and we have to further investigate
> it. I assume that you’re running a batch job, right? Could you maybe share
> your complete configuration and the job to reproduce the problem with us?
>
> I think that your investigation that direct buffers are not properly freed
> and garbage collected can be right. I will open a JIRA issue to further
> investigate and solve the problem. Thanks for reporting :-)
>
> At the moment, one way to solve this problem is, as you’ve already stated,
> to set taskmanager.memory.preallocate: true in your configuration. For
> batch jobs, this should actually improve the runtime performance at the
> cost of a slightly longer start-up time for your TaskManagers.
>
> Cheers,
> Till
>
>
> On Sun, Jun 19, 2016 at 6:16 PM, CPC <[hidden email]> wrote:
>
> > Hi,
> >
> > I think i found some information regarding this behavior. In jvm it is
> > almost imposible to free allocated memory via ByteBuffer.allocateDirect.
> > There is no explicit way to say jvm "free this direct bytebuffer". In
> some
> > forums they said you can free memory with below method:
> >
> >> def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
> >>
> >> if(!buffers.isEmpty){
> >>
> >> val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
> >>
> >> cleanerMethod.setAccessible(true)
> >>
> >> buffers.foreach{buffer=>
> >>
> >> val cleaner = cleanerMethod.invoke(buffer)
> >>
> >> val cleanMethod = cleaner.getClass().getMethod("clean")
> >>
> >> cleanMethod.setAccessible(true)
> >>
> >> cleanMethod.invoke(cleaner)
> >>
> >> }
> >>
> >> }
> >>
> >> List.empty[ByteBuffer]
> >>
> >> }
> >>
> >>
> > but since cleaner method is an internal method ,above is not recommended
> > and not working in every jvm and java 9 does not support it also. I also
> > made some tests with above method and behavior is not predictable. If
> > memory allocated by some other thread and that thread exit then it
> release
> > memory. Actually GC controls directMemory buffers. If there is no gc
> > activity and memory is allocated and then dereferenced by different
> threads
> > memory usage goes beyond intended and machine goes to swap then os kills
> > taskmanager. In my tests i saw that behaviour:
> >
> > Suppose that thread A allocated 8gb memory exit and there is no reference
> > to allocated memory
> > than thread B allocated 8gb memory exit and there is no reference to
> > allocated memory
> >
> > when i look at direct memory usage from jvisualvm it looks like
> > below(-Xmx512m -XX:MaxDirectMemorySize=12G)
> >
> > [image: Inline images 1]
> >
> > but RSS of the process is 16 GB. If i call System.gc at that point RSS
> > drops to 8GB but not to expected point.
> >
> > This is why Apache cassandra guys select sun.misc.Unsafe(
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html
> > ).
> >
> > I think currently only way to limit memory usage in flink if you want to
> > use same taskmanager across jobs is via "taskmanager.memory.preallocate:
> > true". Since it allocate memory at the beginning and not freed its memory
> > usage stays constant.
> >
> > PS: Sorry for my english i am not a native speaker. I hope i can explain
> > what i intended to :)
> >
> >
> >
> > On 18 June 2016 at 16:36, CPC <[hidden email]> wrote:
> >
> >> Hello,
> >>
> >> I repeated the same test with conf values.
> >>
> >>> taskmanager.heap.mb: 6500
> >>>
> >>> taskmanager.memory.off-heap: true
> >>>
> >>> taskmanager.memory.fraction: 0.9
> >>>
> >>>
> >> i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started
> >> with
> >>
> >>> capacman 14543 323 56.0 17014744 13731328 pts/1 Sl 16:23 35:25
> >>> /home/capacman/programlama/java/jdk1.7.0_75/bin/java
> >>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M -Xmx650M
> >>> -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m
> >>>
> -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log
> >>>
> -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties
> >>>
> -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml
> >>> -classpath
> >>>
> /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar:::
> >>> org.apache.flink.runtime.taskmanager.TaskManager --configDir
> >>> /home/capacman/Data/programlama/flink-1.0.3/conf
> >>>
> >>
> >> but memory usage reach up to 13Gb. Could somebodey explain me why memory
> >> usage is so high? I expect it to be at most 8GB with some jvm internal
> >> overhead.
> >>
> >> [image: Inline images 1]
> >>
> >> [image: Inline images 2]
> >>
> >> On 17 June 2016 at 20:26, CPC <[hidden email]> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am making some test about offheap memory usage and encounter an odd
> >>> behavior. My taskmanager heap limit is 12288 Mb and when i set
> >>> "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb
> off
> >>> heap area at most which is heapsize*0.95(value of
> >>> taskmanager.memory.fraction). But when i submit second job it allocated
> >>> another 11GB and does not free memory since MaxDirectMemorySize set to
> >>> -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is
> >>> TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel
> oom
> >>> killed taskmanager. If i hit perform gc from visualvm between jobs
> then it
> >>> release direct memory but memory usage of taskmanager in ps command is
> >>> still around 20GB(RSS) and 27GB(virtual size) in that case i could
> submit
> >>> my test job a few times without oom killed task manager but after 10
> submit
> >>> it killed taskmanager again. I dont understand why jvm memory usage
> is
> >>> still high even if all direct memory released. Do you have any idea?
> Then
> >>> i set MaxDirectMemorySize to 12 GB in this case it freed direct
> memory
> >>> without any explicit gc triggering from visualvm but jvm process memory
> >>> usage was still high around 20GB(RSS) and 27GB(virtual size). After
> again
> >>> maybe 10 submit it killed taskmanager. I think this is a bug and make
> it
> >>> imposible to reuse taskmanagers without restarting them in standalone
> mode.
> >>>
> >>> [image: Inline images 1]
> >>>
> >>> [image: Inline images 2]
> >>>
> >>
> >>
> >
>

Till Rohrmann

Re: offheap memory allocation and memory leak bug

That would be great. Best you directly post the link to the JIRA issue.

Cheers,
Till

On Mon, Jun 20, 2016 at 12:55 PM, CPC <[hidden email]> wrote:

> Hi Till,
>
> I saw jira issue. Do you want me to upload input dataset as well? If you
> want i can prepare a github repo if it would be more easier.
> On Jun 20, 2016 1:10 PM, "Till Rohrmann" <[hidden email]> wrote:
>
> > Hi,
> >
> > your observation sounds like a bug to me and we have to further
> investigate
> > it. I assume that you’re running a batch job, right? Could you maybe
> share
> > your complete configuration and the job to reproduce the problem with us?
> >
> > I think that your investigation that direct buffers are not properly
> freed
> > and garbage collected can be right. I will open a JIRA issue to further
> > investigate and solve the problem. Thanks for reporting :-)
> >
> > At the moment, one way to solve this problem is, as you’ve already
> stated,
> > to set taskmanager.memory.preallocate: true in your configuration. For
> > batch jobs, this should actually improve the runtime performance at the
> > cost of a slightly longer start-up time for your TaskManagers.
> >
> > Cheers,
> > Till
> >
> >
> > On Sun, Jun 19, 2016 at 6:16 PM, CPC <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > I think i found some information regarding this behavior. In jvm it is
> > > almost imposible to free allocated memory via
> ByteBuffer.allocateDirect.
> > > There is no explicit way to say jvm "free this direct bytebuffer". In
> > some
> > > forums they said you can free memory with below method:
> > >
> > >> def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
> > >>
> > >> if(!buffers.isEmpty){
> > >>
> > >> val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
> > >>
> > >> cleanerMethod.setAccessible(true)
> > >>
> > >> buffers.foreach{buffer=>
> > >>
> > >> val cleaner = cleanerMethod.invoke(buffer)
> > >>
> > >> val cleanMethod = cleaner.getClass().getMethod("clean")
> > >>
> > >> cleanMethod.setAccessible(true)
> > >>
> > >> cleanMethod.invoke(cleaner)
> > >>
> > >> }
> > >>
> > >> }
> > >>
> > >> List.empty[ByteBuffer]
> > >>
> > >> }
> > >>
> > >>
> > > but since cleaner method is an internal method ,above is not
> recommended
> > > and not working in every jvm and java 9 does not support it also. I
> also
> > > made some tests with above method and behavior is not predictable. If
> > > memory allocated by some other thread and that thread exit then it
> > release
> > > memory. Actually GC controls directMemory buffers. If there is no gc
> > > activity and memory is allocated and then dereferenced by different
> > threads
> > > memory usage goes beyond intended and machine goes to swap then os
> kills
> > > taskmanager. In my tests i saw that behaviour:
> > >
> > > Suppose that thread A allocated 8gb memory exit and there is no
> reference
> > > to allocated memory
> > > than thread B allocated 8gb memory exit and there is no reference to
> > > allocated memory
> > >
> > > when i look at direct memory usage from jvisualvm it looks like
> > > below(-Xmx512m -XX:MaxDirectMemorySize=12G)
> > >
> > > [image: Inline images 1]
> > >
> > > but RSS of the process is 16 GB. If i call System.gc at that point RSS
> > > drops to 8GB but not to expected point.
> > >
> > > This is why Apache cassandra guys select sun.misc.Unsafe(
> > >
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html
> > > ).
> > >
> > > I think currently only way to limit memory usage in flink if you want
> to
> > > use same taskmanager across jobs is via
> "taskmanager.memory.preallocate:
> > > true". Since it allocate memory at the beginning and not freed its
> memory
> > > usage stays constant.
> > >
> > > PS: Sorry for my english i am not a native speaker. I hope i can
> explain
> > > what i intended to :)
> > >
> > >
> > >
> > > On 18 June 2016 at 16:36, CPC <[hidden email]> wrote:
> > >
> > >> Hello,
> > >>
> > >> I repeated the same test with conf values.
> > >>
> > >>> taskmanager.heap.mb: 6500
> > >>>
> > >>> taskmanager.memory.off-heap: true
> > >>>
> > >>> taskmanager.memory.fraction: 0.9
> > >>>
> > >>>
> > >> i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started
> > >> with
> > >>
> > >>> capacman 14543 323 56.0 17014744 13731328 pts/1 Sl 16:23 35:25
> > >>> /home/capacman/programlama/java/jdk1.7.0_75/bin/java
> > >>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M
> -Xmx650M
> > >>> -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m
> > >>>
> >
> -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log
> > >>>
> >
> -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties
> > >>>
> >
> -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml
> > >>> -classpath
> > >>>
> >
> /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar:::
> > >>> org.apache.flink.runtime.taskmanager.TaskManager --configDir
> > >>> /home/capacman/Data/programlama/flink-1.0.3/conf
> > >>>
> > >>
> > >> but memory usage reach up to 13Gb. Could somebodey explain me why
> memory
> > >> usage is so high? I expect it to be at most 8GB with some jvm internal
> > >> overhead.
> > >>
> > >> [image: Inline images 1]
> > >>
> > >> [image: Inline images 2]
> > >>
> > >> On 17 June 2016 at 20:26, CPC <[hidden email]> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am making some test about offheap memory usage and encounter an odd
> > >>> behavior. My taskmanager heap limit is 12288 Mb and when i set
> > >>> "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb
> > off
> > >>> heap area at most which is heapsize*0.95(value of
> > >>> taskmanager.memory.fraction). But when i submit second job it
> allocated
> > >>> another 11GB and does not free memory since MaxDirectMemorySize set
> to
> > >>> -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}" which is
> > >>> TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel
> > oom
> > >>> killed taskmanager. If i hit perform gc from visualvm between jobs
> > then it
> > >>> release direct memory but memory usage of taskmanager in ps command
> is
> > >>> still around 20GB(RSS) and 27GB(virtual size) in that case i could
> > submit
> > >>> my test job a few times without oom killed task manager but after 10
> > submit
> > >>> it killed taskmanager again. I dont understand why jvm memory usage
> > is
> > >>> still high even if all direct memory released. Do you have any idea?
> > Then
> > >>> i set MaxDirectMemorySize to 12 GB in this case it freed direct
> > memory
> > >>> without any explicit gc triggering from visualvm but jvm process
> memory
> > >>> usage was still high around 20GB(RSS) and 27GB(virtual size). After
> > again
> > >>> maybe 10 submit it killed taskmanager. I think this is a bug and make
> > it
> > >>> imposible to reuse taskmanagers without restarting them in standalone
> > mode.
> > >>>
> > >>> [image: Inline images 1]
> > >>>
> > >>> [image: Inline images 2]
> > >>>
> > >>
> > >>
> > >
> >
>