[Linux][Android] Analyzing Memory Usage.

Domain/Linux 2010.08.15 09:45

First of all, I am going to avoid way that using popular tools used in desktop Linux, because in embedded environment(embedded Linux like Android), we cannot expect them.
Before talking about memory analysis, let's look over fundamental concepts related with memory.
(This article is written based on Kernel 2.6.29)

There are 4 type of memory.
private clean, private dirty, shared clean and shared dirty are those.
* Clean vs. Dirty.
Clean means "It doesn't affect to system in point of semantics." So, we can abandon this at any time. Usually, mmap()ed or unwritten memory can be it.
Dirty is exactly opposite.
* Private vs. Shared
This is trivial.

Here is example in Android,
Shared clean : common dex files.
Private clean : application specific dex files
Shared dirty : library "live" dex structures(ex. Class objects), shared copy-on-write heap. - That's why 'Zygote' exists.
Private dirty : application "live" dex structures, application heap.

Usually, clean memory is not interesting subject.
Most memory analysis is focused on dirty memory especially private dirty.
(shared dirty is also important in some cases.)

Linux uses Virtual Memory(henceforth VM). I think reader already familiar with this. Let's move one step forward. Usually, "demand paging" is used. By using "demand paging", Linux doesn't use RAM space before the page is really requested. Then what this exactly means. Let's see below codes.

#define _BUFSZ (1024*1024*10)
static int _mem[_BUFSZ];
int main (int argc, char* argv[]) {
    int i;
    /* --- (*1) --- */
    for(i=0; i<_BUFSZ; i++) {
        _mem[i] = i;
    }
    /* --- (*2) --- */
}

As you see, "sizeof(_mem)" is sizeof(int)*10*1024*1024 = 40MB (let's assume that sizeof(int)==4).
But, at (*1), _mem is not REALLY requested yet. So, Linux doesn't allocate pages in the RAM. But, at (*2), _mem is requested. So, pages for _mem is in RAM.
OK? Later, we will confirm this from the Kernel.

Now, let's go to the practical stage.
As reader already may know, there is special file system - called procfs - in Linux. We can get lots of kernel information from procfs including memory information.
Try "cat /proc/meminfo".
Then you can see lots of information about memory. Let's ignore all others except for 'MemTotal', 'MemFree', 'Buffers', 'Cached'
(Documents in Kernel source are quoted for below description)
-----------------------------------------------
MemTotal : Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
MemFree: The sum of LowFree + HighFree
LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also available for the kernel's use for its own data structures.  Among many other things, it is where everything from the Slab is allocated.  Bad things happen when you're out of lowmem.
HighFree: Highmem is all memory above ~860MB of physical memory Highmem areas are for use by userspace programs, or for the pagecache.  The kernel must use tricks to access this memory, making it slower to access than lowmem.
Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached.
-----------------------------------------------

Now, we know that size of total memory and free memory etc.
Type 'adb shell ps'
we can see VSIZE(henceforth VSS), RSS(Resident Set Size) column. VSS is amount of memory that process requires. RSS is amount of memory that is REALLY located at physical memory - demanded one!.
As mentioned above, main reason of difference between VSS and RSS  is 'demand paging'.
Now, let's sum all RSSs. Interestingly sum of RSSs is larger than total memory size from 'meminfo'
Why? Can you guess? Right. Due to shared one. For example, In case of Android, there are some prelinked objects. And those are shared among processes. And process RSS size includes those shared one. That's why sum of RSSs is larger than memory size.

To understand deeper about VSS, see following example.
Make empty program, execute it and check it's VSS. For example

void main() { sleep(1000); }

It's size is over 1M!. Why? Kernel reserves memory blocks to handle process itself - for example, page table, control block etc.  As an example, in case of page table, normal 32-bit machine uses 4Kb page and 4G virtual memory. So, number of pages are 4G/4K = 1M. To keep tracking 1M pages, definitely, certain amount of memory is required.
So, at least some - actually, not small - amount of memory is required even in very tiny process.

As mentioned above RSS includes shared memory block. But, we want to know reasonable size of memory that is really located in.
Here is PSS(Proportional Set Size).

PSS = "Non-shared process private memory" + "Shared memory" / "Number of processes that shares those".

Great!. So, sum of PSS is real size of occupied memory.
Then, how can we know it?
The most primitive way is checking

/proc/<PID>/smaps

You can easily found PSS field in it. (For more details, see kernel source code 'task_mmu.c')

smaps also shows memory usage of each memory type for each Virtual Memory Area(VMA).
So, we can analyse memory status deeply through smaps (usually, focusing on private dirty).

Let's deep dive to memory world more.
We can easily guess and know followings.
- local global, static, heap (usually allocated by malloc & new etc) memories are all private. And those are clean until something is written on them.
- memory used by mmap is shared and can be clean and dirty. But mapped memory is also on-demand. What does this mean? Let's assume that 4 processes share one anonymous mmaped area. At first, they don't access to the memory. So, RSS/PSS regarding this area is '0'. But after process 1 accesses to the memory, this process's RSS/PSS is increased. And then, when process 2 accesses to the memory, this process's RSS/PSS is increased. But, in this case, memory is shared by two processes (1, 2). So, amount of memory increased in terms of PSS is half of amount increased in terms of RSS.

Here is example of this case.
Code for test process looks like this.


#include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <time.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <memory.h> #define MEMSZ 1024 * 1024 * 10 #define MAPSZ 1024 * 1024 * 6 int main(int argc, const char *argv[]) { char pidbuf[32]; int fd, cnt, done; char *b, *map; b = malloc(MEMSZ); map = (char *)mmap(NULL, MAPSZ, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0); done = 0; cnt = 3; while (cnt--) { switch (fork()) { case -1: printf("ERR fork\n"); exit(0); case 0: /* child */ done = 1; } if (done) break; } while (1) { if (0 <= open("done", O_RDONLY)) break; snprintf(pidbuf, sizeof(pidbuf), "%d-mem", getpid()); if (0 <= (fd = open(pidbuf, O_RDONLY))) { close(fd); unlink(pidbuf); /* access on demand */ memset(b, 0, MEMSZ); } snprintf(pidbuf, sizeof(pidbuf), "%d-map", getpid()); if (0 <= (fd = open(pidbuf, O_RDONLY))) { char h; int sz; close(fd); unlink(pidbuf); sz = MAPSZ / 2; /* access MAPSZ / 2 -> read on demand */ while (sz--) h ^= map[sz]; #ifdef WRITE_TEST sz = MAPSZ / 2; while (sz--) map[sz] ^= map[sz]; #endif } sleep(1); } return EXIT_SUCCESS; }  

Followings are smap report regarding mmapped memory area.

[ Original ]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB


[After read mmapped area by creating "-map" file]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                3072 kB
Pss:                3072 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      3072 kB
Private_Dirty:         0 kB
Referenced:         3072 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB

[After read mmapped area by creating "-map" file for another child process forking from same parent]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                3072 kB
Pss:                1536 kB
Shared_Clean:       3072 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:         3072 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB 

Please take your attention to change of RSS/PSS memory size.
And, one more interesting point is, memory is still shared clean because operation is just 'read'.

Then, what happen if mmap with MAP_PRIVATE instead of MAP_SHARED.
In this case, memory allocated by mmap is handled just like memory allocated by malloc.
So, with high probability, two memory area are merged int to one. And you may see one private memory area whose size is 16M.

Next topic is very interesting.
Let's try with MAP_PRIVATE.

7f987c7aa000-7f987d7ab000 rw-p 00000000 00:00 0
Size:              16388 kB
Rss:                   4 kB
Pss:                   1 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

In case of read mmapped area, memory is not allocated even if read operation is executed. (This is different with the case of MAP_SHARED.) Why? Because MAP_PRIVATE maps memory with copy-on-write. So, just reading don't need to allocate memory.
Let's try with enable 'WRITE_TEST' define switch.
7f001b786000-7f001c787000 rw-p 00000000 00:00 0 
Size:              16388 kB
Rss:                3076 kB
Pss:                3073 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:      3072 kB
Referenced:         3076 kB
Anonymous:          3076 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

As you can see, memory is allocated successfully as 'Private Dirty'. As next step let's see writing mapped area by it's child.
[After write mmapped area by creating "-map" file for another child process forking from same parent]
7f001b786000-7f001c787000 rw-p 00000000 00:00 0 
Size:              16388 kB
Rss:                3076 kB
Pss:                3073 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:      3072 kB
Referenced:         3076 kB
Anonymous:          3076 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
PSS and RSS is unchanged.
This is what we expected

But, there is interesting case.
Let's see below vma information
b5a7f000-b5f7f000 -w-p 00000000 00:04 24098      xxxxxxxxxxxxxxx
Size:               5120 kB
Rss:                5120 kB
Pss:                1280 kB
Shared_Clean:          0 kB
Shared_Dirty:       5120 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:          5120 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB 
Even if memory area is private and writable, RSS != PSS - That is, it is shared! In case of read-only private area, it can be shared - ex. loaded shared library code. But, this is writable private area! What happened? <= <TODO>I need to investiage more about it!!!

kenel uses VM_MAYSHARE flag to tell this is 'p' or 's' - see task_mmu.c in kernel.
I'm not sure that VM_MAYSHARE is more valuable information then VM_SHARED.
But, I have to analyse this case deeper... (to be updated after more analysis...)

Next case is ashmem.
memory mapped with MAP_PRIVATE, can be shared in case of ashmem.
You can test this case by using below code in Android (NDK).

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <memory.h>
#include <sys/ioctl.h>
#include <linux limits.h>
#include <linux ioctl.h>


#define ASHMEM_NAME_LEN		256
#define ASHMEM_NAME_DEF		"dev/ashmem"

/* Return values from ASHMEM_PIN: Was the mapping purged while unpinned? */
#define ASHMEM_NOT_REAPED	0
#define ASHMEM_WAS_REAPED	1

/* Return values from ASHMEM_UNPIN: Is the mapping now pinned or unpinned? */
#define ASHMEM_NOW_UNPINNED	0
#define ASHMEM_NOW_PINNED	1

#define __ASHMEMIOC		0x77

#define ASHMEM_SET_NAME		_IOW(__ASHMEMIOC, 1, char[ASHMEM_NAME_LEN])
#define ASHMEM_GET_NAME		_IOR(__ASHMEMIOC, 2, char[ASHMEM_NAME_LEN])
#define ASHMEM_SET_SIZE		_IOW(__ASHMEMIOC, 3, size_t)
#define ASHMEM_GET_SIZE		_IO(__ASHMEMIOC, 4)
#define ASHMEM_SET_PROT_MASK	_IOW(__ASHMEMIOC, 5, unsigned long)
#define ASHMEM_GET_PROT_MASK	_IO(__ASHMEMIOC, 6)
#define ASHMEM_PIN		_IO(__ASHMEMIOC, 7)
#define ASHMEM_UNPIN		_IO(__ASHMEMIOC, 8)
#define ASHMEM_ISPINNED		_IO(__ASHMEMIOC, 9)
#define ASHMEM_PURGE_ALL_CACHES	_IO(__ASHMEMIOC, 10)

#define MEMSZ 1024 * 1024 * 10
#define MAPSZ 1024 * 1024 * 6

int
main(int argc, const char *argv[]) {
	char  pidbuf[32];
	int   fd, cnt, done, sz;
	char *b, *map;

	b = malloc(MEMSZ);

	fd = open("/dev/ashmem", O_RDWR);
	if (fd < 0) {
		printf("Fail open ashmem\n");
		return -1;
	}

	if (0 > ioctl(fd, ASHMEM_SET_NAME, "yhc-test-mem")) {
		printf("Fail set ashmem name\n");
		return -1;
	}

	if (0 > ioctl(fd, ASHMEM_SET_SIZE, MAPSZ)) {
		printf("Fail set ashmem size\n");
		return -1;
	}

	map = (char *)mmap(NULL,
			   MAPSZ,
			   PROT_NONE,
			   MAP_PRIVATE,
			   fd,
			   0);
	if (MAP_FAILED == map) {
		printf("Map failed\n");
		return -1;
	}

	close(fd);

	/* demand half of the mmap pages */
	mprotect(map, MAPSZ / 2, PROT_WRITE);
	sz = MAPSZ / 2;
	/* access MAPSZ / 2 -> read on demand */
	while (sz--)
		map[sz] = 0xff;

	done = 0;
	cnt = 3;
	while (cnt--) {
		switch (fork()) {
		case -1:
			printf("ERR fork\n");
			exit(0);

		case 0: /* child */
			done = 1;
		}
		if (done)
			break;
	}

	while (1) {
		if (0 <= open("done", O_RDONLY))
			break;

		snprintf(pidbuf, sizeof(pidbuf), "%d-mem", getpid());
		if (0 <= (fd = open(pidbuf, O_RDONLY))) {
			close(fd);
			unlink(pidbuf);
			/* access on demand */
			memset(b, 0, MEMSZ);
		}
		sleep(1);
	}
	return EXIT_SUCCESS;
} 

You can see that PSS != RSS in ashmem memory area and it is shared among forked child process.

And there is another interesting point.
We know that static/global memory is private like memory allocated by malloc. But interestingly, VMA for this static/global memory is NOT even assigned before they are actually demanded, while VMA for dynamically allocated memory is immediately assigned.
You can easily tested this by using following code snippet.

#define MALLOCSZ 10 * 1024 * 1024
#define BSSSZ    3 * 1024 * 1024

static char sbuf[BSSSZ];

int
main(int argc, const char *argv[]) {
	char *buf;
	//sbuf[0] = 1; <--- (A)
	buf = malloc(MALLOCSZ);
	while (1) { sleep(10); }
	return 0;
}

without line (A), VMA for sbuf is NOT assigned. So, VSS size doesn't include size for sbuf.
After enabling line (A), VSS is increased by sizeof(sbuf).
On the other hand, size allocated by malloc is included at VSS even if it is NOT demanded yet.
Interesting, isn't it?

Now, it time to dive into one of deepest part in terms of memory - page.
Every process has it's own page table. And this has all about process's memory information.
Linux kernel provides various useful information regarding memory page via proc file system.
smaps is one of them.
At this step, I would like to mention about /proc/<pid>/maps, /proc/<pid>/pagemap, /proc/kpagecount and /proc/kpageflags.
To know about process's memory, you need to know about memory page used by the process.
But, lots of pages in page table doesn't have real mapping yet.
Therefore, instead of searching whole page table - this is wasting of time, we can start from smaps.
maps shows VMA (subset of data shown by smaps). And those are we are interested in.
Now, we know virtual memory address this is valid in the process.
Next step is finding corresponding pages. pagemap gives this information.
Each entry of pagemap has 64bit value. And this gives following information (from pagemap.txt in kernel document).

    * Bits 0-54  page frame number (PFN) if present
    * Bits 0-4   swap type if swapped
    * Bits 5-54  swap offset if swapped
    * Bits 55-60 page shift (page size = 1<<page shift)
    * Bit  61    reserved for future use
    * Bit  62    page swapped
    * Bit  63    page present

Most important value here is PFN. PFN is used as index at kpagecount and kpageflags.
Kernel document - pagemap.txt - says like follows (based on Kernel 3.4)

 * /proc/kpagecount.  This file contains a 64-bit count of the number of
   times each page is mapped, indexed by PFN.

 * /proc/kpageflags.  This file contains a 64-bit set of flags for each
   page, indexed by PFN.

   The flags are (from fs/proc/page.c, above kpageflags_read):

     0. LOCKED
     1. ERROR
     2. REFERENCED
     3. UPTODATE
     4. DIRTY
     5. LRU
     6. ACTIVE
     7. SLAB
     8. WRITEBACK
     9. RECLAIM
    10. BUDDY
    11. MMAP
    12. ANON
    13. SWAPCACHE
    14. SWAPBACKED
    15. COMPOUND_HEAD
    16. COMPOUND_TAIL
    16. HUGE
    18. UNEVICTABLE
    19. HWPOISON
    20. NOPAGE
    21. KSM
    22. THP

Finally, we know lots of valuable information for each pages. By combining them, we can get meaningful information - ex. USS, RSS, PSS VSS, swap etc.)
For details, you can refer kernel document (proc.txt and pagemap.txt) and source code.
IMPORTANT NOTE
See Kernel source code. Then you can easily capture that flag information in pagemap.txt is out of date.
It's up to readers to tell the difference between source and document. :-).

This is mechanism is exactly what procrank tool in Android is used.
Android 4.3 or lower has bug in libpagemap.so. So, until now, VSS is not correctly displayed by procrank.
Following code snippet is from libpagemap.so. in Android 4.3.

int pm_map_usage(pm_map_t *map, pm_memusage_t *usage_out) {
    uint64_t *pagemap;
    size_t len, i;
    uint64_t count;
    pm_memusage_t usage;
    int error;

    if (!map || !usage_out)
        return -1;

    error = pm_map_pagemap(map, &pagemap, &len);
    if (error) return error;

    pm_memusage_zero(&usage);

    for (i = 0; i < len; i++) {
        ----- line (A) -----
        if (!PM_PAGEMAP_PRESENT(pagemap[i]) ||
            PM_PAGEMAP_SWAPPED(pagemap[i]))
            continue;

        error = pm_kernel_count(map->proc->ker, PM_PAGEMAP_PFN(pagemap[i]),
                                &count);
        if (error) goto out;

        usage.vss += map->proc->ker->pagesize; // ----- line (B) -----
        usage.rss += (count >= 1) ? (map->proc->ker->pagesize) : (0);
        usage.pss += (count >= 1) ? (map->proc->ker->pagesize / count) : (0);
        usage.uss += (count == 1) ? (map->proc->ker->pagesize) : (0);
    }

    memcpy(usage_out, &usage, sizeof(usage));

    error = 0;

out:    
    free(pagemap);

    return error;
}

As you can see, page which map count == 1, is included at USS. And code for getting RSS and PSS is also easily understandable.
But, in case of VSS - line (B) - should be move to line (A) and I am sure that this bug will be fixed soon.
<--- to be continued...


신고
tags : , ,
Trackback 1 : Comment 0