Multi-Threading(이하 MT)  SW Programming시 동기화는 무엇보다 중요한 이슈가 된다.
생각해보면, 동기화가 문제가 되는 이유는, Programming Language의 기본 전제가 "필요한 곳에 동기화를 한다." 이기 때문이다.
이런 정책에 100% 동의한다. MT을 통해, HW 자원을 효율적으로 사용하고자 한다면 더더욱 그러하다.
그런데, 만약 상당히 많은 부분을 서로 공유하는 어떤 MT SW가 있다고 가정한다면 (그런 SW는 디자인 자체가 잘못된 것이라는 등의 이야기는 일단 접어두자.) 어떨까?
이런 경우 "필요한 곳에 동기화"란 개념으로 보면, 너무 많은 "필요한 곳"이 존재하게 되므로, 쉽지 않다.
이럴 경우, 차라리 개념을 "필요한 곳에서 rescheduling"이라는 걸로 개념을 바꾸어 보는건 어떻까?
일단 다른 측면들은 모두 차치하고서라도, Programming은 좀더 쉬워질 것 같지 않은가?
이런 식의 개념을 지원하도록 SW구조를 잡는 것은 그리 어려운 일은 아니다.
물론, HW 자원을 효과적으로 쓰도록 만들기는 앞의 개념에 비해서 상당히 더 어려울 것이다.
결국 아래와 같은 Trade-off가 발생한다.

"HW자원의 효율적인 사용 vs. Programming의 단순와"

너무 무식한 방법이라고 비판할 수도 있겠지만, Performance가 큰 문제가 되지 않고, Async한 동작자체가 중요한 경우라면, 충분히 한번 고려해 볼만하지 않은가? 일단 Bug는 많이 줄일 수 있으니...

What is stdout, stdin and stderr?
Actually, I am using these without deep understanding.
Now, it's time to deep-dive to it.

Let me focus on stdout. (other twos are similar.)
What type of file default stdout is? Is it normal file? No, definitely. It's device file.
So, two different processes can write data to same stdout directly.
(Normal file doesn't allow this! )
I can easily know which device is used as stdout by checking devices.
(By entering following command in console.

ls -al /dev | grep stdout
    -> stdout -> /proc/self/fd/1

Interesting isn't it?
Let me move forward.

ls -al /proc/self/fd/1
    -> 1 -> /dev/pts/7

Right!.
Even if every process access standard output device with same name 'stdout', it branches to appropriate devices by symbolic link - '/proc/self' is used.
Now I can guess what redirecting stdout is. Let me check it.

# redirecting stdout with sample program.
# something like this.
# main() { for(int i=0; i<10000000; i++) { sleep(1); printf("+"); fflush(stdout); }}
$ ./a.out > t &
$ ls -al /proc/<child pid>/fd/1
    -> 1 -> /xxxx/xxx.../t <= path of target file 't'

Done!

Very simple... Just describing here for me to remind.

=== ARM(32bit) - RVCT ===
/* #pragma O0 <- due to optimized out, this may be required */
{ /* Just Scope */
    void* ra;
    /* register lr(r14) has return address */
    __asm
    { mov ra, lr }
    /* now variable 'ra' has return address */
}

=== x86(32bit) - GCC ===
{ /* Just Scope */
    register void* ra; /* return address */
    /* return address is stored at 4byte above from 'ebp' */
    asm ("movl 4(%%ebp), %0;"
         :"=r"(ra));
    /* now variable 'ra' has return address */
}

'Language > C&C++' 카테고리의 다른 글

[C/C++] Encapsulation tip in C.  (0) 2010.11.12
[C/C++] Tips and Memos  (0) 2010.11.12
[Linux][C/C++] Understanding Signals – User Signal Handler  (0) 2010.10.29
[C/C++] type of hard-coded-string.  (0) 2010.09.16
[C/C++] Function pointer.  (0) 2010.05.24

Signal is used for interaction between User Mode processes(henceforth UMP) and for kernel to notify processes of system events.
There are lots of materials you can find to understand what Signal is. So, let's skip it.
The point of this article is "How user signal handler(henceforth USH) is executed?" in Linux.

The core what Linux Kernel does to deliver signal, is modifying stack of UMP - usually adding data.
This is very important! UMP's stack itself is changed!
Kernel changes UMP' stack and register values as if  USH is called from specific function - let's call it F.
(For example, PC is set to USH. Return address in stack is set to function F.)
And, usually, F is just system call - sigreturn. At this system call, Kernel back UMP's stack to original values.
Here is simplified flow.

Signal is issued --> Kernel changes UMP's stack -> USH is executed -> return to function F -> System call (sigreturn) -> UMP's stack is restored -> UMP is executed in normal.

In case of multi-threaded process, thread stack is changed. Nothing different.
Understood? Than what is point?
Yes, USH is run at issued process's / thread's context in User Mode.
Let's see following codes.

/* Timer is used for example */
static pthread_mutex_t _m;
...
static void
_signal_handler(int sig, siginfo_t* si, void* uc) {
    pthread_mutex_lock(&_m);
    ...
    pthread_mutex_unlock(&_m);
}

int main(...) {
    ... /* signal is requested (ex timer) somewhere here */
    pthread_mutex_lock(&_m);
    ... /* <--- *a */
    pthread_mutex_unlock(&_m);
    ...
    return 0;
}

Can you image what I am going to talk about?
As I mentioned above signal handler is run in issued thread's context. So, if signal is issued at (*a), program is stuck due to deadlock!
So, signal handler of above codes should be like follows

static void*
_signal_handler_thread(void* arg) {
    pthread_mutex_lock(&_m);
    ...
    pthread_mutex_unlock(&_m);
}

static void
_signal_handler(int sig, siginfo_t* si, void* uc) {
    pthread_t thd;
    pthread_create(&thd, NULL, &_signal_handler_thread, NULL);
}

Done!

'Language > C&C++' 카테고리의 다른 글

[C/C++] Tips and Memos  (0) 2010.11.12
[C/C++] Getting return address…  (0) 2010.11.03
[C/C++] type of hard-coded-string.  (0) 2010.09.16
[C/C++] Function pointer.  (0) 2010.05.24
[C/C++] Memory(Heap) alloc/free interface for the library.  (0) 2010.04.20

Terms Definition :

Reboot : Real reboot. System restarts from 'boot logo'.
Soft reboot : System restarts from 'Power on animation'. (NOT from 'Boot logo'.)

In user mode

On Dalvik

ANR (Application Not Responding)

* Event dispatching timeout : 5 sec.
=> Windows can't dispatch events over 5 sec.
* Broadcast timeout : 10 sec.
=> Broadcast receiver can't complete things to do in 10 sec.
* Service timeout : 20 sec.
=> Service can't finish jobs in 20 sec..

Hang up

* There is no UI response, even if window dispatched events.
* Framework misses window that events should be passed to. (It doesn't never happens.)
=> Note! : We should wait over 5 sec. to tell it's hang-up or ANR. (See 'Event dispatching timeout')

FC (Force Close)

* Unexpected stop. Ex. null object access,  exceptions.

return from 'main' function.

* It's normal end of process if it is intentional. But in some cases, we may come across unexpected return from 'main'. System's point of view, it's just normal exit of process. So, no error message is left.

At native

Exception (ex. data abort etc)

* Kernel sends appropriate Signal (ex. SIGSEGV) to user process. And user process may be killed. In this case, process is killed due to signal from kernel.  So, we cannot expect UI feedback, but low level logs - ex. tombstone.

In privilege mode

Exception : Reboot. (Kernel 'oops' is executed before reboot. So, we may  be able to find corresponding kernel log/panic data.

Other important and notable cases.

system_process :

* Unexpected stop of  this process may cause system-soft-reboot. (This should be distinguished from reboot caused by kernel exception)

home / launcher

* Unexpected stop causes one-time-flickering of home screen to restart home. Let's imagine following scenario.
-> Home enters unexpected busy state(cannot dispatch user input during this busy time). After 3~4 sec., there is exception and home is stopped -> restarted.
In this case, it may look like just 'Smooth Reset'!

* 들어가기 앞서, 이후의 서술은 전부 필자의 개인적인 생각에 따른 것으로, 어떠한 통계적/과학적 근거도 없다.

SW분야에서 필요로 하는 기술적인 지식은 크게 Domain Knowledge(DK)와 Programing Skill(PS) 로 나뉜다.
그리고, 대부분의 Programing은 DK를 구체화 시키고, 구현하는 작업이다.

위의 두 가지의 상대적인 비중은, 업무의 분야에 따라 결정된다.
예를 들어, Device Driver를 구현하는 경우, PS보다는 Device에 대한 DK의 비중이 훨씬 높다.
반면, UI쪽 구현은 DK보다는 PS가 더 중요하다.
따라서, 좋은 SW Engineer는 해당 분야에서 요구하는 PS와 DK의 균형을 유지할 필요가 있다.
(개인적인 생각으로는, 코드의 사이즈가 커지면 커질수록 PS의 중요도가 증가하는 것 같다.)

그럼 '측정' 이라는 측면에서 DK와 PS를 접근해 보자.
상당한 확률로, DK는 경력과 상관관계를 가진다. 해당 분야에 대한 경험이 많을 수록 DK는 깊어진다.
반면 PS는 어느 정도까지(2~3년)는 경력과 상관관계를 가지지만, 그 이후는 경력과 무관한 경우가 대부분이다.
Engineer의 태도, 노력, 열정 등에 따라 PS의 수준에 상당한 편차가 생기게 된다.
필자의 경험에 따르면, 10년차 Engineer의 PS수준이, 전산을 전공한 대졸 신입사원 수준에 머무르는 경우도 흔치않게 봐 왔다.
그런데, PS를 측정할  수 있는 공신력있는 어떠한 방법도 알려진 것이 없다.
또한 짧은 시간의 문답이나, 틀에 짜여진 시험 같은 것으로 PS를 측정하는 것도 굉장히 힘들다.
그래서, 회사는 측정하기 힘든 PS보다는 측정이 용이한 DK를 기준으로 사람을 선발하게 되고, 이것이 곧 '경력' = '실력'이라는 잘못된 측정 방식을 낳게 된다.

물론, DK가 중요한 분야라면, 위와같은 측정방식도 상당히 합리적이다.
그렇지만, 근래 SW를 보면, 코드 규모가 예전과는 비교할 수 없을 정도 크다는 것에 주목할 필요가 있다.
앞서 서술한 바와 같이 "코드 규모가 크다."는 "뛰어난 PS를 요구한다."와 관계가 깊다.
다시 말하면, 근래 SW산업 특성상, PS쪽의 요구가 DK쪽의 요구보다 빠르게 증가한다는 말이다.
그럼에도 불구하고, 필자가 보기에는 국내 SW업계는 이런 사실을 상당히 간과하고 있는 듯 하다.
여전히 '경력'='실력' - DK에 높은 비중을 둔다. - 이라는 논리가 상당히 강하다.
또한 PS 수준을 측정하기 힘든 것은 사실이지만, 이를 개선할려는 노력 또한 별로 하지 않는다.
이런 환경에서, SW Engineer는 자신의 PS를 향상시키는 노력을 등한시 하게 된다.
(물론 DK수준을 향상시키는 노력 또한 특별히 하지 않는다. 할 필요가 없기 때문이다. 경력이 곧 DK수준을 말한다고 생각하니까...)
이런 현상이 어느 정도 장기화된 지금, 의사결정권자 역시 PS가 부족한 사람들로 채워지게 되었다.
따라서, 그들 역시 PS의 중요성에 대한 이해가 부족하다.
자, 아래의 cycle을 보자.

PS를 등한시 하는 환경 => Engineer의 PS 향상에 대한 동기 부족 => Engineer의 PS 정체 => PS가 정체된 Engineer가 의사결정권자가 됨.=> PS를 등한시하는 환경 강화.

이것이 필자가 보는 한국 SW의 모습이다.

국내기업에서 경력사원 채용시, 가장 많이 듣게 되는 질문이, "어떤 분야를 얼마나 했느냐?" 이다. 즉 Domain에 대한 질문이다.
이를 통해서 면접관은 지원자의 특정 Domain에 대한 DK를 짐작하고자 하고, 이것은 면접 점수에 상당힌 비중을 차지하게 된다.
그리고 면접관은 DK에 대한 세부 질문들에 상당한 시간을 할애한다.
그런데, 국내 대부분의 회사의 경우, 면접관들이 지원자와 더불어 PS에 대한 심도깊은 면접을 진행한다는 이야기를 들어본 적이 없다.
이는 PS의 중요성에 대한 인식 부족이 가장 큰 원인 중에 하나일 것이다.
그리고, 또 하나의 중요한 이유는, 그들(면접관들) 조차 지원자와 PS에 대한 심도깊은 대화를 진행하고, 지원자의 실력을 평가할 수 있는 수준이 되지 못한다는 것이다.
이것이 현실이다.
PS 향상을 위해 노력하고, Programming 철학을 논하며, Unix Culture를 이야기하는 사람들은 현재 대한민국의 SW 인력시장에서 그만큼의 가치를 인정받지 못한다.
다시 한번 이야기 하지만, 이것이 현실이다.

별다른 뜻은 없다.
그냥 SW Engineer의 한 사람으로서, 대한민국 SW업계를 보는 주관적인 View를 일설했을 뿐이다.

It's very-well-known tip for cyclic queue!
Here is pseudo code!
(Even if this is very -well-known and not difficult to think of, it's worth to remind.)

TYPE Q
    item[SZ] // queue array
    i        // current index
...

FUNC addQ (q, item)
// This is Naive way.
    item[q.i] = item
    if (q.i >= SZ) than q.i = 0

// This is well-known way
    item[q.i] = item
    q.i &= SZ-1 // For this, SZ should be (2^n (n>0))

Here is example.

sizeof("12345") == 6

What this meas? Type of hard-coded-string (like "12345") is char[].
It's important and interesting! I haven't known this for several years! Hmm...

Here is good article about this!

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

First of all, I am going to avoid way that using popular tools used in desktop Linux, because in embedded environment(embedded Linux like Android), we cannot expect them.
Before talking about memory analysis, let's look over fundamental concepts related with memory.
(This article is written based on Kernel 2.6.29)

There are 4 type of memory.
private clean, private dirty, shared clean and shared dirty are those.
* Clean vs. Dirty.
Clean means "It doesn't affect to system in point of semantics." So, we can abandon this at any time. Usually, mmap()ed or unwritten memory can be it.
Dirty is exactly opposite.
* Private vs. Shared
This is trivial.

Here is example in Android,
Shared clean : common dex files.
Private clean : application specific dex files
Shared dirty : library "live" dex structures(ex. Class objects), shared copy-on-write heap. - That's why 'Zygote' exists.
Private dirty : application "live" dex structures, application heap.

Usually, clean memory is not interesting subject.
Most memory analysis is focused on dirty memory especially private dirty.
(shared dirty is also important in some cases.)

Linux uses Virtual Memory(henceforth VM). I think reader already familiar with this. Let's move one step forward. Usually, "demand paging" is used. By using "demand paging", Linux doesn't use RAM space before the page is really requested. Then what this exactly means. Let's see below codes.

#define _BUFSZ (1024*1024*10)
static int _mem[_BUFSZ];
int main (int argc, char* argv[]) {
    int i;
    /* --- (*1) --- */
    for(i=0; i<_BUFSZ; i++) {
        _mem[i] = i;
    }
    /* --- (*2) --- */
}

As you see, "sizeof(_mem)" is sizeof(int)*10*1024*1024 = 40MB (let's assume that sizeof(int)==4).
But, at (*1), _mem is not REALLY requested yet. So, Linux doesn't allocate pages in the RAM. But, at (*2), _mem is requested. So, pages for _mem is in RAM.
OK? Later, we will confirm this from the Kernel.

Now, let's go to the practical stage.
As reader already may know, there is special file system - called procfs - in Linux. We can get lots of kernel information from procfs including memory information.
Try "cat /proc/meminfo".
Then you can see lots of information about memory. Let's ignore all others except for 'MemTotal', 'MemFree', 'Buffers', 'Cached'
(Documents in Kernel source are quoted for below description)
-----------------------------------------------
MemTotal : Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
MemFree: The sum of LowFree + HighFree
LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also available for the kernel's use for its own data structures.  Among many other things, it is where everything from the Slab is allocated.  Bad things happen when you're out of lowmem.
HighFree: Highmem is all memory above ~860MB of physical memory Highmem areas are for use by userspace programs, or for the pagecache.  The kernel must use tricks to access this memory, making it slower to access than lowmem.
Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached.
-----------------------------------------------

Now, we know that size of total memory and free memory etc.
Type 'adb shell ps'
we can see VSIZE(henceforth VSS), RSS(Resident Set Size) column. VSS is amount of memory that process requires. RSS is amount of memory that is REALLY located at physical memory - demanded one!.
As mentioned above, main reason of difference between VSS and RSS  is 'demand paging'.
Now, let's sum all RSSs. Interestingly sum of RSSs is larger than total memory size from 'meminfo'
Why? Can you guess? Right. Due to shared one. For example, In case of Android, there are some prelinked objects. And those are shared among processes. And process RSS size includes those shared one. That's why sum of RSSs is larger than memory size.

To understand deeper about VSS, see following example.
Make empty program, execute it and check it's VSS. For example

void main() { sleep(1000); }

It's size is over 1M!. Why? Kernel reserves memory blocks to handle process itself - for example, page table, control block etc.  As an example, in case of page table, normal 32-bit machine uses 4Kb page and 4G virtual memory. So, number of pages are 4G/4K = 1M. To keep tracking 1M pages, definitely, certain amount of memory is required.
So, at least some - actually, not small - amount of memory is required even in very tiny process.

As mentioned above RSS includes shared memory block. But, we want to know reasonable size of memory that is really located in.
Here is PSS(Proportional Set Size).

PSS = "Non-shared process private memory" + "Shared memory" / "Number of processes that shares those".

Great!. So, sum of PSS is real size of occupied memory.
Then, how can we know it?
The most primitive way is checking

/proc/<PID>/smaps

You can easily found PSS field in it. (For more details, see kernel source code 'task_mmu.c')

smaps also shows memory usage of each memory type for each Virtual Memory Area(VMA).
So, we can analyse memory status deeply through smaps (usually, focusing on private dirty).

Let's deep dive to memory world more.
We can easily guess and know followings.
- local global, static, heap (usually allocated by malloc & new etc) memories are all private. And those are clean until something is written on them.
- memory used by mmap is shared and can be clean and dirty. But mapped memory is also on-demand. What does this mean? Let's assume that 4 processes share one anonymous mmaped area. At first, they don't access to the memory. So, RSS/PSS regarding this area is '0'. But after process 1 accesses to the memory, this process's RSS/PSS is increased. And then, when process 2 accesses to the memory, this process's RSS/PSS is increased. But, in this case, memory is shared by two processes (1, 2). So, amount of memory increased in terms of PSS is half of amount increased in terms of RSS.

Here is example of this case.
Code for test process looks like this.


#include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <time.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <memory.h> #define MEMSZ 1024 * 1024 * 10 #define MAPSZ 1024 * 1024 * 6 int main(int argc, const char *argv[]) { char pidbuf[32]; int fd, cnt, done; char *b, *map; b = malloc(MEMSZ); map = (char *)mmap(NULL, MAPSZ, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0); done = 0; cnt = 3; while (cnt--) { switch (fork()) { case -1: printf("ERR fork\n"); exit(0); case 0: /* child */ done = 1; } if (done) break; } while (1) { if (0 <= open("done", O_RDONLY)) break; snprintf(pidbuf, sizeof(pidbuf), "%d-mem", getpid()); if (0 <= (fd = open(pidbuf, O_RDONLY))) { close(fd); unlink(pidbuf); /* access on demand */ memset(b, 0, MEMSZ); } snprintf(pidbuf, sizeof(pidbuf), "%d-map", getpid()); if (0 <= (fd = open(pidbuf, O_RDONLY))) { char h; int sz; close(fd); unlink(pidbuf); sz = MAPSZ / 2; /* access MAPSZ / 2 -> read on demand */ while (sz--) h ^= map[sz]; #ifdef WRITE_TEST sz = MAPSZ / 2; while (sz--) map[sz] ^= map[sz]; #endif } sleep(1); } return EXIT_SUCCESS; }  

Followings are smap report regarding mmapped memory area.

[ Original ]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB


[After read mmapped area by creating "-map" file]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                3072 kB
Pss:                3072 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      3072 kB
Private_Dirty:         0 kB
Referenced:         3072 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB

[After read mmapped area by creating "-map" file for another child process forking from same parent]
40a93000-41093000 rw-s 00000000 00:07 4270       /dev/zero (deleted)
Size:               6144 kB
Rss:                3072 kB
Pss:                1536 kB
Shared_Clean:       3072 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:         3072 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB 

Please take your attention to change of RSS/PSS memory size.
And, one more interesting point is, memory is still shared clean because operation is just 'read'.

Then, what happen if mmap with MAP_PRIVATE instead of MAP_SHARED.
In this case, memory allocated by mmap is handled just like memory allocated by malloc.
So, with high probability, two memory area are merged int to one. And you may see one private memory area whose size is 16M.

Next topic is very interesting.
Let's try with MAP_PRIVATE.

7f987c7aa000-7f987d7ab000 rw-p 00000000 00:00 0
Size:              16388 kB
Rss:                   4 kB
Pss:                   1 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

In case of read mmapped area, memory is not allocated even if read operation is executed. (This is different with the case of MAP_SHARED.) Why? Because MAP_PRIVATE maps memory with copy-on-write. So, just reading don't need to allocate memory.
Let's try with enable 'WRITE_TEST' define switch.
7f001b786000-7f001c787000 rw-p 00000000 00:00 0 
Size:              16388 kB
Rss:                3076 kB
Pss:                3073 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:      3072 kB
Referenced:         3076 kB
Anonymous:          3076 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

As you can see, memory is allocated successfully as 'Private Dirty'. As next step let's see writing mapped area by it's child.
[After write mmapped area by creating "-map" file for another child process forking from same parent]
7f001b786000-7f001c787000 rw-p 00000000 00:00 0 
Size:              16388 kB
Rss:                3076 kB
Pss:                3073 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:      3072 kB
Referenced:         3076 kB
Anonymous:          3076 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
PSS and RSS is unchanged.
This is what we expected

But, there is interesting case.
Let's see below vma information
b5a7f000-b5f7f000 -w-p 00000000 00:04 24098      xxxxxxxxxxxxxxx
Size:               5120 kB
Rss:                5120 kB
Pss:                1280 kB
Shared_Clean:          0 kB
Shared_Dirty:       5120 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:          5120 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB 
Even if memory area is private and writable, RSS != PSS - That is, it is shared! In case of read-only private area, it can be shared - ex. loaded shared library code. But, this is writable private area! What happened? <= <TODO>I need to investiage more about it!!!

kenel uses VM_MAYSHARE flag to tell this is 'p' or 's' - see task_mmu.c in kernel.
I'm not sure that VM_MAYSHARE is more valuable information then VM_SHARED.
But, I have to analyse this case deeper... (to be updated after more analysis...)

Next case is ashmem.
memory mapped with MAP_PRIVATE, can be shared in case of ashmem.
You can test this case by using below code in Android (NDK).

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <memory.h>
#include <sys/ioctl.h>
#include <linux limits.h>
#include <linux ioctl.h>


#define ASHMEM_NAME_LEN		256
#define ASHMEM_NAME_DEF		"dev/ashmem"

/* Return values from ASHMEM_PIN: Was the mapping purged while unpinned? */
#define ASHMEM_NOT_REAPED	0
#define ASHMEM_WAS_REAPED	1

/* Return values from ASHMEM_UNPIN: Is the mapping now pinned or unpinned? */
#define ASHMEM_NOW_UNPINNED	0
#define ASHMEM_NOW_PINNED	1

#define __ASHMEMIOC		0x77

#define ASHMEM_SET_NAME		_IOW(__ASHMEMIOC, 1, char[ASHMEM_NAME_LEN])
#define ASHMEM_GET_NAME		_IOR(__ASHMEMIOC, 2, char[ASHMEM_NAME_LEN])
#define ASHMEM_SET_SIZE		_IOW(__ASHMEMIOC, 3, size_t)
#define ASHMEM_GET_SIZE		_IO(__ASHMEMIOC, 4)
#define ASHMEM_SET_PROT_MASK	_IOW(__ASHMEMIOC, 5, unsigned long)
#define ASHMEM_GET_PROT_MASK	_IO(__ASHMEMIOC, 6)
#define ASHMEM_PIN		_IO(__ASHMEMIOC, 7)
#define ASHMEM_UNPIN		_IO(__ASHMEMIOC, 8)
#define ASHMEM_ISPINNED		_IO(__ASHMEMIOC, 9)
#define ASHMEM_PURGE_ALL_CACHES	_IO(__ASHMEMIOC, 10)

#define MEMSZ 1024 * 1024 * 10
#define MAPSZ 1024 * 1024 * 6

int
main(int argc, const char *argv[]) {
	char  pidbuf[32];
	int   fd, cnt, done, sz;
	char *b, *map;

	b = malloc(MEMSZ);

	fd = open("/dev/ashmem", O_RDWR);
	if (fd < 0) {
		printf("Fail open ashmem\n");
		return -1;
	}

	if (0 > ioctl(fd, ASHMEM_SET_NAME, "yhc-test-mem")) {
		printf("Fail set ashmem name\n");
		return -1;
	}

	if (0 > ioctl(fd, ASHMEM_SET_SIZE, MAPSZ)) {
		printf("Fail set ashmem size\n");
		return -1;
	}

	map = (char *)mmap(NULL,
			   MAPSZ,
			   PROT_NONE,
			   MAP_PRIVATE,
			   fd,
			   0);
	if (MAP_FAILED == map) {
		printf("Map failed\n");
		return -1;
	}

	close(fd);

	/* demand half of the mmap pages */
	mprotect(map, MAPSZ / 2, PROT_WRITE);
	sz = MAPSZ / 2;
	/* access MAPSZ / 2 -> read on demand */
	while (sz--)
		map[sz] = 0xff;

	done = 0;
	cnt = 3;
	while (cnt--) {
		switch (fork()) {
		case -1:
			printf("ERR fork\n");
			exit(0);

		case 0: /* child */
			done = 1;
		}
		if (done)
			break;
	}

	while (1) {
		if (0 <= open("done", O_RDONLY))
			break;

		snprintf(pidbuf, sizeof(pidbuf), "%d-mem", getpid());
		if (0 <= (fd = open(pidbuf, O_RDONLY))) {
			close(fd);
			unlink(pidbuf);
			/* access on demand */
			memset(b, 0, MEMSZ);
		}
		sleep(1);
	}
	return EXIT_SUCCESS;
} 

You can see that PSS != RSS in ashmem memory area and it is shared among forked child process.

And there is another interesting point.
We know that static/global memory is private like memory allocated by malloc. But interestingly, VMA for this static/global memory is NOT even assigned before they are actually demanded, while VMA for dynamically allocated memory is immediately assigned.
You can easily tested this by using following code snippet.

#define MALLOCSZ 10 * 1024 * 1024
#define BSSSZ    3 * 1024 * 1024

static char sbuf[BSSSZ];

int
main(int argc, const char *argv[]) {
	char *buf;
	//sbuf[0] = 1; <--- (A)
	buf = malloc(MALLOCSZ);
	while (1) { sleep(10); }
	return 0;
}

without line (A), VMA for sbuf is NOT assigned. So, VSS size doesn't include size for sbuf.
After enabling line (A), VSS is increased by sizeof(sbuf).
On the other hand, size allocated by malloc is included at VSS even if it is NOT demanded yet.
Interesting, isn't it?

Now, it time to dive into one of deepest part in terms of memory - page.
Every process has it's own page table. And this has all about process's memory information.
Linux kernel provides various useful information regarding memory page via proc file system.
smaps is one of them.
At this step, I would like to mention about /proc/<pid>/maps, /proc/<pid>/pagemap, /proc/kpagecount and /proc/kpageflags.
To know about process's memory, you need to know about memory page used by the process.
But, lots of pages in page table doesn't have real mapping yet.
Therefore, instead of searching whole page table - this is wasting of time, we can start from smaps.
maps shows VMA (subset of data shown by smaps). And those are we are interested in.
Now, we know virtual memory address this is valid in the process.
Next step is finding corresponding pages. pagemap gives this information.
Each entry of pagemap has 64bit value. And this gives following information (from pagemap.txt in kernel document).

    * Bits 0-54  page frame number (PFN) if present
    * Bits 0-4   swap type if swapped
    * Bits 5-54  swap offset if swapped
    * Bits 55-60 page shift (page size = 1<<page shift)
    * Bit  61    reserved for future use
    * Bit  62    page swapped
    * Bit  63    page present

Most important value here is PFN. PFN is used as index at kpagecount and kpageflags.
Kernel document - pagemap.txt - says like follows (based on Kernel 3.4)

 * /proc/kpagecount.  This file contains a 64-bit count of the number of
   times each page is mapped, indexed by PFN.

 * /proc/kpageflags.  This file contains a 64-bit set of flags for each
   page, indexed by PFN.

   The flags are (from fs/proc/page.c, above kpageflags_read):

     0. LOCKED
     1. ERROR
     2. REFERENCED
     3. UPTODATE
     4. DIRTY
     5. LRU
     6. ACTIVE
     7. SLAB
     8. WRITEBACK
     9. RECLAIM
    10. BUDDY
    11. MMAP
    12. ANON
    13. SWAPCACHE
    14. SWAPBACKED
    15. COMPOUND_HEAD
    16. COMPOUND_TAIL
    16. HUGE
    18. UNEVICTABLE
    19. HWPOISON
    20. NOPAGE
    21. KSM
    22. THP

Finally, we know lots of valuable information for each pages. By combining them, we can get meaningful information - ex. USS, RSS, PSS VSS, swap etc.)
For details, you can refer kernel document (proc.txt and pagemap.txt) and source code.
IMPORTANT NOTE
See Kernel source code. Then you can easily capture that flag information in pagemap.txt is out of date.
It's up to readers to tell the difference between source and document. :-).

This is mechanism is exactly what procrank tool in Android is used.
Android 4.3 or lower has bug in libpagemap.so. So, until now, VSS is not correctly displayed by procrank.
Following code snippet is from libpagemap.so. in Android 4.3.

int pm_map_usage(pm_map_t *map, pm_memusage_t *usage_out) {
    uint64_t *pagemap;
    size_t len, i;
    uint64_t count;
    pm_memusage_t usage;
    int error;

    if (!map || !usage_out)
        return -1;

    error = pm_map_pagemap(map, &pagemap, &len);
    if (error) return error;

    pm_memusage_zero(&usage);

    for (i = 0; i < len; i++) {
        ----- line (A) -----
        if (!PM_PAGEMAP_PRESENT(pagemap[i]) ||
            PM_PAGEMAP_SWAPPED(pagemap[i]))
            continue;

        error = pm_kernel_count(map->proc->ker, PM_PAGEMAP_PFN(pagemap[i]),
                                &count);
        if (error) goto out;

        usage.vss += map->proc->ker->pagesize; // ----- line (B) -----
        usage.rss += (count >= 1) ? (map->proc->ker->pagesize) : (0);
        usage.pss += (count >= 1) ? (map->proc->ker->pagesize / count) : (0);
        usage.uss += (count == 1) ? (map->proc->ker->pagesize) : (0);
    }

    memcpy(usage_out, &usage, sizeof(usage));

    error = 0;

out:    
    free(pagemap);

    return error;
}

As you can see, page which map count == 1, is included at USS. And code for getting RSS and PSS is also easily understandable.
But, in case of VSS - line (B) - should be move to line (A) and I am sure that this bug will be fixed soon.
<--- to be continued...


+ Recent posts