<출처를 기억할 수 없는 것은 생략했습니다. 혹시 문제가 되는 부분이 있다면 알려 주시기 바랍니다.>

  • 좋은 코드는 변경하기 쉽고, 나쁜 코드는 변경하기 어렵다. 그러므로 좋은 코드는 나쁜 코드가 될 때까지 변경된다.
  • “Provide mechanism not policy” : about interface design. (From the UNIX)
  • 또 뭐가 있지??? --- 당장은 생각이 안나는 관계로...

In SMP system, to save power, not all cpus are active if system is not heavy loaded.
But let's image this case.
System is not heavy loaded - only cpu0 is active, but at some moment IRQ is issued very often in short time.
In this case, some of issued IRQs may be abandoned.

Let's assume a IC does abnormal operation if issued interrupt is not handled by MPU and system status is like above.
Then, system works abnormally when it is not loaded, but system works well when it is loaded.
(This is opposite of usual case. Most case, system may do abnormal operation when it is loaded.)
default IRQ affinity is usually, masked for all cores - IRQ can be handled by any cpu.
And, more than one cpus are active when system is loaded. So, there is less chance for IRQ to be abandoned - these are several cpus to handle!

This case shows very interesting issues at system!

With using terminal for a long time, PATH variable tends to be longer and longer due to duplicated path.
Here is simple sample script - with Perl - to resolve this.

# remove duplication at give PATH-format-string
unique_path() {
perl -w -e '
    my %path_hash;
    exit unless (defined $ARGV[0]);
    foreach $p (split (/\:/, $ARGV[0])) {
        unless (defined $path_hash{$p}) {
            $path_hash{$p} = 1;
            push @newpath, $p;
        }
    }
    print join ":", @newpath;
' $1
}
...(skip)...
PATH=$(unique_path "$PATH")
...(skip)

Done :-).

Most Android device is 32bit machine. So, many application assumes that host machine is 32bit system.
And in general, there is no difference developing Android-natvie-library between 64bit and 32bit build machine.

But, here is usual step for developing Android-native-library.
(1) Developing and verifying code at build machine.
(2) Porting to NDK build system.

Most developers turns on full-warning-option at compile to detect bugs at early stage.
But, building and verifying codes assuming 32bit host machine at 64bit machine always issues type casting warning due to different type size.
Especially, between pointer and integer.

For example, many Android JAVA application uses int - jint - as a type to contain native pointer with assumption of 32bit-host-system.
Building this code at 64bit build system to verify code issues type casting warning, even if code itself is built perfectly at NDK build system.
And it is worse that this code doesn't work at 64bit Android host machine, even though it is not popular.

To reduce this warnings (for easy-verifying of library code at build machine), in my opinion, using long - jlong - instead of jint as a type for containing native pointer is better unless memory space is extremely critical.
And to make compiler be happy, using macro can be a good choice.
Here is example of macro for type-casting between pointer and integer - jlong.
(This sample works well without warning at 32bit/64bit build/host system).

#define ptr2jlong(v) ((jlong)((intptr_t)(v)))
#define jlong2ptr(v) ((void*)((intptr_t)(v)))

This is just simple example for portability issues.
Making portable code is always very difficult...

There is function for sleep in kernel - ex. msleep(). And schedule_timeout() is used in those.
And, schedule_timeout() calculates time for expiration based on current jiffies.
But some functions disable local irq or irq for a while in it.
For example, printk does this - disabling local irq.
But, jiffies is updated based on local timer irq (In case of SMP, one defined core - usually cpu0 - is used.)
So, disabling local irq - for timer core - may prevent system from updating jiffies.
At this moment, calling schedule_timeout() leads to set timer which expiring time is earlier than it should be because jiffies are behind of real time.
So, for example, msleep(100) may wake up after 50ms at this scenario.
This is very dangerous.
But, actually, lots of code uses jiffies directly in kernel.
So, I'm not sure that all those codes are safe in case that jiffies is behind.
Anyway, here is my sample fix for this potential issue of schedule_timeout().
I couldn't find any fix regarding this issue even in kernel-3.0.
So, I'm not sure I missed something.
But, as for me, this is definitely issue to consider when variable jiffies is directly used.
I hope that my sample fix regarding schedule_timeout is helpful for others.
(In my opinion, fundamentally, all codes that uses jiffies directly should be re-examined whether it is really safe or not in case that jiffies is behind.)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 6811f4b..6c89958 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -290,6 +290,8 @@ extern unsigned long preset_lpj;

 #endif

+extern unsigned long exact_jiffies(void);
+
 /*
  * Convert various time units to each other:
  */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 56f87fa..37e7bbf 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -99,6 +99,32 @@ static ktime_t tick_init_jiffy_update(void)
        return period;
 }

+/**
+ * exact_jiffies - return real jiffies value
+ *
+ * You can't sure that value from %jiffies varaible is real current time.
+ * Jiffies may not be updated for a while due to several reasones.
+ * So, to get exact value, current ktime and %last_jiffies_update should be used
+ */
+unsigned long exact_jiffies(void)
+{
+       unsigned long exact = jiffies;
+       if (tick_period.tv64
+           && last_jiffies_update.tv64) {
+               unsigned long seq, ticks;
+               ktime_t delta;
+               do {
+                       seq = read_seqbegin(&xtime_lock);
+                       delta = ktime_sub(ktime_get(), last_jiffies_update);
+                       ticks = ktime_divns(delta, ktime_to_ns(tick_period));
+                       /* +1 to compensate loss at division */
+                       exact = jiffies + ticks + 1;
+               } while (read_seqretry(&xtime_lock, seq));
+       }
+       return exact;
+}
+EXPORT_SYMBOL_GPL(exact_jiffies);
+
 /*
  * NOHZ - aka dynamic tick functionality
  */
diff --git a/kernel/timer.c b/kernel/timer.c
index c4714a6..628a714 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1304,7 +1304,6 @@ void run_local_timers(void)
  * without sampling the sequence number in xtime_lock.
  * jiffies is defined in the linker script...
  */
-
 void do_timer(unsigned long ticks)
 {
        jiffies_64 += ticks;
@@ -1457,7 +1456,7 @@ signed long __sched schedule_timeout(signed long timeout)
                }
        }

-       expire = timeout + jiffies;
+       expire = timeout + exact_jiffies();

        setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
        __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
@@ -1467,6 +1466,13 @@ signed long __sched schedule_timeout(signed long timeout)
        /* Remove the timer from the object tracker */
        destroy_timer_on_stack(&timer);

+       /*
+        * Reaching here means "timer interrupt is issued
+        *   and 'jiffies' is updated."
+        * So, 'jiffies' here is recently-updated-value
+        *   and 'jiffies' can be directly used
+        *   instead of using 'exact_jiffies()'
+        */
        timeout = expire - jiffies;

  out:

done.

To use sysfs, enable kernel config : CONFIG_GPIO_SYSFS
Core files related with this : gpiolib.c at Kernel
sysfs nodes can be found at /sys/class/gpio

[ Prerequisite ]
writing value to '/sys/class/gpio/export' uses 'gpio_request()' function at gpiolib in kernel.
So, if given gpio is already requested by other owner, exporting is failed.
Therefore, gpio should not be owned by others in kernel to control gpio at user space.

[ How to use at user space ]
Assumption : current working directory is '/sys/class/gpio'

#> echo [gpio num] > export
=> export give gpio to sysfsgpioN link is newly created if gpio number is valid one.

#> cd gpioN
=> Then, several nodes (active_lowdirectionvalue etc) can be found.
Writing or reading following nodes, is one of useful way to debug/control gpio at user space.
(attribute functions for each node are defined at gpiolib.c)

[ Detail Example ]

* at /sys/class/gpio
export, unexport : gpio number
#> echo 86 > export
#> echo 86 > unexport

* at /sys/class/gpio/gpioN
direction : in, out, low, high
value : 0, 1
edge : none, rising, falling, both
active_low : 0, 1
#> echo in > direction
#> echo 1 > value
#> echo both > edge

< See gpio.txt of Kernel document for details about each sysfs node. >

Linux kernel uses lot's of sections to modularize it's code structure easily.
In case of kernel module and parameter, following section is used.

[ Module ]
macro   : module_init()
section : device_initcall (section name : .initcall6.init) <see init.h>
[ Parameter ]
macro   : module_param_named(), module_param(), core_param() etc.
section : __param <see moduleparam.h>

core_param() macro works like other module parameter. But this is NOT for module BUT for kernel booting parameter.
Kernel parameter doesn't have any prefix unlike other modules.
Module parameter's name has <module name> as it's prefix.
For example, parameter <param> of module <module> has name <module>.<param>
See source code for details.

KBUILD_MODNAME is preprocessor value for module name.
This is defined through complex script processing. See Makefile.libMakefile.mod* in scripts/.
But in most case, you can know it by intuition. That is,  name that seems like module name, is set as KBUILD_MODNAME.
(ex. <mod name>-y<mod-name>-m in Makefile)
So, usually, deep analysis about above scripts are not required.

Note:
It's possible that one object gets potentially linked into more than one module.
In that case KBUILD_MODNAME will be set to  foo_bar, where foo and bar are the name of the modules.

Module and parameter initialization.

[ Built-in module ]
module initialization : do_initcalls() <see 'main.c'> => initcall section is used
parameter sysfs node  : param_sysfs_init() -> param_sysfs_builtin() => __param section is used
[ Dynamic module ]
module & parameter initialization is done at system call
SYSCALL_DEFINE3(init_module, ...) => load_module()

Each parameter has permission mask. So, parameter value can be read or written at runtime through sysfs node.

/sys/module/<module name>/parameter/<parameter name>

But module parameter whose  permission is 0, is ignored at sysfs (Not shown at sysfs).
( See module_param_sysfs_setup(...) function. )

To test Android kernel, keeping minimum number of user space process is very useful.
Actually, 'adbd' and 'ueventd' is enough on Android.
Here is the way how to make device have only minimum user space processes - adbd and ueventd.
Followings are file structure of ramdisk image.

[ Create ramdisk ]

let's make following directory structure in ramdisk.

/bin -> sbin
/sbin -+- busybox
       +- adbd
       +- ueventd -> ../init
       +- <...> -> busybox
/init
/init.rc
/default.prop

All are same with default android except that busybox is in /sbin and /bin is symbolic link to /sbin.
Let's look into one by one.

/init : same binary with default Android.
/bin : symbolic link to /sbin.
/default.prop : same with default Android. - adb and debugging is enabled.
/sbin/busybox : statically linked busybox.
/sbin/... : tools (symbolic link to busybox). Ex, sh -> busyboxls -> busybox.
/sbin/adbd :
Modified adbd. Original adbd uses /system/bin/sh as its terminal shell. But, this one uses /bin/sh.
To do this, value of SHELL_COMMAND at system/core/adb/service.c should be modified.
/init.rc :
Simplified one. Only adbd and ueventd is started.
One important note is, "DO NOT make empty section(ex. on fs)!". This will lead init process to error and system will restarted again and again.
Here is sample.

on early-init
    start ueventd

on init
    sysclktz 0
    export PATH /bin:/sbin:

#on fs

#on post-fs

#on post-fs-data

on boot
   start adbd

## Daemon processes to be run by init.
##
service ueventd /sbin/ueventd

service adbd /sbin/adbd

[ Make ramdisk image ]

Let's assume that current working directory is ramdisk directory.

find . | cpio -o -H newc | gzip > newramdisk.gz

This newly generated gzip file can be renamed directly to ramdisk.img.

[ Make boot image ]

mkbootimg tool is used. This can be easily found at out/host/<arch>/bin after building android from source.

mkbootimg --cmdline 'no_console_suspend=1 console=null' --kernel <zImage file> --ramdisk <ramdisk image> -o newboot

[ Verify ]

After flashing newly generated boot image, reboot device.
The only respond that device can do, is done at boot-loader stage. After that device doesn't anything.
After waiting some moments, try adb device. Then host PC can find the device and adb shell can be used.
Type adb shell ps. Then, you can check that only three user space process are running - initadbd and ueventd.

[ Debugging ]

Except for kernel, adbd and init may be required to be modified (As mentioned above, modified adbd is used.). Printing log is very helpful for debugging and using framebuffer console is simple way to do this.
Here is the step (ex. adbd).

* comment out xxxx in init.c
=> this removes /dev/console (framebuffer console)
* modify start_logging() and start_device_log() in adb.c.
=> use /dev/console as stdout and stderr file.

Now, log message of adbd will be shown on the framebuffer (that is, displayed at the panel.)

[ Something more ]

You may build your own Linux environment on the device by building file system and installing libraries etc.
In my case, I set up tools for development - ex. glibc, gcc, binutils etc, and compiled LTP(Linux Test Project) to test kernel.
Enjoy your minimum Android environment :-).

Check 'Requesting program interpreter' in 'Program Headers' section of ELF.
Simple command : readelf -l <file>

This test is done under following environment.

x86 :

intel Core(TM)2 Duo T9400 2.53Mhz
GCC-4.4.5

ARM :

OMAP4430 (Cortax-A9)
Android NDK platform 9

Test code.

/*
 * Test Configuration
 */
#define _ARRSZ    1024*1024*8

static int _arr[_ARRSZ];

static void
_init() {
    int i;
    for (i = 0; i < _ARRSZ; i++)
        _arr[i] = i;
}

static unsigned long long
_utime() {
    struct timeval tv;
    if (gettimeofday(&tv, NULL))
        assert(0);
    return (unsigned long long)(tv.tv_sec * 1000000)
        + (unsigned long long)tv.tv_usec;
}

#define _test_arraycopy_pre()               \
    int  i;                                 \
    unsigned long long ut;                  \
    int* ia = malloc(_ARRSZ * sizeof(*ia));

#define _test_arraycopy_post()              \
    free(ia);

#define _operation()            \
    do {                        \
        ia[i] = _arr[i];        \
    } while (0)

static void*
_test_arraycopy_worker(void* arg) {
    int     i;
    int*    ia = arg;
    for (i = (_ARRSZ / 2); i < _ARRSZ; i++)
        _operation();
    return NULL;
}

static unsigned long long
_test_arraycopy_sc() {
    _test_arraycopy_pre();

    ut = _utime();
    for (i = 0; i < _ARRSZ; i++)
        _operation();
    ut = _utime() - ut;

    _test_arraycopy_post();

    return ut;
}

static unsigned long long
_test_arraycopy_dc() {
    pthread_t thd;
    void*     ret;
    _test_arraycopy_pre();

    ut = _utime();
    if (pthread_create(&thd,
               NULL,
               &_test_arraycopy_worker,
               (void*)ia))
        assert(0);

    for (i = 0; i < (_ARRSZ / 2); i++)
        _operation();

    if (pthread_join(thd, &ret))
        assert(0);

    ut = _utime() - ut;

    _test_arraycopy_post();

    return ut;
}

#undef _test_arraycopy_pre
#undef _test_arraycopy_post

int
main(int argc, char* argv[]) {
    _init();
    printf(">> SC : %lld ", _test_arraycopy_sc());
    printf(">> DC : %lld\n", _test_arraycopy_dc());
    return 0;
}

[Test 1]
x86

>> SC : 59346 >> DC : 38566
>> SC : 59195 >> DC : 39028
>> SC : 49529 >> DC : 38160
>> SC : 49722 >> DC : 38457
>> SC : 49952 >> DC : 37457

ARM

>> SC : 102295 >> DC : 94147
>> SC : 102264 >> DC : 94025
>> SC : 102173 >> DC : 94116
>> SC : 102172 >> DC : 94116
>> SC : 102325 >> DC : 94177

Change '_operation' macro to as follows

#define _operation()                                    \
    do {                                                \
        if (i > _ARRSZ / 2)                             \
            ia[i] = (_arr[i] & 0xff) << 8 ^ _arr[i];    \
        else                                            \
            ia[i] = (_arr[i] & 0xff) << 16 ^ _arr[i];   \
    } while (0)                                         \

[Test 2]
x86

>> SC : 60696 >> DC : 40523
>> SC : 56907 >> DC : 45355
>> SC : 55066 >> DC : 42329
>> SC : 54931 >> DC : 40651
>> SC : 57022 >> DC : 41879

ARM

>> SC : 164514 >> DC : 112671
>> SC : 163971 >> DC : 112854
>> SC : 164521 >> DC : 112976
>> SC : 163940 >> DC : 112732
>> SC : 164245 >> DC : 112671

Interesting result, isn't it?
For heavily-memory-accessing-code (Test 1), ARM does not show good statistics for multi-core (in this case, dual-core) optimization.
But, if not (Test 2), optimization shows quite good results.
And, x86 seems to handle memory accessing from multi-core, quite well.

So, developers should consider ARM's characteristic when optimize codes for multi-core.
(I'm sure that ARM will improve this someday! :-) )

* Things to consider regarding this kind of optimization *
Cache, Cache coherence, Memory Controller, Bus etc...

[ Test for later version of ARM (ex Cortax-A15) will be listed continuously... ]

'Domain > ARM' 카테고리의 다른 글

[ARM] Sample code for unwinding stack with DWARF2 information.  (0) 2010.04.07
[ARM] Sample code for unwinding stack in Thumb mode.  (0) 2010.04.07
[ARM] Unwinding Stack.  (0) 2009.09.15
[ARM] Long jump.  (0) 2007.06.05
[ARM] .init_array section  (0) 2007.03.18

+ Recent posts