OOP SW design is usually compose of one main control routine (henceforth MCR) and lots of sub modules.
But, MCR don't need to (must not need to) know inside implementation of every sub modules.
Sometimes, MCR don't need to know even existence of some sub modules.
The problem is, most sub modules require initialization.
How can sub modules whose existence is now known, be initialized.
Due to this issue, principle of information hiding is sometimes broken.
Let's see below example.

FILE : main.c
-------------
int main(int argc, char* argv[]) {
        ...

}

FILE : moduleA.c
----------------
...

FILE : moduleB.c
----------------
...

Assume that, each module requires initialization and main.c don't need to know existence of each module.
How can we resolve this issue?
Easiest way is calling initialization function of each module with damaging principle of information hiding little bit, like mentioned above.

FILE : main.c
-------------
extern void moduleA_init();
extern void moduleB_init();

int main(int argc, char* argv[]) {
        ...
        moduleA_init();
        moduleB_init();
        ...
}

FILE : moduleA.c
----------------
...
void moduleA_init() { ... }

FILE : moduleB.c
----------------
...
void moduleB_init() { ... }

At above code, main.c becomes to know existence of moduleA and moduleB.
That is, in terms of modules, principle of information hiding is damaged although it's very little.
Additionally, global symbol space is dirtier.
Regarding maintenance, whenever new module is added, modifying main.c is unavoidable.
But, main.c doesn't have any dependency on newly added module.
With this and that, this way is uncomfortable.
How can we clean up these?
Using constructor leads us to better way.

Functions in constructor are executed before main function.
So, it is very useful for this case.
Easiest way is setting every initialization function as constructor.
But, in this case, we cannot control the moment when module is initialized at.
Therefore, it is better that each module's initialization function is registered to MCR, and MCR calls these registered function at right moment.
Following pseudo code is simple implementation of this concept.

FILE : main.c
-------------
void register_initfn(void (*fn)()) {
        list_add(initfn_list, fn);
}

int main(int argc, char* argv[]) {
        ...
        /* initialize modules */
        foreach(initfn_list, fn)
                (*fn)();
        ...
}

FILE : module.h
---------------
extern void register_initfn(void (*fn)());
#define MODULE_INITFN(fn)                               \
        static void __##fn##__() __attribute__ ((constructor)); \
        static void __##fn##__() { register_initfn(&fn); }

FILE : moduleA.c
----------------
...
#include "module.h"
...
static void _myinit() { ... }
MODULE_INITFN(_myinit)

FILE : moduleB.c
----------------
...
#include "module.h"
...
static void _myinit() { ... }
MODULE_INITFN(_myinit)

Now, MCR don't need to know existence of each modules.
And, MCR can also control the moment of each module's initialization.
In addition, adding new module doesn't require any modification of MCR side.
It is closer to OOP's concept, isn't it?

We can improve this concept by customizing memory section.
Here is rough description of this.

* Declare special memory section for initializer function.
    - In gcc, ld script should be modified.

* Put initializer function into this section.
    - __attribute__ ((__section__("xxxx"))) can be used.

* MCR can read this section and call these functions at appropriate moment.

Actually, this way is better than using constructor in terms of SW design.
Linux kernel uses this concept in it's driver model.
(For deeper analysis, kernel source code can be good reference.)
But, in many cases, using this concept may lead to over-engineering.
So, if there isn't any other concern, using constructor is enough.

Separating mechanism from policy is very important issue of SW design.
This is also true for microscopic area - implementing function.
Here is one of simplest example - function that calculates area of rectangle.

int rect_area(int left, int top, int right, int bottom) {
        return (right - left) * (bottom - top);
}

Simple, isn't it?
But, this function doesn't have any exception/error handling.
Let's add some of them.

int rect_area(int left, int top, int right, int bottom) {
        if (left >= right)
                left = right;
        if (top >= bottom)
                top = bottom;
        return (right - left) * (bottom - top);
}

It's seems good.
Soon after, another function that calculates area of rectangle is needed.
But, this function should return error value if input rectangle is invalid.
I think quick solution is

int rect_area2(int left, int top, int right, int bottom) {
        if (left > right || top > bottom)
                return -1;
        return (right - left) * (bottom - top);
}

But, in this solution, code for calculating area of rectangle is duplicated.
At above example, this is just one-line code. So, duplicating is not a big deal.
But, it's not good in terms of code structure.
Why did this happen?
Calculating rectangle area is 'Mechanism'.
But, exception/error handling is 'Policy' at this example.
So, above example should be implemented like below.

static inline int _rect_area(int left, int top, int right, int bottom) {
        return (right - left) * (bottom - top);
}

int rect_area(int left, int top, int right, int bottom) {
        if (left >= right)
                left = right;
        if (top >= bottom)
                top = bottom;
        return _rect_area(left, top, right, bottom);
}
int rect_area2(int left, int top, int right, int bottom) {
        if (left > right || top > bottom)
                return -1;
        return _rect_area(left, top, right, bottom);
}

Can you know the difference?
'_rect_area' is implementation of pure 'Mechanism'.
And policy is implemented at each interface function.
Even for simple function, developer should consider CONCEPT of separating Mechanism from Policy.

This test is done under following environment.

x86 :

intel Core(TM)2 Duo T9400 2.53Mhz
GCC-4.4.5

ARM :

OMAP4430 (Cortax-A9)
Android NDK platform 9

Test code.

/*
 * Test Configuration
 */
#define _ARRSZ    1024*1024*8

static int _arr[_ARRSZ];

static void
_init() {
    int i;
    for (i = 0; i < _ARRSZ; i++)
        _arr[i] = i;
}

static unsigned long long
_utime() {
    struct timeval tv;
    if (gettimeofday(&tv, NULL))
        assert(0);
    return (unsigned long long)(tv.tv_sec * 1000000)
        + (unsigned long long)tv.tv_usec;
}

#define _test_arraycopy_pre()               \
    int  i;                                 \
    unsigned long long ut;                  \
    int* ia = malloc(_ARRSZ * sizeof(*ia));

#define _test_arraycopy_post()              \
    free(ia);

#define _operation()            \
    do {                        \
        ia[i] = _arr[i];        \
    } while (0)

static void*
_test_arraycopy_worker(void* arg) {
    int     i;
    int*    ia = arg;
    for (i = (_ARRSZ / 2); i < _ARRSZ; i++)
        _operation();
    return NULL;
}

static unsigned long long
_test_arraycopy_sc() {
    _test_arraycopy_pre();

    ut = _utime();
    for (i = 0; i < _ARRSZ; i++)
        _operation();
    ut = _utime() - ut;

    _test_arraycopy_post();

    return ut;
}

static unsigned long long
_test_arraycopy_dc() {
    pthread_t thd;
    void*     ret;
    _test_arraycopy_pre();

    ut = _utime();
    if (pthread_create(&thd,
               NULL,
               &_test_arraycopy_worker,
               (void*)ia))
        assert(0);

    for (i = 0; i < (_ARRSZ / 2); i++)
        _operation();

    if (pthread_join(thd, &ret))
        assert(0);

    ut = _utime() - ut;

    _test_arraycopy_post();

    return ut;
}

#undef _test_arraycopy_pre
#undef _test_arraycopy_post

int
main(int argc, char* argv[]) {
    _init();
    printf(">> SC : %lld ", _test_arraycopy_sc());
    printf(">> DC : %lld\n", _test_arraycopy_dc());
    return 0;
}

[Test 1]
x86

>> SC : 59346 >> DC : 38566
>> SC : 59195 >> DC : 39028
>> SC : 49529 >> DC : 38160
>> SC : 49722 >> DC : 38457
>> SC : 49952 >> DC : 37457

ARM

>> SC : 102295 >> DC : 94147
>> SC : 102264 >> DC : 94025
>> SC : 102173 >> DC : 94116
>> SC : 102172 >> DC : 94116
>> SC : 102325 >> DC : 94177

Change '_operation' macro to as follows

#define _operation()                                    \
    do {                                                \
        if (i > _ARRSZ / 2)                             \
            ia[i] = (_arr[i] & 0xff) << 8 ^ _arr[i];    \
        else                                            \
            ia[i] = (_arr[i] & 0xff) << 16 ^ _arr[i];   \
    } while (0)                                         \

[Test 2]
x86

>> SC : 60696 >> DC : 40523
>> SC : 56907 >> DC : 45355
>> SC : 55066 >> DC : 42329
>> SC : 54931 >> DC : 40651
>> SC : 57022 >> DC : 41879

ARM

>> SC : 164514 >> DC : 112671
>> SC : 163971 >> DC : 112854
>> SC : 164521 >> DC : 112976
>> SC : 163940 >> DC : 112732
>> SC : 164245 >> DC : 112671

Interesting result, isn't it?
For heavily-memory-accessing-code (Test 1), ARM does not show good statistics for multi-core (in this case, dual-core) optimization.
But, if not (Test 2), optimization shows quite good results.
And, x86 seems to handle memory accessing from multi-core, quite well.

So, developers should consider ARM's characteristic when optimize codes for multi-core.
(I'm sure that ARM will improve this someday! :-) )

* Things to consider regarding this kind of optimization *
Cache, Cache coherence, Memory Controller, Bus etc...

[ Test for later version of ARM (ex Cortax-A15) will be listed continuously... ]

'Domain > ARM' 카테고리의 다른 글

[ARM] Sample code for unwinding stack with DWARF2 information.  (0) 2010.04.07
[ARM] Sample code for unwinding stack in Thumb mode.  (0) 2010.04.07
[ARM] Unwinding Stack.  (0) 2009.09.15
[ARM] Long jump.  (0) 2007.06.05
[ARM] .init_array section  (0) 2007.03.18

+ Recent posts