* First of all, check your kernel configuration.

- CONFIG_MODULES=y
- CONFIG_MODULE_UNLOAD=y    <- Optional

* Create your Kernel module code and Makefile


If you have luck, everything should be OK. But...


CASE: error at insmod?

insmod: init_module 'hello.ko' failed (Exec format error) in android
kernel message : unknown relocation: 27
Hmm, then you should check kernel functions you are using in the module (ex. printk)
You can check relocation type with following command (ex. printk)

readelf -r hello.ko | grep prink

Then, result may like below

xxxxxxxx  xxxxxxxx R_ARM_PLT32       00000000   printk

For your information : number 27 indicates R_ARM_PLT32.

Then, what's the problem?
ELF for the ARM Architecture(IHI0044E_aaelf.pdf) says that R_ARM_PLT32 is deprecated.
And, Linux kernel doesn't handle R_ARM_PLT32 at module.c : apply_relocation() function
We have two workaround for this issue.

* Update toolchains.
In my case, I faced this issue when I used gcc-4.4.3. And, gcc-4.7 resolved this issue.
(You can see, relocation type is changed into R_ARM_CALL instead of R_ARM_PLT32.)

* fpic / fno-pic HACK.
If you are in trouble with upgrading your toolchains, here is workaround (tested on gcc-4.4.3).
Add, -fpic or -fno-pic at CFLAGS_MODULE in your module Makefile like below.

make CFLAGS_MODULE=-fpic xxxx modules

Then, like above - updating toolchains -, relocation type is changed :-)



현재 영문 번역이 진행중입니다.... (English translation is under construction.)


Big Picture

-----------


     0 +-----------------------+      Block group details

| |     /

  1024 +-----------------------+    / +-------------------------------+

| super block |   / | Super block copy |

  2048 +-----------------------+  / +-------------------------------+

| | / |    Group Descriptor Table |

+-----------------------+/ +-------------------------------+

| Block group 0 | | Block Bitmap |

+-----------------------+ +-------------------------------+

| Block group 1 | | Inode Bitmap |

+-----------------------+ +-------------------------------+

| ... | | Inode Record |

+-----------------------+ +-------------------------------+

| Block group n | +---+--| |-----+

+-----------------------+ |   | | Inode Table |     |

|   | | |<-+  |

|   | +-------------------------------+  |  |

|   | | .... |  |  |

|   | +-------------------------------+  |  |

|   +->| File Content |  |  |

| +-------------------------------+  |  |

| | ... |  |  |

| +-------------------------------+  |  |

+----->| Directory entries |--+  |

+-------------------------------+     |

| ... |     |

+-------------------------------+     |

| File Content |<----+

+-------------------------------+



ext4 raw block에 대해서 분석해 놓은 자료를 찾기 힘든관계로, kernel source를 분석한 내용을 바탕으로 정리해 본다.

(특히 directory쪽에 집중했다.)

==> 개인적인 분석에 근거한 것이므로 정확도는 보장할 수 없다는... -_-;

In my case, it's very difficult to find documents regarding ext4 raw block. So, I described some parts of them based on Linux kernel source codes. (especially, focusing on directory part).
==> This is just personal analysis. So, I can't guarantee correctness.. :-)


File system의 기본 정보는 super block에 저장되어 있다. ext4 super block에 대한 detail은 Linux kernel에서 "struct ext4_super_block"을 참고하면되고 여기서는 몇몇개만 언급해 보면

s_log_block_size    : file system block size (단위 KB). 보통 4. 즉 4KB

s_blocks_per_group  : 하나의 block group을 이루는 block 개수

s_inodes_per_group  :

s_inode_size        : ext4의 경우 256

s_inodes_count      : total inode count

Super block contains basic information of file system. For ext4 super block, struct ext4_super_block of Linux kernel source, is good reference for details. Let me describe some of them.

s_log_block_size    : size of file system block size (KB). usually 4 - 4KB

s_blocks_per_group  : number of blocks in a 'block group'

s_inodes_per_group  :

s_inode_size        : 256 in case of ext4

s_inodes_count      : total inode count



ext4의 경우 기존의 indirect block의 개념 + extent라는 개념을 사용하고 있다.

ext4 uses existing indirect-block + extent.

(details for indirect block : http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-blocks-in-unix-file-systems/)


먼저 http://computer-forensics.sans.org/blog/2010/12/20/digital-forensics-understanding-ext4-part-1-extents <= 이곳의 내용을 읽어보도록 하자.

Please read above link before moving forward.


아래는 ext4.h 에서 발췌한 내용이다.

Followings comes from ext4.h


... (생략 skip) ...

/*

 * Constants relative to the data blocks

 */

#define EXT4_NDIR_BLOCKS 12

#define EXT4_IND_BLOCK EXT4_NDIR_BLOCKS

#define EXT4_DIND_BLOCK (EXT4_IND_BLOCK + 1)

#define EXT4_TIND_BLOCK (EXT4_DIND_BLOCK + 1)

#define EXT4_N_BLOCKS (EXT4_TIND_BLOCK + 1)


...(생략 skip)...

struct ext4_inode {

... (생략 skip) ...

__le32 i_blocks_lo; /* Blocks count */

... (생략 skip) ...

__le32 i_block[EXT4_N_BLOCKS]; /* pointers to blocks */

... (생략 skip) ...

}


ext4의 inode에서 실제 block을 나타내기 위해서 60 bytes (sizeof i_block)를 사용하고 있는다.

여기서 EXT4_N_BLOCKS == 15 이므로 15 * 4 = 60 이 block을 가리키는 포인터로 사용된다.

전통적인 direct/indirect block을 위해서라면, 15 index를 사용할 수 있다.

60 bytes (sizeof i_block) is used to represent real block at ext4-inode.

EXT4_N_BLOCKS == 15. Therefore 15 * 4 = 60 is used as pointers indicating real block.

For direct/indirect block, 15 indexes are available.


i_blocks_lo 의 경우 i_blocks_high와 함께 이 inode에 할당된 block의 count를 가지고 있다.

여기서 주의할 점의 여기서 사용되는 block의 size는 일반적인 super block에 저장된 그 block size(보통 4KB)가 아니라, 512 bytes라는 점이다. ext4/inode.c를 보면, "<< 9" 혹은 ">> 9"등의 표현을 종종 볼 수 있는데, 이것이 바로 512 bytes 단위를 표현하기 위한 것이다. 추가적으로 아래를 참고하자.
inode_add_bytes() : fs/stat.c ==> 512라는 숫자가 직접적으로 등장하기도 한다.
reserve_backup_gdb() : ext4/resize.c ==> "inode->i_blocks += reserved_gdb * sb->s_blocksize >> 9;"

i_blocks_lo and i_blocks_high has count of blocks allocated for the inode.

One thing to note, is block size here is NOT 4KB - usually mentioned at super block - but 512 bytes.

Expressions like "<< 9" and ">> 9" are often shown at ext4/inode.c, and this represents 512 bytes block size.

For additional information, please see

inode_add_bytes() : fs/stat.c ==> number '512' is shown up explicitly.

reserve_backup_gdb() : ext4/resize.c ==> "inode->i_blocks += reserved_gdb * sb->s_blocksize >> 9;"


한가지 더 짚고 넘어갈 내용은, i_blocks_lo가 meta data를 위한 block까지 포함하느냐? 의 문제이다.

ext4_ext_swap_inode_data : ext4/migrate.c를 보면, 아래의 코드라인이 보인다.

One more thing to consider is whether i_blocks_lo includes block for meta data or not?
Following lines of codes are observed at 
ext4_ext_swap_inode_data : ext4/migrate.c

/*

 * Update i_blocks with the new blocks that got

 * allocated while adding extents for extent index

 * blocks.

 *

 * While converting to extents we need not

 * update the orignal inode i_blocks for extent blocks

 * via quota APIs. The quota update happened via tmp_inode already.

 */

spin_lock(&inode->i_lock);

inode->i_blocks += tmp_inode->i_blocks;

즉 meta data를 위한 block역시 i_blocks_lo에 count된다는 뜻이다.

주의 : 그렇지만, meta data는 file의 real data가 아니므로 logical block index번호를 가지지는 않는다.
That is, i_blocks_lo includes blocks for meta data.
Note : But, meta data is NOT real data of file. So, it doesn't have logical block index.

inode의 extent의 경우 60 bytes를 아래와 같은 format으로 사용한다.

Extent for inode uses 60 bytes as following format.


|<- 12 bytes ->|<- 12 bytes ->|<- 12 bytes ->|<- 12 bytes ->|<- 12 bytes ->|

+--------------+--------------+--------------+--------------+--------------+

|    header    |    extent0   |    extent1   |    extent2   |    extent3   |

+--------------+--------------+--------------+--------------+--------------+


관련 code는 아래와 같다 (from ext4_extents.h)

Here are related code (from ext4_extents.h)

/*

 * ext4_inode has i_block array (60 bytes total).

 * The first 12 bytes store ext4_extent_header;

 * the remainder stores an array of ext4_extent.

 */


/*

 * This is the extent on-disk structure.

 * It's used at the bottom of the tree.

 */

struct ext4_extent {

__le32 ee_block; /* first logical block extent covers */

__le16 ee_len; /* number of blocks covered by extent */

__le16 ee_start_hi; /* high 16 bits of physical block */

__le32 ee_start_lo; /* low 32 bits of physical block */

};


/*

 * This is index on-disk structure.

 * It's used at all the levels except the bottom.

 */

struct ext4_extent_idx {

__le32 ei_block; /* index covers logical blocks from 'block' */

__le32 ei_leaf_lo; /* pointer to the physical block of the next *

* level. leaf or next index could be there */

__le16 ei_leaf_hi; /* high 16 bits of physical block */

__u16 ei_unused;

};


/*

 * Each block (leaves and indexes), even inode-stored has header.

 */

struct ext4_extent_header {

__le16 eh_magic; /* probably will support different formats */

__le16 eh_entries; /* number of valid entries */

__le16 eh_max; /* capacity of store in entries */

__le16 eh_depth; /* has tree real underlying blocks? */

__le32 eh_generation; /* generation of the tree */

};


하나의 extent는 연속된 physical block을 나타낸다.

ee_start_hi / ee_start_lo

start block index. ext4는 48bit block index를 사용하고 있다는 것을 상기하자.

sizeof(ee_start_hi) == 2, sizeof(ee_start_lo) == 4

ee_len

length of continous block. __le16 type이므로, 최대 32768개의 연속 block을 나타낼 수 있고, 4KB block size일 경우, 최대 4K * 32768 = 128M 를 표시할 수 있다.

eh_max

현재 block에서 가질 수 있는 max entry 수. entry 수가 이 값을 넘어가면 extent/index 를 split해야 한다.

size = (inode->i_sb->s_blocksize - sizeof(struct ext4_extent_header))

/ sizeof(struct ext4_extent);

--- or ---

size = sizeof(EXT4_I(inode)->i_data);

size -= sizeof(struct ext4_extent_header);

size /= sizeof(struct ext4_extent_idx);

--- or ---

size = sizeof(EXT4_I(inode)->i_data);

size -= sizeof(struct ext4_extent_header);

size /= sizeof(struct ext4_extent);


One extent represents continuous physical blocks.

ee_start_hi / ee_start_lo

start block index. remind that ext4 uses 48bit for block index.

sizeof(ee_start_hi) == 2, sizeof(ee_start_lo) == 4

ee_len

length of continuous block. It's type is __le16. Therefore it can represent maximum 32768 continuous blocks. And in case that block size is 4KB, it can represent maximum 4K * 32768 = 128MB.

eh_max

maximum number of entries in current block. extent/index should be splitted if number of entries exceeded this value.



따라서, 4KB block size를 가정하면, ext4_inode의 i_block array는

최악의 경우 : continuous한 block이 없을 경우 (즉 하나의 extent가 오직 한 block만 가리킨다. ee_len = 1)

4K * 4 = 16KB

최상의 겨우 : ee_len = 32768

128M * 4 = 512M

까지 나타낼 수 있다.
So, assuming 4KB block size, i_block array of ext4_inode can represent
    worst case : There is no continuous block. (That is, one extent can indicate only one block. ee_len = 1)
       
4K * 4 = 16KB
    best case : ee_len = 32768 

128M * 4 = 512M


이 한계를 극복하기 위해서 HTree방식의 extent tree가 사용된다.

아래는 ext4의 extent tree의 layout이다.(출처 : ols2007v2-pages-21-34.pdf from kernel.org)
To win over this limitation, HTree algorithm is used for extent tree.
Following picture is layout of ext4 extent tree.(The source : 
ols2007v2-pages-21-34.pdf from kernel.org)
(http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2&ved=0CEwQFjAA&url=http%3A%2F%2Fkernel.org%2Fdoc%2Fols%2F2007%2Fols2007v2-pages-21-34.pdf&ei=PLHWT_rUOayUiAew86mSAw&usg=AFQjCNF8Yzcg4Z5VRsEGWLMCEWNEJGx2sw&sig2=SDilL5_ETwCMRA9--mQ8lA)

이런 방식의 위의 한계를 극복하고 있다.

This is the way to recover above limitation.


inode의 extent는 60 bytes만 사용가능하지만, leaf nodes의 extent의 경우는 한 block전체를 사용하므로, 처음 12 bytes header를 제외하면, 그 뒤는 쭉~ "struct ext4_extent"의 array이다. array의 크기는 header->eh_entries에 저장되어 있다.

따라서, visit algorithm은
Only 60 bytes are available for inode extent. But, one entire block can be used for leaf node extent. Therefore, excluding first 12 bytes header, all are array of struct ext4_extent.
Length of array is stored at header->eh_entries.
Therefore, visiting algorithm is


struct ext4_extent_header *eh = (struct ext4_extent_header *)block;

struct ext4_extent         *e = (struct ext4_extent *)((char *)block + sizeof(*eh))

struct ext4_extent      *eend = e + eh->eh_entries;

while (e < eend) {

... do something ..

e++;

}


index node block의 경우 leaf node와 마찬가지로 12 bytes header가 처음에 오고, 그 뒤로 "struct ext4_extent_idx"의 array가 나타난다.

따라서 visit algorithm은
In case of index node block, like leaf node, 12 bytes header follows by array of struct ext4_extent_idx.
Therefore visiting algorithm is


struct ext4_extent_header *eh = (struct ext4_extent_header *)block;

struct ext4_extent_idx    *ei = (struct ext4_extent_idx *)((char *)block + sizeof(*eh))

struct ext4_extent     *eiend = ei + eh->eh_entries;

while (ei < eiend) {

... do something ..

/* "(ei->ei_leaf_hi << 32) | ei->ei_leaf_lo" 는 child block을 가리킨다. */
/* ei->ei_leaf_hi << 32 | ei->ei_leaf_lo means child block. */

ei++;

}


그렇다면, 현재 extent block이 leaf node인지 아니면 index node인지는 어떻게 판단할 수 있는가?
Then, how can we know a certain extent block is whether leaf node or index node?


if "header->eh_depth > 0" than "index node"

else if "header->eh_depth == 0" than "leaf node"


따라서, extent tree를 따라가는 algorithm은
So, algorithm for visiting extent tree is


struct ext4_extent_header *eh = (struct ext4_extent_header *)block;

if (eh->depth)

do_index_node(eh, block)

else

do_leaf_node(eh, block)


본격적으로 directory의 구조로 들어가 보자 (extent를 사용하는 경우를 가정하자.)

directory 역시 file이므로 위의 기본 구조를 따른다.

file의 block은 file의 content를 담고 있고, directory 역시 directory content를 담고 있는데, 다음과 같은 format을 따른다.
Now it's time for talking about directory structure (assuming the case using extent).
directory is also file. So, basic structure is same with the one we mentioned so far.
file block has file content, and directory also has directory content. Format is like below.


Structure across 2 blocks

-------------------------


|<------------------------- block 0 ------------------------->|<------------------ block 1--------------------->|

+----+-----------+--------+-----------------------------------+-----------+--------+----------------------------+

|... | struct DE | <name> |  padding                          | struct DE | <name> | ...                        |

+----+-----------+--------+-----------------------------------+-----------+--------+----------------------------+

|... | rec_len = 8 + name + 4 byte align + "len of padding"   |           same as normal (see above)            |

+----+-----------+--------+-----------------------------------+-------------------------------------------------+


< NOTE : 여기서 '8'은 'name' field를 제외한 sizeof(struct ext4_dir_entry_2)를 나타낸다.) >
< NOTE : 8 means sizeof(struct ext4_dir_entry_2) excluding name field. >


대부분의 경우 struct DE == struct ext4_dir_entry_2 를 나타낸다.
Most cases, struct DE means struct ext4_dir_entry_2.


#define EXT4_NAME_LEN 255


struct ext4_dir_entry_2 {

 __le32 inode;                 /* Inode number */

 __le16 rec_len;               /* Directory entry length */

 __u8 name_len;                /* Name length */

 __u8 file_type;

 char name[EXT4_NAME_LEN];     /* File name */

};


위에서 보듯이 ext4에서 file name의 최대 길이는 255이다.
As you can see above, maximum length of file name at ext4 is 255.

즉, directory block의 경우, struct ext4_dir_entry_2 구조가 계속해서 저장되는데, 현재 DE(directory entry)의 위치에서 'rec_len'만큼 더하면, 다음 DE로 갈 수 있다.
따라서, block의 마지막 DE의 rec_len는 항상 "현재 block의 마지막 / 다음 block의 시작" 을 가리킨다.
In case of directory block, ext4_dir_entry_2 structures are stored in sequence. So, we can move to next DE(directory entry) by adding rec_len to current DE.
The last DE of a certain block always means "end of current block / begin of next block".

한가지 주목할 사항은 'inode' 값에 대한 내용인데, '0'은 invalid한 directory entry를 뜻한다.
특히 delete algorithm 을 살펴보면 (For details : ext4_delete_entry() at ext4/namei.c from kernel)
One important thing to know is meaning of inode value. 0 inode value means invalid directory entry.
Especially, you can observe that below algorithm is used to delete file (for details : ext4_delete_entry() at ext4/namei.c from kernel)

< block 중간의 entry - DE1 - 를 삭제하는 경우 >
< Case : Entry in the middle of block - DE1 - is deleted >
+-----+-----+-----+-----+       +-----+-----------+-----+
| ... | DE0 | DE1 | ... |  ==>  | ... |    DE0    | ... |
+-----+-----+-----+-----+       +-----+-----------+-----+
--> 이 경우, DE0->rec_len += DE1->rec_len 가 된다.
--> In this case, DE0->rec_len += DE1->rec_len.

< block의 첫번재 entry - DE0 - 를 삭제하는 경우 >
< Case : The first entry of block - DE0 - is deleted >
+-----+-----+       +---------------+-----+
| DE0 | ... |  ==>  | DE0 <invalid> | ... |
+-----+-----+       +---------------+-----+
--> "DE0->inode = 0" 을 통해서 DE0를 invalid entry로 mark한다.
--> DE0 is marked as invalid by "DE0->inode = 0" statement.

정리해 보면, directory entry를 읽는 대략적인 알고리즘은 다음과 같다.
In summary, rough algorithm for reading directory is
(For details : ext_readdir() at ext4/dir.c from kernel)

struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)block;
struct ext4_dir_entry_2 *deend = block + block_size; /* block size is usually 4KB */
while (de < deend) {
if (de->inode) { /* check that this is valid or not */
... do something ...
}
de = (struct ext4_dir_entry_2 *)((char*)de + de->rec_len);
}


.... 음 .... 일단 여기까지... 나중에 좀더 정리해야겠당....
... to be continued...

Kernel build system defines KBUILD_MODNAME automatically from each Makefile.
You can easily find below string from build command of each files of modules.

-D"KBUILD_MODNAME=KBUILD_STR(xxxxx)"
('xxxxx' is given module name)

This name xxxxx comes from Makefile.
Below is simple example.

[ in Makefile ]
xxxxx-y := a.o b.o c.o d.o

In above case, KBUILD_MODNAME of a.cb.cc.c and d.c becomes xxxxx.
This can be easily confirmed by checking command line of each objects - a.ob.oc.o and d.o.

In Linux, usually, kernel crashes with so-called Oops report.
This Oops report includes lots of useful information for debugging.

Therefore, I have seen lots of developer suffering from analyzing register, memory and stack dump in the report.
To analyze dumped information, memory information should be matched with source code.
But, this is not easy process.
So, I want to introduce the way to do this easily.

Main concept is, we can make tool to parse Oops report and pass these to debugging tool.
Here is introduction of my case.
I uses TRACE32 software - ARM simulator for debugging tool.
What I did is, implementing simple Perl script that parses Oops report and make cmm file that set register and memory information given by the report.
For example, auto generated cmm file is like this.

R.S cpsr 0x20000013
R.S r0 0x0
R.S r1 0x0
...
D.S 0xc035a248 %long 0xe3a02000
D.S 0xc035a24c %long 0xe3a03020
...

It's time to use TRACE32 PowerView for ARM to analyze the report.
Launching t32marm with simulator mode -> Loading issued 'vmlinux' -> Runnig auto-generated cmm script
Finally, I can see stack frame with local variables and interpreted memory information by virtue of T32.

I'm sorry not to share parsing tool - Perl script - due to ... as you know, something like legal issue... (I hate this.)
I hope this post is helpful for others...

In SMP system, to save power, not all cpus are active if system is not heavy loaded.
But let's image this case.
System is not heavy loaded - only cpu0 is active, but at some moment IRQ is issued very often in short time.
In this case, some of issued IRQs may be abandoned.

Let's assume a IC does abnormal operation if issued interrupt is not handled by MPU and system status is like above.
Then, system works abnormally when it is not loaded, but system works well when it is loaded.
(This is opposite of usual case. Most case, system may do abnormal operation when it is loaded.)
default IRQ affinity is usually, masked for all cores - IRQ can be handled by any cpu.
And, more than one cpus are active when system is loaded. So, there is less chance for IRQ to be abandoned - these are several cpus to handle!

This case shows very interesting issues at system!

There is function for sleep in kernel - ex. msleep(). And schedule_timeout() is used in those.
And, schedule_timeout() calculates time for expiration based on current jiffies.
But some functions disable local irq or irq for a while in it.
For example, printk does this - disabling local irq.
But, jiffies is updated based on local timer irq (In case of SMP, one defined core - usually cpu0 - is used.)
So, disabling local irq - for timer core - may prevent system from updating jiffies.
At this moment, calling schedule_timeout() leads to set timer which expiring time is earlier than it should be because jiffies are behind of real time.
So, for example, msleep(100) may wake up after 50ms at this scenario.
This is very dangerous.
But, actually, lots of code uses jiffies directly in kernel.
So, I'm not sure that all those codes are safe in case that jiffies is behind.
Anyway, here is my sample fix for this potential issue of schedule_timeout().
I couldn't find any fix regarding this issue even in kernel-3.0.
So, I'm not sure I missed something.
But, as for me, this is definitely issue to consider when variable jiffies is directly used.
I hope that my sample fix regarding schedule_timeout is helpful for others.
(In my opinion, fundamentally, all codes that uses jiffies directly should be re-examined whether it is really safe or not in case that jiffies is behind.)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 6811f4b..6c89958 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -290,6 +290,8 @@ extern unsigned long preset_lpj;

 #endif

+extern unsigned long exact_jiffies(void);
+
 /*
  * Convert various time units to each other:
  */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 56f87fa..37e7bbf 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -99,6 +99,32 @@ static ktime_t tick_init_jiffy_update(void)
        return period;
 }

+/**
+ * exact_jiffies - return real jiffies value
+ *
+ * You can't sure that value from %jiffies varaible is real current time.
+ * Jiffies may not be updated for a while due to several reasones.
+ * So, to get exact value, current ktime and %last_jiffies_update should be used
+ */
+unsigned long exact_jiffies(void)
+{
+       unsigned long exact = jiffies;
+       if (tick_period.tv64
+           && last_jiffies_update.tv64) {
+               unsigned long seq, ticks;
+               ktime_t delta;
+               do {
+                       seq = read_seqbegin(&xtime_lock);
+                       delta = ktime_sub(ktime_get(), last_jiffies_update);
+                       ticks = ktime_divns(delta, ktime_to_ns(tick_period));
+                       /* +1 to compensate loss at division */
+                       exact = jiffies + ticks + 1;
+               } while (read_seqretry(&xtime_lock, seq));
+       }
+       return exact;
+}
+EXPORT_SYMBOL_GPL(exact_jiffies);
+
 /*
  * NOHZ - aka dynamic tick functionality
  */
diff --git a/kernel/timer.c b/kernel/timer.c
index c4714a6..628a714 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1304,7 +1304,6 @@ void run_local_timers(void)
  * without sampling the sequence number in xtime_lock.
  * jiffies is defined in the linker script...
  */
-
 void do_timer(unsigned long ticks)
 {
        jiffies_64 += ticks;
@@ -1457,7 +1456,7 @@ signed long __sched schedule_timeout(signed long timeout)
                }
        }

-       expire = timeout + jiffies;
+       expire = timeout + exact_jiffies();

        setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
        __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
@@ -1467,6 +1466,13 @@ signed long __sched schedule_timeout(signed long timeout)
        /* Remove the timer from the object tracker */
        destroy_timer_on_stack(&timer);

+       /*
+        * Reaching here means "timer interrupt is issued
+        *   and 'jiffies' is updated."
+        * So, 'jiffies' here is recently-updated-value
+        *   and 'jiffies' can be directly used
+        *   instead of using 'exact_jiffies()'
+        */
        timeout = expire - jiffies;

  out:

done.

To use sysfs, enable kernel config : CONFIG_GPIO_SYSFS
Core files related with this : gpiolib.c at Kernel
sysfs nodes can be found at /sys/class/gpio

[ Prerequisite ]
writing value to '/sys/class/gpio/export' uses 'gpio_request()' function at gpiolib in kernel.
So, if given gpio is already requested by other owner, exporting is failed.
Therefore, gpio should not be owned by others in kernel to control gpio at user space.

[ How to use at user space ]
Assumption : current working directory is '/sys/class/gpio'

#> echo [gpio num] > export
=> export give gpio to sysfsgpioN link is newly created if gpio number is valid one.

#> cd gpioN
=> Then, several nodes (active_lowdirectionvalue etc) can be found.
Writing or reading following nodes, is one of useful way to debug/control gpio at user space.
(attribute functions for each node are defined at gpiolib.c)

[ Detail Example ]

* at /sys/class/gpio
export, unexport : gpio number
#> echo 86 > export
#> echo 86 > unexport

* at /sys/class/gpio/gpioN
direction : in, out, low, high
value : 0, 1
edge : none, rising, falling, both
active_low : 0, 1
#> echo in > direction
#> echo 1 > value
#> echo both > edge

< See gpio.txt of Kernel document for details about each sysfs node. >

Linux kernel uses lot's of sections to modularize it's code structure easily.
In case of kernel module and parameter, following section is used.

[ Module ]
macro   : module_init()
section : device_initcall (section name : .initcall6.init) <see init.h>
[ Parameter ]
macro   : module_param_named(), module_param(), core_param() etc.
section : __param <see moduleparam.h>

core_param() macro works like other module parameter. But this is NOT for module BUT for kernel booting parameter.
Kernel parameter doesn't have any prefix unlike other modules.
Module parameter's name has <module name> as it's prefix.
For example, parameter <param> of module <module> has name <module>.<param>
See source code for details.

KBUILD_MODNAME is preprocessor value for module name.
This is defined through complex script processing. See Makefile.libMakefile.mod* in scripts/.
But in most case, you can know it by intuition. That is,  name that seems like module name, is set as KBUILD_MODNAME.
(ex. <mod name>-y<mod-name>-m in Makefile)
So, usually, deep analysis about above scripts are not required.

Note:
It's possible that one object gets potentially linked into more than one module.
In that case KBUILD_MODNAME will be set to  foo_bar, where foo and bar are the name of the modules.

Module and parameter initialization.

[ Built-in module ]
module initialization : do_initcalls() <see 'main.c'> => initcall section is used
parameter sysfs node  : param_sysfs_init() -> param_sysfs_builtin() => __param section is used
[ Dynamic module ]
module & parameter initialization is done at system call
SYSCALL_DEFINE3(init_module, ...) => load_module()

Each parameter has permission mask. So, parameter value can be read or written at runtime through sysfs node.

/sys/module/<module name>/parameter/<parameter name>

But module parameter whose  permission is 0, is ignored at sysfs (Not shown at sysfs).
( See module_param_sysfs_setup(...) function. )

⚫ Sendig kobject uevent to user space from kernel
    ⚬ Concept
        • Execute user-space binary with predefined + customized environment variables.
        • Send message to pre-defined netlink socket unit created by kernel.

    ⚬ From souce code (hotplug)
        • kernel node [kernel/sysctl.c]
        : /proc/sys/kernel/hotplug

        • kobject_uevent_env() [kobject_uevent.c]
            default user process executable image path
                -> CONFIG_UEVENT_HELPER_PATH
        • kernel/sysctl.c
            'uevent_helper' value can be RWed through 'hotplug' node of sysctl.

    ⚬ From souce code (netlink)
        • kobject_uevent_env() [kobject_uevent.c]
            : #if defined(CONFIG_NET)
            write message to netlink socket unit (NETLINK_KOBJECT_UEVENT)
        • netlink.h
            netlink units
        • Creating in userspace
            socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT)
⚬ Concept
    - Creating Kernel Thread
    - Fork by call 'do_fork' directly (because, this is inside kernel!)
    - Exec by call 'do_execve'

⚬ Code walkthrough
    • kmod.h
        call_usermodehelper / call_usermodehelper_fns
            call_usermode_helper_setup(...)
            call_usermode_helper_setfns(...)
            call_usermodehelper_exec(...)

    • kmod.c
        call_usermode_helper_setup()
            -> setup subprocess infos.
                : path to execute, arguments, env variables etc
            -> set 'work' function of 'workqueue' to '__call_usermodehelper'
                : INIT_WORK(&sub_info->work, __call_usermodehelper);
        call_usermode_helper_setfns()
            -> setup subprocess functions
        call_usermodehelper_exec()
            -> queue this work to 'khelper_wq' workqueue.
        __call_usermodehelper() /* <- in work queue (khelper_wq) */
            pid = kernel_thread(___call_usermodehelper, sub_info, CLONE_VFORK | SIGCHILD);
        kernel_thread(...)
            : set pc to 'kernel_thread_helper'- assembly function
            pid = do_fork(flags|CLONE_VM|CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
        kernel_thread_helper
            : Shuffle the arument into the correct register before calling the thread function
            : set pc to given thread function (in this example, '___call_usermodehelper()' function.)
        ___call_usermodehelper()
            kernel_execve(sub_info->path, sub_info->argv, sub_info->envp);

    • sys_arm.c
        kernel_execve()
            do_execve(...)
            : make it to user space process!

+ Recent posts