Preface
IPC (InterProcess Communication) mechanisms was introduced
in UNIX System V.2 by AT&T corp. It consist of:
- semaphores,
- shared memory,
- message queues,
All of above are often called "System V IPC". We're interested in shared memory
implementation. For details of use I recommend N. Matthew and R. Stones book -
"Beginning Linux Programming". Basically, the idea is quite simple: one program requests
for memory block and then gets from operating system two things - pointer to that memory
and special unique identifier. Next, every other processes that want to gain in access to
shared memory must use that unique identifier to "attach memory" and gets pointer in
their own address spaces. Every processes can use shared memory exactly as if it was
allocated by malloc() function. If one process performs write to shared memory,
changes are immediately visible for every other processes having access to it. If work
with shared memory is done, it has to be detached by all processes and deallocated
by exactly one. Shared memory is very efficient way of passing data between many non
related processes, but have some limits:
- doesn't have it's own synchronization method (thus programmers are using semaphores,
or messages sending via pipes to synchronize),
- number of shared memory blocks and total blocks size is limited,
According to above limits (especially number of shared memory blocks limit) we can't treat
request for new shmem block and malloc() function equally. That's not all - we
must remember that every request for memory (it is not important which type) is very
expensive. If we want built efficient, dynamic data structure, we have to take care of
appropriate management of one continuous data block.
Let's characterize what we want from our data-structure:
- it has to be multi-accessible. More precise: we want to allow one process
writing to it, and many (at least two) processes reading it at a same time,
- writing process must have possibly fastest random access to every part of data,
- writing process must have possibility of deleting, changing existing or adding new
portions of data,
- adding, searching and deleting data must be as fast as possible,
- every "reading process" must be able to read data in some logical order: from
"beginning" to "end",
- it has to be guaranteed that "path" from logical begin of memory to logical "end"
will be available in every moment,
- when "reading process" finishes reading memory it have to check if amount of data it
reads are equal to data stored in memory (writing process could make some changes
during "reading process" work),
Multi - accessibility
Multi-accessibility to our data structure is provided by
shared memory IPC mechanism. Must take into consideration, that shared memory is
a special addresses range, created by IPC and appearing in processes user spaces.
The point is, that physical memory pointers (pointing to the same shared memory
segment) peculiarly differs in different processes. So, if we want to "recover"
original producer pointer value (it is explained later, for what reason) we
have to store it in shared memory. Thus we describe SHMEM
header structure:
typedef struct shmem_header_t{
void *producer_mem_point;
void *head;
size_t n_buckets;
size_t used_buckets;
size_t bucket_size;
} shmem_header_t;
|
This kind of structure will be placed at "top" of every shared memory block. Because
above type is known at compilation time, every "memory consumer" can cast his own shared
memory pointer to this type and gain access to every information stored in header.
Random access
Fast random access can be achieved by Hash Table implementation.
Instead of storing "real data", every hash-table bucket is storing pointer to appropriate
place in shared memory. Such hash table is created in "producer's" private memory, thus
accessible only for him.
Modifying data
When shared memory block is created (let's assume N buckets),
the corresponding cyclic list (build of N elements pointing to free buckets in shared memory)
is created in producer's private memory. Therefore operating on memory can be described
as follows:
- adding new bucket to memory:
- pointer to free bucket is taken from cyclic's list "walking head",
- "walking head" is moved to next logical position,
- appropriate data is copied to shared memory,
- every used shared memory bucket contains pointer to next used bucket (or NULL),
so appropriate pointers (newly allocated bucket, logically previous bucket and eventually
"head" pointer in SHMEM header structure) is set,
- using hash-function (we're using Phong's congruential linear hash implementation),
based on key of actually adding bucket, hash-code is generated,
- pointer taken from "free nodes list" (in p. 1.) is stored in hash_table,
- searching for bucket (testing if it exists):
- based on currently searched bucket key, hash-code is generated,
- if in hash table such hash code isn't empty, and keys (searched one and founded
in hash table) are equal, bucket exists.
- deleting bucket:
- check if bucket exists,
- update "pointer to next used bucket" in logically previous bucket in shmem,
- add pointer to "free nodes list" to element pointed by "walking tail",
- move "walking tail" to next logical position,
As you can see, all above functions boil down to some operations on pointers and
calculating hash-codes. In fact, only cost is in copying memory (when adding a new
bucket).
Reading data
As mentioned above every used shared memory bucket has pointer
to logically next bucket. Every "consumer" that connects to shared memory, can read
"head" pointer (to first used bucket) and then pass whole list till NULL... But - little
nuance is hiding here. When we was talking about multi-accessibility we were saying
that pointer to beginning of shared memory block can have different values in individual
processes.
Let's assume that we have two buckets in shared memory. One bucket
have pointer called "next" and pointing it to second bucket. We have two independent
processes which have pointers to first bucket. First process is "writer" and it created both
buckets (especially it sets value of "next" pointer in first bucket). Second process is
"reader" and it is trying to read buckets data from shared memory. If it tries to read first
bucket - everything will be OK. But then, it tries to read data from place pointed by "next"
pointer of first bucket and it gets segmentation fault. This is because both processes
have different address spaces assigned by system. The way getting appropriate pointers
from "reader" process is count difference between "writer" and "reader" pointer to first
bucket values and then add that difference to every pointer value. That's why we have
"producer_mem_point" pointer stored in SHMEM header.
Error immunity
Taking into consideration, that our data structure is multi-accessible
(means that many uncommon processes can operate on one portion of data at the same time) we
have to prevent deadlocks, races and starvation. Normally we would use semaphores or
any other synchronization techniques, but in this case it isn't necessary (or rather plain
unwanted). Let's review how our shared memory is used:
- one memory producer with permission to write (having all "helper" structures described
above), doing random memory modifications,
- many (in netraf project only two) readers, "walking" through list,
Please notice, that efficiency of
netrafd daemon
strongly depends on shared memory structure random access speed. If we were using semaphore
for synchronization purposes, it could happen that one of "reading process" closes semaphore
and starts reading. During that time, system's planner can assign CPU time for "writing process".
If there were some ethernet packets waiting for processing, "writing process" will try to close
semaphore and will hung. What's more - there could be more "reading processes" waiting, so during
that time "writing process" loose many packets (I'm omitting starving situation - IPC semaphores
implementation assures that processes waiting in queue to close semaphore will be chosen
randomly). It is unacceptable. Our shared data structure must allow "writing process" doing it's
job immediately in every time quantum. This entails following problems: what happens if during
reading "writing process" changes some data (maybe several times)? After a moment of thinking,
we can specify following situations:
- just after reading, it could be more data buckets in memory than "reading process" read,
- it could be less data buckets in memory than "reading process" read,
- number of buckets read and that stored in shared memory structure are equal, but data
differs,
Solution of above problems isn't easy in general situation. But in our case it is simple; we
must remember, that "reading processes" are renewing their work periodically (in particular
netrafg renewing reading process form scratch
about 10 times per second). So... we've decided that we can put "used_buckets" variable
into SHMEM header structure and apply comparing
rule to every reader: depending on application of reader (some readers can perform reading
often - such as netrafg, some of them can read
memory relatively rarely - such as netrafd)
there will be some "epsilon" value. If difference between number of read buckets and
"used_buckets" value (as for absolute value) will be greater than epsilon, all read buckets
are treated as "false" and reading process is started immediately.
M.S.
|