netraf
A Network Analyzer and Traffic Logger.


netraf Shared Memory Model


Shared Memory Model


Preface

    IPC (InterProcess Communication) mechanisms was introduced in UNIX System V.2 by AT&T corp. It consist of:

  • semaphores,
  • shared memory,
  • message queues,

All of above are often called "System V IPC". We're interested in shared memory implementation. For details of use I recommend N. Matthew and R. Stones book - "Beginning Linux Programming". Basically, the idea is quite simple: one program requests for memory block and then gets from operating system two things - pointer to that memory and special unique identifier. Next, every other processes that want to gain in access to shared memory must use that unique identifier to "attach memory" and gets pointer in their own address spaces. Every processes can use shared memory exactly as if it was allocated by malloc() function. If one process performs write to shared memory, changes are immediately visible for every other processes having access to it. If work with shared memory is done, it has to be detached by all processes and deallocated by exactly one. Shared memory is very efficient way of passing data between many non related processes, but have some limits:

  • doesn't have it's own synchronization method (thus programmers are using semaphores, or messages sending via pipes to synchronize),
  • number of shared memory blocks and total blocks size is limited,

According to above limits (especially number of shared memory blocks limit) we can't treat request for new shmem block and malloc() function equally. That's not all - we must remember that every request for memory (it is not important which type) is very expensive. If we want built efficient, dynamic data structure, we have to take care of appropriate management of one continuous data block.

    Let's characterize what we want from our data-structure:
  • it has to be multi-accessible. More precise: we want to allow one process writing to it, and many (at least two) processes reading it at a same time,
  • writing process must have possibly fastest random access to every part of data,
  • writing process must have possibility of deleting, changing existing or adding new portions of data,
  • adding, searching and deleting data must be as fast as possible,
  • every "reading process" must be able to read data in some logical order: from "beginning" to "end",
  • it has to be guaranteed that "path" from logical begin of memory to logical "end" will be available in every moment,
  • when "reading process" finishes reading memory it have to check if amount of data it reads are equal to data stored in memory (writing process could make some changes during "reading process" work),

Multi - accessibility

    Multi-accessibility to our data structure is provided by shared memory IPC mechanism. Must take into consideration, that shared memory is a special addresses range, created by IPC and appearing in processes user spaces. The point is, that physical memory pointers (pointing to the same shared memory segment) peculiarly differs in different processes. So, if we want to "recover" original producer pointer value (it is explained later, for what reason) we have to store it in shared memory. Thus we describe SHMEM header structure:

/** header of shared memory */
typedef struct shmem_header_t{

   /** based on this pointer every reader
   can calculate offset to his mapped memblock */
   void *producer_mem_point;

   /** pointer to first used bucket */
   void *head;

   /** number of buckets allocated in memory */
   size_t n_buckets;

   /** number of "used" buckets */
   size_t used_buckets;

   /** we don't know what kind of structures will be stored,
   thus we need their size */
   size_t bucket_size;

} shmem_header_t;

This kind of structure will be placed at "top" of every shared memory block. Because above type is known at compilation time, every "memory consumer" can cast his own shared memory pointer to this type and gain access to every information stored in header.


Random access

    Fast random access can be achieved by Hash Table implementation. Instead of storing "real data", every hash-table bucket is storing pointer to appropriate place in shared memory. Such hash table is created in "producer's" private memory, thus accessible only for him.


Modifying data

    When shared memory block is created (let's assume N buckets), the corresponding cyclic list (build of N elements pointing to free buckets in shared memory) is created in producer's private memory. Therefore operating on memory can be described as follows:

  • adding new bucket to memory:
    1. pointer to free bucket is taken from cyclic's list "walking head",
    2. "walking head" is moved to next logical position,
    3. appropriate data is copied to shared memory,
    4. every used shared memory bucket contains pointer to next used bucket (or NULL), so appropriate pointers (newly allocated bucket, logically previous bucket and eventually "head" pointer in SHMEM header structure) is set,
    5. using hash-function (we're using Phong's congruential linear hash implementation), based on key of actually adding bucket, hash-code is generated,
    6. pointer taken from "free nodes list" (in p. 1.) is stored in hash_table,

  • searching for bucket (testing if it exists):
    1. based on currently searched bucket key, hash-code is generated,
    2. if in hash table such hash code isn't empty, and keys (searched one and founded in hash table) are equal, bucket exists.

  • deleting bucket:
    1. check if bucket exists,
    2. update "pointer to next used bucket" in logically previous bucket in shmem,
    3. add pointer to "free nodes list" to element pointed by "walking tail",
    4. move "walking tail" to next logical position,

As you can see, all above functions boil down to some operations on pointers and calculating hash-codes. In fact, only cost is in copying memory (when adding a new bucket).


Reading data

    As mentioned above every used shared memory bucket has pointer to logically next bucket. Every "consumer" that connects to shared memory, can read "head" pointer (to first used bucket) and then pass whole list till NULL... But - little nuance is hiding here. When we was talking about multi-accessibility we were saying that pointer to beginning of shared memory block can have different values in individual processes.
    Let's assume that we have two buckets in shared memory. One bucket have pointer called "next" and pointing it to second bucket. We have two independent processes which have pointers to first bucket. First process is "writer" and it created both buckets (especially it sets value of "next" pointer in first bucket). Second process is "reader" and it is trying to read buckets data from shared memory. If it tries to read first bucket - everything will be OK. But then, it tries to read data from place pointed by "next" pointer of first bucket and it gets segmentation fault. This is because both processes have different address spaces assigned by system. The way getting appropriate pointers from "reader" process is count difference between "writer" and "reader" pointer to first bucket values and then add that difference to every pointer value. That's why we have "producer_mem_point" pointer stored in SHMEM header.


Error immunity

    Taking into consideration, that our data structure is multi-accessible (means that many uncommon processes can operate on one portion of data at the same time) we have to prevent deadlocks, races and starvation. Normally we would use semaphores or any other synchronization techniques, but in this case it isn't necessary (or rather plain unwanted). Let's review how our shared memory is used:

  • one memory producer with permission to write (having all "helper" structures described above), doing random memory modifications,
  • many (in netraf project only two) readers, "walking" through list,

Please notice, that efficiency of netrafd daemon strongly depends on shared memory structure random access speed. If we were using semaphore for synchronization purposes, it could happen that one of "reading process" closes semaphore and starts reading. During that time, system's planner can assign CPU time for "writing process". If there were some ethernet packets waiting for processing, "writing process" will try to close semaphore and will hung. What's more - there could be more "reading processes" waiting, so during that time "writing process" loose many packets (I'm omitting starving situation - IPC semaphores implementation assures that processes waiting in queue to close semaphore will be chosen randomly). It is unacceptable. Our shared data structure must allow "writing process" doing it's job immediately in every time quantum. This entails following problems: what happens if during reading "writing process" changes some data (maybe several times)? After a moment of thinking, we can specify following situations:

  • just after reading, it could be more data buckets in memory than "reading process" read,
  • it could be less data buckets in memory than "reading process" read,
  • number of buckets read and that stored in shared memory structure are equal, but data differs,

Solution of above problems isn't easy in general situation. But in our case it is simple; we must remember, that "reading processes" are renewing their work periodically (in particular netrafg renewing reading process form scratch about 10 times per second). So... we've decided that we can put "used_buckets" variable into SHMEM header structure and apply comparing rule to every reader: depending on application of reader (some readers can perform reading often - such as netrafg, some of them can read memory relatively rarely - such as netrafd) there will be some "epsilon" value. If difference between number of read buckets and "used_buckets" value (as for absolute value) will be greater than epsilon, all read buckets are treated as "false" and reading process is started immediately.

M.S.

back to "Theory" section


Copyright © 2005, M.K., T.J., M.S.
netraf is Open Source software, distributed under the terms of the New BSD License.
waldson.com activity involved