Autobiography
Nikita Danilov received PhD from Moscow Institute of Physics and Technology. His 1996 doctoral thesis was in Computer Science.
While a student he worked as a software developer for (now defunct) ACT Financial Systems Ltd. In this capacity he participated in developing portable GUI front-end for {OPENLimits} -- risk management and analysis system.
In 1994 he joined Server Ltd. to design and develop a series of large client-server systems for governments. These were usually built over databases (Informix or ORACLE) functioning as backends with front-ends written with the help of RAD tools and 4GLs. This definitely was not bleeding research edge, but in such systems value of modularization, reusable designs, and clean interfaces, as well as paramount importance of simplicity are very clear.
In 1997, still employed by Server Ltd., he worked as a consultant for Charles Schwab \& Co., Inc., Capital Market & Trading Group, Jersey City, NJ. His task was to develop high performance server for some proprietary trading protocol. This project had all sorts of interesting stuff from message parsing to load balancing in a thread pool.
At this time or slightly earlier he became interested in Linux kernel. In April 2000 Nikita joined Namesys to work on reiserfs file system
. Reiserfs designed from scratch file system for Linux that uses balanced trees to efficiently store all data and meta-data. He worked on maintenance and bug fixing for reiserfs v3 and then became involved into design and implementation of reiser4 -- next version of reiserfs. Reiser4, while probably among the most complex subsystems of the Linux kernel provides a lot of novel functionality that hopefully justifies its complexity.
Subject
Reiser4 - A Tree Based File System
Nikita Y. Danilov Naming System Venture (Namesys)
Reiser4 is a new version of the Reiserfs file system. Reiser4 is a complete re-write, but it shares the main architectural properties of Reiserfs.
The most unusual and striking design decision, shared by Reiserfs and Reiser4 is that single global balanced tree is used to store all file system data and meta-data. The balanced tree allows one to store and later retrieve pieces of information ( items ), each identified by some unique identifier ( key ).
Each file system object (file or directory) is represented as a set of items. Each item has a specific key and is stored in the balanced tree.
Keeping all file system in the tree has following advantages over traditional organization of UNIX file system: efficient handling of small and very small files, and support for large directories.
Reiser4 has some new features:
- plugin infrastructure: Reiser4 is organized as a set of modules (called plugins ) communicating through a well-defined set of internal interfaces.
- wandering logs: Reiser4 uses this novel technique to reduce the overhead associated with journalling.
- allocate on flush: blocks numbers are assigned to tree nodes (including unformatted nodes containing raw user data) not when node is allocated but at later time; this decreases fragmentation and produces a more optimal on-disk layout of the file system.
- scalability: Reiser4 is designed to scale up to many CPUs without much overhead.
There basically are two types of operations performed on a balanced tree: lookups, and modifications that update or insert item with given key.
Reiser4 implements a set of tree operations and a method to control their concurrency. We implemented bottom-to-top tree balancing with:
- per-node read-write locking
- prioritized locking with deadlock-detection protocol
- non-recursive balancing ( carry algorithm).
This talk will focus on the balancing algorithms, with some mention of other features.
|