An Example Design of a Distributed Bank Application The objective is to design and implement a system that maintains consistency among cached copies of data items in several concurrently running processes on different processors. There are n clients and one server. The server maintains a database of accounts. Each record has a name, account number, and balance. Each record will be referred to as an object of the database. For each object, the server will keep track of which client has that object and what he is doing with that object (reading or writing). For this project, each transaction does a write/update. I assume that strict data coherence is demanded. The trade-off is performance. I will have one centralized server with a synchronized critical section that controls clients' access to the database. The advantage is that data will be kept up-to-date, while the disadvantage is that there could be a bottleneck at the point of centralization. Also, the server will have to collect and maintain information. The reason is that we need to prevent two or more clients from intefering with eachother (accessing the same object at the same time). Clients ask the server for data records which they want to modify. One way to do this would be: before giving data to the client, the server locks the data so no one else can access it while it is being updated. This will ensure that the server's data is consistent at all times. However, concurrency will be overly restricted. Clients should be allowed to work on different objects at the same time. This can be implemented as follows: the server maintains a lock table to keep track of what clients are doing with what objects. Upon receiving a request, the server checks the lock table to see if anyone else has a write lock; if so, then the client waits; otherwise the server grants this client a write lock and ships the data record. When the client sends back the updated data, the server demotes the client's lock to a cache lock, and then performs a lock of its own, updates its own data, and unlocks to ensure the consistency of the database at the server level. Management of locks only needs to be done at the server level. Once a client is given the data, and implicitly the permission to change it, another problem needs to be addressed: cache coherence. Each client has a cache, a local array smaller than the size of the database (let's say one-tenth), in which it can store records retrieved from the server. A cache is coherent only if the data that it contains is an exact replica of the data in the storage system that it replicates. A cache may lose coherence as a result of an update/write operation (which is the only type of transaction in this project). For each write operation performed by a client, the local client cache is checked first and if the data record needed is there, then it is updated in the cache and the block is transferred to the server; this is called write-through. This will ensure that the next time the data record is needed, we won't be working with an old copy. The server will need to listen for messages from the client that say, "Here is new (updated) data"; upon receiving this message, the server, as mentioned earlier, will lock its data while updating it. We need to address the following problem: suppose the server gets updated data from a client and updates its centralized database, but now another client needs that data but already has it in its cache. That copy in the cache is old. To prevent this situation, we could have the following: every time a server updates its database, it broadcasts a message to all the clients, letting them know about the change. However, many of these clients may not be interested in that record, so I would like to do the following: let the server check its lock table, and only inform those clients that have cache locks on the modified record. (A "cache-lock on object X" means that X is in the client's cache.) Now there are two options: 1) write-update and 2) write-invalidate. In write-update, the server says, "Here is the new value" then the client updates its cache. In write-invalidate, the server says, "Your data's value is old/invalid," removes the client's cache lock, and the client deletes the data record from its cache. I will use write-update because the record size is small (only 3 fields) and so that if a client soon wants that record again, its up-to-date value will be in the cache, thus saving an extra access to the server. With regard to cache replacement policy, I would like to implement LRU; when the cache is full, replace the record that was least recently used. To maintain the scheme described above, upon removing a record from its cache, the client sends a message to the server, so that the server can remove that client's cache-lock on that object. The client is saying, "I don't need this item, so don't bother me if/when its value changes in the future." We have now ensured data consistency and cache coherence at both the clients and the server. To summarize the types of messages being passed in this system: client: "I want to write object X." server: "Here is object X." client: "Here is the new value of object X." server: "X has been changed; update your cache." client: "I don't need X anymore." I will use different tags to differentiate between the different types of messages. There will be a client function executed by each client, and a server function executed by the server. The server can be single-threaded or multi-threaded. If the server is single-threaded, it would be easy to implement, easy to guarantee ordered entry to critical section (using a queue), and easy to guarantee no starvation. However, concurrency will be constrained; if two clients want to retrieve different objects at the same time, they should be allowed to. Therefore, I would like to have a multi-threaded server, with one thread per client; each thread handles the requests of one client as outlined below... client i -------- loop Decide whether to make a request during this time interval. Let R be a random number representing a record in database. Generate request to write record R. Check cache for record R. Is it there? Yes - update cache, write through to server No - send write-request to server for record R receive record R (and permission to write) Is cache full? Yes - throw out a record (LRU policy), inform server to remove cache-lock No - put R in cache perform the transaction update cache, write through to server If there is a message from the server: "Here is new value of R" then update cache end loop server thread i --------------- loop get message from client i if message is client i: "I want to write object R." write_lock on record R check lock table. Does another client have a write lock? No - give write lock to client i by putting 'W' in the lock table (row i, column r). send data to client i Yes - client i has to wait; recheck lock table write_unlock on record R else if message is client i: "Here is the new value of object R." write_lock on record R update record R, demote client i's write-lock to cache-lock send R to all clients with cache-lock on R write_unlock on record R else if message is client i: "I don't need R anymore." write_lock on record R remove client i's cache-lock on R write_unlock on record R end loop Note: The lock table is just a two-dimensional matrix that indicates which clients have write-locks or cache-locks on which records. When someone gets a write_lock on record R, it prevents anyone else from accessing record R either from the database or the lock table until that write-lock is released (unlocked). Also, we are guaranteed no deadlock because any process that locks a record will unlock it before trying to lock a different record. One problem with my design is that there is going to be busy-waiting. In a real database system, we would want to further maximize concurrency. Also, in a real database system, we would have to allow concurrent reads, whereas the only transactions in this project are writes.