In this assignment, you are expected to design and implement a simple ﬁle system (SFS) that can be mounted by the user under a directory in the users machine. This assignment along with the supporting ﬁles we will provide you will only be functional on Linux. Make sure you don’t waste time coding it on a macOS and trying to run it on Linux later.
2 Some speciﬁcations to simplify your life
The SFS introduces many limitations such as restricted ﬁlename lengths, no user concept, no protection among ﬁles, no support for concurrent access, etc. You could introduce additional restrictions in your design. However, such restrictions should be reasonable to not oversimplify the implementation and should be documented in your submission. Even with the said restrictions, the ﬁle system you are implementing is highly useable for embedded applications. The following are the restrictions you will put on your code (this will also help you in coding and simplify your implmentation) : • Max ﬁle name of 16 characters. • Max extension name of 3 characters. So in total your ﬁle name can have 20 characters (16 characters for name, 1 character for the ”.” period, and ﬁnally 3 characters for the extension such as ”txt”) • Single level directory, so there will not be subdirectories (only a single root directory) • You will implement an inode system • You will implement bit map to keep track of free data blocks and ﬁnd a free data block before increasing the size of a ﬁle • The size of one data block is 1024 bytes • The maximum number of data blocks on the disk is 1024
3 About the supporting ﬁles
We will provide you with supporting ﬁles that include the code structure of solving the assignment. You will edit the C ﬁles we give you. This is because will also provide you with test ﬁles to test your code. The test code will call functions inside your C ﬁle, so please don’t change the main function names. You are free to implement as many helper functions and other C ﬁles as you wish (just make sure you edit your Makeﬁle to account for your C ﬁles). In fact, through the test ﬁles, you will be able to get a sense of what your ﬁnal grade for this assignment will be!
3.1 Emulated disk
Your ﬁle system is implemented over an emulated disk system, which is provided to you. The following is a schematic that illustrates the overall concept of the mountable simple ﬁle system.
The gray colored modules in the above schematic are provided by the Linux OS. The blue colored modules will be given to you. You are expected to implement the yellow colored module.
The disk emulator given to you provides a constant-cost disk (CCdisk). This CCdisk is a ”virtual” disk space that is separate from your computer disk space. It can be considered as an array of sectors (blocks of ﬁxed size). You can randomly access any given sector for reading or writing. The CCdisk is implemented as a ﬁle on the actual ﬁle system. Therefore, the data you store in the CCdisk is persistent across program invocations.
To mimic the real disk, the CCdisk is divided into sectors of ﬁxed size. For example, we can split the space into 1024 byte sectors. The number of sectors times the size of a sector gives the total size of the disk. In addition to holding the actual ﬁle and directory data, we need to store auxiliary data (meta data) that describes the ﬁles and directories in the disk (refer to File attributes section in the lecture notes). The structure and number of bytes spent on meta data storage depends on the ﬁle system design, which is the concern in this assignment.
3.2 API for the assignment
As mentioned earlier, the test ﬁles we will give you will be calling your C ﬁle functions. This means your C ﬁle will be an API for the test ﬁles. We will give you the API function names and you will have to ﬁll them. The following are the API functions you will implement:
0 void mksfs(int fresh); // creates the file system 1 int sfs_getnextfilename(char *fname); // get the name of the next file in directory 2 int sfs_getfilesize(const char* path); // get the size of the given file 3 int sfs_fopen(char *name); // opens the given file 4 void sfs_fclose(int fileID); // closes the given file 5 void sfs_fwrite(int fileID, char *buf, int length); // write buf characters into disk 6 void sfs_fread(int fileID, char *buf, int length); // read characters from disk into buffer 7 void sfs_fseek(int fileID, int loc); // seek to the location from beginning 8 int sfs_remove(char *file);// removes a file from the filesystem Listing 1: Function names for your C code(you will implement this)
The following table descibes how these functions are supposed to behave
Function Explanation mksfs(int fresh) Formats the virtual disk implemented by the disk emulator and creates an instance of the simple ﬁle system on top of it. The mksfs() has a fresh ﬂag to signal that the ﬁle system should be created from scratch. If ﬂag is false, the ﬁle system is opened from the disk (i.e., we assume that a valid ﬁle system is already there in the ﬁle system). The support for persistence is important so you can reuse existing data or create a new ﬁle system. sfs getnextﬁlename(char *fname) Copies the name of the next ﬁle in the directory into fname and returns non zero if there is a new ﬁle. Once all the ﬁles have been returned, this function returns 0. So, you should be able to use this function to loop through the directory. In implementing this function, you need to ensure that the function remembers the current position in the directory at each call. Remember we have a single-level directory. sfs getﬁlesize(char *path) Returns the size of a given ﬁle. sfs fopen(char *name) Opens a ﬁle and returns the index that corresponds to the newly opened ﬁle in the ﬁle descriptor table. If the ﬁle does not exist, it creates a new ﬁle and sets its size to 0. If the ﬁle does exists, the ﬁle is opened in append mode (i.e., set the ﬁle pointer to the end of the ﬁle). sfs fclose(int ﬁleID) Closes a ﬁle. This means you will remove the entry from the open ﬁle descriptor table. sfs fwrite(int ﬁleID, char *buf, int length) Writes the given number of bytes of buﬀered data in buf into the open ﬁle, starting from the current ﬁle pointer. This in eﬀect could increase the size of the ﬁle by the given number of bytes (it may not increase the ﬁle size by the number of bytes written if the write pointer is located at a location other than the end of the ﬁle). sfs fread(int ﬁleID, char *buf, int length) Reads bytes of data into buf starting from the current ﬁle pointer. sfs fseek(int ﬁleID, int loc) Moves the read/write pointer (a single pointer in SFS) to the given location. sfs remove(char *ﬁle) Removes the ﬁle from the directory entry, releases the ﬁle allocation table entries and releases the data blocks used by the ﬁle, so that they can be used by new ﬁles in the future.
A ﬁle system is somewhat diﬀerent from other OS components because it maintains data structures in memory as well as disk. The disk data structures are important to manage the space in disk and allocate and de-allocate the disk space. Also, the disk data structures indicate where a ﬁle is allocated. This information is necessary to access the ﬁle. The following is what each function will be doing
4 About inode
On-disk data structures of the ﬁle system include a super block, the root directory, free block list, and inode table. The ﬁgure below shows a schematic of the on-disk organization of SFS.
Figure 1: superblock
The super block deﬁnes the ﬁle system geometry. It is also the ﬁrst block in SFS. So the super block needs to have some form of identiﬁcation to inform the program what type of ﬁle system format is followed for storing the data. The ﬁgure below shows the proposed structure for the super block. We expect your ﬁle system to implement these features, but some modiﬁcations are acceptable provided they are well documented. Each ﬁeld in the ﬁgure is 4 bytes long. For instance, the magic number ﬁeld is 4 bytes long. With a 1024-byte long block , we can see that there will plenty of unused space in the super block.
Figure 2: Super block structure
A ﬁle or directory in SFS is deﬁned by an inode. Remember we simpliﬁed the SFS by just having a single root directory (no subdirectories). This root directory is pointed to by an inode, which is pointed to by the super block. The inode structure we use here is slightly simpliﬁed too. It does not have the double and triple indirect pointers. It has direct and single indirect pointers. With the inode all the meta information (size, mode, ownership) can be associated with the inode. So, the directory entry can be pretty simple. The ﬁgure below shows the simpliﬁed inode structure.
Figure 3: inode structure
We are suggesting the inode structure shown above to maintain a similarity to the UNIX ﬁle system. However, the simpliﬁcation made to the SFS inode makes it impossible to read or write the SFS using UNIX software or vice-versa.
The directory is a mapping table to convert the ﬁle name to the inode. Remember a ﬁle name can have an extension too. You limit the extension to 3 characters max. A directory entry is a structure that contains two ﬁelds (at least): inode and ﬁle name. You could add other ﬁelds (if you ﬁnd necessary). Remember the inode also has some attributes such as mode, etc. Depending on the number of entries you have in the directory, the directory could be spanning across multiple blocks in the disk. The inode pointing to the root directory is stored in the super block so we know how to access the root directory. We assume that the SFS root directory would not grow larger than the max ﬁle size we could accommodate in SFS.
In addition to the on-disk data structures, we need a set of in-memory data structures to implement the ﬁle system. The in-memory data structures improve the performance of the ﬁle system by caching the on disk information in memory. Two data structures should be used in this assignment: directory table and inode cache. The directory table keeps a copy of the directory block in memory. Dont make the simpliﬁcation of limiting the root directory to a single block (this would severely restrict the size of the disk by limiting the number of ﬁles in disk). Instead, you could either read the whole directory into the memory or have a cache for the currently used directory block. The later one could be hard to get right.
When you want to create, delete, read, or write a ﬁle, ﬁrst operation is to ﬁnd the appropriate directory entry. Therefore, directory table is a highly accessed data structure and is a good candidate to keep in memory. Another data structure to cache in the memory is the free block list. In your assignment you will use bit map.
The ﬁgure below shows the in-memory data structure and how it connects to the other components. We need at least a table that combines the open ﬁle descriptor tables (the per-process one and system-wide one) in a UNIX-like operating system. We simplify the situation because we assume that only one process is accessing a ﬁle at any given time.
Figure 4: In-memory data structure
When a ﬁle is opened we create an entry in this table. The index of the newly created entry is the ﬁle descriptor that is returned by the ﬁle opening activity. That is the return value of the sfs fopen() is precisely this index.
The entry created in the ﬁle descriptor table has at least two pieces of important information: inode number and a read/write pointer. The inode number is the one that corresponds to the ﬁle. Remember just like there is an inode for the root directory, there is one inode associated with each ﬁle. When a ﬁle is opened, that inode number is recorded in this table entry. The read/write pointer is also set according to the ﬁle system operating rule. For instance, in this assignment (SFS), you are going to set the read/write pointer to the end of the ﬁle at open so that data written into the ﬁle will be appended to the ﬁle.
In SFS, sfs fseek() is a direct way of setting the read/write pointer value. The interesting problem you could be faced with is what to do when you perform a read or write after setting the read/write pointer. Speciﬁcally, if we have a single pointer then sfs fread() would also advance the write pointer, and similarly, the sfs fwrite() would advance the read pointer as well. We can simplify the complexity and let it be that way. You could opt to implement the SFS with two independent read and write pointers as well. In that case, the sfs fseek() needs to have a parameter to specify whether the read, write, or both pointers should be moved by the seek operation.
As shown in the ﬁgure above, we have in-memory data structures and on-disk data structures in a ﬁle system. The in-memory data structures are activated as soon as the ﬁle system is up and running and they are updated every time a ﬁle system operation is carried out. While designing and implementing a given ﬁle system operation you need to think of the actions that should be carried out on the in-memory and ondisk data structures. In addition to the Open File Descriptor Table, we have variety of diﬀerent caches for inodes, disk blocks and the root directory. Your design could implement all of them or some of them. File system performance is not a concern for this assignment correct operation is what we need.
5 Suggested steps for your code
5.1 Creating a ﬁle • Allocate and initialize an inode. You need to somehow remember the state of the inode table to know which and inode could be allocated for the newly created ﬁle. Simply remembering the last inode used is not correct because as you delete ﬁles, some inodes in the middle of the table will become unused and available for reuse. • Write the mapping between the inode and ﬁle name in the root directory. You could simply update the memory and disk copies. • When no disk data block allocated set the ﬁle size to 0. This can also open the ﬁle for transactions (read and write). Note that the SFS API does not have a separate create() call. So you can do this activity as part of the open() call.
5.2 Writing to a ﬁle • Allocate disk blocks (mark them as allocated in your free block list). • Modify the ﬁle’s inode to point to these blocks. • Write the data the user gives to these blocks. • Flush all modiﬁcations to disk. • Note that all writes to disk are at block sizes. If you are writing few bytes into a ﬁle, this might actually end up writing a block to next. So if you are writing to an existing ﬁle, it is important you read the last block and set the write pointer to the end of ﬁle. The bytes you want to write goes to the end of the previous bytes that are already part of the ﬁle. After you have written the bytes, you ﬂush the block to the disk.
5.3 Seek on a ﬁle • Modify the read and write pointers in memory. There is nothing to be done on disk.
6 What to submit & test ﬁles
We have given you a Makeﬁle, disk emulator (C and Header), SFS test ﬁles, and FUSE wrappers. The Makeﬁle shown below has three conﬁgurations. The ﬁrst two tests use hand coded test ﬁles to test your implementation. The last test (FUSE wrappers) is a full test that uses kernel functions to manipulate your ﬁle system.
0 CFLAGS = -c -g -Wall -std=gnu99 ‘pkg-config fuse –cflags –libs‘ 1 LDFLAGS = ‘pkg-config fuse –cflags –libs‘ 2 3 # Uncomment one of the following lines to run the corresponding scenario 4 #SOURCES= disk_emu.c sfs_api.c sfs_test.c sfs_api.h 5 #SOURCES= disk_emu.c sfs_api.c sfs_test2.c sfs_api.h 6 #SOURCES= disk_emu.c sfs_api.c fuse_wrappers.c sfs_api.h 7 8 OBJECTS=$(SOURCES:.c=.o) EXECUTABLE=First_Lastname_sfs 9 10 all: $(SOURCES) $(HEADERS) $(EXECUTABLE) 11 12 \$(EXECUTABLE): $(OBJECTS) 13 gcc $(OBJECTS) $(LDFLAGS) -o $@ 14 15 .c.o: 16 gcc $(CFLAGS) $< -o $@ 17 clean: 18 rm -rf *.o *~ $(EXECUTABLE) You will be editing sfs api.c and sfs api.h. So you will be submitting all the ﬁles we give. You are free to change the Makeﬁle however you like. This means if you want to structure your code into more C ﬁles, that is ﬁne, just remember to edit the Makeﬁle to work approprialty with your code if it needs to. 7 7 Grading Functionality Grade Illustrated design (pseudo code, UMLs, your favorite strategy). You don’t need to do this if your code is working. If your code does not pass any test, this is a must. 20% File creation, read/write working, able to retrieve previously written data, but does not pass any test. 40% Random reads and writes working with seeking, deletes working and ﬁle persistence working 50% Passes test 1 60% Passes test 2 75% Passes Fuse test 90% Memory leaks handled 95% Excellent code structure and comments (this will show us your understanding of what functions do, and how your functions work together to achieve the required functionality)