The Filesystem API

Onto what you actually came for, filesystems! Before we can talk about how a filesystem works we need to understand the requirements it must satisfy. That’s where the filesystem interface shows up. All a filesystem truly does is define a series of callbacks corresponding to the filesystem interface. We can almost think of this as a kind of API that we can interact with.

This is why we can have many filesystems, anything from ext4 (the bread and butter filesystem that most of your data is probably on if you use a popular linux distro like ubuntu), to something more exotic like zfs (a powerful filesystem popular in the freeBSD community) or even something that doesn’t even provide access to data on a disk like procfs (a filesystem that reports information about running process, it should be mounted at /proc in most linux systems).

In all of these filesystems, once they are mounted (a term that means to give a filesystem a path. We’ll revisit this concept later) we can interact with them using the same interface. This means that all of those commands you learnt in the previous section can be used regardless of filesystem (with some caveats)!

We’ll refer to filesystems that don’t provide access to actual disk data as virtual filesystems. Note that in general, the term virtual filesystem refers to any filesystem that provides an interface between the kernel and the user. Our usage of the term is in some ways broader (our virtual filesystems aren’t necessarily going to be connected to the kernel) and in some ways more restrictive (technically even filesystems like ext4 and zfs are virtual since they use the kernel to manage the actual disk I/O).

To get a list of functions that a filesystem must implement I like to look at the FUSE documentation. FUSE (which stands for Filesystems in USErspace) is a library that makes it easy to write filesystems without having to write kernel code. A filesystem implemented with FUSE must register itself by filling out the fuse_operations struct. We can find the documentation for this struct at this link: https://libfuse.github.io/doxygen/structfuse__operations.html#abac8718cdfc1ee273a44831a27393419. Below is an excerpt from the documentation highlighting a few relevant fields of the struct.

int(* 	getattr )(const char *, struct stat *, struct fuse_file_info *fi)
int(* 	mkdir )(const char *, mode_t)
int(* 	unlink )(const char *)
int(* 	rename )(const char *, const char *, unsigned int flags)
int(* 	link )(const char *, const char *)
int(* 	chmod )(const char *, mode_t, struct fuse_file_info *fi)
int(* 	chown )(const char *, uid_t, gid_t, struct fuse_file_info *fi)
int(* 	truncate )(const char *, off_t, struct fuse_file_info *fi)
int(* 	open )(const char *, struct fuse_file_info *)
int(* 	read )(const char *, char *, size_t, off_t, struct fuse_file_info *)
int(* 	write )(const char *, const char *, size_t, off_t, struct fuse_file_info *)
int(* 	readdir )(const char *, void *, fuse_fill_dir_t, off_t, struct fuse_file_info *, enum fuse_readdir_flags)
int(* 	access )(const char *, int)
int(* 	create )(const char *, mode_t, struct fuse_file_info *)
int(* 	ioctl )(const char *, unsigned int cmd, void *arg, struct fuse_file_info *, unsigned int flags, void *data)

There’s many other callbacks that can be registered, but a filesystem doesn’t necessarily need to succesfully handle every callback. In the case of FUSE, any unimplemented callbacks will all return -1 regardless of their inputs and set the errno to some generic value.

Implementing a virtual filesystem

To further understand how filesystems work, let’s implement a virtual filesystem! We’ll implement a very simple filesystem, a subset of the features provided by devfs (a filesystem used to managed devices, usually mounted at /dev). The features we’ll be implemented will be those provided by the special file /dev/zero. /dev/zero is a file that returns an infinite stream of 0s (the byte value, not the ascii character) when read from and ignores all data written to it.

In our filesystem we will have a single file /zero that will follow the requirements of /dev/zero specified above. To implement this, we’ll be writing some javascript code to be loaded into our simulator. Note that we’ll provide code that takes care of the nitty gritty details. You can focus on just implementing the core read/write logic.

Note that both read and write take in a (for now) opaque file descriptor object and a buffer of type Uint8Array (essentially char[] if you’re a c person). Conventionally, read and write respectively return the number of bytes read or written and -1 on error (since we’re using javascript we’ll return errors as strings for greater expresiveness).

Once you’ve written your code, press run below to mount your file system at /dev.

If you’re stuck check out these hints below:

Click to expand/hide hints

When someone tries to write to the file we’ll return the size of the buffer (try the .length property) to indicate that we successfully consumed the data, but we won’t actually do anything with the data itself.
When someone tries to read to the file we’ll zero out the contents of the buffer (try the .fill method).
Need to debug with print statements? Try console.log("hello world!"). Right click the page, click on the console tab and you’ll be able to see the output of console.log when you run the code and interact with the filesystem via the shell.
Still stuck? Click here to show the solution.
```
 
```

// populates MyVFS with generic error functions for all callbacks
class MyVFS extends DefaultFS{};

// implement a few other callbacks like readdir
// -- snip --

MyVFS.prototype.read = function (fd, buffer) {
    if (fd.path != "/zero")
        return "ENOENT";
MYVFS_READ
};

MyVFS.prototype.write = function (fd, buffer) {
    if (fd.path != "/zero")
        return "ENOENT";
MYVFS_WRITE
};

Try commands such as echo hi > /dev/zero and hexdump -c 10 /dev/zero (Shows the hex values of the first 10 bytes of the file - change 10 to something else to see a different number of bytes).

Do not use cat to read the file, as cat will continue reading until the file indicates that it’s reached the end of the file (EOF), by return 0. Since our implementation of read should never return 0, the file never ends and cat will hang as our shell simulator models an operating system that can have at most 2 running processes at any given time - the shell/kernel and at most one other process - and has no support for pre-emption.

In our simiulator the filesystem acts similar to how a filesystem in the kernel would present itself (a series of functions that can be called from the kernel), which is different from the way FUSE works (a series of functions that can are run in the context of a running filesystem process via IPC).

Aside from /dev/zero there’s also a few other interesting virtual files. In particular, take a look at /dev/stdin//dev/stdout//dev/stderr. These files provide access to stdin, stdout, and stderr respectively. Underneath the hood devfs presents these files as symlinks to the current process’s file descriptors via procfs. (You can check this with file /dev/stdin).

In our shell simulator above, run ls -a. You’ll see an entry .shellfs. In the simiulator .shellfs/stdin, .shellfs/stdout, and .shellfs/stderr provide access to the standard I/O mechanisms. Try running cat /.shellfs/stdin and confirm that the behaviour is the same as running cat. To explore the topic of managing filesystems and mounting them, see section 9.

Read/Write offsets

You might have noticed in the FUSE function prototypes above that the read and write functions take in a off_t paramemter. off_t is usually a synonym of size_t and defines an offset at which to read or write from. In the case of FUSE, the FUSE library tracks the offset for it’s internal representation of a file descriptor and will add the result of read and write calls (when successfull) to the offset. This means that if you read 1 byte, then perform another read of 1 byte, the first read will give you the first byte of the file and the second read will give you the second byte of the file. In our javascript library, the offset needs to be managed by the filesystem itself (this is also true for filesystems implemented in the kernel, where the off_t parameter is usually a off_t* parameter instead). To accomplish this, you can use the fd.offset field and increment it by the number of bytes read/written.

We’ll discuss this more when we talk about reading and writing from disks in section 6.