A distinct feature of Unix and Unix-like platforms is the idea that “everything is a
file”. This means that filesystems are used for more than structuring data on storage
devices. For example, hardware information on Linux cannot be queried (by any userspace
programs) programmatically – instead, the /sys
directory provides this information, and
needs to be queried by iterating through and reading its contents.
I believe that using filesystems as interfaces in this way is a bad idea. Although this paradigm is likely easier to implement, it comes with a host of issues (in particular, incompatibility with locales, the obfuscation of meaning, and inefficiency) which outweigh its limited benefits.
Locales
Filesystems are inherently human-oriented. Because we use a variety of locales (languages, currencies, formats, etc.), and often directly interact with filesystems, they have to be locale-independent. However, using a filesystem as an interface necessitates the declaration of specific names for files and directories. Programs using such an interface will look for these entries by their exact names, and as they are a part of the interface, they use the locale the inteface was designed in (usually English). This locale might be different from that of the user of the system.
For example, on Linux, the /dev
filesystem is required for many core operations (such
as for showing and using terminals such as /dev/tty1
). Programs expect that /dev
exists, and often query for entries within it using hardcoded paths (e.g. /dev/stdout
,
which is almost universally known). But note that the very name “dev” is specific to
Latin-script locales: a Linux user who speaks Mandarin, for example, cannot use this
interface directly because it was not designed for them. This defeats the primary
intention of a filesystem, which is to offer an easily-understood view of a computer’s
contents to humans.
Guarantees and Meaning
Filesystems provide no guarantees or information about the contents of their entries (beyond basic file type information and permissions control). This means that a file contained within a filesystem cannot be guaranteed to have a fixed format or structure. However, interfaces are built upon such guarantees. When a filesystem is used as an interface, these guarantees have to be translated into restrictions upon file and directory contents. But because filesystems were not designed for such use, these restrictions and guarantees cannot be expressed explicitly in the filesystem, and must instead be implicitly communicated and understood.
Furthermore, information about the meaning of a file or directory cannot be expressed directly in the filesystem. As such, using filesystems as interfaces prevents expressing the meaning of parts of the interface directly.
For example, “thermal zones” detected by Linux are exposed in the
/sys/class/thermal/thermal_zone*
directories. These devices provide temperature
readings through files named temp
. Here is a sample reading from a thermal zone on my
laptop:
$ cat /sys/class/thermal/thermal_zone*/temp | head -n 1
31800
What does 31800
mean? It might not be obvious from an immediate glance, but it is
measuring milli-Celsius, with 1 fractional digit of precision. There are several issues
with this interface: the meaning of the contents of this file are not clearly expressed
(making it difficult to determine e.g. units), and guarantees about these contents are
not explicit either (such as the fact that it will be a decimal integer). Programs
using this interface are forced to implicitly assume the meaning of these contents, and
are unable to make use of the guarantees provided by the interface, and so have to
consider situations (e.g. that the contents of a temp
file do not represent an
integer) that cannot occur.
Furthermore, programs (and programmers) will assume simple meanings and guarantees on the basis of the names of entries in the interface. For example, one may assume that (in any reasonable environment) temperature readings are positive – however, disfunctional thermal zones often output negative readings, which are not well-defined or expected and so can introduce subtle bugs (or worse) into unsuspecting programs. These interfaces are not well-documented either, making it even more difficult to correctly use (and implement) them.
Filesystems as interfaces are simply not capable of expressing the necessary degree of meaning and guarantees to be used easily and correctly.
Efficiency
Using filesystems as interfaces is simply more inefficient, because they try to make the primary method of access to the interface (partially) human-oriented, even though they are most used by programs, not humans.
On Linux, this is most clearly visible in stringification. Much of the data
communicated in /sys
is made human-readable. For example, the temperature readings
from the previous example were provided as human-readable text, rather than
machine-oriented byte sequences. In most cases, programs are using this data rather
than humans, and because programs require different representations (i.e. binary data
instead of ASCII text), they need to convert it back and forth.
A more general example is the use of filesystem path names. Because these names are strings, both programs using this interface and the operating system providing this interface are required to interact with strings when it was otherwise unnecessary. Compared to working with integers, strings are much more inefficient to work with. And some issues which are handled relatively well by the use of strings (such as backward-compatibility) can be addressed with integers as well, although requiring additional thought in design.
A Solution
The best solution, in my opinion, is to provide a low-level interface to the contents of
these virtual filesystems using new system calls. An approach similar to io_uring
,
where multiple commands are stored in memory and executed by the kernel from there,
seems to be quite fitting. This would allow new interfaces to be added by adding to the
memory layout of the commands rather than changing system calls. It would also allow
concurrent access to the devices easily.
This would resolve the issues detailed previously. Because this interface is not exposed to the user directly, locales and human access are no longer a concern. The interface would be easily capable of communicating the necessary meaning (e.g. by providing a trait-like system for exposing capabilities of objects) to use the system safely. Because the interface is directly involved with low-level details such as memory layouts, very efficient formats for sharing data can be used, in comparison to the textual interfaces of filesystems.
On the other hand, developing such a system would require tremendous time and effort.
It may be possible to develop it incrementally, starting off as a C library that
internally uses the filesystem-based interfaces, but exposes the newer API. Once it
becomes relatively mature, the API can be exposed through a few new system calls,
modeled off io_uring
.