For my ViperVM project, I want to be able to manage memory at a very low-level (pinning pages into RAM, using custom allocator instead of malloc, allocating huge pages, setting execution flag, etc.). However using system calls through the legacy libc interface from Haskell is not appealing and imposes some constraints because the libc is unnecessarily stateful (cf errno). In this post I present the module I am working on that performs system calls directly.
I am aware that as we bypass the libc, we lose a lot of portability, at least until someone does the same work for other architectures and OS. Currently only Linux on x86-64 is beginning to be supported. Anyway, we have to start somewhere in order to build user space applications on safer software layers.
A few decades ago, computer programs were written specifically for each architecture and applications had to deal with devices explicitly as there were only a few of them (remember sound card configuration in old games where the whole list of existing/supported sound cards were listed). Nevertheless, the BIOS has been used to offer a standard interface for the most common ones, for instance to manage the display (cf VESA), keyboards, etc. To call a “function” provided by the BIOS, programs had to put appropriate parameters into registers and use an instruction triggering a software interruption. The appropriate BIOS interrupt handler then performed the requested operation and returned some values into registers to indicate results to the application. Operating systems such as DOS provided custom interrupt handlers to add more high-level functionalities (access to the file system, etc.). Modern operating systems still use the same mechanism. OS and their hardware drivers are now used by applications instead of BIOS but the same “trap” mechanism is used, either with software interruptions or explicit syscall instruction (cf x86-64).
Most applications do not call syscalls explicitly but use the wrappers provided by libc implementations alongside other “standard” libraries (libpthread, etc.). These standard libraries have been designed a long time ago and still bear the stigma of the long gone time when multicore computers were not widespread.
I can’t get errno satisfaction
A good example to show how bad it can get is error management in libc: a global variable “errno” is used to indicate if the libc call executed last has produced an error. This approach is bad in at least three ways: (1) it has weak multithreading support, (2) it does not encourage systematic error checking, (3) it produces additional overhead (memory access).
- To support multithreading, that is several threads calling into the libc concurrently, and to avoid mixing up “errno” values for different calls, the global variable is in fact aliased and stored in each thread local storage (TLS). User-space threads (also called green threads) implementations also have to deal with this mandatory global variable at the core of all Posix C programs. For instance, if we look into the sources of the runtime system for GHC (Glasgow Haskell Compiler), we can see that errno has special treatment. In “schedule” function in rts/Schedule.c we even have the following comment:
// And save the current errno in this thread. // XXX: possibly bogus for SMP because this thread might already // be running again, see code below.
- Checking for errors returned by libc functions is cumbersome because error codes are not directly returned by functions. In general, if a libc function returns -1 or NULL, then it signals that an error occured whose code is stored in “errno”. However this is not an unbreakable rule. From the manual of “errno”:
“For some system calls and library functions (e.g., getpriority(2)), -1 is a valid return on success. In such cases, a successful return can be distinguished from an error return by setting errno to zero before the call, and then, if the call returns a status that indicates that an error may have occurred, checking to see if errno has a nonzero value.”
A language is not considered good for what it allows but for what it makes easy to do. C and its standard library make error checking laborious and error-prone.
- “errno” is stored in memory which is notoriously slower than registers (even if it is presumably cached and maybe even forwarded in the instruction pipeline). It makes no sense for such a volatile value that is used once before being discarded.
Note: the whole “errno” issue is very similar to the flag register EFLAG in x86 architectures containing carry-flag, overflow-flag, etc. This register is likely to be aliased to support out-of-order or concurrent instruction executions just like “errno”. Notice how in Itanium ISA (IA-64), Intel avoided this issue by using several flag registers (“predicate” registers).
What can we do about it?
Exceptions aside, the most common way to indicate that an error has occured during the execution of a function is to use a sum type. In Haskell we can use the following types:
data Maybe a = Nothing | Just a data Either a b = Left a | Right b
In particular, if the function only indicates if it fails or succeeds without returning a value, we will use “Maybe Error” as a return type: Nothing will indicate that no error occured, “Just err” will indicate that the error “err” occured. For instance, we could have this prototype for “close” system call (used to close a file descriptor):
close :: FileDescriptor -> IO (Maybe Error)
Similarily, we can use Either sum type to return a value or an error. By convention we use “Right” constructor to return the “right” value (i.e. not an error) and “Left” constructor to return an error. So the return type used by most system calls will be “Either Error a” where “a” depends on the syscall return value type. For instance, for “pipe” syscall that returns the two ends (file descriptors) of a newly created channel:
pipe :: IO (Either Error (FileDescriptor,FileDescriptor))
Wrapping system calls using FFI
Now that we know how we want to access system calls, we have to wrap them into functions with appropriate prototypes. My first idea was to wrap “syscall” libc function that allows to call any system call. However the only thing it does is what we want to avoid: setting errno…
So we have to use the (not so) hard way, that is to use the OS syscall convention directly. With Linux on x86-64 architectures, the syscall number is to be found in RAX register and parameters (up to 6) in RDI, RSI, RDX, R10, R8 and R9 registers (see: man 2 syscall).
The first approach is to write a function that converts between the convention used by C and the syscall convention. Using Haskell FFI, we can easily call the C function.
Wrapping system calls using PrimOps
In order to bypass the FFI that is known to be a little bit slow, we can directly use the calling convention used internally by GHC and provide a new “primop” for system calls.
Internally, GHC uses the following (virtual) registers for the STG calling convention: Base, Sp, Hp, R1, R2, R3, R4, R5, R6 and SpLim. Function parameters (up to 6) are passed in R1-R6 virtual registers and if there are more than 6 parameters, they are stored in the stack accessible through Sp (“stack pointer”). The first additional parameter is stored in Sp and so on upwards for other parameters. There is no return convention because every function performs a tail-call (a jump) to the next one whose address is stored on the top of the stack (Sp).
On x86-64 architectures, virtual registers are respectively mapped on R13, RBP, R12, RBX, R14, RSI, RDI, R8, R9 and R15. After the syscall, we need to put RAX value (the value returned by the syscall) into RBX and to jump to SP, that is [RBP].
I have implemented the two approaches in my ViperVM package. The most interesting files for the subject at stake are :
The other files in ViperVM.Arch.X86_64.Linux.* are syscalls wrapped as presented in introduction (returning either Maybe or Either). At the time of writing, I still have a lot of them to wrap.
This implementation is only for Linux on x86-64 architectures. We could easily do the same thing for other architectures and OS. It may allow us to define a more up to date common interface to operating systems (instead of Posix one). In addition, we are no longer tied to C language and C libraries for the interaction with the OS. This is one step towards a full Haskell software stack.
It could be fun and useful to support inline assembly directly into Haskell files. Even more if custom calling conventions could be defined so that Haskell compiler register allocator can optimize register use and so that we could delete the useless calling convention convertion functions written in assembly language.