diff --git a/examples/imfs-grate/README.md b/examples/imfs-grate/README.md new file mode 100644 index 0000000..34ec05f --- /dev/null +++ b/examples/imfs-grate/README.md @@ -0,0 +1,120 @@ +## In Memory File System + +[Upstream Repository](https://github.com/stupendoussuperpowers/imfs) + +The In Memory File System (IMFS) provides a self-contained implementation of a POSIX-like FS backed by memory. It serves as a backbone that can later be integrated as a grate to sandbox any FS calls made by a cage. IMFS exposes POSIX-like APIs and maintains its own inode and file descriptor tables to provide an end-to-end FS interface. + +New implementations to IMFS are usually tested in a sandboxed manner on Linux natively, before being tested in `lind-3i` with a grate function wrapping the new functionality. + +### File System APIs + +IMFS mirrors POSIX system calls with an added `cageid` parameter. For example: + +``` +open(const char* pathname, int flags, mode_t mode) +-> +imfs_open(int cageid, const char* pathname, int flags, mode_t mode) +``` + +The behaviours of these APIs closely match those of their corresponding Linux system calls. They follow the semantics described in man pages including types, return values, and error codes. This allows easy integration of IMFS into a grate, and allows for easy testing on native environments. + +When running this module on Linux, the `cageid` parameter should be stubbed as a constant between `[0,128)`, like so: + +``` +#define CAGEID 0 + +int fd = imfs_open(CAGEID, "/testfile.txt", O_RDONLY, 0); +imfs_close(CAGEID, fd); +``` +### Utility Functions. + +In addition to POSIX APIs, IMFS also provides helper functions for moving files in and out of memory. + +- `load_file(char *path)` Load a single file into IMFS at `path`, recursively creating any required folders. + +- `dump_file(char *path, char *actual_path)` Copy IMFS file at `path` to the host filesystem at `actual_path` + +- `preloads(char *preload_files)` Copy files from host to IMFS, `preload_files` being a `:` separated list of filenames. + +These utility functions are called before executing any child cages, and after they exit. The IMFS grate is responsible for calling these to stage files into memory (`load_file`, `preloads`) and to persist results back (`dump_file`). + +In the accompanying example grate, the grate reads the environment variables `"PRELOADS"` to determine which files are meant to be staged. + +## Implementation + +### Inodes + +IMFS maintains an array of `Node` objects each of which serve as an inode to represent an FS object (file, directory, symlink, or pipe). Allocation of nodes is performed using a free-list mechanism along with a pointer that tracks the next available slot within the array. + +The structure of the node is specialized according to its type: + +- Directories contain references to child nodes. +- Symlinks maintain a pointer to the target node. +- Regular files store data in fixed-sized `Chunk`s, each of which store 1024 bytes of data. These chunks are organized as a singly linked list. + +### File Descriptors + +Each cage has its own array of `FileDesc` objects that represent a file descriptor. The file descriptors used by these FS calls are indices into this array. + +File descriptor allocation begins at index 3. The management of standard descriptors (`stdin`, `stdout`, `stderr`) are delegated to the enclosing grate. + +Descriptors are allocated using `imfs_open` or `imfs_openat`. Each file descriptor object stores: + +- A pointer to the associated node. +- The current file offset. +- Open flags + +## Building + +Build Requirements: + +- `make` +- Python3 for tests + +### Native Build + +- `make lib` to build as a library +- `make imfs` to build with the main function +- `make debug` build with debug symbols + +### Lind Integration Build + +The following compile flags are required to compile IMFS for a Lind build: + +- `-DLIB` omit the main function +- `-DDIAG` to enable diagnostic logging +- `-D_GNU_SOURCE` needed to support `SEEK_HOLE` and `SEEK_DATA` operations in `imfs_lseek()` + +## Grate Integration + +The grate implementation currently provides syscall wrappers for the following FS syscalls: + +- [`open`](https://man7.org/linux/man-pages/man2/open.2.html) +- [`close`](https://man7.org/linux/man-pages/man2/close.2.html) +- [`read`](https://man7.org/linux/man-pages/man2/read.2.html) +- [`write`](https://man7.org/linux/man-pages/man2/write.2.html) +- [`fcntl`](https://man7.org/linux/man-pages/man2/fcntl.2.html) + +## Testing + +POSIX compliance is validated through `pjdfstest`, a widely adopted test suite for file systems for both BSD and Linux file systems. The tests are executed natively on Linux, which required modifications to `pjdfstest` in order to support a persistent test runner capable of maintaining FS state in memory. + +`pdjfstest` provides a comprehensive list of assertions each designed to verify a specific FS property. This approach allows for easier detection of edge-cases. + +The test suite is invoked using: + +- `make test` run all tests +- `make test-` run all tests in a particular feature + +## Example Usage: Running `tcc` with IMFS Grate + +Check out the documentation [here](https://github.com/stupendoussuperpowers/lind-wasm/tree/ea95e1742c4c497ae7d859603869d8612f695ad7/imfs_grate). + +## Future Work + +- Currently only a handful of the most common logical branches are supported for most syscalls. For example, not all flags are supported for `open`. +- Access control is not implemented, by default all nodes are created with mode `0755` allowing for any user or group to access them. +- `mmap` is yet to be implemented. +- Performance testing for reading and writing. +- Integrating FD table management with `fdtables` crate. + diff --git a/examples/imfs-grate/build.conf b/examples/imfs-grate/build.conf new file mode 100644 index 0000000..45ae931 --- /dev/null +++ b/examples/imfs-grate/build.conf @@ -0,0 +1,3 @@ +ENTRY=imfs_grate.c +MAX_MEMORY=1570242560 +EXTRA_CFLAGS="-D_GNU_SOURCE -DLIB" diff --git a/examples/imfs-grate/compile_grate.sh b/examples/imfs-grate/compile_grate.sh new file mode 100755 index 0000000..d3c062e --- /dev/null +++ b/examples/imfs-grate/compile_grate.sh @@ -0,0 +1,74 @@ +#!/usr/bin/env bash + +# Usage: ./compile_grate.sh +# +# Builds and outputs a WebAssembly binary for lind. +# +# Expected directory structure: +# / +# ├── build.conf (Required: ENTRY, Optional: MAX_MEMORY, EXTRA_CFLAGS) +# └── src/ +# └── *.c (Source files to compile) +# +# Outputs to /output/: +# - .wasm +# - .cwasm + +set -euo pipefail + +if [[ $# -ne 1 ]]; then + echo "Usage: $0 " + exit 1 +fi + +TARGET="$1" + +# Enter the example directory +pushd "$TARGET" >/dev/null + +# Now everything is relative to the example dir +echo "[cwd] $(pwd)" + +# Load per-example config +if [[ ! -f build.conf ]]; then + echo "missing build.conf" + exit 1 +fi +source build.conf + +CLANG_DIR="${CLANG:-/home/lind/lind-wasm/clang+llvm-18.1.8-x86_64-linux-gnu-ubuntu-18.04}" +CLANG="$CLANG_DIR/bin/clang" +SYSROOT="${SYSROOT:-/home/lind/lind-wasm/src/glibc/sysroot}" +WASM_OPT="${WASM_OPT:-/home/lind/lind-wasm/tools/binaryen/bin/wasm-opt}" +WASMTIME="${WASMTIME:-/home/lind/lind-wasm/src/wasmtime/target/release/wasmtime}" + +SRC_DIR="src" +mkdir -p output +OUT="output/${ENTRY%.c}" + +MAX_MEMORY="${MAX_MEMORY:-268435456}" +EXTRA_CFLAGS="${EXTRA_CFLAGS:-}" +EXTRA_WASM_OPT="${EXTRA_WASM_OPT:-}" + +echo "[build] $OUT (max-mem=$MAX_MEMORY)" + +"$CLANG" -pthread \ + --target=wasm32-unknown-wasi \ + --sysroot "$SYSROOT" \ + -Wl,--import-memory,--export-memory,--max-memory="$MAX_MEMORY",\ +--export=__stack_pointer,--export=__stack_low,--export=pass_fptr_to_wt \ + $EXTRA_CFLAGS \ + "$SRC_DIR"/*.c \ + -g -O0 -o "$OUT.wasm" + +"$WASM_OPT" \ + --asyncify \ + --epoch-injection \ + --debuginfo \ + $EXTRA_WASM_OPT \ + "$OUT.wasm" -o "$OUT.wasm" + +"$WASMTIME" compile "$OUT.wasm" -o "$OUT.cwasm" + +# Return to original directory +popd >/dev/null diff --git a/examples/imfs-grate/src/imfs.c b/examples/imfs-grate/src/imfs.c new file mode 100644 index 0000000..587b663 --- /dev/null +++ b/examples/imfs-grate/src/imfs.c @@ -0,0 +1,1433 @@ +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "imfs.h" + +// Global state for the IMFS +struct IMFState { + Node nodes[MAX_NODES]; + int next_node; + int free_list[MAX_NODES]; + int free_list_size; +}; + +static struct IMFState g_state; + +#define g_next_node g_state.next_node +#define g_nodes g_state.nodes +#define g_free_list g_state.free_list +#define g_free_list_size g_state.free_list_size + +// Each Process (Cage) has it's own FD Table, all of which are initiated +// in memory when imfs_init() is called. Node are allocated using the use of +// g_next_node and g_free_list, as described below. +// +// This tracks "Holes" in the g_nodes table, caused by nodes that were deleted. +// When creating a new node, we check which index this free list points to and +// creates the node there. In case there are no free nodes in this list, we use +// the global g_next_node index. + +static FileDesc g_fdtable[MAX_PROCS][MAX_FDS]; + +// We use the same logic for fd allocations. +static int g_next_fd[MAX_PROCS]; +static int g_fd_free_list[MAX_PROCS][MAX_FDS]; +static int g_fd_free_list_size[MAX_PROCS]; + +static Node *g_root_node = NULL; + +// +// String Utils +// + +static size_t str_len(const char *name) { + int i = 0; + while (name[i] != '\0') { + i++; + } + return i; +} + +static char *str_rchr(const char *s, const char c) { + char *last = 0; + + while (*s != '\0') { + if (*s == (char)c) { + last = (char *)s; + } + s++; + } + + if (c == '\0') { + return (char *)s; + } + + return last; +} + +static void split_path(const char *path, int *count, + char namecomp[MAX_DEPTH][MAX_NODE_NAME]) { + *count = 0; + + int i = 0; + if (path[i] == '/') + i++; + + int current_len = 0; + while (path[i] != '\0') { + if (path[i] == '/') { + namecomp[*count][current_len] = '\0'; + (*count)++; + current_len = 0; + } else { + namecomp[*count][current_len++] = path[i]; + } + + i++; + } + namecomp[*count][current_len] = '\0'; + (*count)++; +} + +static int str_compare(const char *a, const char *b) { + int a_len = 0; + while (a[a_len] != '\0') + a_len++; + int b_len = 0; + while (b[b_len] != '\0') + b_len++; + + if (a_len != b_len) + return 0; + int i = 0, j = 0; + while (a[i] != '\0' && b[j] != '\0') { + if (a[i] != b[j]) + return 0; + i++; + j++; + } + return 1; +} + +static void str_ncopy(char *dst, const char *src, int n) { + size_t i; + for (i = 0; i < n && src[i] != '\0'; i++) { + dst[i] = src[i]; + } +} + +static void mem_cpy(void *dst, const void *src, size_t n) { + size_t i; + unsigned char *d = dst; + const unsigned char *s = src; + + for (i = 0; i < n; i++) { + d[i] = s[i]; + } +} + +// Return a buffer that contains the entire file in. Avoids having to call +// realloc over and over for preloaded files. +static char *read_full_file(char *path, size_t *out_size) { + FILE *fp = fopen(path, "rb"); + + fseek(fp, 0, SEEK_END); + long size = ftell(fp); + rewind(fp); + + char *buf = malloc(size); + + size_t read = fread(buf, 1, size, fp); + fclose(fp); + *out_size = (size_t)size; + + return buf; +} + +// +// IMFS Utils +// + +void imfs_copy_fd_tables(int srcfd, int dstfd) { + for (int i = 0; i < MAX_FDS; i++) { + g_fdtable[dstfd][i] = g_fdtable[srcfd][i]; + } +} + +static Node *imfs_create_node(const char *name, NodeType type, mode_t mode) { + if (g_free_list_size == -1 && g_next_node >= MAX_NODES) { + errno = ENOMEM; + return NULL; + } + + int node_index; + if (g_free_list_size == -1) + node_index = g_next_node++; + else + node_index = g_free_list[g_free_list_size--]; + + if (g_nodes[node_index].type != M_NON) { + errno = ENOMEM; + return NULL; + } + + g_nodes[node_index].in_use = 0; + g_nodes[node_index].doomed = 0; + g_nodes[node_index].type = type; + g_nodes[node_index].total_size = 0; + g_nodes[node_index].d_count = 0; + g_nodes[node_index].r_head = NULL; + g_nodes[node_index].r_tail = NULL; + g_nodes[node_index].parent_idx = -1; + g_nodes[node_index].mode = g_nodes[node_index].type | (mode & 0777); + g_nodes[node_index].owner = GET_UID; + g_nodes[node_index].group = GET_GID; + + clock_gettime(CLOCK_REALTIME, &g_nodes[node_index].atime); + clock_gettime(CLOCK_REALTIME, &g_nodes[node_index].btime); + clock_gettime(CLOCK_REALTIME, &g_nodes[node_index].ctime); + clock_gettime(CLOCK_REALTIME, &g_nodes[node_index].mtime); + + str_ncopy(g_nodes[node_index].name, name, MAX_NODE_NAME); + int length = str_len(name); + g_nodes[node_index].name[length] = '\0'; + + /* If directory, allocate initial children array */ + if (type == M_DIR) { + g_nodes[node_index].d_capacity = MAX_NODES; + g_nodes[node_index].d_children = calloc(g_nodes[node_index].d_capacity, sizeof(DirEnt)); + if (!g_nodes[node_index].d_children) { + errno = ENOMEM; + return NULL; + } + g_nodes[node_index].d_count = 0; + } + return &g_nodes[node_index]; +} + +static int imfs_allocate_fd(int cage_id, Node *node, int flags) { + if (!node) + return -1; + + int i; + if (g_fd_free_list_size[cage_id] > -1) { + i = g_fd_free_list[cage_id][g_fd_free_list_size[cage_id]--]; + } else { + i = g_next_fd[cage_id]++; + } + + if (i == MAX_FDS) { + errno = EMFILE; + return -1; + } + + g_fdtable[cage_id][i] = (FileDesc){ + .node = node, + .offset = 0, + .link = NULL, + .status = 1, + .flags = flags, + }; + + node->in_use++; + + clock_gettime(CLOCK_REALTIME, &node->atime); + + return i; +} + +static FileDesc *get_filedesc(int cage_id, int fd) { + if (g_fdtable[cage_id][fd].link) + return g_fdtable[cage_id][fd].link; + + return &g_fdtable[cage_id][fd]; +} + +// +// These two functions are used to perform a Node lookup. The implementation for +// this is to start from the '/' REG and iteratively go through their child +// nodes. +// +// imfs_find_node_namecomp() takes as input an array of path name components. +// imfs_find_node() takes as input a pathname which is then split by '/' +// +// The runtime should likely be improved by using a different method like a hash +// table. +// +static Node * +imfs_find_node_namecomp(int cage_id, int dirfd, + const char namecomp[MAX_DEPTH][MAX_NODE_NAME], + int count) { + FileDesc *fd = get_filedesc(cage_id, dirfd); + if (count == 0) + return g_root_node; + + Node *current; + if (dirfd == AT_FDCWD) + current = g_root_node; + else + current = fd->node; + + for (int i = 0; i < count && current; i++) { + Node *found = NULL; + for (size_t j = 0; j < current->d_count; j++) { + if (str_compare(namecomp[i], + current->d_children[j].name) == 1) { + switch (current->d_children[j].node->type) { + case M_LNK: + found = current->d_children[j] + .node->l_link; + break; + case M_DIR: + case M_REG: + found = + current->d_children[j].node; + break; + default: + found = NULL; + } + break; + } + } + + if (!found) { + return NULL; + } + + current = found; + } + + return current; +} + +static Node *imfs_find_node(int cage_id, int dirfd, const char *path) { + if (!path || !g_root_node) + return NULL; + + if (path[0] == '/' && path[1] == '\0') + return g_root_node; + + int count; + char namecomps[MAX_DEPTH][MAX_NODE_NAME]; + + split_path(path, &count, namecomps); + + return imfs_find_node_namecomp(cage_id, dirfd, namecomps, count); +} + +static int add_child(Node *parent, Node *node) { + if (!parent || !node || parent->type != M_DIR) + return -1; + + /* grow children array if needed */ + if (parent->d_count >= parent->d_capacity) { + size_t newcap = parent->d_capacity ? parent->d_capacity * 2 : MAX_NODES; + DirEnt *n = realloc(parent->d_children, newcap * sizeof(DirEnt)); + if (!n) + return -1; + parent->d_children = n; + parent->d_capacity = newcap; + } + + size_t new_count = parent->d_count + 1; + + parent->d_children[parent->d_count].node = node; + + str_ncopy(parent->d_children[parent->d_count].name, node->name, + MAX_NODE_NAME); + int length = str_len(node->name); + parent->d_children[parent->d_count].name[length] = '\0'; + + parent->d_count = new_count; + node->parent_idx = parent->index; + + return 0; +} + +static Pipe *get_pipe(int cage_id, int fd) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + if (fdesc->node->type != M_PIP) { + return NULL; + } + + return fdesc->node->p_pipe; +} + +static int imfs_dup_fd(int cage_id, int oldfd, int newfd) { + if (newfd == oldfd) + return newfd; + + int i; + if (newfd != -1) { + i = newfd; + goto allocate; + } + + if (g_fd_free_list_size[cage_id] > -1) { + i = g_fd_free_list[cage_id][g_fd_free_list_size[cage_id]--]; + } else { + i = g_next_fd[cage_id]++; + } + + if (i == MAX_FDS) { + errno = EMFILE; + return -1; + } + +allocate: + + if (g_fdtable[cage_id][i].node || g_fdtable[cage_id][i].link) + imfs_close(cage_id, i); + + g_fdtable[cage_id][i] = (FileDesc){ + .link = &g_fdtable[cage_id][oldfd], + .node = NULL, + .offset = 0, + }; + + return i; +} + +static int remove_child(Node *node) { + size_t total_nodes = g_nodes[node->parent_idx].d_count; + int remove_idx; + + for (int i = 0; i < total_nodes; i++) { + if (str_compare(g_nodes[node->parent_idx].d_children[i].name, + node->name)) { + remove_idx = i; + break; + } + } + + for (int i = remove_idx; i < total_nodes - 1; i++) { + g_nodes[node->parent_idx].d_children[i] = + g_nodes[node->parent_idx].d_children[i + 1]; + } + + g_nodes[node->parent_idx].d_count--; + + return 0; +} + +// +// Most FS APIs contain duplicated workflows, these functions deal with that. +// This allows for exports FS APIs to be brief. For e.g., the difference between +// write() and pwrite() is only on how offsets are used and updated. The rest of +// the logic remains the same. +// + +static int imfs_remove_file(Node *node) { + remove_child(node); + + node->doomed = 1; + + if (!node->in_use) { + g_free_list[++g_free_list_size] = node->index; + node->type = M_NON; + } + + return 0; +} + +static int imfs_remove_pipe(Node *node) { + node->doomed = 1; + + g_free_list[++g_free_list_size] = node->index; + node->type = M_NON; + + return 0; +} + +static int imfs_remove_dir(Node *node) { + if (node == g_root_node || node->d_count > 2) { + errno = EBUSY; + return -1; + } + + if (!node->in_use) { + g_free_list[++g_free_list_size] = node->index; + /* free dynamic children storage if allocated */ + if (node->d_children) { + free(node->d_children); + node->d_children = NULL; + node->d_capacity = 0; + } + node->type = M_NON; + } + + remove_child(node); + node->doomed = 1; + return 0; +} + +static int imfs_remove_link(Node *node) { + if (!node->in_use) { + g_free_list[++g_free_list_size] = node->index; + node->type = M_NON; + } + + remove_child(node); + node->doomed = 1; + return 0; +} + +static ssize_t __imfs_pipe_read(int cage_id, int fd, void *buf, size_t count, + int pread, off_t offset) { + Pipe *_pipe = get_pipe(cage_id, fd); + + LOG("[pipe] [read] offset=%zd status=%d\n", count, + _pipe->writefd->status); + while (_pipe->writefd->status && _pipe->offset <= 0) { + }; + + int to_read = _pipe->offset; + mem_cpy(buf, _pipe->data, to_read); + _pipe->offset = 0; + + return to_read; +} + +static ssize_t imfs_new_read(int cage_id, int fd, void *buf, size_t count, + int pread, off_t offset) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + + if ((fdesc->flags & O_ACCMODE) == O_WRONLY) { + errno = EBADF; + return -1; + } + + Node *node = fdesc->node; + off_t use_offset = pread ? offset : fdesc->offset; + + if (use_offset >= node->total_size) + return 0; + + if (use_offset + count > node->total_size) + count = node->total_size - use_offset; + + size_t read = 0; + size_t local_offset = use_offset; + Chunk *c = node->r_head; + + while (c && local_offset >= 1024) { + local_offset -= 1024; + c = c->next; + } + + while (read < count && c) { + size_t available = c->used - local_offset; + size_t to_copy = count - read; + if (to_copy > available) { + to_copy = available; + } + + mem_cpy(buf + read, c->data + local_offset, to_copy); + + read += to_copy; + local_offset = 0; + c = c->next; + } + + if (!pread) + fdesc->offset += read; + + return read; +} + +static ssize_t __imfs_readv(int cage_id, int fd, const struct iovec *iov, + int len, off_t offset, int pread) { + int ret, fin = 0; + for (int i = 0; i < len; i++) { + ret = imfs_new_read(cage_id, fd, iov[i].iov_base, + iov[i].iov_len, pread, offset); + if (ret == -1) + return ret; + else + fin += ret; + } + + return fin; +} + +static ssize_t __imfs_pipe_write(int cage_id, int fd, const void *buf, + size_t count, int pread, off_t offset) { + Pipe *_pipe = get_pipe(cage_id, fd); + + mem_cpy(_pipe->data, buf, count); + _pipe->offset += count; + LOG("[pipe] offset=%zd\n", count); + + return count; +} + +static ssize_t imfs_new_write(int cage_id, int fd, const void *buf, + size_t count, int pread, off_t offset) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + + if ((fdesc->flags & O_ACCMODE) == O_RDONLY) { + errno = EBADF; + return -1; + } + + Node *node = fdesc->node; + off_t use_offset = pread ? offset : (fdesc->flags & O_APPEND ? node->total_size : fdesc->offset); + + size_t written = 0; + + size_t chunk_offset = 0; + Chunk *c = node->r_head; + size_t local_offset = use_offset; + + while (c && local_offset >= 1024) { + local_offset -= 1024; + chunk_offset += c->used; + if (!c->next) + break; + c = c->next; + } + + while (written < count) { + if (!c) { + Chunk *new_chunk = calloc(1, sizeof(Chunk)); + if (!new_chunk) + return -1; + if (node->r_tail) + node->r_tail->next = new_chunk; + node->r_tail = new_chunk; + if (!node->r_head) + node->r_head = new_chunk; + c = new_chunk; + } + + size_t space = 1024 - local_offset; + size_t to_copy = count - written; + if (to_copy > space) + to_copy = space; + + // Zero-fill any hole before writing + if (local_offset > c->used) { + memset(c->data + c->used, 0, local_offset - c->used); + } + + mem_cpy(c->data + local_offset, buf + written, to_copy); + + if (local_offset + to_copy > c->used) + c->used = local_offset + to_copy; + + written += to_copy; + local_offset = 0; + c = c->next; + } + + if(use_offset + written > node->total_size) + node->total_size = use_offset + written; + + if (!pread) + fdesc->offset += written; + + clock_gettime(CLOCK_REALTIME, &node->mtime); + + return written; +} + +static ssize_t __imfs_writev(int cage_id, int fd, const struct iovec *iov, + int count, off_t offset, int pread) { + int ret, fin = 0; + for (int i = 0; i < count; i++) { + ret = imfs_new_write(cage_id, fd, iov[i].iov_base, + iov[i].iov_len, pread, count); + if (ret == -1) + return ret; + else + fin += ret; + } + return fin; +} + +static int __imfs_stat(int cage_id, Node *node, struct stat *statbuf) { + if (node == NULL) + return -1; + + *statbuf = (struct stat){ + .st_dev = GET_DEV, + .st_ino = node->index, + .st_mode = node->mode, + .st_nlink = 1, + .st_uid = GET_UID, + .st_gid = GET_GID, + .st_rdev = 0, + .st_size = node->total_size, + .st_blksize = 512, + .st_blocks = node->total_size / 512, +#ifdef __APPLE__ + .st_atimespec = node->atime, + .st_mtimespec = node->mtime, + .st_ctimespec = node->ctime, + .st_birthtimespec = node->btime, +#else + .st_atim = node->atime, + .st_mtim = node->mtime, + .st_ctim = node->ctime, +#endif + }; + + return 0; +} + +// +// Exported Utility Functions +// + +void load_file(char *path) { + FILE *fp = fopen("preloads.log", "a"); + + fprintf(fp, "\n[load_file] loading=%s\n", path); + + char split_path[4096]; + strcpy(split_path, path); + + for (char *p = split_path + 1; *p; p++) { + if (*p == '/') { + *p = '\0'; + int ret = imfs_mkdir(0, split_path, 0755); + *p = '/'; + fprintf(fp, "[load_file] mkdir=%d\n", ret); + } + } + + int imfs_fd = imfs_open(0, path, O_CREAT | O_WRONLY, 0777); + fprintf(fp, "[load_file] created file: %s\n", path); + + size_t size; + char *data = read_full_file(path, &size); + + imfs_write(0, imfs_fd, data, size); + free(data); + + imfs_close(0, imfs_fd); +} + +void dump_file(char *path, char *actual_path) { + char split_path[4096]; + strcpy(split_path, path); + + for (char *p = split_path + 1; *p; p++) { + if (*p == '/') { + *p = '\0'; + int ret = mkdir(split_path, 0755); + *p = '/'; + } + } + + int fd = open(actual_path, O_CREAT | O_WRONLY | O_TRUNC, 0777); + int ifd = imfs_open(0, path, O_RDONLY, 0); + + size_t nread; + char buf[1024]; + + while (1) { + char buf[1024]; + size_t nread = imfs_read(0, ifd, buf, 1024); + + if (nread <= 0) { + break; + } + + write(fd, buf, nread); + } + + close(fd); + imfs_close(0, ifd); +} + +void preloads(const char *env) { + if (!env) { + fprintf(stderr, "no preloads.\n"); + return; + } + + char *list = strdup(env); + if (!list) { + return; + } + + fprintf(stderr, "Loading all files\n"); + char *line = strtok(list, ":"); + + FILE *fp = fopen("preloads.log", "a"); + + while (line) { + fprintf(fp, "Loading= %s\n", line); + + struct stat st; + if (stat(line, &st) < 0) { + line = strtok(NULL, ":"); + continue; + } + + if (strlen(line) > 0) { + if (S_ISREG(st.st_mode)) + load_file(line); + } + fprintf(fp, "Loaded {%s}\n", line); + line = strtok(NULL, ":"); + } + + fclose(fp); + free(list); +} + +void imfs_init(void) { + g_free_list_size = -1; + + for (int cage_id = 0; cage_id < MAX_PROCS; cage_id++) { + for (int i = 0; i < MAX_FDS; i++) { + g_fdtable[cage_id][i] = (FileDesc){ + .node = NULL, + .offset = 0, + }; + } + } + + for (int i = 0; i < MAX_NODES; i++) { + g_nodes[i] = (Node){ + .type = M_NON, + .index = i, + .in_use = 0, + .d_count = 0, + .total_size = 0, + .mode = 0, + }; + /* ensure dir children pointer is NULL and capacity zero */ + g_nodes[i].d_children = NULL; + g_nodes[i].d_capacity = 0; + } + + for (int i = 0; i < MAX_PROCS; i++) { + g_fd_free_list_size[i] = -1; + } + + for (int i = 0; i < MAX_PROCS; i++) { + g_next_fd[i] = 3; + } + + Node *root_node = imfs_create_node("/", M_DIR, 0755); + root_node->parent_idx = root_node->index; + + Node *dot = imfs_create_node(".", M_LNK, 0); + if (!dot) + exit(1); + dot->l_link = root_node; + + Node *dotdot = imfs_create_node("..", M_LNK, 0); + if (!dotdot) + exit(1); + + if (add_child(root_node, dot) != 0) + exit(1); + if (add_child(root_node, dotdot) != 0) + exit(1); + dotdot->l_link = root_node; + + g_root_node = &g_nodes[0]; +} + +// +// FS Entrypoints +// + +int imfs_fcntl(int cage_id, int fd, int op, int arg) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + + if (!fdesc) { + return -1; + } + + switch (fd) { + case F_GETFL: + return fdesc->flags; + default: + return -1; + } +} + +int imfs_openat(int cage_id, int dirfd, const char *path, int flags, + mode_t mode) { + if (!path) { + errno = EINVAL; + return -1; + } + + if (dirfd == -1) { + errno = EBADF; + return -EBADF; + } + + int count; + char namecomp[MAX_DEPTH][MAX_NODE_NAME]; + + split_path(path, &count, namecomp); + + char *filename = namecomp[count - 1]; + + Node *parent_node; + + parent_node = + imfs_find_node_namecomp(cage_id, dirfd, namecomp, count - 1); + + if (!parent_node || parent_node->type != M_DIR) { + errno = ENOTDIR; + return -ENOTDIR; + } + + Node *node = imfs_find_node(cage_id, dirfd, path); + + // New File + if (!node) { + if (!(flags & O_CREAT)) { + errno = ENOENT; + return -ENOENT; + } + + if (str_len(filename) > MAX_NODE_NAME - 1) { + errno = ENAMETOOLONG; + return -ENAMETOOLONG; + } + + if (str_len(filename) > 64) { + errno = ENAMETOOLONG; + return -ENAMETOOLONG; + } + + node = imfs_create_node(filename, M_REG, mode); + if (!node) { + return -ENOMEM; + } + + if (add_child(parent_node, node) != 0) { + errno = ENOMEM; + node->type = M_NON; + return -ENOMEM; + } + } else { + // File Exists + if (/*flags & O_EXCL ||*/ flags & O_CREAT) { + errno = EEXIST; + return -EEXIST; + } + + if (node->type == M_DIR && !(flags & O_DIRECTORY)) { + errno = EISDIR; + return -EISDIR; + } + + // Check for file access based on flags and mode. + + switch (O_ACCMODE & flags) { + case O_RDONLY: + if (!(node->mode & S_IRUSR)) { + errno = EACCES; + return -EACCES; + } + break; + case O_RDWR: + if (!(node->mode & S_IWUSR) || + !(node->mode & S_IRUSR)) { + errno = EACCES; + return -EACCES; + } + break; + case O_WRONLY: + if (!(node->mode & S_IWUSR)) { + errno = EACCES; + return -EACCES; + } + break; + default: + break; + } + } + + return imfs_allocate_fd(cage_id, node, flags); +} + +int imfs_open(int cage_id, const char *path, int flags, mode_t mode) { + return imfs_openat(cage_id, AT_FDCWD, path, flags, mode); +} + +int imfs_creat(int cage_id, const char *path, mode_t mode) { + return imfs_open(cage_id, path, O_WRONLY | O_CREAT | O_TRUNC, mode); +} + +int imfs_close(int cage_id, int fd) { + if (fd < 0 || fd >= MAX_FDS || !g_fdtable[cage_id][fd].node) { + errno = EBADF; + return -1; + } + + FileDesc *fdesc = get_filedesc(cage_id, fd); + fdesc->node->in_use--; + + if (fdesc->node->doomed) { + /* if it's a directory, free its children buffer */ + if (fdesc->node->type == M_DIR && fdesc->node->d_children) { + free(fdesc->node->d_children); + fdesc->node->d_children = NULL; + fdesc->node->d_capacity = 0; + } + fdesc->node->type = M_NON; + g_free_list[++g_free_list_size] = fdesc->node->index; + } + + g_fd_free_list[cage_id][++g_fd_free_list_size[cage_id]] = fd; + + // Reclaim anonymous pipe if both fd's are closed. + if (fdesc->node->type == M_PIP) { + if (!fdesc->node->p_pipe->readfd->status && + !fdesc->node->p_pipe->writefd->status) { + imfs_remove_pipe(fdesc->node); + } + } + + *fdesc = (FileDesc){ + .node = NULL, + .offset = 0, + .status = 0, + }; + + return 0; +} + +ssize_t imfs_write(int cage_id, int fd, const void *buf, size_t count) { + return imfs_new_write(cage_id, fd, buf, count, 0, 0); +} + +ssize_t imfs_pwrite(int cage_id, int fd, const void *buf, size_t count, + off_t offset) { + return imfs_new_write(cage_id, fd, buf, count, 1, offset); +} + +ssize_t imfs_writev(int cage_id, int fd, const struct iovec *iov, int count) { + return __imfs_writev(cage_id, fd, iov, count, 0, 0); +} + +ssize_t imfs_pwritev(int cage_id, int fd, const struct iovec *iov, int count, + off_t offset) { + return __imfs_writev(cage_id, fd, iov, count, offset, 1); +} + +ssize_t imfs_read(int cage_id, int fd, void *buf, size_t count) { + return imfs_new_read(cage_id, fd, buf, count, 0, 0); +} + +ssize_t imfs_pread(int cage_id, int fd, void *buf, size_t count, off_t offset) { + return imfs_new_read(cage_id, fd, buf, count, 1, offset); +} + +ssize_t imfs_readv(int cage_id, int fd, const struct iovec *iov, int count) { + return __imfs_readv(cage_id, fd, iov, count, 0, 0); +} + +ssize_t imfs_preadv(int cage_id, int fd, const struct iovec *iov, int count, + off_t offset) { + return __imfs_readv(cage_id, fd, iov, count, offset, 1); +} + +int imfs_mkdirat(int cage_id, int fd, const char *path, mode_t mode) { + if (!path) { + errno = EINVAL; + return -1; + } + + Node *parent; + + char namecomp[MAX_DEPTH][MAX_NODE_NAME]; + int count; + + split_path(path, &count, namecomp); + char *filename = namecomp[count - 1]; + + if (str_compare(filename, ".") || str_compare(filename, "..")) { + errno = EINVAL; + return -1; + } + + // Invalid path (parent doesn't exist) + parent = imfs_find_node_namecomp(cage_id, fd, namecomp, count - 1); + if (!parent) { + errno = EINVAL; + return -1; + } + + Node *node; + + // Invalid path (directory already exists) + node = imfs_find_node_namecomp(cage_id, fd, namecomp, count); + if (node) { + errno = EEXIST; + return -1; + } + + // Node creation failed + node = imfs_create_node(filename, M_DIR, mode); + if (!node) { + return -1; + } + + // Add new node to parent, and add . & .. to new node. + if (add_child(parent, node) != 0) { + errno = ENOMEM; + node->type = M_NON; + return -1; + } + + Node *dot = imfs_create_node(".", M_LNK, 0); + if (!dot) + return -1; + dot->l_link = node; + + Node *dotdot = imfs_create_node("..", M_LNK, 0); + if (!dotdot) + return -1; + + if (add_child(node, dot) != 0) + return -1; + if (add_child(node, dotdot) != 0) + return -1; + + dotdot->l_link = &g_nodes[node->parent_idx]; + + LOG("Created Node: \n"); + LOG("Index: %d \n", node->index); + LOG("Name: %s\n", node->name); + LOG("Type: %d\n", node->type); + + return 0; +} + +int imfs_mkdir(int cage_id, const char *path, mode_t mode) { + return imfs_mkdirat(cage_id, AT_FDCWD, path, mode); +} + +int imfs_linkat(int cage_id, int olddirfd, const char *oldpath, int newdirfd, + const char *newpath, int flags) { + Node *oldnode = imfs_find_node(cage_id, olddirfd, oldpath); + + if (!oldnode) { + errno = EINVAL; + return -1; + } + + char namecomp[MAX_DEPTH][MAX_NODE_NAME]; + int count; + + Node *newnode = imfs_find_node(cage_id, newdirfd, newpath); + if (newnode != NULL) { + errno = EINVAL; + return -1; + } + + split_path(newpath, &count, namecomp); + + char *filename = namecomp[count - 1]; + + Node *newnode_parent = + imfs_find_node_namecomp(cage_id, newdirfd, namecomp, count - 1); + newnode = imfs_create_node(filename, M_LNK, 0); + + newnode->l_link = oldnode; + + if (add_child(newnode_parent, newnode) != 0) { + errno = ENOMEM; + newnode->type = M_NON; + return -1; + } + + clock_gettime(CLOCK_REALTIME, &newnode->ctime); + + return 0; +} + +int imfs_link(int cage_id, const char *oldpath, const char *newpath) { + return imfs_linkat(cage_id, AT_FDCWD, oldpath, AT_FDCWD, newpath, 0); +} + +int imfs_symlink(int cage_id, const char *oldpath, const char *newpath) { + return imfs_linkat(cage_id, AT_FDCWD, oldpath, AT_FDCWD, newpath, 0); +} + +int imfs_rename(int cage_id, const char *oldpath, const char *newpath) { + int count; + char namecomp[MAX_DEPTH][MAX_NODE_NAME]; + + split_path(oldpath, &count, namecomp); + Node *current_node = + imfs_find_node_namecomp(0, AT_FDCWD, namecomp, count); + if (!current_node) { + errno = ENOENT; + return -1; + } + + split_path(newpath, &count, namecomp); + Node *new_parent = + imfs_find_node_namecomp(0, AT_FDCWD, namecomp, count - 1); + char *new_filename = namecomp[count - 1]; + if (!new_parent) { + errno = ENOENT; + return -1; + } + + str_ncopy(current_node->name, new_filename, MAX_NODE_NAME); + int length = str_len(new_filename); + current_node->name[length] = '\0'; + + // Remode node from old parent. + remove_child(current_node); + + // Add node to new parent + add_child(new_parent, current_node); + + return 0; +} + +int imfs_chown(int cage_id, const char *pathname, uid_t owner, gid_t group) { + Node *node = imfs_find_node(cage_id, AT_FDCWD, pathname); + if (!node) { + return -1; + } + + node->owner = owner; + node->group = group; + + clock_gettime(CLOCK_REALTIME, &node->ctime); + return 0; +} + +int imfs_chmod(int cage_id, const char *pathname, mode_t mode) { + Node *node = imfs_find_node(cage_id, AT_FDCWD, pathname); + + if (!node) { + errno = ENOENT; + return -1; + } + + node->mode = (node->mode & ~0777) | mode; + + return 0; +} + +int imfs_fchmod(int cage_id, int fd, mode_t mode) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + + if (!fdesc || !fdesc->node) { + errno = ENOENT; + return -1; + } + + fdesc->node->mode = (fdesc->node->mode & ~0777) | mode; + + return 0; +} + +int imfs_remove(int cage_id, const char *pathname) { + Node *node = imfs_find_node(cage_id, AT_FDCWD, pathname); + + if (!node) { + errno = ENOENT; + return -1; + } + + switch (node->type) { + case M_DIR: + return imfs_remove_dir(node); + case M_LNK: + return imfs_remove_link(node); + case M_REG: + return imfs_remove_file(node); + default: + return 0; + } +} + +int imfs_rmdir(int cage_id, const char *pathname) { + return imfs_remove(cage_id, pathname); +} + +int imfs_unlink(int cage_id, const char *pathname) { + return imfs_remove(cage_id, pathname); +} + +off_t imfs_lseek(int cage_id, int fd, off_t offset, int whence) { + FileDesc *fdesc = get_filedesc(cage_id, fd); + + if (!fdesc->node) { + errno = EBADF; + return -1; + } + + off_t ret = fdesc->offset; + + // SEEK_HOLE and SEEK_DATA need to be reworked. Unclear as to what it is + // they do + switch (whence) { + case SEEK_SET: + ret = offset; + break; + case SEEK_CUR: + ret += offset; + break; + case SEEK_END: + ret = fdesc->node->total_size + offset; + break; +#ifdef _GNU_SOURCE + case SEEK_HOLE: + while (*(char *)(fdesc->node + ret)) { + ret++; + } + break; + case SEEK_DATA: + while (!*(char *)(fdesc->node + ret)) { + ret++; + } + break; +#endif + default: + errno = EINVAL; + return ret - 1; + } + + fdesc->offset = ret; + + return ret; +} + +int imfs_dup(int cage_id, int fd) { return imfs_dup_fd(cage_id, fd, -1); } + +int imfs_dup2(int cage_id, int oldfd, int newfd) { + return imfs_dup_fd(cage_id, oldfd, newfd); +} + +int imfs_lstat(int cage_id, const char *pathname, struct stat *statbuf) { + Node *node = imfs_find_node(cage_id, AT_FDCWD, pathname); + return __imfs_stat(cage_id, node, statbuf); +} + +int imfs_stat(int cage_id, const char *pathname, struct stat *statbuf) { + LOG("cage=%d pathname=%s\n", cage_id, pathname); + Node *node = imfs_find_node(cage_id, AT_FDCWD, pathname); + if (!node) { + errno = ENOENT; + return -1; + } + if (node->type == M_LNK) + return __imfs_stat(cage_id, node->l_link, statbuf); + return __imfs_stat(cage_id, node, statbuf); +} + +int imfs_fstat(int cage_id, int fd, struct stat *statbuf) { + Node *node = get_filedesc(cage_id, fd)->node; + if (node->type == M_LNK) + return __imfs_stat(cage_id, node->l_link, statbuf); + return __imfs_stat(cage_id, node, statbuf); +} + +I_DIR *imfs_opendir(int cage_id, const char *name) { + I_DIR *dirstream = NULL; + int fd = imfs_open(cage_id, name, O_DIRECTORY, 0); + Node *node = get_filedesc(cage_id, fd)->node; + + *dirstream = (I_DIR){ + .fd = fd, + .node = node, + .size = 0, + .offset = 0, + .filepos = 0, + }; + + return dirstream; +} + +struct dirent *imfs_readdir(int cage_id, I_DIR *dirstream) { + struct dirent *ret = malloc(sizeof(struct dirent)); + + Node *dirnode = dirstream->node; + + if (dirstream->offset >= dirnode->d_count) { + return NULL; + } + + // Next entry + + struct DirEnt nextentry = dirnode->d_children[dirstream->offset++]; + + int ino = nextentry.node->index; + int _type = nextentry.node->type; + size_t namelen = str_len(nextentry.name); + + *ret = (struct dirent){ + .d_ino = ino, // 8 + .d_reclen = 32, // 24 + // .d_namlen = namelen, // 32 + X + .d_type = _type, // 36 + X + }; + + str_ncopy(ret->d_name, nextentry.name, namelen); + ret->d_name[namelen + 1] = '\0'; + + return ret; +} + +// pipe and pipe2 have only gone limited testing. Since IMFS doesn't support +// multi-processing on native builds, these need to be tested out in Lind. +int imfs_pipe(int cage_id, int pipefd[2]) { + Node *pipenode = imfs_create_node("APIP", M_PIP, 0); + pipefd[0] = imfs_allocate_fd(cage_id, pipenode, O_RDONLY); + pipefd[1] = imfs_allocate_fd(cage_id, pipenode, O_WRONLY); + + pipenode->p_pipe = mmap(NULL, sizeof(Pipe), PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + + pipenode->p_pipe->offset = 0; + pipenode->p_pipe->readfd = get_filedesc(cage_id, pipefd[0]); + pipenode->p_pipe->writefd = get_filedesc(cage_id, pipefd[1]); + + return 0; +} + +int imfs_pipe2(int cage_id, int pipefd[2], int flags) { + return imfs_pipe(cage_id, pipefd); +} + +int imfs_mkfifo(int cage_id, const char *pathname, mode_t mode) { + errno = EOPNOTSUPP; + return -1; +} + +int imfs_mknod(int cage_id, const char *pathname, mode_t mode, dev_t dev) { + errno = EOPNOTSUPP; + return -1; +} + +int imfs_bind(int cage_id, int sockfd, const struct sockaddr *addr, + socklen_t length) { + errno = EOPNOTSUPP; + return -1; +} + +int imfs_pathconf(int cage_id, const char *pathname, int name) { + return PC_CONSTS[name]; +} + +int imfs_fpathconf(int cage_id, int fd, int name) { return PC_CONSTS[name]; } + diff --git a/examples/imfs-grate/src/imfs.h b/examples/imfs-grate/src/imfs.h new file mode 100644 index 0000000..64a521d --- /dev/null +++ b/examples/imfs-grate/src/imfs.h @@ -0,0 +1,216 @@ + +#include +#include +#include +#include + +#include +#include + +#ifdef DIAG +#define LOG(...) printf(__VA_ARGS__) +#else +#define LOG(...) ((void)0) +#endif + +#define MAX_NODE_NAME 65 +#define MAX_NODE_SIZE 4096 +#define MAX_FDS 1024 +#define MAX_NODES 1024 +#define MAX_DEPTH 10 +#define MAX_PROCS 128 + +// These are stubs for the stat call, for now we return +// a constant. These can be reappropriated later. +#define GET_UID 501 +#define GET_GID 20 +#define GET_DEV 1 + +typedef struct Node Node; +typedef struct FileDesc FileDesc; +typedef struct Pipe Pipe; +typedef struct Chunk Chunk; + +// Used for pathconf(3) +static int PC_CONSTS[] = { + 0, + 10, + 10, + 10, + MAX_NODE_NAME - 1, // _PC_NAME_MAX + MAX_DEPTH *MAX_NODE_NAME, // _PC_PATH_MAX + 10, + 10, + 10, + 10, +}; + +typedef enum { + M_REG = S_IFREG, + M_DIR = S_IFDIR, + M_LNK = S_IFLNK, + M_PIP, + // Indicated free node + M_NON = 0, +} NodeType; + +#define d_children info.dir.children +#define d_count info.dir.count +#define d_capacity info.dir.capacity +#define l_link info.lnk.link +#define r_data info.reg.data +#define r_head info.reg.head +#define r_tail info.reg.tail +#define p_pipe info.pip.pipe + +typedef struct DirEnt { + char name[MAX_NODE_NAME]; + struct Node *node; +} DirEnt; + +typedef struct Node { + NodeType type; + int index; /* Index in the global g_nodes */ + + size_t total_size; /* Total size of a reg file */ + + char name[MAX_NODE_NAME]; /* File name */ + // struct Node *parent; /* Parent node */ + int parent_idx; + int in_use; /* Number of FD's attached to this node */ + int doomed; + mode_t mode; + + uid_t owner; + gid_t group; + + struct timespec atime; + struct timespec mtime; + struct timespec ctime; + struct timespec btime; + + union { + // M_REG + struct { + Chunk *head; /* First data node */ + Chunk *tail; /* Last data node */ + } reg; + + // M_LNK + struct { + struct Node *link; /* Point to linked node. */ + } lnk; + + // M_DIR + struct { + struct DirEnt *children; /* Directory contents (dynamically allocated). */ + size_t count; /* len(children) including . and .. */ + size_t capacity; /* allocated capacity for children */ + } dir; + + // M_PIP + struct { + Pipe *pipe; + } pip; + } info; +} Node; + +typedef struct FileDesc { + int status; + int flags; + struct FileDesc *link; + Node *node; + int offset; /* How many bytes have been read. */ +} FileDesc; + +// This is an internal reprenstation of the DIR* struct +// the internal implementation of which changes quite often. +// We need this only to enable readdir() through opendir(). +typedef struct I_DIR { + int fd; + Node *node; + size_t size; + size_t offset; + off_t filepos; +} I_DIR; + +typedef struct Pipe { + FileDesc *readfd; + FileDesc *writefd; + char data[1024]; + off_t offset; +} Pipe; + +// Data for reg files is stored in Chunks of size 1024 bytes, there are +// connected through a linked list. +typedef struct Chunk { + char data[1024]; + size_t used; + Chunk *next; +} Chunk; + +int imfs_open(int cage_id, const char *path, int flags, mode_t mode); +int imfs_openat(int cage_id, int dirfd, const char *path, int flags, + mode_t mode); +int imfs_creat(int cage_id, const char *path, mode_t mode); +ssize_t imfs_read(int cage_id, int fd, void *buf, size_t count); +ssize_t imfs_write(int cage_id, int fd, const void *buf, size_t count); +int imfs_close(int cage_id, int fd); +int imfs_mkdir(int cage_id, const char *path, mode_t mode); +int imfs_mkdirat(int cage_id, int fd, const char *path, mode_t mode); +int imfs_rmdir(int cage_id, const char *path); +int imfs_remove(int cage_id, const char *path); +int imfs_link(int cage_id, const char *oldpath, const char *newpath); +int imfs_linkat(int cage_id, int olddirfd, const char *oldpath, int newdirfd, + const char *newpath, int flags); +int imfs_unlink(int cage_id, const char *path); +off_t imfs_lseek(int cage_id, int fd, off_t offset, int whence); +int imfs_dup(int cage_id, int oldfd); +int imfs_dup2(int cage_id, int oldfd, int newfd); + +ssize_t imfs_pwrite(int cage_id, int fd, const void *buf, size_t count, + off_t offset); +ssize_t imfs_pread(int cage_id, int fd, void *buf, size_t count, off_t offset); + +int imfs_lstat(int cage_id, const char *pathname, struct stat *statbuf); +int imfs_stat(int cage_id, const char *pathname, struct stat *statbuf); +int imfs_fstat(int cage_id, int fd, struct stat *statbuf); + +I_DIR *imfs_opendir(int cage_id, const char *name); +struct dirent *imfs_readdir(int cage_id, I_DIR *dirstream); + +ssize_t imfs_readv(int cage_id, int fd, const struct iovec *iov, int count); +ssize_t imfs_preadv(int cage_id, int fd, const struct iovec *iov, int count, + off_t offset); +ssize_t imfs_writev(int cage_id, int fd, const struct iovec *iov, int count); +ssize_t imfs_pwritev(int cage_id, int fd, const struct iovec *iov, int count, + off_t offset); + +int imfs_symlink(int cage_id, const char *oldpath, const char *newpath); +int imfs_rename(int cage_id, const char *oldpath, const char *newpath); + +int imfs_chown(int cage_id, const char *pathname, uid_t owner, gid_t group); +int imfs_chmod(int cage_id, const char *pathname, mode_t mode); +int imfs_fchmod(int cage_id, int fd, mode_t mode); + +int imfs_mkfifo(int cage_id, const char *pathname, mode_t mode); +int imfs_mknod(int cage_id, const char *pathname, mode_t mode, dev_t dev); + +int imfs_bind(int cage_id, int sockfd, const struct sockaddr *addr, + socklen_t addrlen); + +int imfs_pathconf(int cage_id, const char *pathname, int name); +int imfs_fpathconf(int cage_id, int fd, int name); + +int imfs_pipe(int cage_id, int pipefd[2]); +int imfs_pipe2(int cage_id, int pipefd[2], int flags); + +int imfs_fcntl(int cage_id, int fd, int op, int arg); + +void imfs_copy_fd_tables(int srcfd, int dstfd); + +void preloads(const char *); +void load_file(char *); +void dump_file(char *, char *); + +void imfs_init(); diff --git a/examples/imfs-grate/src/imfs_grate.c b/examples/imfs-grate/src/imfs_grate.c new file mode 100644 index 0000000..d1fa868 --- /dev/null +++ b/examples/imfs-grate/src/imfs_grate.c @@ -0,0 +1,343 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "imfs.h" + +// Dispatcher function +int pass_fptr_to_wt(uint64_t fn_ptr_uint, uint64_t cageid, uint64_t arg1, + uint64_t arg1cage, uint64_t arg2, uint64_t arg2cage, + uint64_t arg3, uint64_t arg3cage, uint64_t arg4, + uint64_t arg4cage, uint64_t arg5, uint64_t arg5cage, + uint64_t arg6, uint64_t arg6cage) { + if (fn_ptr_uint == 0) { + return -1; + } + + int (*fn)(uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, + uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, + uint64_t) = + (int (*)(uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, + uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, + uint64_t))(uintptr_t)fn_ptr_uint; + + return fn(cageid, arg1, arg1cage, arg2, arg2cage, arg3, arg3cage, arg4, + arg4cage, arg5, arg5cage, arg6, arg6cage); +} + +// Stores list of files on host that need to be copied in before cage execution. +const char *preload_files; + +// This is a util function to enable logging syscall invocations along with inputs and return value. +static inline void sys_log_args(const char *name, uint64_t arg1, uint64_t arg2, + uint64_t arg3, uint64_t arg4, uint64_t arg5, + uint64_t arg6, int ret) { + char buf[512]; + size_t pos = 0; + + pos += snprintf(buf + pos, sizeof(buf) - pos, "%s (", name); + + uint64_t args[6] = {arg1, arg2, arg3, arg4, arg5, arg6}; + int first = 1; + + for (int i = 0; i < 6; i++) { + if (args[i] == 0xdeadbeefdeadbeefULL) + continue; + + if (!first) + pos += snprintf(buf + pos, sizeof(buf) - pos, ", "); + + pos += snprintf(buf + pos, sizeof(buf) - pos, "%llu", args[i]); + first = 0; + } + + snprintf(buf + pos, sizeof(buf) - pos, ") = %d\n", ret); + + fprintf(stderr, "%s", buf); +} + +#define SYS_LOG(name, ret) \ + sys_log_args((name), arg1, arg2, arg3, arg4, arg5, arg6, (ret)) + +/* + * These functions are the wrappers for FS related syscalls. + * + * IMFS registers open, close, read, write, and fcntl syscalls. +*/ + +int open_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, uint64_t arg2, + uint64_t arg2cage, uint64_t arg3, uint64_t arg3cage, + uint64_t arg4, uint64_t arg4cage, uint64_t arg5, + uint64_t arg5cage, uint64_t arg6, uint64_t arg6cage) { + int thiscage = getpid(); + + // Copying the char* pathname into the grate's memory. + char *pathname = malloc(256); + + if (pathname == NULL) { + perror("malloc failed"); + exit(EXIT_FAILURE); + } + + // This is an API provided by `lind_syscall.h` which is used to copy buffers + // from one cage's memory to another's. + // + // This is useful for syscall wrappers where arguments passed by reference + // must be copied into the grate before the operation and copied back to the cage + // afterward. + // + // Syntax: + // + // copy_data_between_cages( + // thiscage, ID of the cage initiating the call. + // srcaddr, Virtual address in srccage where the data starts. + // srccage, Cage that owns the source data. + // destaddr, Destination virtual address in destcage. + // destcage, Cage that will receive the copied data. + // len, Number of bytes to copy for memcpy mode. + // copytype, Type of copy: 0 = raw (memcpy), 1 = bounded string (strncpy). + // ); + copy_data_between_cages(thiscage, arg1cage, arg1, arg1cage, + (uint64_t)pathname, thiscage, 256, 1); + + // Call imfs_open() from the IMFS library + int ifd = imfs_open(cageid, pathname, arg2, arg3); + + SYS_LOG("OPEN", ifd); + + free(pathname); + return ifd; +} + +int fcntl_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, + uint64_t arg2, uint64_t arg2cage, uint64_t arg3, + uint64_t arg3cage, uint64_t arg4, uint64_t arg4cage, + uint64_t arg5, uint64_t arg5cage, uint64_t arg6, + uint64_t arg6cage) { + int ret = imfs_fcntl(cageid, arg1, arg2, arg3); + SYS_LOG("FCNTL", ret); + return ret; +} + +int unlink_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, + uint64_t arg2, uint64_t arg2cage, uint64_t arg3, + uint64_t arg3cage, uint64_t arg4, uint64_t arg4cage, + uint64_t arg5, uint64_t arg5cage, uint64_t arg6, + uint64_t arg6cage) { + int thiscage = getpid(); + + char *pathname = malloc(256); + + if (pathname == NULL) { + perror("malloc failed"); + exit(EXIT_FAILURE); + } + + copy_data_between_cages(thiscage, arg1cage, arg1, arg1cage, + (uint64_t)pathname, thiscage, 256, 1); + + int ret = imfs_unlink(cageid, pathname); + + SYS_LOG("UNLINK", ret); + return ret; +} + +int close_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, + uint64_t arg2, uint64_t arg2cage, uint64_t arg3, + uint64_t arg3cage, uint64_t arg4, uint64_t arg4cage, + uint64_t arg5, uint64_t arg5cage, uint64_t arg6, + uint64_t arg6cage) { + int ret = imfs_close(cageid, arg1); + SYS_LOG("CLOSE", ret); + return ret; +} + +off_t lseek_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, + uint64_t arg2, uint64_t arg2cage, uint64_t arg3, + uint64_t arg3cage, uint64_t arg4, uint64_t arg4cage, + uint64_t arg5, uint64_t arg5cage, uint64_t arg6, + uint64_t arg6cage) { + int thiscage = getpid(); + + int fd = arg1; + off_t offset = (off_t)arg2; + int whence = (int)arg3; + + off_t ret = imfs_lseek(cageid, fd, offset, whence); + + SYS_LOG("LSEEK", ret); + + return ret; +} + +// Read: Copy memory from grate to cage. +// Write: Copy memory from cage to grate. +int read_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, uint64_t arg2, + uint64_t arg2cage, uint64_t arg3, uint64_t arg3cage, + uint64_t arg4, uint64_t arg4cage, uint64_t arg5, + uint64_t arg5cage, uint64_t arg6, uint64_t arg6cage) { + int thiscage = getpid(); + + int fd = (int)arg1; + int count = (size_t)arg3; + + ssize_t ret = 4321; + + char *buf = malloc(count); + + if (buf == NULL) { + fprintf(stderr, "Malloc failed"); + exit(1); + } + + ret = imfs_read(cageid, arg1, buf, count); + // Sometimes read() is called with a NULL buffer, do not call cp_data in + // that case. + if (arg2 != 0) { + copy_data_between_cages( + thiscage, arg2cage, (uint64_t)buf, thiscage, arg2, arg2cage, + count, + 0 // Use copytype 0 so read exactly count + // bytes instead of stopping at '\0' + ); + } + + SYS_LOG("READ", ret); + + free(buf); + + return ret; +} + +int write_grate(uint64_t cageid, uint64_t arg1, uint64_t arg1cage, + uint64_t arg2, uint64_t arg2cage, uint64_t arg3, + uint64_t arg3cage, uint64_t arg4, uint64_t arg4cage, + uint64_t arg5, uint64_t arg5cage, uint64_t arg6, + uint64_t arg6cage) { + int thiscage = getpid(); + int count = arg3; + int ret = 1604; + + char *buffer = malloc(count); + + if (buffer == NULL) { + perror("malloc failed."); + exit(1); + } + + copy_data_between_cages(thiscage, arg2cage, arg2, arg2cage, + (uint64_t)buffer, thiscage, count, 0); + + if (arg1 < 3) { + return write(arg1, buffer, count); + } + + ret = imfs_write(cageid, arg1, buffer, count); + free(buffer); + + SYS_LOG("WRITE", ret); + + return ret; +} + +int main(int argc, char *argv[]) { + // Should be at least two inputs (at least one grate file and one cage + // file) + if (argc < 2) { + fprintf(stderr, "Usage: %s \n", argv[0]); + exit(EXIT_FAILURE); + } + + // Create a semaphore to synchronize the grate and cage lifecycles. + // + // In this model, we call register_handler on the desired syscalls from the + // grate rather than the newly forked child process. + // + // We use an unnamed semaphore to ensure that the cage only calls exec once the + // grate has completed the necessary setup. + sem_t *sem = mmap(NULL, sizeof(*sem), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0); + sem_init(sem, 1, 0); + + int grateid = getpid(); + + // Initialize imfs data structures. + imfs_init(); + + // Load files into memory before execution + preload_files = getenv("PRELOADS"); + preloads(preload_files); + + pid_t cageid = fork(); + if (cageid < 0) { + perror("fork failed"); + exit(EXIT_FAILURE); + } else if (cageid == 0) { + // Wait for grate to complete setup actions. + sem_wait(sem); + + if (execv(argv[1], &argv[1]) == -1) { + perror("execv failed"); + exit(EXIT_FAILURE); + } + } + int ret; + uint64_t fn_ptr_addr; + + // OPEN + fn_ptr_addr = (uint64_t)(uintptr_t)&open_grate; + ret = register_handler(cageid, 2, 1, grateid, fn_ptr_addr); + + // LSEEK + fn_ptr_addr = (uint64_t)(uintptr_t)&lseek_grate; + ret = register_handler(cageid, 8, 1, grateid, fn_ptr_addr); + + // READ + fn_ptr_addr = (uint64_t)(uintptr_t)&read_grate; + ret = register_handler(cageid, 0, 1, grateid, fn_ptr_addr); + + // WRITE + fn_ptr_addr = (uint64_t)(uintptr_t)&write_grate; + ret = register_handler(cageid, 1, 1, grateid, fn_ptr_addr); + + // CLOSE + fn_ptr_addr = (uint64_t)(uintptr_t)&close_grate; + ret = register_handler(cageid, 3, 1, grateid, fn_ptr_addr); + + // FCNTL + fn_ptr_addr = (uint64_t)(uintptr_t)&fcntl_grate; + ret = register_handler(cageid, 72, 1, grateid, fn_ptr_addr); + + // UNLINK + fn_ptr_addr = (uint64_t)(uintptr_t)&unlink_grate; + ret = register_handler(cageid, 87, 1, grateid, fn_ptr_addr); + + // Notify cage that it can proceed with execution. + sem_post(sem); + + int status; + int w; + while (1) { + w = wait(&status); + if (w > 0) { + printf("[Grate] terminated, status: %d\n", status); + break; + } else if (w < 0) { + perror("[Grate] [Wait]"); + } + } + + // Clean up the semaphore once the cage has exited. + sem_destroy(sem); + munmap(sem, sizeof(*sem)); + + return 0; +} diff --git a/examples/imfs-grate/tests/imfs-test.c b/examples/imfs-grate/tests/imfs-test.c new file mode 100644 index 0000000..83432e2 --- /dev/null +++ b/examples/imfs-grate/tests/imfs-test.c @@ -0,0 +1,521 @@ +#include +#include +#include +#include +#include +#include + +#define PASS() \ + do { \ + printf("PASS: %s\n", __func__); \ + total_tests++; \ + } while (0) + +#define FAIL(msg) \ + do { \ + printf("FAIL: %s - %s\n", __func__, msg); \ + failures++; \ + return 1; \ + } while (0) + +static int total_tests = 0; +static int failures = 0; + +int test_basic_write_read() { + int fd = open("test1.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open create"); + + char wbuf[] = "Hello"; + if (write(fd, wbuf, 5) != 5) + FAIL("write"); + if (close(fd) != 0) + FAIL("close after write"); + + fd = open("test1.txt", O_RDONLY); + if (fd < 0) + FAIL("open readonly"); + + char rbuf[6] = {0}; + if (read(fd, rbuf, 5) != 5) + FAIL("read"); + if (strcmp(rbuf, "Hello") != 0) + FAIL("data mismatch"); + if (close(fd) != 0) + FAIL("close after read"); + + unlink("test1.txt"); + PASS(); + return 0; +} + +int test_multiple_writes() { + int fd = open("test2.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + if (write(fd, "First ", 6) != 6) + FAIL("write 1"); + if (write(fd, "Second ", 7) != 7) + FAIL("write 2"); + if (write(fd, "Third", 5) != 5) + FAIL("write 3"); + close(fd); + + fd = open("test2.txt", O_RDONLY); + char buf[20] = {0}; + if (read(fd, buf, 18) != 18) + FAIL("read"); + if (strcmp(buf, "First Second Third") != 0) + FAIL("content mismatch"); + close(fd); + + unlink("test2.txt"); + PASS(); + return 0; +} + +int test_partial_reads() { + int fd = open("test3.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + char data[] = "0123456789"; + if (write(fd, data, 10) != 10) + FAIL("write"); + close(fd); + + fd = open("test3.txt", O_RDONLY); + char buf1[5], buf2[5], buf3[5]; + if (read(fd, buf1, 3) != 3) + FAIL("read 1"); + if (read(fd, buf2, 4) != 4) + FAIL("read 2"); + if (read(fd, buf3, 3) != 3) + FAIL("read 3"); + + if (memcmp(buf1, "012", 3) != 0) + FAIL("chunk 1"); + if (memcmp(buf2, "3456", 4) != 0) + FAIL("chunk 2"); + if (memcmp(buf3, "789", 3) != 0) + FAIL("chunk 3"); + close(fd); + + unlink("test3.txt"); + PASS(); + return 0; +} + +int test_read_past_eof() { + int fd = open("test4.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "short", 5); + close(fd); + + fd = open("test4.txt", O_RDONLY); + char buf[100]; + ssize_t n = read(fd, buf, 100); + if (n != 5) + FAIL("read past EOF should return actual bytes"); + + n = read(fd, buf, 100); + if (n != 0) + FAIL("read at EOF should return 0"); + close(fd); + + unlink("test4.txt"); + PASS(); + return 0; +} + +int test_write_expands_file() { + int fd = open("test5.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "ABC", 3); + write(fd, "DEF", 3); + write(fd, "GHI", 3); + close(fd); + + fd = open("test5.txt", O_RDONLY); + char buf[10] = {0}; + if (read(fd, buf, 9) != 9) + FAIL("read expanded file"); + if (strcmp(buf, "ABCDEFGHI") != 0) + FAIL("expanded file content"); + close(fd); + + unlink("test5.txt"); + PASS(); + return 0; +} + +int test_lseek_seek_set() { + int fd = open("test6.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "0123456789", 10); + + if (lseek(fd, 0, SEEK_SET) != 0) + FAIL("lseek to start"); + char buf[3]; + read(fd, buf, 2); + if (memcmp(buf, "01", 2) != 0) + FAIL("read from start"); + + if (lseek(fd, 5, SEEK_SET) != 5) + FAIL("lseek to offset 5"); + read(fd, buf, 2); + if (memcmp(buf, "56", 2) != 0) + FAIL("read from offset 5"); + + close(fd); + unlink("test6.txt"); + PASS(); + return 0; +} + +int test_lseek_seek_cur() { + int fd = open("test7.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "0123456789", 10); + lseek(fd, 0, SEEK_SET); + + read(fd, NULL, 3); // advance to position 3 + + if (lseek(fd, 2, SEEK_CUR) != 5) + FAIL("lseek forward from current"); + char buf[3]; + read(fd, buf, 2); + if (memcmp(buf, "56", 2) != 0) + FAIL("read after seek_cur forward"); + + if (lseek(fd, -4, SEEK_CUR) != 3) + FAIL("lseek backward from current"); + read(fd, buf, 2); + if (memcmp(buf, "34", 2) != 0) + FAIL("read after seek_cur backward"); + + close(fd); + unlink("test7.txt"); + PASS(); + return 0; +} + +int test_lseek_seek_end() { + int fd = open("test8.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "0123456789", 10); + + if (lseek(fd, 0, SEEK_END) != 10) + FAIL("lseek to end"); + if (lseek(fd, -3, SEEK_END) != 7) + FAIL("lseek from end"); + + char buf[4]; + read(fd, buf, 3); + if (memcmp(buf, "789", 3) != 0) + FAIL("read from end offset"); + + close(fd); + unlink("test8.txt"); + PASS(); + return 0; +} + +int test_lseek_beyond_eof() { + int fd = open("test9.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "data", 4); + + if (lseek(fd, 10, SEEK_SET) != 10) + FAIL("lseek beyond EOF"); + if (write(fd, "X", 1) != 1) + FAIL("write after seek beyond EOF"); + + close(fd); + + fd = open("test9.txt", O_RDONLY); + char buf[12]; + ssize_t n = read(fd, buf, 11); + if (n != 11) + FAIL("read file with hole"); + if (memcmp(buf, "data", 4) != 0) + FAIL("data before hole"); + // Check that bytes 4-9 are zeros (the hole) + for (int i = 4; i < 10; i++) { + if (buf[i] != 0) + FAIL("hole not zero-filled"); + } + if (buf[10] != 'X') + FAIL("data after hole"); + + close(fd); + unlink("test9.txt"); + PASS(); + return 0; +} + +int test_overwrite_data() { + int fd = open("test10.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + write(fd, "AAAAAAAAAA", 10); + lseek(fd, 3, SEEK_SET); + write(fd, "BBBB", 4); + + close(fd); + + fd = open("test10.txt", O_RDONLY); + char buf[11] = {0}; + read(fd, buf, 10); + if (strcmp(buf, "AAABBBBAAA") != 0) + FAIL("overwritten data mismatch"); + + close(fd); + unlink("test10.txt"); + PASS(); + return 0; +} + +int test_append_mode() { + int fd = open("test11.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + write(fd, "Initial", 7); + close(fd); + + fd = open("test11.txt", O_WRONLY | O_APPEND); + if (fd < 0) + FAIL("open append"); + write(fd, " Data", 5); + close(fd); + + fd = open("test11.txt", O_RDONLY); + char buf[20] = {0}; + read(fd, buf, 12); + if (strcmp(buf, "Initial Data") != 0) + FAIL("append content"); + close(fd); + + unlink("test11.txt"); + PASS(); + return 0; +} + +int test_append_ignores_lseek() { + int fd = open("test12.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + write(fd, "12345", 5); + close(fd); + + fd = open("test12.txt", O_WRONLY | O_APPEND); + if (fd < 0) + FAIL("open append"); + + lseek(fd, 0, SEEK_SET); // Try to seek to beginning + write(fd, "67890", 5); // Should still append to end + close(fd); + + fd = open("test12.txt", O_RDONLY); + char buf[11] = {0}; + read(fd, buf, 10); + if (strcmp(buf, "1234567890") != 0) + FAIL("append should ignore lseek"); + close(fd); + + unlink("test12.txt"); + PASS(); + return 0; +} + +int test_empty_file() { + int fd = open("test13.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + close(fd); + + fd = open("test13.txt", O_RDONLY); + char buf[10]; + ssize_t n = read(fd, buf, 10); + if (n != 0) + FAIL("read from empty file should return 0"); + close(fd); + + unlink("test13.txt"); + PASS(); + return 0; +} + +int test_write_zero_bytes() { + int fd = open("test14.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + ssize_t n = write(fd, "data", 0); + if (n != 0) + FAIL("write 0 bytes should return 0"); + + close(fd); + + fd = open("test14.txt", O_RDONLY); + char buf[10]; + n = read(fd, buf, 10); + if (n != 0) + FAIL("file should be empty after 0-byte write"); + close(fd); + + unlink("test14.txt"); + PASS(); + return 0; +} + +int test_multiple_open_same_file() { + int fd1 = open("test15.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd1 < 0) + FAIL("open 1"); + + int fd2 = open("test15.txt", O_RDWR); + if (fd2 < 0) + FAIL("open 2"); + + write(fd1, "AAA", 3); + write(fd2, "BBB", 3); + + close(fd1); + close(fd2); + + int fd = open("test15.txt", O_RDONLY); + char buf[7] = {0}; + read(fd, buf, 6); + + if (strcmp(buf, "AAABBB") != 0 && strcmp(buf, "BBB") != 0) { + FAIL("unexpected content with multiple fds"); + } + close(fd); + + unlink("test15.txt"); + PASS(); + return 0; +} + +int test_rdonly_write_fails() { + int fd = open("test16.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open create"); + write(fd, "data", 4); + close(fd); + + fd = open("test16.txt", O_RDONLY); + if (fd < 0) + FAIL("open rdonly"); + + ssize_t n = write(fd, "x", 1); + if (n >= 0) + FAIL("write to O_RDONLY should fail"); + + close(fd); + unlink("test16.txt"); + PASS(); + return 0; +} + +int test_wronly_read_fails() { + int fd = open("test17.txt", O_CREAT | O_WRONLY | O_TRUNC, 0644); + if (fd < 0) + FAIL("open wronly"); + + char buf[10]; + ssize_t n = read(fd, buf, 10); + if (n >= 0) + FAIL("read from O_WRONLY should fail"); + + close(fd); + unlink("test17.txt"); + PASS(); + return 0; +} + +int test_large_write_read() { + int fd = open("test18.txt", O_CREAT | O_RDWR | O_TRUNC, 0644); + if (fd < 0) + FAIL("open"); + + char wbuf[4096]; + for (int i = 0; i < 4096; i++) { + wbuf[i] = (char)(i % 256); + } + + if (write(fd, wbuf, 4096) != 4096) + FAIL("large write"); + close(fd); + + fd = open("test18.txt", O_RDONLY); + + char rbuf[4096]; + + if (read(fd, rbuf, 4096) != 4096) + FAIL("large read"); + + if (memcmp(wbuf, rbuf, 4096) != 0) + FAIL("large data mismatch"); + close(fd); + + unlink("test18.txt"); + PASS(); + return 0; +} + +int main(int argc, char *argv[]) { + int failures = 0; + + // Basic operations + + test_basic_write_read(); + test_multiple_writes(); + test_partial_reads(); + test_read_past_eof(); + test_write_expands_file(); + + // lseek tests + test_lseek_seek_set(); + test_lseek_seek_cur(); + test_lseek_seek_end(); + test_lseek_beyond_eof(); + + // Overwrite and append + test_overwrite_data(); + test_append_mode(); + test_append_ignores_lseek(); + + // Edge cases + test_empty_file(); + test_write_zero_bytes(); + test_multiple_open_same_file(); + + // Error conditions + test_rdonly_write_fails(); + test_wronly_read_fails(); + + // Large data + test_large_write_read(); + + printf("\n====================================\n"); + printf("%d/%d Tests Passed.\n", total_tests - failures, total_tests); + printf("====================================\n"); + + return failures > 0 ? 1 : 0; +}