[RFC] kernel/process: switch ProcessId to use u64 unique identifier#4777
[RFC] kernel/process: switch ProcessId to use u64 unique identifier#4777lschuermann wants to merge 1 commit intomasterfrom
Conversation
This switches the unique numeric identifier of a process instance within the ProcessId type (colloquially also called process ID) to be an 8-byte unsigned integer, instead of usize. Additionally, it changes the kernel to panic on the offchance that the counter used for assigning new process IDs rolls over. This change is motivated by recent discussions around new IPC mechanisms. These mechanisms consist of a "discovery" phase, where applications search for, and potentially authenticate a given process, and a "communication" phase, where they actually exchange data between processes. The discovery phase emits a unique process handle that a subsequent communication phase will use to uniquely identify a previously discovered and authenticated process. Virtually the only good candidate for such a handle on a Tock system is the numeric process ID, as it refers to a particular instance of an application, and does not require us to keep track of extra state or mappings. However, depending on which architecture we are running on, this numeric process ID will be generated based on a counter that is either 4 or 8 byte wide (`usize`). While Tock systems generally are limited in the number of applications they can concurrently run, this identifier instead refers to an _instance_ of a process. As such, this counter can be controlled and increased by processes themselves. If this counter is based on a 4 byte value it is unlikely, but not impossible, that malicious or colluding processes could force it to overflow to a known, previous assigned value within a somewhat reasonable timeframe (multiple days). We should not prevent such overflows by panicing the kernel, or preventing process restarts, as that endangers availability of the system, esp. when deployed in critical and long running scenarios. At the same time, while unlikely, explicitly permitting counter overflows forces us to think about, model, and either handle or explicitly ignore many types of attacks (e.g., confused deputy) that could arise from such a forced counter overflow. This is not just affecting IPC, but any kernel or userspace subsystem that explicitly or implicitly relies on process ID uniqueness. This instead proposes to make this numeric counter consistent across platforms and debug/release builds: it is always an 8 byte unsigned integer, and in constrast to the previous implementation (which only panics on debug builds) an overflow of this counter always panics the kernel. However, given the astoronomical size of this counter, it is virtually impossible that a correct kernel implementation will ever reach this case. This does come with some drawbacks: I don't believe permformance will be impacted significantly (comparision and increments of 64 bit numbers is decently efficient even on 32 bit systems). However, it does add a single word of memory everywhere that a Process ID is used. It also forces backwards-incompatible changes to a few userspace APIs (legacy IPC being a notable example, although we're in the process of removing that).
It's not clear to me why an app would want to use IPC to communicate with a process and not an application. As an app author, I would want to communicate with a service, and I don't care if the service restarts.
I'm surprised you didn't mention I'm wary of adding another state variable in Tock that has security implications. |
If the service restarts, everything you previous coordinated with it is gone and you've got to restart. So I'd argue that you definitely care. Similarly if a client restarts, a response destined for it is no longer relevant. |
Well, that's if the service makes it my problem to care. Any maybe I get an error that I need to handle, and the only way to handle that error is to re-initialize, etc. That's OK I suppose. Maybe I was too quick, we could have services, apps, or processes. Right now we have services, which I think makes sense. I can also see using apps. I don't quite see wanting to use processes, at least not in the Tock use case. |
We do have a policy about what happens on an app crash, and a system worried about this should just not keep restarting any app that crashes too many times. |
I do think it might make sense to standardize this, and I would choose a |
Echoing @brghena's point here: we're interested in identifying a particular process instance to keep state related to this instance. While initially motivated by IPC, we also use ProcessId for these purposes in the kernel (e.g., as part of grants).
However, for many use-cases (including IPC),
ProcessId instead is a handle to a given process instance, and as such is somewhat orthogonal to application IDs. Without going too much into the details of how new IPC systems should work in Tock, they will likely use both AppId and ProcessID. Specifically, they will want to have one stable, persistent, and authenticated identifier uniquely associated with a given application, and one identifier that can be used for actual transactions, referring to a particular process instance, in a way that is efficient to look up.
This is exactly what I am trying to bring to our attention here. I'm worried that we are currently using ProcessId assuming uniqueness properties that don't actually exist. Such a mismatch commonly leads to serious security vulnerabilities known as "confused deputy attacks". The current (and likely future) IPC mechanisms are a great example of this, and I'm almost certain we find others in the kernel. This PR is trying to rectify this mismatch between assumed and actual guarantees. Notably, other larger kernels (like Linux) don't have this issue and can recycle PIDs, because they keep extra, unique state around for tracking shared IPC resources between processes. Naïvely, this is in conflict with Tock's heapless kernel architecture.
I'm not entirely opposed, but we should think about the implications this has on Tock's threat model (specifically concerning availability) if we limit the total number of application (re)starts to Instead, I'm more in favor of evaluating the actual overheads of using |
|
Is it difficult to evaluate the overhead of this change? If some of the boards already build with this PR, then we should be able to check the binary sizes to see how much they've bloated, correct? If the code size bloat isn't significant then I would not expect the runtime increase to be significant. Unless there's a lot of code size overhead, I think 64-bit is the way to go. |
|
Discussed on call today that this does actually have a larger code size impact than expected: ~500 bytes. Leon is going to work on this to see if that can trivially be reduced. |
This is not the case. tock/doc/reference/trd-appid.md Lines 57 to 58 in 78c76c7
I disagree.
Today, I agree, however, with:
The separation is much less clear.
I don't believe this is quite true. We do guarantee
I'm not sure I understand the need to allow a process (or processes) to intentionally restart itself (themselves) billions of times. However, if it is important that we just eliminate this concern and we manage the overhead in a way we find suitable, I'm not that opposed to making the counter a What I am pushing back on is then using that change as an indicator we should use |
|
While AppID requires that no two running processes share the same application ID, it does not uniquely identify a given process. There can be multiple, different process instances spawned by the same application over time, all carrying the same application ID. As such, while AppID is an appropriate tool to govern access to persistent state shared by all process instances of an application over time, it is not appropriate to address a particular process instance and its inherent, implicit ephemeral state. However, reliably addressing a particular process instance is important for many mechanisms both in the kernel and in userspace. For example, sensitive kernel APIs to interact with processes (such as scheduling upcalls or terminating them) relate not to applications, but to given process instances, and hence take ProcessIds instead of application IDs. For example, when a process starts a security-sensitive operation, and a capsule later wants to inform that particular process instance of the result of that operation, it will use its ProcessId to schedule a callback. If it is possible that this ProcessId may now refer to another process instance, perhaps even of another application altogether, then we would leak sensitive data back to another process. Similar issues exist around process management, etc. This makes ProcessIds security sensitive; we need to carefully reason about which guarantees they give us and about how we use them in practice. I agree that we need to carefully consider when to use AppID or ProcessId, and the relationship between them, as they have fundamentally different use cases, semantics, and guarantees. As an aside: you might be wondering why other OSes don't run into similar issues. E.g., on Linux, PIDs can be recycled, so why isn't that problematic? The answer is twofold: on the one hand, this is a real problem in practice. PID reuse is a well-known security-relevant issue that causes real vulnerabilities. On the other hand, Linux uses properly unique identifiers within the kernel itself to refer to process-related resources (pointers to kernel objects). In fact, the solution to the aforementioned PID reuse issues is to extend this notion of unique kernel-backed identifiers into userspace in the form of Also, the following is incorrect:
Currently, we only (accidentally?) perform a panic-on-overflow in debug builds, while in release builds we silently wrap around. If we were to commit to a panic-on-overflow approach that guarantees Process ID uniqueness over a kernel instance's lifetime, I want us to be aware of, and document the implications that this has on availability of the Tock kernel, in particular in the face of adversarial processes. |
|
I think the following could occur if IPC were to use AppID as a handle:
I think there's also a worse case:
Now, that's less bad than it sounds, as for some reason both Service A and B shared an AppID, so presumably we would equally trust them. |
|
To be clear, that's not a statement that ProcessIDs are good. If they're problematic enough, we shouldn't use them for IPC. But I think AppID or ShortID doesn't meet our handle needs either, unless I'm mistaken. That's mostly why I'm listing scenarios: so you can correct me if I'm wrong. |
Certainly something has to handle process crashes. But rather than discuss ProcessIds vs. ShortIds vs. SomethingElse, I'd be more interested in deciding what conceptual model we are using, and then mapping that to an implementation. I would say currently IPC uses a named service model, and it seems to me like the service itself should be responsible for maintaining continuity of functionality. Or, the restarted service could respond with a "service failed, please restart" error. Alternatively, the kernel knows if a process restarts, and could maybe somehow inject that error on to the using process. Or, we decide the failure is entirely the problem of the using process.
Those are then the same service, without some other identifier for services. This is consistent with IPC today. |
Ah, yes, thank you for correcting that.
We can make this even better by implementing I do agree we need |
Pull Request Overview
This switches the unique numeric identifier of a process instance within the ProcessId type (colloquially also called process ID) to be an 8-byte unsigned integer, instead of usize. Additionally, it changes the kernel to panic on the offchance that the counter used for assigning new process IDs rolls over.
This change is motivated by recent discussions around new IPC mechanisms. These mechanisms consist of a "discovery" phase, where applications search for, and potentially authenticate a given process, and a "communication" phase, where they actually exchange data between processes. The discovery phase emits a unique process handle that a subsequent communication phase will use to uniquely identify a previously discovered and authenticated process. Virtually the only good candidate for such a handle on a Tock system is the numeric process ID, as it refers to a particular instance of an application, and does not require us to keep track of extra state or mappings.
However, depending on which architecture we are running on, this numeric process ID will be generated based on a counter that is either 4 or 8 byte wide (
usize). While Tock systems generally are limited in the number of applications they can concurrently run, this identifier instead refers to an instance of a process. As such, this counter can be controlled and increased by processes themselves. If this counter is based on a 4 byte value it is unlikely, but not impossible, that malicious or colluding processes could force it to overflow to a known, previous assigned value within a somewhat reasonable timeframe (multiple days).We should not prevent such overflows by panicing the kernel, or preventing process restarts, as that endangers availability of the system, esp. when deployed in critical and long running scenarios.
At the same time, while unlikely, explicitly permitting counter overflows forces us to think about, model, and either handle or explicitly ignore many types of attacks (e.g., confused deputy) that could arise from such a forced counter overflow. This is not just affecting IPC, but any kernel or userspace subsystem that explicitly or implicitly relies on process ID uniqueness.
This PR instead proposes to make this numeric counter consistent across platforms and debug/release builds: it is always an 8 byte unsigned integer, and in contrast to the previous implementation (which only panics on debug builds) an overflow of this counter always panics the kernel. However, given the astronomical size of this counter, it is virtually impossible that a correct kernel implementation will ever reach this case.
This does come with some drawbacks: I don't believe performance will be impacted significantly (comparsion and increments of 64 bit numbers is decently efficient even on 32 bit systems). However, it does add a single word of memory everywhere that a Process ID is used. It also forces backwards-incompatible changes to a few userspace APIs (legacy IPC being a notable example, although we're in the process of removing that).
I want to open this RFC to start a discussion about this change, whether or not it is a good idea, and what a possible pathway for integrating it into the kernel could be. This does not actively block any IPC development, but we might design our interfaces to expect that we will be passing around
u64instead ofusizevalues.Testing Strategy
This PR is not tested.
TODO or Help Wanted
The PR needs
Documentation Updated
/docs, or no updates are required.Formatting
make prepush.AI Use
code in this PR, if any, and I have manually checked and
personally certify the entire contents of this PR.
Wrote this myself.