1// Copyright 2023 The Fuchsia Authors. All rights reserved.
2// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
45//! Lock ordering for Netstack3 core.
6//!
7//! This module contains the "lock ordering" for Netstack3 core: it describes
8//! the order in which additional locks can be acquired while other locks are
9//! held. In general, code that is written to avoid deadlocks must respect the
10//! same lock ordering.
11//!
12//! Netstack3 core relies on the [`lock_order`] crate to ensure that all
13//! possible code paths respect the lock ordering defined here. By leveraging
14//! the types and traits from `lock_order`, we can guarantee at compile time
15//! that there are no opportunities for deadlock. The way this works is that
16//! each lock in [`CoreCtx`] and its associated per-device state gets assigned a
17//! type in this module. Then, for a pair of locks `A` and `B`, where `B` is
18//! allowed to be acquired while `A` is locked, there is a corresponding
19//! declaration of the [`lock_order::relation::LockAfter`] trait (done via the
20//! [`impl_lock_after`] macro).
21//!
22//! Notionally, the lock ordering forms a [directed acyclic graph (DAG)], where
23//! the nodes in the graph are locks that can be acquired, and each edge
24//! represents a `LockAfter` implementation. For a lock `B` to be acquired while
25//! some lock `A` is held, the node for `B` in the graph must be reachable from
26//! `A`, either via direct edge or via an indirect path that traverses
27//! other nodes. This can also be thought of as a [strict partial order] between
28//! locks, where `A < B` means `B` can be acquired with `A` held.
29//!
30//! Ideally we'd represent the lock ordering here that way too, but it doesn't
31//! work well with the blanket impls of `LockAfter` emitted by the `impl_lock_after` macro.
32//! We want to use `impl_lock_after` since it helps prevent deadlocks (see
33//! documentation for [`lock_order::relation`]). The problem can be illustrated
34//! by this reduced example. Suppose we have a lock ordering DAG like this:
35//!
36//! ```text
37//! ┌────────────────────┐
38//! │LoopbackRx │
39//! └┬──────────────────┬┘
40//! ┌▽─────────────┐ ┌─▽────────────┐
41//! │IpState<Ipv4> │ │IpState<Ipv6> │
42//! └┬─────────────┘ └┬─────────────┘
43//! ┌▽─────────────────▽┐
44//! │DevicesState │
45//! └───────────────────┘
46//!```
47//!
48//! With the blanket `LockAfter` impls, we'd get this:
49//! ```no_compile
50//! impl<X> LockAfter<X> for DevicesState where IpState<Ipv4>: LockAfter<X> {}
51//! // and
52//! impl<X> LockAfter<X> for DevicesState where IpState<Ipv6>: LockAfter<X> {}
53//! ```
54//!
55//! Since `X` could be `LoopbackRx`, we'd have duplicate impls of
56//! `LockAfter<LoopbackRx> for DevicesState`.
57//!
58//! To work around this, we pick a different graph for the lock ordering that
59//! won't produce duplicate blanket impls. The property we need is that every
60//! path in the original graph is also present in our alternate graph. Luckily,
61//! graph theory proves that this is always possible: a [topological sort] has
62//! precisely this property, and every DAG has at least one topological sort.
63//! For the graph above, that could look something like this:
64//!
65//! ```text
66//! ┌────────────────────┐
67//! │LoopbackRx │
68//! └┬───────────────────┘
69//! ┌▽─────────────┐
70//! │IpState<Ipv4> │
71//! └┬─────────────┘
72//! ┌▽─────────────┐
73//! │IpState<Ipv6> │
74//! └┬─────────────┘
75//! ┌▽──────────────────┐
76//! │DevicesState │
77//! └───────────────────┘
78//! ```
79//!
80//! Note that every possible lock ordering path in the original graph is present
81//! (directly or transitively) in the second graph. There are additional paths,
82//! like `IpState<Ipv4> -> IpState<Ipv6>`, but those don't matter since
83//! a) if they become load-bearing, they need to be in the original DAG, and
84//! b) load-bearing or not, the result is still a valid lock ordering graph.
85//!
86//! The lock ordering described in this module is likewise a modification of
87//! the ideal graph. Instead of specifying the actual DAG, we make nice to the
88//! Rust compiler so we can use the transitive blanket impls instead. Since the
89//! graph doesn't have multiple paths between any two nodes (that's what causes
90//! the duplicate blanket impls), it ends up being a [multitree]. While we could
91//! turn it into a (linear) total ordering, we keep the tree structure so that
92//! we can more faithfully represent that some locks can't be held at the same
93//! time by putting them on different branches.
94//!
95//! If, in the future, someone comes up with a better way to declare the lock
96//! ordering graph and preserve cycle rejection, we should be able to migrate
97//! the definitions here without affecting the usages.
98//!
99//! [`CoreCtx`]: crate::CoreCtx
100//! [directed acyclic graph (DAG)]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
101//! [strict partial order]: https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_orders
102//! [topological sort]: https://en.wikipedia.org/wiki/Topological_sorting
103//! [multitree]: https://en.wikipedia.org/wiki/Multitree
104105pub(crate) use lock_order::Unlocked;
106107use core::convert::Infallible as Never;
108use core::marker::PhantomData;
109110use lock_order::impl_lock_after;
111use lock_order::relation::LockAfter;
112use net_types::ip::{Ipv4, Ipv6};
113114// This lock level is used to provide unlocked access to stack state that does
115// not require any locking.
116//
117// Implementations of UnlockedAccess must ensure that they do not provide access
118// to any state that is held by a lock, in order not to sidestep the global lock
119// order declared below.
120pub enum UnlockedState {}
121122pub struct IcmpAllSocketsSet<I>(PhantomData<I>, Never);
123pub struct IcmpSocketState<I>(PhantomData<I>, Never);
124pub struct IcmpBoundMap<I>(PhantomData<I>, Never);
125126pub struct IcmpTokenBucket<I>(PhantomData<I>, Never);
127128pub struct TcpAllSocketsSet<I>(PhantomData<I>, Never);
129pub struct TcpSocketState<I>(PhantomData<I>, Never);
130pub struct TcpDemux<I>(PhantomData<I>, Never);
131pub struct UdpAllSocketsSet<I>(PhantomData<I>, Never);
132pub struct UdpSocketState<I>(PhantomData<I>, Never);
133pub struct UdpBoundMap<I>(PhantomData<I>, Never);
134135pub struct IpDeviceConfiguration<I>(PhantomData<I>, Never);
136pub struct IpDeviceGmp<I>(PhantomData<I>, Never);
137pub struct IpDeviceAddresses<I>(PhantomData<I>, Never);
138pub struct IpDeviceFlags<I>(PhantomData<I>, Never);
139pub struct IpDeviceDefaultHopLimit<I>(PhantomData<I>, Never);
140141pub enum Ipv4DeviceAddressState {}
142143pub enum Ipv6DeviceRouterSolicitations {}
144pub enum Ipv6DeviceRouteDiscovery {}
145pub enum Ipv6DeviceLearnedParams {}
146pub enum Ipv6DeviceSlaac {}
147pub enum Ipv6DeviceAddressDad {}
148pub enum Ipv6DeviceAddressState {}
149pub struct NudConfig<I>(PhantomData<I>, Never);
150151// This is not a real lock level, but it is useful for writing bounds that
152// require "before IPv4" or "before IPv6".
153pub struct IpState<I>(PhantomData<I>, Never);
154pub struct IpStatePmtuCache<I>(PhantomData<I>, Never);
155pub struct IpStateFragmentCache<I>(PhantomData<I>, Never);
156pub struct IpStateRulesTable<I>(PhantomData<I>, Never);
157pub struct IpStateRoutingTables<I>(PhantomData<I>, Never);
158pub struct IpStateRoutingTable<I>(PhantomData<I>, Never);
159// Lock level attributed to the state of an individual raw IP sockets.
160pub struct RawIpSocketState<I>(PhantomData<I>, Never);
161// Lock level attributed to the collection of all raw IP sockets.
162pub struct AllRawIpSockets<I>(PhantomData<I>, Never);
163164// Lock level attributed to all multicast forwarding state.
165pub struct IpMulticastForwardingState<I>(PhantomData<I>, Never);
166// Lock level attributed to the multicast routing table.
167pub struct IpMulticastRouteTable<I>(PhantomData<I>, Never);
168// Lock level attributed to the table of pending multicast packets.
169pub struct IpMulticastForwardingPendingPackets<I>(PhantomData<I>, Never);
170171pub enum DeviceLayerState {}
172pub enum AllDeviceSockets {}
173pub enum AnyDeviceSockets {}
174pub enum DeviceSocketState {}
175pub enum DeviceSockets {}
176pub struct EthernetDeviceIpState<I>(PhantomData<I>, Never);
177pub enum EthernetDeviceDynamicState {}
178pub enum PureIpDeviceDynamicState {}
179180pub enum EthernetIpv4Arp {}
181pub enum EthernetIpv6Nud {}
182pub enum EthernetTxQueue {}
183pub enum EthernetTxDequeue {}
184// We do not actually have a dedicated RX queue for ethernet, but we want to have a
185// clear separation between the ethernet layer and above (IP/ARP) without specifying
186// any specific protocol. To do this, we introduce this lock-level to show the
187// "boundary" between ethernet-level RX path work and upper level RX path work.
188//
189// Note that if/when an RX queue is implemented for ethernet, this lock-level may be
190// trivially used.
191pub enum EthernetRxDequeue {}
192193pub enum LoopbackRxQueue {}
194pub enum LoopbackRxDequeue {}
195pub enum LoopbackTxQueue {}
196pub enum LoopbackTxDequeue {}
197198pub enum PureIpDeviceTxQueue {}
199pub enum PureIpDeviceTxDequeue {}
200// Note: Pure IP devices do not have an RX queue. This lock marker exists to
201// provide separation between the "device" layer, and the IP layer. If an RX
202// queue is introduced in the future, this lock-level may be trivially used.
203pub enum PureIpDeviceRxDequeue {}
204205pub struct FilterState<I>(PhantomData<I>, Never);
206207impl LockAfter<Unlocked> for LoopbackTxDequeue {}
208impl_lock_after!(LoopbackTxDequeue => EthernetTxDequeue);
209impl_lock_after!(EthernetTxDequeue => PureIpDeviceTxDequeue);
210impl_lock_after!(PureIpDeviceTxDequeue => LoopbackRxDequeue);
211impl_lock_after!(LoopbackRxDequeue => EthernetRxDequeue);
212impl_lock_after!(EthernetRxDequeue => PureIpDeviceRxDequeue);
213impl_lock_after!(PureIpDeviceRxDequeue => IcmpAllSocketsSet<Ipv4>);
214impl_lock_after!(IcmpAllSocketsSet<Ipv4> => IcmpAllSocketsSet<Ipv6>);
215impl_lock_after!(IcmpAllSocketsSet<Ipv6> => IcmpSocketState<Ipv4>);
216impl_lock_after!(IcmpSocketState<Ipv4> => IcmpBoundMap<Ipv4>);
217impl_lock_after!(IcmpBoundMap<Ipv4> => IcmpTokenBucket<Ipv4>);
218impl_lock_after!(IcmpTokenBucket<Ipv4> => IcmpSocketState<Ipv6>);
219impl_lock_after!(IcmpSocketState<Ipv6> => IcmpBoundMap<Ipv6>);
220impl_lock_after!(IcmpBoundMap<Ipv6> => IcmpTokenBucket<Ipv6>);
221impl_lock_after!(IcmpTokenBucket<Ipv6> => AllRawIpSockets<Ipv4>);
222impl_lock_after!(AllRawIpSockets<Ipv4> => AllRawIpSockets<Ipv6>);
223impl_lock_after!(AllRawIpSockets<Ipv6> => RawIpSocketState<Ipv4>);
224impl_lock_after!(RawIpSocketState<Ipv4> => RawIpSocketState<Ipv6>);
225impl_lock_after!(RawIpSocketState<Ipv6> => TcpAllSocketsSet<Ipv4>);
226227// Ideally we'd have separate impls `LoopbackRxDequeue =>
228// TcpAllSocketsSet<Ipv4>` and for `Ipv6`, but that doesn't play well with the
229// blanket impls. Linearize IPv4 and IPv6, and TCP and UDP, like for `IpState`
230// below.
231impl_lock_after!(TcpAllSocketsSet<Ipv4> => TcpAllSocketsSet<Ipv6>);
232impl_lock_after!(TcpAllSocketsSet<Ipv6> => TcpSocketState<Ipv4>);
233impl_lock_after!(TcpSocketState<Ipv4> => TcpSocketState<Ipv6>);
234impl_lock_after!(TcpSocketState<Ipv6> => TcpDemux<Ipv4>);
235impl_lock_after!(TcpDemux<Ipv4> => TcpDemux<Ipv6>);
236impl_lock_after!(TcpDemux<Ipv6> => UdpAllSocketsSet<Ipv4>);
237impl_lock_after!(UdpAllSocketsSet<Ipv4> => UdpAllSocketsSet<Ipv6>);
238impl_lock_after!(UdpAllSocketsSet<Ipv6> => UdpSocketState<Ipv4>);
239impl_lock_after!(UdpSocketState<Ipv4> => UdpSocketState<Ipv6>);
240impl_lock_after!(UdpSocketState<Ipv6> => UdpBoundMap<Ipv4>);
241impl_lock_after!(UdpBoundMap<Ipv4> => UdpBoundMap<Ipv6>);
242impl_lock_after!(UdpBoundMap<Ipv6> => IpMulticastForwardingState<Ipv4>);
243impl_lock_after!(IpMulticastForwardingState<Ipv4> => IpMulticastRouteTable<Ipv4>);
244impl_lock_after!(IpMulticastRouteTable<Ipv4> => IpMulticastForwardingPendingPackets<Ipv4>);
245impl_lock_after!(IpMulticastForwardingPendingPackets<Ipv4> => IpMulticastForwardingState<Ipv6>);
246impl_lock_after!(IpMulticastForwardingState<Ipv6> => IpMulticastRouteTable<Ipv6>);
247impl_lock_after!(IpMulticastRouteTable<Ipv6> => IpMulticastForwardingPendingPackets<Ipv6>);
248impl_lock_after!(IpMulticastForwardingPendingPackets<Ipv6> => IpStateRulesTable<Ipv4>);
249impl_lock_after!(IpStateRulesTable<Ipv4> => IpStateRulesTable<Ipv6>);
250impl_lock_after!(IpStateRulesTable<Ipv6> => IpStateRoutingTables<Ipv4>);
251impl_lock_after!(IpStateRoutingTables<Ipv4> => IpStateRoutingTables<Ipv6>);
252impl_lock_after!(IpStateRoutingTables<Ipv6> => IpStateRoutingTable<Ipv4>);
253impl_lock_after!(IpStateRoutingTable<Ipv4> => IpStateRoutingTable<Ipv6>);
254impl_lock_after!(IpStateRoutingTable<Ipv6> => IpDeviceConfiguration<Ipv4>);
255impl_lock_after!(IpDeviceConfiguration<Ipv4> => IpDeviceConfiguration<Ipv6>);
256impl_lock_after!(IpDeviceConfiguration<Ipv6> => DeviceLayerState);
257impl_lock_after!(DeviceLayerState => Ipv6DeviceRouteDiscovery);
258impl_lock_after!(Ipv6DeviceRouteDiscovery => Ipv6DeviceSlaac);
259impl_lock_after!(Ipv6DeviceSlaac => Ipv6DeviceAddressDad);
260impl_lock_after!(Ipv6DeviceAddressDad => IpDeviceGmp<Ipv4>);
261262impl_lock_after!(IpDeviceGmp<Ipv4> => IpDeviceGmp<Ipv6>);
263impl_lock_after!(IpDeviceGmp<Ipv6> => FilterState<Ipv4>);
264265impl_lock_after!(FilterState<Ipv4> => FilterState<Ipv6>);
266impl_lock_after!(FilterState<Ipv6> => IpState<Ipv4>);
267impl_lock_after!(IpState<Ipv4> => IpState<Ipv6>);
268269impl_lock_after!(IpState<Ipv4> => IpStatePmtuCache<Ipv4>);
270impl_lock_after!(IpState<Ipv6> => IpStatePmtuCache<Ipv6>);
271impl_lock_after!(IpState<Ipv4> => IpStateFragmentCache<Ipv4>);
272impl_lock_after!(IpState<Ipv6> => IpStateFragmentCache<Ipv6>);
273274impl_lock_after!(IpState<Ipv6> => LoopbackTxQueue);
275impl_lock_after!(LoopbackTxQueue => LoopbackRxQueue);
276impl_lock_after!(LoopbackTxQueue => EthernetIpv4Arp);
277impl_lock_after!(EthernetIpv4Arp => EthernetIpv6Nud);
278impl_lock_after!(EthernetIpv6Nud => AllDeviceSockets);
279280impl_lock_after!(AllDeviceSockets => AnyDeviceSockets);
281impl_lock_after!(AnyDeviceSockets => EthernetDeviceIpState<Ipv4>);
282impl_lock_after!(EthernetDeviceIpState<Ipv4> => IpDeviceAddresses<Ipv4>);
283impl_lock_after!(IpDeviceAddresses<Ipv4> => IpDeviceAddresses<Ipv6>);
284impl_lock_after!(IpDeviceAddresses<Ipv6> => IpDeviceFlags<Ipv4>);
285impl_lock_after!(IpDeviceFlags<Ipv4> => IpDeviceFlags<Ipv6>);
286impl_lock_after!(IpDeviceFlags<Ipv6> => Ipv4DeviceAddressState);
287impl_lock_after!(Ipv4DeviceAddressState => Ipv6DeviceAddressState);
288impl_lock_after!(Ipv6DeviceAddressState => IpDeviceDefaultHopLimit<Ipv4>);
289impl_lock_after!(IpDeviceDefaultHopLimit<Ipv4> => EthernetDeviceIpState<Ipv6>);
290impl_lock_after!(EthernetDeviceIpState<Ipv6> => IpDeviceDefaultHopLimit<Ipv6>);
291impl_lock_after!(IpDeviceDefaultHopLimit<Ipv6> => Ipv6DeviceRouterSolicitations);
292impl_lock_after!(Ipv6DeviceRouterSolicitations => Ipv6DeviceLearnedParams);
293impl_lock_after!(Ipv6DeviceLearnedParams => NudConfig<Ipv4>);
294impl_lock_after!(NudConfig<Ipv4> => NudConfig<Ipv6>);
295impl_lock_after!(NudConfig<Ipv6> => EthernetDeviceDynamicState);
296impl_lock_after!(EthernetDeviceDynamicState => EthernetTxQueue);
297impl_lock_after!(EthernetTxQueue => PureIpDeviceDynamicState);
298impl_lock_after!(PureIpDeviceDynamicState => PureIpDeviceTxQueue);
299300impl_lock_after!(AnyDeviceSockets => DeviceSockets);
301impl_lock_after!(DeviceSockets => DeviceSocketState);