Rack-scale Networking
Update: 2023-02-28
Description
Bryan and Adam are joined by a number of members of the Oxide networking team to talk about the networking software that drives the Oxide rack. It turns out that rack-scale networking is hard... and has enormous benefits!
We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from February 27th, 2023.
In addition to Bryan Cantrill and Adam Leventhal, speakers included Ryan Goodfellow, Levon Tarver, Ben Naecker, and Arjen Roodselaar.
Links
- Intel Tofino Series
- P4 (programming language) - Wikipedia
- p4lang/p4c: P4_16 reference compiler
- oxidecomputer/p4: A P4 compiler
- The quote crate: Rust quasi-quoting
- RIFT WG - Routing In Fat Trees | IETF Community Wiki
Here's (much of) the live chat from the show:
- ahl https://github.com/oxidecomputer/oxide-and-friends/blob/master/2021_11_29.md
- ahl That's the Sidecar switch episode
- bcantrill https://p4.org/
- admchl What does "at line rate" mean?
- Riking Line rate = As fast as the packets could possibly come. 1Gbit, 10Gbit, 100Gbit, etc
- admchl Do you need ASICs to hit that speed? I assume x86_64 is not going to be fast enough for these specialised operations?
- levon Yes, the Tofino 2 is the ASIC
- bcantrill You need ASICs
- bnaecker Yes, you really can't do these kinds of operations on a general purpose CPU.
- rng_drizzt Yeah, you need specialized silicon here.
- JustinAzoff Right, also often across all ports at the same time in both direction. a 48 port 10gbps switch will have a line rate of 960gbps (10 ** 48 ** 2)
- duckman So the advantage is being able to offload compute to the switch?
- bnaecker Yes, and specifically that you can separate the data plane (operations on the packets) from the control plane (decisions about what operations to allow or make).
- tahnok What's TCAM?
- levon Ternary Content Addressable Memory
- bnaecker https://en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs
- ryaeng Sure beats logging into a number of Cisco switches and making changes at the console.
- admchl This is my favourite episode in a long time, this is all really fascinating.
- rng_drizzt the first Sidecar episode was nearly 1.5 years ago ü§Ø , right after we cut the first rev
- levon That episode blew my mind
- duckman This sounds like a big deal on the scale of ebpf
- duckman Or bigger
- bnaecker It is extremely useful for understanding the processing pipelines. As long as you only run single-packet integration tests üôÇ
- od0 just want to go out and find things to write P4 code for
- JustinAzoff <@354365572554948608> yeah one way to think about that sort of thing is that xdp can be used to run little programs on a nic, where p4 is kind of like that, but running on effectively a nic with 48+ ports
- bcantrill https://github.com/oxidecomputer/p4
- SyntheticGate sidecar is the "codename" of our switch box
- SyntheticGate "gimlet" is our server sled
- bcantrill https://github.com/oxidecomputer/propolis
- wmf So you have P4 and OPTE in the hypervisor at the same time?
- bnaecker OPTE is in the host kernel.
- arjenroodselaar The P4 runtime Ry described only exists in the test bed, where it high level simulates the switches. OPTE is part of the production environment.
- arjenroodselaar The rough difference between P4 and OPTE is that P4 works on individual packets without much concept of a session (so it can't reason about TCP streams, packet order etc, so no firewall like functionality), while OPTE aims to operate on streams of packets.
- JustinAzoff So you can run 100 VMs on a test system and wire them up to your virtual switch compiled by x4c?
- arjenroodselaar Correct.
- bcantrill OPTE == Oxide Packet Transformation Engine
- admchl Gimlet?
- rng_drizzt Compute server
- rng_drizzt The Sidecar switch is actually just a PCIe peripheral to a Gimlet.
- bnaecker The Gimlet managing the Sidecar is often called a "Scrimlet" for "Sidecar attached Gimlet"
- Riking and "how do i reconfigure this giant network without hosing my ability to reconfigure this giant network"
- ShaunO can identify with that - we seriously struggle to keep our own products inter-operating, let alone anyone else's
- levon It can feel like a Sisyphean task.
- a172 Setup a much smaller/simpler network in parallel that is accessible from "not your network" that gets you to the management interface.
- levon It's a whole new world when you can look at the actual table definitions in P4
- rng_drizzt Owning all the layers here is immensely beneficial
- levon Those DTrace probes have been very helpful
- bnaecker Those probes turned out to be everywhere. They are are in: SQL queries, HTTP queries, log messages, Propolis hypervisor state, virtual storage system, networking protocol messages, the P4 emulator, and probably more that I'm forgetting about.
- levon For those unfamiliar with the DTrace tool, or the rationale behind leveraging DTrace over other tracing / debugging tools: https://www.cs.princeton.edu/courses/archive/fall05/cos518/papers/dtrace.pdf
- bcantrill https://github.com/oxidecomputer/progenitor
- ahl some notes on rust codegen: https://github.com/ahl/codegen-template
- arjenroodselaar DDM! Bring us home!
- a172 it astonishes me how many "cloud" type architectures are built on v4 only or v4 first.
- a172 IPv6 is older than Wi-Fi
- a172 It solves real problems. PLEASE use it.
- nyanotech yessss fina...
Comments