Let's say you have a fiber optic line running between two buildings, or between two spaces you rent in the same building. You use trunk ports on the switches connected to the fiber, in order to "stretch" your L2 Ethernet network and its different VLANs, so computers, servers, printers, cameras, etc. in both offices easily communicate with their peers.
But there is next to no physical security in the shared wiring closets and common cabling paths, and so you are concerned about someone tapping into the fiber line and capturing data.
First you may ask, is that really possible? And if so, is it easy to do? Would you need NSA-level hardware and skills or could anyone do it?
Turns out, it is both possible and easy, so you're right to worry! But there is a solution...
Tapping on copper cables
It's so simple to tap on copper network cables that back in the 80s, it was the official way to expand your 10BASE5 Ethernet network. IT people used so-called "vampire taps" that truly bit through the insulation of the cable to make contact with the conductors inside, enabling a new connection without interrupting ongoing data transmission.
This TikTok video shows one of these devices in action.
Nowadays, people might still use copper cables instead of fiber for distances < 100m, and it's a bit more difficult to tap on 1000BASE-T modern cables. Anyway, since our typical LAN-to-LAN scenario rather involves optical fibers running along not-well-protected cable runways, let's see if optical fibers are vulnerable too.
Tapping on optical fiber
For a number of years it was commonly believed that tapping on optical fibers was possible, but that it needed both expertise and expensive hardware.
But in 2015, the late Kevin Mitnick demonstrated just how easy it is to tap into a fiber and sniff the traffic, using a 200$ optical "clip-on coupler".
These couplers are originally designed to be used by line technicians to talk to one another over long distances using so-called "optical talk sets", and coordinate with one another while installing these fibers, even when they have no access to the ends of the fiber.
They work by exposing a portion of the fiber core and bending it slightly, reflecting some 2-3% of the light inside the fiber (but still 100% of its data) and also allowing to inject light in it. Of course, it's better to have some practice and a steady hand in order to expose the fiber core without damaging the fiber.
In true hacker fashion, these couplers are creatively used by attackers to read data from the fiber, and inject data of their own.
This opens many classical man-in-the-middle attack scenarios such as forcing the downgrade of crypto protocols, redirecting traffic, etc. on top of simple sniffing.
Encrypt ALL the things !
So it seems there's not much you can do to prevent vampire tapping onto your easily-accessible cable paths. In fact, they might already be there...
As with any confidentiality issue, the best solution lies in encryption. If your whole network traffic is encrypted, the data vampires might still suck on your network cables, but instead of draining the very blood of your company, they will only gather encrypted gibberish of no use to them.
But how exactly can you encrypt "everything" in such a LAN-to-LAN scenario? And at what cost?
If you had routers on both ends of the fiber, with each site having its own dedicated IP zones with no overlap, you could easily setup a VPN tunnel like IPSec or Wireguard between the two routers and solve your problem.
That said, if you used a "stretched L2" approach like in our scenario, it would not be easy to go back and re-segment all your networking, moving from bridging to routing traffic.
You could try instead to enforce the policy that every applicative stream on your network, even print jobs, realtime video, DNS, etc. uses only a "secure" (encrypted) version of its protocol. It's certainly an impressive feat if you've managed to achieve that in a typical office environment ☺️. Even so, you'd likely still be interested into setting some kind of encrypted tunnel on the fiber, just in case someone sets up accidentaly some unsecure applicative stream...
The idea we had to solve such a problem, with the least disruption to existing networks and protocols, was to mix "802.1q trunk links" and VPN-like encryption between the two ends of the fiber. So instead of plugging the fiber directly into your switches, you plug them to an equipment with two network interfaces, that will on one end "swallow" all your 802.1q traffic (VLANs and all) and then transfer them over the fiber, through an encrypted tunnel, to a second "mirror" equipment that will decrypt the packets and transform them again into 802.1q frames and "spit them out" to its local network. This was dubbed the "wormhole" project.
The quest for the encrypted trunk : MACsec
If you search for ways to "encrypt/secure a 802.1q trunk" you will probably read about MACsec, aka the Cisco-designed 802.1ae standard, which on paper seems to do exactly what we want, with the added benefit that if you use MACsec-capable switches on each end of the fiber, you don't need additionnal equipements to do the secure tunneling.
MACsec creates point-to-point Secure Channels pairs (one for Tx, one for Rx) between two devices over an untrusted connection, using first a negotiation protocol called MKA (MACsec Key Agreement) and for example pre-shared keys (or a PKI). Over these Secure Channels, MACSec Frames are sent, which are only slightly modified Ethernet frames with their layer-3+ payload encrypted using GCM-AES-128 (or GCM-AES-256 in newer hardware). Since they are essentially Ethernet frames, MACSec can natively support 802.1q VLAN headers for example.
In order to test these promises, we bought a pair of the smallest Cisco switch that supports MACsec : the Catalyst 3560 CX WS-3560CX-8XPD-S, at ~1600€ each.
Testing MACsec
Unfortunately, during our testing we found MACsec underwhelming for our LAN-to-LAN scenario.
First, it's difficult to find accurate MACsec documentation since it supports a variety of use cases (securing a link between a workstation and a switch, securing a link between two switches...) and as usual with Cisco, there are configuration directives that may or may not exist on a given Cisco switch, depending on its IOS image, feature level, generation, etc. So it was a bit of a pain to get it working for our trunk ports.
Then, we found out something that in hindsight was pretty clear in the MACsec specification : MACsec does not hide the real MAC addresses of devices talking on your network to the eavesdropping attacker. It may make sense, in order for traffic to be able go through MACsec-unaware bridges, to preserve the original MAC adresses of (for example) two computers talking to each other, but we expected to be able to see only the MAC Adresses of the two switches directly connected to the untrusted network and doing the MACsec tunnel. This could have been considered a minor inconvenience, but as MAC addresses are assigned by device manufacturers, it gives an attacker quite a good intel on what brand of computer, printer, appliances, etc. devices you have on your network, so that may help them plan a targeted attack. And if you're worried about industrial espionage, maybe you don't want your spying concurrent to know what brand of components you use on your R&D VLANs either.
Last but not least, on several occasions during our offensive testing, we were able to mess with the MKA/MACsec traffic enough (using not-so-transparent software bridges and resetting MKA sessions on the switches ) so that half of the traffic (corresponding to one of the two secure channels) was being sent fully unencrypted, despite the linksec policy must-secure
settings on the ports specifying to never send traffic in the clear on this interface according to Cisco documentation ("Must-Secure imposes that only MACsec encrypted traffic can flow. Hence, until the MKA session is secured, traffic is dropped."). Moreover, the switches were not reporting errors at all and happily continued sending/receiving clear traffic on MACsec must-secure interfaces. The end-user devices (laptops) had no way to know their traffic was being read, since their communications continued as usual (minus the loss of a few ICMP or UDP packets during MKA renegociation).
This behaviour was observed with the most recent IOS firmware at that time, and although it then required both 1) an attacker on the fiber doing Man-in-the-Middle and dropping some frames, and 2) an admin action on the switch CLI itself (clear mka sessions
), given that the MKA sessions have an expiry and must be renegociated after some time or when a port goes down, we think it's likely that there's a way to trigger that behavior only by "sitting on the wire" with no CLI access to the switch.
The bug tracker from Cisco showed bugs and behaviour that, although apparently not applying to our model, looked close to what we found:
We had to drop there our experiments with our Cisco switches since the goal of this mission was to "find a reliable way to encrypt LAN to LAN fiber links at high speed", not to "break MACsec", but we may go back to it sometime in the future :)
Native Linux solution : VXLAN+Wireguard
Having set MACsec aside, we decided to PoC something that would use only native and well-known Linux kernel features on commodity hardware.
Since Wireguard is the state-of-the-art, in-kernel tunneling, was there a way to shove 802.1q L2 traffic inside a wireguard tunnel, and what speeds could be reached on customer-level hardware costing approximately the same as the Cisco switches we just tested ?
The "missing bit" to go from an L2 trunk to a L3 tunnel was solved by using VXLAN.
VXLAN
VXLAN is a protocol used to carry over L2 frames using UDP encapsulation (port 4789 or 8472 depending on the implementation) to a distant endpoint (called VTEP, for "VXLAN termination endpoint"). It works by hooking onto L2 forwarding tables (like on a Linux bridge), and sending to the remote VTEP the frames that must be broadcasted on the segment, along with the frames that have no local destination ("flood and learn"). The remote VTEP decapsulates the L2 frame from the UDP packet and sends it to its local network.
VXLAN real-world use case mostly involves "datacenter bridging", so that VMs from Datacenter 1 could behave like they are "on the same Ethernet segment" than VMs from Datacenter 2 so they can really share the same IP subnet (respond to ARP "who-has", etc).
Despite its name, VXLAN is only "inspired" by the concept of L2 VLANs : it is not something that will, out-of-the box, listen on a trunk port and carry 802.1q tagged frames to the remote endpoint. But it's very possible to do so using Linux wonderful networking stack ! You just need to to map each 802.1q VLAN ID to a VXLAN "vid" and do the same thing in reverse on the other VTEP.
Most of the documentation you find online about injecting L2 VLAN info into a VXLAN tunnel has you creating a Linux bridge + a VLAN interface + a VXLAN interface per VLAN that you want to transmit.
If you use all 4096 VLANs, or want to be able to add/drop VLANs on your network without having to do so many steps each time, you can use iproute2 commands bridge
and ip
with some recent feature flags (vlan_filtering, vlan_default_pvid, vlan_tunnel + tunnel_info
) to spare yourself some time and have less clutter on your Linux "wormhole" boxes. The vlan_default_pvid
flag is very important to be able to keep 802.1q headers "inside the bridge" and have them reach the vxlan interface where they will then be mapped to a vxlan vid
.
With these commands you can have a single bridge and VXLAN interface that handles every VLAN coming its way.
# Create a Linux bridge with the right options
/sbin/ip link add br0 type bridge vlan_filtering 1 vlan_default_pvid 0 vlan_stats_enabled 1 vlan_stats_per_port 1
# Enslave the trunk eth interface (connected to the switch trunk port)
# to the bridge
/sbin/ip link set dev ${TRUNK_IFACE} master br0
# Create a vxlan interface and enslave it to the same bridge
/sbin/ip link add vxlan0 type vxlan vni ${VXLAN_DEFAULT_VNI} local ${VTEP_LOCALIP} remote ${VTEP_REMOTEIP} dstport 4789
/sbin/ip link set dev vxlan0 master br0
# Activate vlan tunneling !
/sbin/bridge link set dev vxlan0 vlan_tunnel on
Then, adding a specific VLAN ID to the bridge so its extracted and mapped to a VXLAN vid requires these 3 commands
/sbin/bridge vlan add vid $vid dev ${TRUNK_IFACE}
/sbin/bridge vlan add vid $vid dev vxlan0
/sbin/bridge vlan set dev vxlan0 vid $vid tunnel_info id $vid
You can easily run a little script to do the mapping once and for all for every possible VLAN ID :
# extract vlans from trunk, map them to same vxlan vid
for vid in $(seq 2 4095); do
/sbin/bridge vlan add vid $vid dev ${TRUNK_IFACE}
/sbin/bridge vlan add vid $vid dev vxlan0
/sbin/bridge vlan set dev vxlan0 vid $vid tunnel_info id $vid
done
Wireguard
Plugging this VXLAN configuration to a wireguard tunnel is surprisingly easy. We won't be covering setting up a wireguard tunnel between two Linux hosts since there are many great resources online (and it just works out of the box).
Indeed, if you've already set up a wg0 interface over the (insecure) fiber connection, with (secure) IP addresses for both of your "wormholes", you can specify the remote peer's wireguard IP address directly as ${VTEP_REMOTEIP}. Linux in-kernel networking will then do its magic and dutifully forward over the wire your 802-1q frames, encapsulated in UDP VXLAN, and encapsulated again in UDP Wireguard.
Having read that, you might worry like we did at the potential performance cost of such many-levels of encapsulation and encryption on top of it.
Performance
So, the final packets that will transit "on the wire" between our wormholes will end up looking something like Eth/IP/UDP/WG/IP/UDP/VXLAN/Eth/802.1q/IP/Payload
. That's quite an overhead indeed!
But this overhead is only present during the transit on the fiber, which on our scenario is a local, short-distance (read : low latency) optical fiber, typically using at least 10 Gbps SFP+ optical transceivers. So the latency and bandwidth should be good, and we will be able to maximize the Maximum Transfer Unit (MTU) on the fiber interface ports, so that a typical Ethernet data payload of 1500 bytes will be able to "sit" easily in the payload of our jumbo wireguard+vxlan frame : there will be no need for segmentation and retransmits.
Regarding wireguard encryption, we did a little research and felt confident after reading resources that we could reach high speeds with the right hardware offloads, altough 10 Gbps seemed like the "record".
The corresponding ethtool
vars that matched offloads that enabled good performance for VXLAN+wireguard were determined to be : tx-udp_tnl-segmentation, generic-segmentation-offload, generic-receive-offload, rx-vlan-offload
and tx-vlan-offload
.
The next step was to find server hardware that had 10 Gbps SFP+ ports with UDP Segmentation Offload (Generic Segmentation Offload), UDP Receive Coalescing (Generic Receive Offload) and VXLAN offloading capabilites
The search was over when we found out about SuperMicro SuperServer 5019D-4C-FN8TP, that had everything we needed with an Intel Xeon D-2123IT SoC that directly handles 2x 10Gbps SFP+ and 2x 10Gbps base-T ports, both the CPU and NICs supporting the aforementioned offload instructions. It costed about 1300€ (you then have to add ECC RAM and local hard drive yourself).
Test setup
Our test setup was like this :
We planned to measure throughput using iperf3
in the following conditions:
- Between the Supermico wormholes on the untrusted fiber
- Between the Supermico wormholes through the wireguard tunnel built over the untrusted fiber (to measure wireguard encryption penalty)
- Between a pair of two 1 Gbps laptops, each on one side of the wormholes, to measure end-device to end-device performance
- Between two pairs of such laptops, each pair on a different VLAN and simultaneously trying to use their 1 Gbps max bandwidth, just to be sure we were scaling... and this was quite a good intuition to do this test, as you will read further.
We also measured the base performance of the four laptops with a classical setup (just the network switches linked by a trunk port, no vxlan or wireguard, no "wormholing") : they were able to reach 942 Mbps.
Just out-of-the box, with no particular tuning, here were the first tests results:
- 9.81 Gbits/sec between the Supermicros on untrusted fiber (no crypto) - so it seems we really did buy 10 Gbps-capable NICs! nice
- 8.18 Gbits/sec between the Supermicros on wg0 (AES) - a 17% performance penalty, seemed to be expected from encrypting...
- 874 Mbits/sec between two laptops (compared to 942 Mbps in a classical setup) - a 8% performance "end-user" penalty, did not seem so bad
- But only 658 Mbits/sec when there were 4 laptops each connected in pairs (again compared to 942 Mbps for each pair) - oops, something seemed off!
It looked like we were stalling somewhere around a ~1 Gbps shared bandwith for every end-user devices. Looked like a waste of our "next to 10 Gbps" bandwidth, surely there was something to do about it.
Tuning Linux networking
So we activated all the network-related tuning we had thought off beforehand:
# enable Jumbo frames (9000 bytes MTU) on 10 Gbps fiber
/sbin/ip li set dev ${UNTRUSTED_IFACE} mtu 9000
# https://cromwell-intl.com/open-source/performance-tuning/ethernet.html
/sbin/ip link set dev ${UNTRUSTED_IFACE} txqueuelen 13888
/sbin/ethtool -G ${UNTRUSTED_IFACE} rx 4096 tx 4096
# setup a 8020 MTU on wg0 interface to account for the 80 bytes wireguard headers overhead
# 20-byte IPv4 header or 40 byte IPv6 header, 8-byte UDP header 4-byte type, 4-byte key index, 8-byte nonce, 16-byte authentication tag)
/sbin/ip li set dev wg0 mtu 8020
we added some sysctl
tuning for 10Gbps ethernet as well:
# Maximum receive socket buffer size
net.core.rmem_max = 134217728
# Maximum send socket buffer size
net.core.wmem_max = 134217728
# Minimum, initial and max TCP Receive buffer size in Bytes
net.ipv4.tcp_rmem = 4096 87380 134217728
# Minimum, initial and max buffer space allocated
net.ipv4.tcp_wmem = 4096 65536 134217728
# Maximum number of packets queued on the input side
net.core.netdev_max_backlog = 300000
# Auto tuning
net.ipv4.tcp_moderate_rcvbuf =1
# Don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# If you are using jumbo frames set this to avoid MTU black holes.
net.ipv4.tcp_mtu_probing = 1
After that we had the following test results:
- 9.91 Gbits/sec between the Supermicros on untrusted fiber (no crypto) - 100 Mbps better than 9.81 Gbits/sec !
- 8.41 Gbits/sec between the Supermicros on wg0 (wireguard) - more than 200 Mbps better than 8.18 Gbits/sec !!
- 942 Mbits/sec between two laptops - now equivalent to the legacy setup without wormholes - yay !
- still 658 Mbits/sec when they were 4 laptops each connected in pairs - something's still off...
Having tuned everything network-related we could think of, it then occured to us that a good part of the networking ( bridging, forwarding, encapsulating) was really done in-kernel (read: "in-memory") and not on NICs. So surely some default Linux performance setting was stalling us around ~ 1 Gbps .
Last tunables
We then set the CPU governor to performance
and tuned virtual memory sysctl
s to match what RedHat's tuned
tool does when setting the profile throughput-performance
:
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
vm.swappiness=10
# set cpu/power options
governor=performance
energy_perf_bias=performance
min_perf_pct=100
And lo and behold, here were the tests results:
- 9.86 Gbits/sec between the Supermicros on untrusted fiber (no crypto) - bit lower than previous test
- 9.71 Gbits/sec between the Supermicros on wg0 - the crypto impact was becoming negligible!
- 942 Mbits/sec between two laptops (still equivalent to the legacy setup without wormholes)
- finally 942 Mbits/sec even when they were 4 laptops each connected in pairs - job done!
In fact, the performance was so great we felt the need to vampire tap the fiber to make sure everythink was still encrypted - and it still was ;-)
Conclusion
So, we were able to build a fully open-source pair of appliances that will strongly encrypt a 10 Gbps 802.1q trunk at almost wire-speed (less than 2% performance penalty), defeating any spying vampire tapping onto the underlying network link. And this appliance costs less than a flagship smartphone.
This truly speaks levels about the performance of today's affordable server hardware and the maturity of Linux networking stack. You can chain network technologies like trunking, bridging, routing, VXLAN and Wireguard almost like you chain CLI commands in true UNIX fashion, and the kernel makes it "just work". It just takes quite a bit of time and trial & error to find the right feature flags and tuning settings so you can get the most out of it. We hope this blog article will do its part as well for future researchers.
If you liked this article and building secure-yet-performant infrastructure, speak French and are living near Paris, note that we are hiring an experienced profile to join our Infrastructure team ! More info (in French) : https://www.synacktiv.com/apt-search-sysadmin