Troubleshoot eBPF mode
This document gives general troubleshooting guidance for the eBPF data plane.
To understand basic concepts, we recommend the following video by Tigera Engineers: Opening the Black Box: Understanding and troubleshooting Calico's eBPF Data Plane.
Troubleshoot access to services​
-
Verify that eBPF mode is correctly enabled
Examine the log for a
calico-node
container; in the extremely rare case when eBPF mode is not supported it will log anERROR
log that saysBPF data plane mode enabled but not supported by the kernel. Disabling BPF mode.
If BPF mode is correctly enabled, you should see an
INFO
log that saysBPF enabled, starting BPF endpoint manager and map manager.
-
In eBPF mode, forwarding external client access to services (typically NodePorts) from node to node is implemented using VXLAN encapsulation. If NodePorts time out when the backing pod is on another node, check your underlying network fabric allows VXLAN traffic between the nodes. VXLAN is a UDP protocol; by default it uses port 4789.
Note that this VXLAN traffic is separate from any overlay network that you may be using for pod-to-pod traffic.
-
In DSR mode, Calico Cloud requires that the underlying network fabric allows one node to respond on behalf of another.
-
In AWS, to allow this, the Source/Dest check must be disabled on the node's NIC. However, note that DSR only works within AWS; it is not compatible with external traffic through a load balancer. This is because the load balancer is expecting the traffic to return from the same host.
-
In GCP, the "Allow forwarding" option must be enabled. As with AWS, traffic through a load balancer does not work correctly with DSR because the load balancer is not consulted on the return path from the backing node.
-
The calico-node -bpf
tool
To inspect Calico Cloud's internal data structures, you can use the calico-node -bpf
tool. The tool is embedded in the cnx-node container image and
displays information about the eBPF data plane from within a calico-node
pod
only. Use kubectl get pod -o wide -n calico-system
to find the name of a
calico-node
pod and use the name in the following commands instead of
<calico-node-name>
.
To run the tool, use:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf <args>
For example, to show the tool's help:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf help
Available Commands:
arp Manipulates arp
cleanup Removes all calico-bpf programs and maps
completion Generate the autocompletion script for the specified shell
connect-time Manipulates connect-time load balancing programs
conntrack Manipulates connection tracking
counters Show and reset counters
help Help about any command
ifstate Manipulates ifstate
ipsets Manipulates ipsets
nat Manipulates network address translation (nat)
policy Dump policy attached to interface
profiling Show and reset profiling data
routes Manipulates routes
version Prints the version and exits
(Since the tool is embedded in the main `calico-node` binary the `--help` option is not available, but running
`calico-node -bpf help` does work.)
For example, to dump the BPF conntrack table, use:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf conntrack dump
...
Debug access to services​
Inspect the BPF NAT table to verify that the service is correctly programmed. To dump the BPF NAT table:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf nat dump
10.96.0.10 port 53 proto 6 id 4 count 2 local 0
4:0 192.168.129.66:53
4:1 192.168.129.68:53
10.96.0.10 port 53 proto 17 id 6 count 2 local 0
6:0 192.168.129.66:53
6:1 192.168.129.68:53
10.96.0.10 port 9153 proto 6 id 5 count 2 local 0
5:0 192.168.129.66:9153
5:1 192.168.129.68:9153
10.105.77.92 port 5473 proto 6 id 0 count 2 local 0
0:0 10.128.1.192:5473
0:1 10.128.1.195:5473
10.105.187.231 port 8081 proto 6 id 2 count 1 local 0
2:0 192.168.105.131:8081
10.109.136.88 port 7443 proto 6 id 1 count 1 local 0
1:0 192.168.129.72:7443
10.109.139.39 port 443 proto 6 id 7 count 2 local 0
7:0 192.168.129.67:5443
7:1 192.168.129.69:5443
10.96.0.1 port 443 proto 6 id 3 count 1 local 0
3:0 10.128.0.255:6443
Inspect the BPF conntrack table to verify that connections are being tracked. To dump the BPF conntrack table:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf conntrack dump
TCP 192.168.58.7:49178 -> 10.111.57.87:80 -> 192.168.105.136:80 Active ago 4.486606371s CLOSED <--- example of connection to service with per-packet NAT
TCP 10.128.1.194:41513 -> 10.128.1.192:179 Active ago 26.442759238s ESTABLISHED <--- example of connection without NAT or with connect-time NAT
TCP 192.168.58.7:42818 -> 10.111.57.87:80 -> 192.168.105.136:80 Active ago 1m15.987585857s CLOSED
TCP 10.128.1.192:58208 -> 10.109.136.88:7443 -> 192.168.58.5:7443 Active ago 4.935017508s ESTABLISHED
UDP 162.142.125.240:30603 -> 10.128.1.192:18989 Active ago 1m0.816678617s
UDP 127.0.0.1:48611 -> 127.0.0.53:53 Active ago 17.789851961s
Note that traffic originating within the cluster uses connect-time load balancing. By default, the connect-time
load balancing is only enabled for TCP traffic. When connect-time load
balancing is used, the conntrack table will not show the NAT resolution as
that happens when the application calls connect()
.
Check if Calico Cloud is dropping packets​
If you suspect that Calico Cloud is dropping packets, you can use the
calico-node -bpf
tool to check the BPF counters. Since the eBPF data plane is split
into programs that are attached to interfaces, you must check the counters on
the relevant interface. You can either dump counters for all interfaces or use --iface=<interface name>
to dump
counters for a specific interface.
Increasing counter Dropped by policy
indicates that Calico Cloud is dropping packets due to policy and you should
check your policy configuration.
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf counters dump --iface=eth0
+----------+--------------------------------+---------+--------+-----+
| CATEGORY | TYPE | INGRESS | EGRESS | XDP |
+----------+--------------------------------+---------+--------+-----+
| Accepted | by another program | 0 | 0 | N/A |
| | by failsafe | 0 | 0 | N/A |
| | by policy | 0 | 4 | N/A |
| Dropped | NAT source collision | 0 | 0 | N/A |
| | resolution failed | | | |
| | QoS control limit | 0 | 0 | N/A |
| | by policy | 0 | 11 | N/A |
| | failed decapsulation | 0 | 0 | N/A |
| | failed encapsulation | 0 | 0 | N/A |
| | failed to create conntrack | 0 | 0 | N/A |
| | fragment of yet incomplete | 0 | 0 | N/A |
| | packet | | | |
| | fragment out of order within | 0 | 0 | N/A |
| | host | | | |
| | fragments not supported | 0 | 0 | N/A |
| | incorrect checksum | 0 | 0 | N/A |
| | malformed IP packets | 0 | 0 | N/A |
| | packets hitting blackhole | 0 | 0 | N/A |
| | route | | | |
| | packets with unknown route | 0 | 0 | N/A |
| | packets with unknown source | 0 | 0 | N/A |
| | packets with unsupported IP | 0 | 0 | N/A |
| | options | | | |
| | too short packets | 0 | 0 | N/A |
| Other | packets hitting NAT source | 0 | 0 | N/A |
| | collision | | | |
| Redirect | neigh | 0 | 0 | N/A |
| | peer | 0 | 0 | N/A |
| | plain | 20 | 0 | N/A |
| Total | packets | 34 | 22 | N/A |
+----------+--------------------------------+---------+--------+-----+
eBPF program debug logs​
Sometimes it is necessary to examine the logs that are emitted by the eBPF
programs themselves. Although the logs can be very verbose (because the
programs will log every packet), they can be invaluable to diagnose eBPF program
issues. To enable the log, set the bpfLogLevel
Felix configuration setting to
Debug
.
Enabling logs in this way has a significant impact on eBPF program performance.
To reduce the performance impact in production clusters, you can target logging to specific traffic and/or specific interfaces using the bpfLogFilters Felix configuration setting. Filters are pcap expressions.
Note that the filters are applied to the original packet, before any NAT or encapsulation. Therefore, to log a packet that is being sent to a service making its way via different devices, you must filter on the service IP and port and also the backend pod IP and port.
The logs are emitted to the kernel trace buffer, and they can be examined using the following command:
kubectl exec -n calico-system <calico-node-name> -- bpftool prog tracelog
Logs have the following format:
<...>-84582 [000] .Ns1 6851.690474: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 7366ns
The parts of the log are explained below:
-
<...>-84582
gives an indication about what program (or kernel process) was handling the packet. For packets that are being sent, this is usually the name and PID of the program that is actually sending the packet. For packets that are received, it is typically a kernel process, or an unrelated program that happens to trigger the processing. -
6851.690474
is the log timestamp. -
ens192---E
is the Calico Cloud log tag. For programs attached to interfaces, the first part contains the first few characters of the interface name. The suffix is either-I
or-E
indicating "Ingress" or "Egress". "Ingress" and "Egress" have the same meaning as for policy:- A workload ingress program is executed on the path from the host network namespace to the workload.
- A workload egress program is executed on the workload to host path.
- A host endpoint ingress program is executed on the path from external node to the host.
- A host endpoint egress program is executed on the path from host to external host.
-
you may also see
ens192---X
which indicates an XDP program. Calico Cloud uses XDP programs to implementdoNotTrack
policies on host devices only. -
Final result=ALLOW (-1). Program execution time: 7366ns
is the message. In this case, logging the final result of the program. Note that the timestamp is massively distorted by the time spent logging.
Debugging policy issues​
If you suspect that Calico Cloud is dropping packets due to policy, you can use the
calico-node -bpf
tool to dump the policy that is attached to a specific interface.
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf policy dump <interface> <type> [--asm]
Where:
<interface>
is the name of the interface, for exampleeth0
orcaliXXXXXX
.<type>
represents location of the policy, eitheringress
,egress
,xdp
orall
.
Dump of an ingress policy. Note that ingress policy for a pod is attached to the
tc/tcx
egress hook of the host-side of the caliX veth pair, while ingress
policy for host endpoints is attached to the tc/tcx
ingress hook of the host
interface. Similarly for egress policy.
IfaceName: calic31b4f7fc58
Hook: tc egress
Error:
Policy Info:
// Start of tier default
// Start of policy default/knp.default.allow-nginx-from-ubuntu
// Start of rule action:"allow" protocol:{name:"tcp"} dst_ports:{first:80 last:80} src_ip_set_ids:"s:nzE8vwTu69FSscx2FDKjb20D9dZxEyVxsWFqwA" original_src_selector:"projectcalico.org/orchestrator == 'k8s' && app == 'ubuntu-client'" rule_id:"29vMYcPWr7reSxxN"
// IPSets src_ip_set_ids:<0x303904ae5eae5418>
// count = 9
// End of rule 29vMYcPWr7reSxxN
// End of policy default/knp.default.allow-nginx-from-ubuntu
// End of tier default: deny
// Start of rule action:"allow" rule_id:"aBMQCbsUMESPKGRp"
// count = 0
// End of rule aBMQCbsUMESPKGRp
Rules that use selectors refer to IP sets. You can dump the contents of an IP set using the ipsets
command and you can check whether the IP set contains the expected members:
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf ipsets dump
You can see how many packets have matched each rule. In this example, 9 packets have matched rule 29vMYcPWr7reSxxN
and 0 packets have matched rule aBMQCbsUMESPKGRp
.
Adding --asm
will show the eBPF assembly code for the program as well.
kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf ipsets dump
IP set 0x303904ae5eae5418
192.168.58.5/32
IP set 0xffef9f925a8a4ca4
192.168.129.66/32
192.168.129.68/32
Debugging calico-node not ready​
If you notice that a calico-node
pod is not ready, check its logs for errors. The most likely reason for calico-node
not being ready in eBPF mode is
that Calico Cloud is not able to update a program attached to an interface. Look for the following type of warning:
2025-09-22 22:39:59.801 [WARNING][10374] felix/bpf_ep_mgr.go 2107: Failed to apply policy to endpoint, leaving it dirty
One reason for this type of error is that the eBPF programs provided in the cnx-node image are not compatible with the verifier used by your kernel. Each kernel must ensure that the eBPF programs it loads are safe to run. However, capabilities of the verifier differ between kernel versions. We do test the eBPF programs with a range of kernels, but it is impossible to test all kernels. You may see errors such as if the verifier rejects a program:
265: (79) r1 = *(u64 *)(r10 -72) ; R1_w=ctx(off=0,imm=0) R10=fp0
266: (79) r2 = *(u64 *)(r10 -128) ; R2_w=scalar(umin=14,umax=74,var_off=(0x2; 0x7c)) R10=fp0
267: (b7) r5 = 0 ; R5_w=0
268: (85) call bpf_skb_store_bytes#9
invalid access to map value, value_size=1512 off=8 size=0
R3 min value is outside of the allowed memory range
processed 1102 insns (limit 1000000) max_states_per_insn 2 total_states 38 peak_states 38 mark_read 27
-- END PROG LOAD LOG --
libbpf: prog 'calico_tc_skb_ipv4_frag': failed to load: -13
libbpf: failed to load object '/usr/lib/calico/bpf/to_wep_no_log.o'
2025-07-31 17:36:15.708 [WARNING][45] felix/bpf_ep_mgr.go 2124: Failed to apply policy to endpoint, leaving it dirty error=attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied
attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied name="enif327a56b833" wepID=&types.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"calico-system/csi-node-driver-tggmr", EndpointId:"eth0"}
2025-07-31 17:36:15.708 [WARNING][45] felix/bpf_ep_mgr.go 2124: Failed to apply policy to endpoint, leaving it dirty error=attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied
If you see errors of this type, please open an issue on the Calico Cloud GitHub repository, including details of your kernel version and distribution.
Poor performance​
A number of problems can reduce the performance of the eBPF data plane.
-
Verify that you are using the best networking mode for your cluster. If possible, avoid using an overlay network; a routed network with no overlay is considerably faster. If you must use one of Calico Cloud's overlay modes, use VXLAN, not IPIP. IPIP performs poorly in eBPF mode due to kernel limitations.
-
If you are not using an overlay, verify that the Felix configuration parameters
ipInIpEnabled
andvxlanEnabled
are set tofalse
. Those parameters control whether Felix configured itself to allow IPIP or VXLAN, even if you have no IP pools that use an overlay. The parameters also disable certain eBPF mode optimisations for compatibility with IPIP and VXLAN.To examine the configuration:
kubectl get felixconfiguration -o yaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2020-10-05T13:41:20Z"
name: default
resourceVersion: "767873"
uid: 8df8d751-7449-4b19-a4f9-e33a3d6ccbc0
spec:
...
ipipEnabled: false
...
vxlanEnabled: false
kind: FelixConfigurationList
metadata:
resourceVersion: "803999" -
If you are running your cluster in a cloud such as AWS, then your cloud provider may limit the bandwidth between nodes in your cluster. For example, most AWS nodes are limited to 5GBit per connection.
Runtime profiling​
Setting bpfProfiling
to Enabled
enables collection of runtime profiling data
for eBPF programs. It collects the average execution time and number of
executions for each eBPF program attached to each interface. The
profiling data can be examined using the calico-node -bpf profiling e2e
command. The command resets the profiling data after dumping it.
----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+
| IFACE | INGRESS NEW | # | INGRESS EST | # | EGRESS NEW | # | EGRESS ETS | # |
+----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+
| lo | --- | --- | --- | --- | 142.263 ns | 10272 | --- | --- |
| eth0 | 2492.344 ns | 32 | 1535.443 ns | 16114 | 6296.421 ns | 749 | 1503.339 ns | 10982 |
| eni76136be4c77 | 5031.436 ns | 149 | 1194.923 ns | 1421 | 4950.196 ns | 138 | 1437.015 ns | 1432 |
| eni80d5c04bc95 | 7773.459 ns | 74 | 1508.973 ns | 641 | 4907.333 ns | 69 | 1715.848 ns | 646 |
| eth1 | 136.250 ns | 24 | --- | --- | 75.320 ns | 25 | --- | --- |
| eni5f8ab1cfc29 | 107.250 ns | 36 | 1068.596 ns | 1514 | 189.528 ns | 36 | 1104.335 ns | 1658 |
| bpfout.cali | 440.000 ns | 1 | --- | --- | 206.000 ns | 1 | --- | --- |
+----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+
Debug high CPU usage​
If you notice calico-node
using high CPU:
- Check if
kube-proxy
is still running. Ifkube-proxy
is still running, you must either disablekube-proxy
or ensure that the Felix configuration settingbpfKubeProxyIptablesCleanupEnabled
is set tofalse
. If the setting is set totrue
(its default), then Felix will attempt to removekube-proxy
's iptables rules. Ifkube-proxy
is still running, Felix will continually try to remove the rules, which can cause high CPU usage. - If your cluster is very large, or your workload involves significant service churn, you can increase the interval
at which Felix updates the services data plane by increasing the
bpfKubeProxyMinSyncPeriod
setting. The default is 1 second. Increasing the value has the trade-off that service updates will happen more slowly. - Calico Cloud supports endpoint slices, similarly to
kube-proxy
. If your Kubernetes cluster supports endpoint slices and they are enabled, then you can enable endpoint slice support in Calico Cloud with thebpfKubeProxyEndpointSlicesEnabled
configuration flag.