eBPF in kernel lockdown mode
----------------------------

Abstract:

Linux has a new 'lockdown' security mode where changes to the running kernel
requires verification with a cryptographic signature and restrictions to
accesses to kernel memory that may leak to userspace.

Lockdown's 'integrity' mode requires just the signature, while in
'confidentiality' mode in addition to requiring a signature the system can't
leak information to userspace.

Work needs to be done to add cryptographic signatures for eBPF bytecode. The
signature be then passed to the kernel via sys_bpf() reusing the kernel module
signing infrastructure.

The main eBPF loader, libbpf, may perform relocations on the received bytecode
for things like CO-RE (Compile Once, Run Everywhere), thus tampering with the
signature made with the original bytecode.

It is thus needed to move such modifications to the signed bytecode from libbpf
to the kernel, so that it may be done after the signature is verified.

--------------------------------------------------------------------------------

eBPF in kernel lockdown mode
----------------------------

Linux now has a new LSM (Linux Security Module) that implements restrictions on
what root can do to reduce the possibility that unauthorized unsigned code
runs. This module is called 'lockdown'[1].

Lockdown has two modes: the 'integrity' one that requires that the kernel,
modules and whatever runs in kernel space be signed and the 'confidentiality'
mode, that in addition to requiring a signature, disables kernel features that
may leak confidential information from the kernel.

This presentation is intended to provide a problem statement, some ideas being
discussed, provide a reading list, and to foster awareness about this security
feature so that BPF can be used in environments where 'lockdown' mode is
required.

The kernel eBPF subsystem was already modified to avoid reading kernel
memory[2], by restricting the bpf_probe_read() BPF helper when the kernel is in
lockdown confidentiality mode.

As with other subsystems[3], maybe there are more places where such checks
needs to be done, such as uprobes, kprobes, etc.

Allowing to probe in selected areas needs to be investigated so that the value
of BPF tracing programs in lockdown mode can be made available. For instance,
XDP programs can look at kernel memory, but in a very restricted fashion, being
restricted to the network packet.

Approaches helping with allowing to rule out confidential areas for eBPF
programs in confidentiality mode includes to "add support for privileged
applications with an appropriate signature that implement policy on the
userland side." [8]

The next area to look at making eBPF usable in lockdown mode is in signature
verification[4] by the kernel when loading BPF bytecode.

Initially for tools that comes with pre-compiled byte code[5], with dynamic
generation in tools such as bpftrace being something harder to achieve and thus
left to a second step.

The main eBPF loader, libbpf, may perform changes, e.g., relocations on the
received bytecode for things like CO-RE (Compile Once, Run Everywhere)[6], thus
tampering with the signature made with the original bytecode.

It is thus needed to move such modifications to the signed bytecode from libbpf
to the kernel, so that it may be done after the signature is verified.

Here we should probably reuse the infrastructure for kernel module signing and
verification, with sys_bpf() being changed to receive the signature, check it
and then pass it to a component that will do the changes now performed in
libbpf.

Another idea that is being considered is to use an UMH (user mode helper) like
with bpfilter[7] for doing the bytecode modifications, with the parts of libbpf
that does the changes moved to this new component.

A signed userpace component would act as the helper that would do the libbpf
relocations receiving commands thru pipes setup by the kernel after it performs
signature verification.

This could happen at both the sys_bpf when receiving any bytecode or metadata
(BTF) from userspace or when treating BPF ELF files as an executable file going
through the kernel loader, which would help covering the whole ELF file, to
cover metadata such as where to insert the BPF bytecode in each of the ELF
sections.

Restricting even more what an unsigned eBPF program can do in kernel space in
lockdown mode may be an alternative mode for requiring a signature, programs
that need do use more capabilities would need to be signed. This could help
with a subset of observability tools.

Another feature that may be worth having would be for the tool writer to state
what is acceptable for a given signed BPF bytecode to do when in each of the
lockdown modes, being to some degree similar to capability dropping, informing
the BPF verifier of what is acceptable in each mode via some signed metadata.

The dynamic case, where tools such as bpftrace are involved would come later,
after the pre-built, signed bytecode is solved, but could involve a variation
of an approach described in a recent blog post by Matthew Garrett[8]:

"Add support for privileged applications with an appropriate signature that
 implement policy on the userland side. This is actually possible already,
 though not straightforward. Lockdown is implemented in the LSM layer, which
 means the policy can be imposed using any other existing LSM. As an example, we
 could use SELinux to impose the confidentiality restrictions on most processes
 but permit processes with a specific SELinux context to use them, and then use
 EVM[9] to ensure that any process running in that context has a legitimate
 signature. This is quite a few hoops for a general purpose distribution to jump
 through."

The bpftool utility needs to get functionality now found in the linux kernel
module signature utility scripts/sign-file.c and add a companion ELF section
with the signature for each of the BPF ELF bytecode and BTF sections.

We'd have a new subcommand:

  bpftool sign file FILENAME

Then libbpf when loading the bytecode should add that signature to the bpf_attr
struct passed to sys_bpf(BPF_PROG_LOAD).

In the future new BPF helpers would start being disabled for the
'confidentiality' mode, a conservative approach until enough time passed to
allow for assessing its safety.

References:

[1] "Why lock down the kernel?", Matthew Garrett
    https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Why-lock-down-the-kernel_Matthew-Garrett.pdf

[2] "bpf: Restrict bpf when kernel lockdown is in confidentiality mode", David Howells
    https://git.kernel.org/torvalds/c/9d1f8be5cf42

[3] "powerpc/xmon: Restrict when kernel is locked down", Christopher M. Riedl
    https://git.kernel.org/torvalds/c/69393cb03ccd  
    "ACPI: configfs: Disallow loading ACPI tables when locked down", Jason A. Donenfeld
    https://git.kernel.org/torvalds/c/75b0cea7bf30

[4] "kexec: do not verify the signature without the lockdown or mandatory signature"
    http://git.kernel.org/torvalds/c/fd7af71be542

[5] "BPF: The Status of BTF", description of runqslower, Arnaldo Carvalho de Melo
    http://vger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF-producers-consumers/#/33

[6] "BPF CO-RE (Compile Once – Run Everywhere)", Andrii Nakryiko
    http://vger.kernel.org/bpfconf2019_talks/bpf-core.pdf

[7] "Rethinking bpfilter and user-mode helpers", Jonathan Corbet, June 12, 2020
    https://lwn.net/Articles/822744/

[8] "Linux kernel lockdown, integrity, and confidentiality", Matthew Garret, April 21, 2020
    https://mjg59.dreamwidth.org/55105.html

[9] "evm: re-release", Mimi Zohar, March 15, 2011
    http://git.kernel.org/torvalds/c/66dbc325afce
