Skip to content

A real upstream Samba bug

Building the lab’s offline-join path for the print emulator turned up a genuine bug in Samba: net offlinejoin requestodj segfaults with the default kerberos method. It is worth documenting because the crash is the kind that hides from automation, and because the fix is a one-line setting.

Offline domain join (ODJ) lets you provision a machine’s membership on the DC, hand it a blob, and have the machine join without ever talking to the DC live. It is how Windows does djoin.exe /requestODJ. Samba implements it as net offlinejoin requestodj.

With kerberos method set to anything that builds the default keytab (secrets and keytab, which is the default, or system keytab), that command segfaults:

INTERNAL ERROR: Signal 11: Segmentation fault
PANIC: Signal 11: Segmentation fault
...
#4 ads_search+0x3
#5 ads_find_machine_acct+0x130
#6 ads_get_service_principal_names+0x45
#7 ads_keytab_create_default+0xdd
#8 libnet_Join+0x13b0
#9 NetRequestOfflineDomainJoin_l+0x229

Reproduced on Samba 4.17.12 (Debian bookworm) and observed on other 4.x members.

requestodj is an offline operation, so it never opens a live connection to a DC. But with the default kerberos method, libnet_Join goes on to call ads_keytab_create_default(), which tries to enumerate the machine’s service principal names from the DC over LDAP. That needs a connected ADS_STRUCT handle, and in the offline path there is none: it is NULL, because no bind ever happened. ads_search() dereferences the NULL handle and the process dies.

In short: the offline path calls into code that assumes an online connection.

The crash is silently survivable. secrets.tdb (the part that records the join) is written before the keytab step that crashes. So the process panics, the keytab is never built, and yet:

Terminal window
net ads testjoin # -> "Join is OK"

Automation that checks only join status sees success and moves on. The missing keytab surfaces much later, as a confusing Kerberos failure, far from the panic that caused it. A crash that lies about having failed is worse than one that fails loudly.

Set kerberos method = dedicated keytab (with a dedicated keytab file). That skips the default-keytab build entirely, so the crashing branch is never reached:

kerberos method = secrets and keytab # SIGSEGV, exit 1 (join survives by ordering luck)
kerberos method = dedicated keytab # exit 0, clean

For members that mount a pre-exported keytab, which is exactly the lab’s print emulator, dedicated keytab is the right setting anyway. The lab uses it, so its ODJ path is clean.

Standing up a realistic lab is itself a test. Faking the protocols at production fidelity drove the code down a path the happy-path setup never exercises, and that path had a real defect in it. The “testjoin OK” masking is the part worth internalising: when a step can fail without changing the status your automation checks, your automation is not actually checking the thing you care about.

A draft bug report for filing upstream lives in the repository at docs/upstream-samba-requestodj-segfault.md.