A real upstream Samba bug
Building the lab’s offline-join path for the print emulator turned up a genuine
bug in Samba: net offlinejoin requestodj segfaults with the default
kerberos method. It is worth documenting because the crash is the kind that
hides from automation, and because the fix is a one-line setting.
What happens
Section titled “What happens”Offline domain join (ODJ) lets you provision a machine’s membership on the DC,
hand it a blob, and have the machine join without ever talking to the DC live. It
is how Windows does djoin.exe /requestODJ. Samba implements it as
net offlinejoin requestodj.
With kerberos method set to anything that builds the default keytab (secrets and keytab, which is the default, or system keytab), that command
segfaults:
INTERNAL ERROR: Signal 11: Segmentation faultPANIC: Signal 11: Segmentation fault ... #4 ads_search+0x3 #5 ads_find_machine_acct+0x130 #6 ads_get_service_principal_names+0x45 #7 ads_keytab_create_default+0xdd #8 libnet_Join+0x13b0 #9 NetRequestOfflineDomainJoin_l+0x229Reproduced on Samba 4.17.12 (Debian bookworm) and observed on other 4.x members.
Why it crashes
Section titled “Why it crashes”requestodj is an offline operation, so it never opens a live connection to a
DC. But with the default kerberos method, libnet_Join goes on to call
ads_keytab_create_default(), which tries to enumerate the machine’s service
principal names from the DC over LDAP. That needs a connected ADS_STRUCT
handle, and in the offline path there is none: it is NULL, because no bind ever
happened. ads_search() dereferences the NULL handle and the process dies.
In short: the offline path calls into code that assumes an online connection.
Why it is dangerous, not just annoying
Section titled “Why it is dangerous, not just annoying”The crash is silently survivable. secrets.tdb (the part that records the
join) is written before the keytab step that crashes. So the process panics, the
keytab is never built, and yet:
net ads testjoin # -> "Join is OK"Automation that checks only join status sees success and moves on. The missing keytab surfaces much later, as a confusing Kerberos failure, far from the panic that caused it. A crash that lies about having failed is worse than one that fails loudly.
The workaround
Section titled “The workaround”Set kerberos method = dedicated keytab (with a dedicated keytab file). That
skips the default-keytab build entirely, so the crashing branch is never reached:
kerberos method = secrets and keytab # SIGSEGV, exit 1 (join survives by ordering luck)kerberos method = dedicated keytab # exit 0, cleanFor members that mount a pre-exported keytab, which is exactly the lab’s print
emulator, dedicated keytab is the right setting anyway. The lab uses it, so its
ODJ path is clean.
The broader lesson
Section titled “The broader lesson”Standing up a realistic lab is itself a test. Faking the protocols at production fidelity drove the code down a path the happy-path setup never exercises, and that path had a real defect in it. The “testjoin OK” masking is the part worth internalising: when a step can fail without changing the status your automation checks, your automation is not actually checking the thing you care about.
A draft bug report for filing upstream lives in the repository at
docs/upstream-samba-requestodj-segfault.md.