Skip to content

Simulated Hospital — setup, integration & notes

The activity plane is Google’s Simulated Hospital (“Simhospital”): a synthetic HL7v2 generator that behaves like a hospital that is always open. It is what makes the EMR a populated, moving system rather than an empty shell. This page is the deep dive — the setup choices and the things that cost real time to get right.

The official eu.gcr.io/simhospital-images/simhospital image is no longer anonymously pullable, and upstream ships no Dockerfile (it builds via Bazel) and no go.mod. So gh-simhospital/build/ synthesises a Go-modules build:

  • A pinned upstream commit, fetched in a multi-stage Dockerfile.
  • A committed go.mod / go.sum (generated once with go mod tidy) so the build is reproducible, not re-resolved each time.
  • GOTOOLCHAIN=auto, because a transitive dependency now requires a newer Go than the base image ships.

Simhospital takes your own doctors and locations:

  • config/doctors.yml — ordering/attending physicians. These are the lab’s cast, the same names that exist as AD accounts, so every message names someone you can look up in the directory.
  • config/locations.yml — the wards (ED, ICU, Nephrology, Cardiology, Maternity, Pediatrics, Oncology). Pathways reference these by key; the poc code populates the HL7 PV1 assigned-location.

Out of the box Simhospital generates UK patients (NHS numbers, London addresses, GBR) — which jars against a US cast admitting to “General Hospital.” A locale pack (config/locales/us/data.yml, wired in via -data_config_file) plus a set of source patches (build/patches/) make the patients coherent:

  • US demographics — names and Chicago-area addresses (Evanston, Oak Park, Cicero, Naperville…), country = USA, 5-digit ZIPs.
  • National ID — a US SSN in a SS-typed PID-3 repetition (not an NHS number), and on the FHIR Patient.
  • Insurance — an IN1 segment on admit (Medicare / commercial), carried across the financial-lifecycle ADT messages.
  • Guarantor — a GT1 segment; for a minor the guarantor is a parent.
  • Cast cameo — occasionally a patient is named after a recognizable character from the medical-show universe (show-weighted, low probability), a wink for demos. The attending clinicians still come from doctors.yml.

The interface engine maps these through to OpenEMR: the SSN and address onto patient_data, IN1 into insurance_data, and the GT1 guarantor as the policy subscriber (so a pediatric patient shows a parent as the policyholder).

Pathways — what segments the traffic carries

Section titled “Pathways — what segments the traffic carries”

A pathway is a YAML patient journey. The live distribution spans 24 active pathways across three files, with percentage_of_patients summing to exactly 100 so the distribution manager runs them all:

  • gh_pathways.yml — the original six (ED chest pain, AKI dialysis, surgical ICU, peds fever, maternity, onc infusion), now rebalanced to 38%.
  • gh_clinical_pathways.yml — 13 service-line journeys (49%): NSTEMI, CKD exacerbation, neutropenic fever, upper-GI bleed, COPD, ischemic stroke, hip fracture, urosepsis, DKA, preeclampsia, bronchiolitis, suicidal-ideation hold, anaphylaxis.
  • gh_fax_pathways.yml — 5 fax-workflow journeys (13%); see below.

Two more files stay at 0% (run on demand, not in the live mix): the gh_negative_pathways.yml malformed-message set and gh_fhir_demo.yml.

The enrichment that makes each chart rich is all pathway-driven, no code:

Want this segmentAdd this to the pathway
AL1 (allergy)an allergies: list on an admission/registration step → rides on the ADT
DG1 (diagnosis)a diagnoses: list on an update_person step → emits an ADT^A08
PR1 (procedure)a procedures: list on an update_person step → emits an ADT^A08
OBX lab trendshistorical_data (backdated results) + result steps

gh_fax_pathways.yml models the clinical episodes that, in a real hospital, generate fax traffic — useful for exercising the Print & Fax emulation (faxart). Simhospital emits HL7, not faxes, so each pathway models the episode and the fax artifact it represents. Notably, the clinical_note step in this build lands in OpenEMR as a clearly-named procedure order, so the fax artifact is visible in the chart:

PathwayEpisodeFax artifact (shows in OpenEMR as)
fax_referral_cardiologyregistration + AFib diagnosisReferral Letter
fax_discharge_summaryadmit → pneumonia → discharge(discharge summary to PCP)
fax_results_to_referrerorder + resultsLaboratory Report
fax_prior_authOA diagnosis + knee replacementPrior Authorization Request
fax_pharmacy_scriptbronchitis + prescriptionPrescription

So a faxart demo can point at a recognizable, named order on a patient that arrived as synthetic HL7 traffic, rather than a hand-made document.

Gotcha 1 — let Simhospital compute lab values

Section titled “Gotcha 1 — let Simhospital compute lab values”

Simhospital does semantic validation on results: a specified abnormal_flag must match what the value-versus-reference-range computes. Hand-picking a value and a flag that disagree (or omitting the flag on an out-of-range value) is a fatal error that crash-loops the container. The YAML lints fine; only a live run catches it.

The robust fix, used here: drop explicit results: value lists and keep just the order_profile. Simhospital then generates in-spec values and derives the abnormal flags itself — still realistic, always valid.

Pathway delay steps are wall-clock, not simulated-fast. A transfer scheduled “2h later” actually fires two hours later. The practical consequences:

  • A short burst only produces the early events (admits, and the allergies that ride on them). Diagnoses (A08) and discharges (A03) arrive much later.
  • To verify the late events quickly, send crafted MLLP messages directly, or use a no-delay pathway in deterministic mode.

The compose command sends HL7 to the engine over MLLP:

- -output=${SIM_OUTPUT:-mllp}
- -mllp_destination=${MLLP_DEST:-oie:6661}
- -pathways_per_hour=${PATHWAYS_PER_HOUR:-120}

SIM_OUTPUT=stdout falls back to logging on the bus-free path. To seed a batch, raise PATHWAYS_PER_HOUR (e.g. 2500), make restart, let it run, then dial back to ~120 for a living trickle. The MLLP sender blocks on the ACK, so the engine’s auto-ACK must be working or throughput collapses — see the gh-integration notes on the mandatory responseGenerationProperties.

Simhospital can emit FHIR R4 alongside HL7. Two things were essential to learn:

  1. Resource generation is step-triggered. A pathway must end with a generate_resources: {} step or the FHIR writer is never invoked (it silently logs written: 0).
  2. Observations break the marshaller. With a result/lab step, the google/fhir JSON marshaller errors on the Observation’s Quantity value (JSONRawValue: invalid character ...). Upstream pins google/fhir@a54aa66 (~2020) via Bazel; the from-source build resolves the modern v0.7.4, whose Quantity marshalling differs. Pinning the old library would drag in incompatible 2020-era protobuf APIs and break the build.

So make fhir-sample runs a no-delay fhir_demo pathway in deterministic mode without lab steps, and produces valid bundles of Patient, Encounter, AllergyIntolerance, Condition, Location, and Practitioner — everything except Observation.

  • The built image bundles upstream’s malformed messages (InvalidNhsNum, InvalidOru_MissingPlacerAndFiller). A pathway emits one with a hardcoded_message step that selects by regex (there is no name: field). config/pathways/gh_negative_pathways.yml defines these at 0% so they never fire in the live mix — fire them on demand to exercise the engine’s error handling.
  • For repeatable demos, switch the manager with -pathway_manager_type=deterministic -pathway_names=<comma-list> so the named pathways run in a fixed order instead of the weighted distribution.

Simhospital feeds the integration plane, which writes into OpenEMR. To bring it up with the rest, see Run the clinical ecosystem.