Splitting a 2,000-Line automations.yaml into 8 Files

June 13, 2026

Table of Contents

Why one file stops scaling
A 40-line splitter that classifies by name
Eight buckets the data chose for itself
The validator is the actual hero
The one-line payoff

My automations.yaml had quietly grown into a 60 KB monster. 2,144 lines. 63 automations stacked end to end in a single file, averaging about 34 lines each. Scrolling it was a chore, editing it was nerve-racking, and finding the one heating automation I wanted meant a full-text search and a prayer. The file worked perfectly — that was the problem. There was no pressure to touch it, and every reason not to, because the moment you refactor a live config you risk silently dropping the automation that keeps the heating from freezing or the alarm from going half-deaf.

So this is a small story about a refactor I almost didn't do, and the one thing that made me trust it: I wrote a 40-line script to split the file, and then I wrote a second, longer script whose only job was to prove the first one hadn't lost anything.

Why one file stops scaling

Home Assistant wires automations in through one line in configuration.yaml: automation: !include automations.yaml. That single include is convenient right up until it isn't. With 63 automations in there, a few patterns had emerged on their own. Heating was by far the biggest concern — 21 automations, a third of the whole file, driving 11 thermostats with day and night modes and room-by-room overrides. Alarm was the next slab at 13. Then a cluster around the AC·THOR power diverter, a handful of camera and detection rules, time-based switching, system housekeeping, and a long tail of one-offs.

None of that structure was visible. It was just 2,144 lines in the order I'd happened to add each one. I wanted the heating rules in a heating file and the ability to open exactly the slice I was working on.

A 40-line splitter that classifies by name

The splitter is deliberately boring. It's about 40 lines of pure standard-library Python — just yaml, os, and a defaultdict. It loads the YAML list, walks each automation, decides which bucket it belongs to, and dumps one file per bucket into an automations/ directory.

Here's the honest quirk: the classification isn't driven by any tag or metadata field, because there isn't one. It's substring matching on the automation's German alias. If the alias contains "Heizung" it's heating; "Alarm" goes to alarm; "AC Thor" (in any of its hyphenated spellings) to ac_thor; camera and detection words to cameras; push and notification words to notifications; the time-control vocabulary to time_control; sync and reachability words to system; and everything the rules don't catch falls through to a misc bucket. It's string matching, not metadata, and I'd rather say that plainly than dress it up as something smarter.

categories = defaultdict(list)
for automation in automations:
    alias = automation.get('alias', 'Unknown')
    if 'Heizung' in alias:
        categories['heating'].append(automation)
    elif 'Alarm' in alias:
        categories['alarm'].append(automation)
    elif 'AC Thor' in alias or 'AC-Thor' in alias:
        categories['ac_thor'].append(automation)
    # ... cameras / notifications / time_control / system / misc
    else:
        categories['misc'].append(automation)

for category, autos in categories.items():
    with open(f'automations/{category}.yaml', 'w') as f:
        yaml.dump(autos, f, default_flow_style=False,
                  allow_unicode=True, sort_keys=False)

Two of those dump arguments matter more than they look. allow_unicode=True keeps the German umlauts intact instead of mangling them into escape sequences, so a name like "Büro" survives the round trip. And sort_keys=False preserves the original key order inside each automation — the trigger, condition and action stay in the order I wrote them, instead of getting alphabetised into nonsense.

Eight buckets the data chose for itself

I didn't pick eight up front — I'd half-expected to hand-carve a dozen or so files by feel. The committed splitter emits eight category files: heating, alarm, ac_thor, cameras, notifications, time_control, system, and misc. Eight buckets is what the data wanted, not a number I imposed. The distribution is lopsided in a way that's genuinely useful to see once it's broken out: heating is 21 automations (33%), alarm is 13 (21%), AC·THOR is 5 (8%), and then it thins fast into pairs and singletons. By estimated size the heating file lands around 700 lines on its own, alarm around 440, cameras around 270, ac_thor around 170.

That imbalance is the real finding. A 700-line heating file controlling 11 thermostats with near-identical day/night/presence logic is begging to be templated into a handful of parameterised scripts — that's the obvious next refactor, and now that it's isolated I can actually see the shape of it. The AC·THOR file, by contrast, came out clean and self-contained: temperature monitoring, power management, and a bit of system detection, all in one place.

The validator is the actual hero

Splitting a live config is only as trustworthy as your proof that nothing fell out. So before I changed a single line of configuration.yaml, I wrote a second script — about 160 lines — whose entire purpose is to re-read the original file and every split file and assert the round trip is lossless. It runs five checks. One: the total automation count matches. Two: the set of unique aliases matches, with no automation missing and none invented. Three: any duplicate aliases are identical between original and split. Four: every automation id value matches as a set. Five: a per-alias deep dictionary comparison to catch silent content drift.

if original_count == split_count:
    print(f"COUNT CHECK: {split_count} automations (matches original)")

missing_in_split = set(original_aliases) - set(split_aliases)
extra_in_split   = set(split_aliases) - set(original_aliases)
# IDs compared as sets; per-alias deep compare flags content drift
for alias in original_alias_set:
    if alias in split_by_alias and original_by_alias[alias] != split_by_alias[alias]:
        content_mismatches.append(alias)

The detail I'm proudest of is the deliberate split between errors and warnings. A count mismatch, a missing alias, an extra alias, an id-set divergence — those are hard errors, and the script exits 1 and refuses to bless the new layout. But the per-automation deep comparison is treated as a warning, and it exits 0 anyway. That's a judgement call I made on purpose: when you re-serialize YAML through the library, the bytes reformat — quoting style shifts, some flow collapses to block — but the parsed structure is functionally identical. If I treated every reformat as a failure, the validator would scream on every clean split and I'd learn to ignore it. So a lost automation aborts the whole thing; a cosmetic re-quote just gets a note.

The one-line payoff

With the validator green, the cutover is almost anticlimactic. You swap one line in configuration.yaml — !include automations.yaml becomes !include_dir_merge_list automations/ — and Home Assistant now merges every file in that directory into one automation list, exactly as if they were still concatenated.

Then you reload without a full restart: a ha core check to validate the config first, then a call to the homeassistant.reload service against the automation entity — for me that's a small POST to the REST API at something like http://homeassistant.local:8123, no reboot, no downtime. The automations come back exactly as they were, just spread across eight readable files instead of one 2,144-line wall.

If you take one habit from this, make it a naming convention: prefix new aliases with their category, like "Heating: <room> night mode", so the substring classifier stays deterministic as the config grows. The splitter only stays honest if the names do. The rest of how this house runs on Home Assistant — from the Docker-on-Azure install on up — is built on exactly this kind of small, defensible step, and the next one is finally templating that 700-line heating file.