Coding Standards

CRITICAL RULES (NEVER VIOLATE)

  1. NEVER swallow exceptions - Always handle specific exceptions or crash loudly
  2. NEVER use string concatenation for structured data (URLs, SQL, HTML, JSON)
  3. NEVER use getattr/setattr unless literally no alternative exists
  4. ALWAYS fail fast - Crash immediately on unexpected state

Repository Instructions

If the repository has a README.md, read it and refer to it. If there is CLAUDE.md or CODEX.md, read it and follow it.

References Folder (if present)

If you see a references/ folder: - DO NOT edit the reference files inside - Look for references/fetch.sh which fetches/updates references - Feel free to add to fetch.sh for new references

Example fetch.sh might include:

# Fetch API docs and convert to markdown
curl -L https://example.com/api-docs.html | pandoc -f html -t markdown > api.md

# Clone specific files
curl -O https://raw.githubusercontent.com/user/repo/main/implementation.py

Only fetch.sh is version controlled. Fetched files are gitignored.

Internet use OK

Feel free to fire HTTP queries for testing, fetching documentation, source code for reference, etc. Especially to add to the references/ folder.

If useful for testing etc., just fire them right away without asking. Also start servers, experiment, etc.

One-off Scripts

For temporary/experimental scripts, make their throwaway nature obvious:

Wrong: test_api.py in repo root Right: throwaway/2024-01-15/test_api.py with header # THROWAWAY SCRIPT - DO NOT REUSE

General across languages

Code Brevity

Minimize code length aggressively. Prefer: - One-liners over multi-line when readable - List/dict comprehensions over loops - Ternary operators over if/else blocks - Built-in functions over manual implementations

This is more important than some traditional "clean code" rules.

# Wrong - unnecessary loop:
operands = []
for op_id in operand_ids:
    if expr := _parse_single_component(store, op_id):
        operands.append(expr)

# Right - list comprehension:
operands = [expr for op_id in operand_ids if (expr := _parse_single_component(store, op_id))]

No Trailing Whitespace

Remove all trailing whitespace. Empty lines should be truly empty.

DRY (Don't Repeat Yourself)

Be aggressive about eliminating repetition. The longer the repeated pattern, the more important to refactor it. Use whatever abstraction fits: loops, functions, decorators, context managers, etc.

Example using loops:

# Wrong:
if category is not None:
    habit_data["category"] = category
if goal_type is not None:
    habit_data["goal_type"] = goal_type
if target_value is not None:
    habit_data["target_value"] = target_value

# Right:
for key, value in {
    "category": category,
    "goal_type": goal_type,
    "target_value": target_value,
}.items():
    if value is not None:
        habit_data[key] = value

Example using mappings:

# Wrong - repetitive if/elif:
if operator_id == AND_OPERATOR_ID:
    return _parse_boolean_expression(store, "AND", node.children[1:])
elif operator_id == OR_OPERATOR_ID:
    return _parse_boolean_expression(store, "OR", node.children[1:])
elif operator_id == NOT_OPERATOR_ID:
    return _parse_boolean_expression(store, "NOT", node.children[1:])

# Right - use mapping:
OPERATORS = {AND_OPERATOR_ID: "AND", OR_OPERATOR_ID: "OR", NOT_OPERATOR_ID: "NOT"}
if operator_id in OPERATORS:
    return _parse_boolean_expression(store, OPERATORS[operator_id], node.children[1:])

Particular case: No redundant special cases for empty structures

Do not implement redundant special cases for empty lists/dicts/structures if they do not change behavior.

Wrong (function formats a list as <1 2 3>):

def format_numbers(xs: list[int]):
    if not xs:      # <-- BAD: redundant special case
        return '<>'  # Same result as general case below

    result = '<'
    for i, n in enumerate(xs):
        if i > 0:
            result += ' '
        result += str(n)
    result += '>'
    return result

The special case if not xs is redundant because the loop naturally handles empty lists, producing the same <> output.

CORRECTED:

def format_numbers(xs: list[int]):
    result = '<'
    for i, n in enumerate(xs):
        if i > 0:
            result += ' '
        result += str(n)
    result += '>'
    return result

Exception Handling

FORBIDDEN:

try:
    risky_operation()
except Exception:  # NEVER do this
    pass  # ABSOLUTELY FORBIDDEN

Wrong:

try:
    risky_operation()
except Exception as e:  # Too broad
    logger.error(f"Something went wrong: {e}")

Right:

try:
    risky_operation()
except (ValueError, KeyError) as e:  # Specific exceptions
    logger.error(f"Data validation failed: {e}")
    raise  # Re-raise or handle appropriately

Only catch Exception at the very outer boundary (e.g., request handlers) and ALWAYS log it.

Early bail-out

Use early bail-out pattern. Including making functions to enable using it when it makes things nicer.

Wrong:

def process_data(data):
    if data is not None and len(data) > 0:
        validate_data(data)
        transformed = transform_data(data)
        result = analyze_data(transformed)
        save_results(result)
        return result
    else:
        logger.error("No data provided")
        raise ValueError("Data cannot be empty")

Right:

def process_data(data):
    if not data:  # Early bail-out
        logger.error("No data provided")
        raise ValueError("Data cannot be empty")

    validate_data(data)
    transformed = transform_data(data)
    result = analyze_data(transformed)
    save_results(result)
    return result

DO NOT do:

async def _handle_interfaces_removed(self, path: str, interfaces: list[str]) -> None:
    """Handle interfaces being removed (e.g., adapter disappearing)."""
    if path == self._adapter_path and "org.bluez.Adapter1" in interfaces:
        logger.warning(f"Bluetooth adapter removed: {path}")
        # Clean up adapter
        if self._adapter_properties_iface:
            self._adapter_properties_iface.off_properties_changed(self._handle_adapter_properties_changed)
        self._adapter_path = None
        # ... bunch more code in this branch, nothing outside it ...

Instead, DO:

async def _handle_interfaces_removed(self, path: str, interfaces: list[str]) -> None:
    """Handle interfaces being removed (e.g., adapter disappearing)."""
    if path != self._adapter_path or "org.bluez.Adapter1" not in interfaces:
        return  # Early bail-out if not the adapter we're interested in
    logger.warning(f"Bluetooth adapter removed: {path}")
    # Clean up adapter
    if self._adapter_properties_iface:
        self._adapter_properties_iface.off_properties_changed(self._handle_adapter_properties_changed)
    self._adapter_path = None
    ...

This just saved us an indentation level. This can be especially nice in helper functions.

Document Current State Only

No historical comments like # This used to work this way but we changed it. Don't keep broken code "for backward compatibility". It was broken. Delete it.

Avoid redundant docstrings:

# Wrong - docstring just repeats what's obvious from signature:
def _parse_boolean_expression(store: NodeStore, operator: str, operand_ids: list[NodeId]) -> BooleanSearch | None:
    """
    Parse a boolean expression with the given operator and operands.

    Args:
        store: The NodeStore
        operator: The boolean operator ("AND", "OR", "NOT")
        operand_ids: List of operand node IDs

    Returns:
        The parsed boolean expression or None
    """
    operands = [expr for op_id in operand_ids if (expr := _parse_single_component(store, op_id))]
    return BooleanSearch(operator, operands) if operands else None

# Right - no docstring, or only document non-obvious behavior:
def _parse_boolean_expression(store: NodeStore, operator: str, operand_ids: list[NodeId]) -> BooleanSearch | None:
    operands = [expr for op_id in operand_ids if (expr := _parse_single_component(store, op_id))]
    return BooleanSearch(operator, operands) if operands else None

DO NOT assemble non-plaintext by string concatenation (e.g., URL parameters)

Do not assemble URLs with plain string concat, e.g. "&".join([f"{k}={v}" for k, v in params.items()]). Use proper libraries:

Wrong (various languages):

# Python
url = f"https://api.example.com/search?q={query}&limit={limit}"  # BAD: no escaping
html = f"<div title='{title}'>{content}</div>"  # BAD: manual string concat
html = f'<p class="{html.escape(css_class)}">'  # STILL BAD: manual string concat
sql = f"SELECT * FROM users WHERE name = '{username}'"  # BAD: SQL injection
// JavaScript
const url = `https://api.example.com/search?q=${query}&limit=${limit}`;  // BAD
const html = `<div title="${title}">${content}</div>`;  // BAD
const sql = `SELECT * FROM users WHERE id = ${userId}`;  // BAD
# Bash
URL="https://api.example.com/search?q=$QUERY"  # BAD
SQL="SELECT * FROM users WHERE name = '$NAME'"  # BAD

Right:

# URLs: Use requests (preferred) or urllib
response = requests.get("https://api.example.com/search", params={"q": query, "limit": limit})

# HTML: Use template engines or proper HTML builders
from jinja2 import Template
template = Template("<div title='{{ title }}'>{{ content }}</div>")
html = template.render(title=title, content=content)

# SQL: Use parameterized queries
cursor.execute("SELECT * FROM users WHERE name = %s", [username])

# JSON: Use json module
data = json.dumps({"name": name, "value": value})

This applies to ANY structured format. If it has special characters or escaping rules, use a library.

CLI and Shell Tools

Examples of tools you can use without asking: rg, jq, tree, ag. Feel free to use any standard development tools.

Avoid One-off Variables

Don't create variables used only once:

# Wrong:
data = [update.dict() for update in updates]
await self._post_webhook({"type": "update", "data": data})

# Right:
await self._post_webhook({
    "type": "update",
    "data": [update.dict() for update in updates]
})

Self-describing Variable Names

Include units/formats in names:

# Wrong:
timeout: int
devices: list[str]

# Right:
timeout_secs: int
device_macs: list[str]

# Better (type encodes unit):
timeout: datetime.timedelta

Python

Code Style Philosophy

Optimize for brevity and minimal cognitive load. Fewer lines, fewer characters, less to hold in working memory.

Formatting

  1. Check for .pre-commit-config.yaml - use whatever formatter is configured there
  2. If no pre-commit, use black
  3. Remove unused imports before finishing

Core Rules

Use Modern Python

# Type hints - ALWAYS use new syntax
str | None                  # NOT Optional[str]
list[int]                   # NOT List[int]

# Features to use aggressively
f"{var=}"                   # Shows var='value'
text.removeprefix("pre_")   # NOT text[4:]
dict1 | dict2              # Merge dicts
if (n := len(items)) > 10:  # Walrus operator
match status:               # Pattern matching
    case "ok": return True
    case _: raise ValueError(f"Unknown {status=}")

# Use enums for fixed string sets
from enum import Enum
class Operator(Enum):
    AND = "AND"
    OR = "OR"
    NOT = "NOT"

# Wrong - stringly typed:
operator: str  # "AND", "OR", "NOT"

# Right - use enum:
operator: Operator

Self-referencing Types

Use typing.Self or from __future__ import annotations:

class X:
    def foo(self) -> Self:  # NOT -> "X"
        return self

Walrus Operator

Use := to combine assignment and test:

# Wrong:
missing = configured - available_interfaces
if missing:
    logger.warning(f"Interfaces not found: {missing}")

# Also wrong:
expr = _parse_single_component(store, op_id)
if expr:
    operands.append(expr)

# Right:
if missing := configured - available_interfaces:
    logger.warning(f"Interfaces not found: {missing}")

if expr := _parse_single_component(store, op_id):
    operands.append(expr)

Code Patterns

NEVER use hasattr / getattr / setattr

ABSOLUTELY FORBIDDEN when you control the code:

# WRONG - I HATE THIS:
if hasattr(piece, 'get_display_name'):
    return f"Temperature {piece.get_display_name()}"
return f"Temperature {piece.hardware_id}"

Right:

return f"Temperature {piece.get_display_name()}"  # You KNOW it exists

HTML Templating

As soon as you start doing nontrivial html operations/concatting, switch from manual html stitching to jinja2 or other templating engine that contextually makes sense.

BAD: already WAY TOO COMPLEX for manual html stitching - AND prone to escaping issues:

menu_html = '<nav class="menu">\n'
for page_id, page_title in menu_items:
    url = "/" if page_id == "index" else f"/{page_id}"
    active_class = ' class="active"' if page_id == active_page else ""
    menu_html += f'    <a href="{url}"{active_class}>{page_title}</a>\n'
menu_html += '</nav>\n'
html = f"""<!DOCTYPE html>
<html lang="en">
<head><meta charset="utf-8"><title>{title}</title></head>
<body>{content}</body>
</html>"""

This should have switched to jinja2 about 10 minutes ago already.

Logging

Inside exception handlers, logger methods automatically include exception info:

Wrong:

try:
    risky_operation()
except ValueError as e:
    logger.error(f"Operation failed: {e}")  # BAD: duplicates exception info

Right:

try:
    risky_operation()
except ValueError:
    logger.error("Operation failed")  # Good: exception details auto-included

Testing

Test files should be located in the same directory as the module they're testing, with the name pattern test_*.py.

When writing unit tests, make them be pytest tests, NOT executable files with __main__ section.

PyHamcrest

Use pyhamcrest when testing sensor collections or complex matching scenarios. For example:

# Instead of multiple `.next()` and assert calls:
assert_that(sensors, has_items(
    has_properties(unique_id="battery_level", state=50.0, icon="mdi:battery-50"),
    has_properties(unique_id="battery_state", state="discharging", icon="mdi:battery-minus"),
    has_properties(unique_id="battery_power", state=-10.0, unit_of_measurement="W", device_class=DeviceClass.POWER),
    has_properties(unique_id="battery_time_to_empty", state=3600, unit_of_measurement="s", device_class=DeviceClass.DURATION),
    has_properties(unique_id="battery_time_to_full", state=7200, unit_of_measurement="s", device_class=DeviceClass.DURATION),
))

This approach provides more readable and concise assertions, making it easier to verify complex object collections.

When looking for whether a sequence contains one element which meets some properties, use has_item.

DO NOT do:

xs = [x for x in capture_updates if x.unique_id == "bluetooth_enabled"]
assert any(x.state == True and x.icon == "mdi:bluetooth" for x in xs)

ALSO DO NOT DO:

from hamcrest import assert_that, has_items, has_properties
assert_that(
    capture_updates,
    has_items(
        has_properties(
            unique_id="bluetooth_enabled",
            state=True,
            icon="mdi:bluetooth"
        )
    )
)

Instead, DO do this:

from hamcrest import assert_that, has_item, has_properties
assert_that(
    capture_updates,
    has_item(
        has_properties(
            unique_id="bluetooth_enabled",
            state=True,
            icon="mdi:bluetooth"
        )
    )
)

When to use PyHamcrest vs standard assertions

Use standard Python assertions for basic checks that don't benefit from Hamcrest's matchers:

# Use standard assertions when Hamcrest doesn't add value:
assert value == 200
assert user.name == "John"
assert foo is True
assert not bar
assert len(items) > 0

Use pyhamcrest when it makes the assertion more clear, expressive, or when you're doing complex checks:

# Use Hamcrest for these cases:
# String content checking
assert_that(text, contains_string("success"))

# Dictionary content validation
assert_that(data, has_entries(status="ok", count=greater_than(0)))

# Multiple conditions
assert_that(
    response.text,
    all_of(
        contains_string("success"),
        contains_string("data")
    )
)

Access properties directly when using Hamcrest instead of using has_property when it doesn't add value:

WRONG - unnecessarily verbose:

assert_that(user, has_property("name", contains_string("John")))

RIGHT - clearer and more direct:

assert_that(user.name, contains_string("John"))

The rule of thumb is: if you're just doing a single test on an object and it's a basic equality/truthiness check, use standard assertions. Use Hamcrest when you need its matchers to simplify complex assertions.

If you notice you'd like to test your changes (which is of course highly encouraged), rather than writing one-off blobs of throwaway Python, feel free to suggest creating a new actual test file.

Handling Unhandled Cases

ALWAYS handle the else case in switches/type checks. Crash on unexpected inputs.

Actively check that the program stays within understood guardrails. As soon as something unexpected happens → CRASH.

match msg:
    case SystemMessage(): return {"role": "system", "content": msg.content}
    case UserMessage(): return {"role": "user", "content": msg.content}
    case AssistantMessage(): return {"role": "assistant", "content": msg.content}
    case _: raise TypeError(f"Unexpected message type: {type(msg)}")

Sometimes let natural exceptions serve as crashes:

# If this should NEVER fail (you control all callers):
operator_map = {AND_OPERATOR_ID: "AND", OR_OPERATOR_ID: "OR", NOT_OPERATOR_ID: "NOT"}
return _parse_boolean_expression(store, operator_map[operator_id], node.children[1:])
# KeyError here means a programming error - let it crash

# But if it's user input or external data, be explicit:
if operator_id not in operator_map:
    raise ValueError(f"Invalid operator: {operator_id}")

Sentinel Objects

Using None as a default is fine when it means "nothing special" or "use default behavior":

def format_data(data: str, formatter: Formatter | None = None):
    if formatter is None:
        return data  # No formatting, just return as-is
    return formatter.format(data)

Use sentinel objects when there's a semantic difference between passing None and not passing anything:

# Example: JSON API where {"key": null} differs from {} (no key)
_UNSET = object()  # Sentinel value

def update_json_api(endpoint: str, key: str, value: Any = _UNSET):
    payload = {}
    if value is not _UNSET:
        # This handles both None and actual values
        payload[key] = value  # {"key": null} if value is None
    # If value is _UNSET, key is omitted entirely: {}
    return requests.post(endpoint, json=payload)

Re-exporting Modules

Do not create new __init__.py files that re-export things from sibling/child modules, i.e. __all__ == ["ThingFromSubmoduleA", "ThingFromSubmoduleB", ...]

If you find yourself in a codebase that already has a well-established file like that, it's OK to continue using and adding to it.

But DO NOT create such a file yourself without my explicit permission.

Final Rule

When in doubt, CRASH. Better to fail loudly than silently corrupt state.