The Language for Orchestration Logic.

Heddle is a strictly-typed, domain-specific programming language (DSL) built to eliminate the maintainability nightmares endemic to traditional data orchestration and microservices.

Designed for Backend Developers, Data Engineers, and Data Scientists, Heddle bridges the gap between the rigorous safety of functional data pipelines and the pragmatic utility of imperative Python code.

Heddle introduces a Host-Core Architecture to give you the best of both worlds:

  1. The Functional Core (Heddle DSL): A clean, declarative workflow pipeline where data flows immutably.
  2. The Imperative Host (Python): Atomic, reusable "Steps" where you can leverage the Python ecosystem.

By separating the control flow from the computation, Heddle maximizes developer productivity and fosters a highly modular, reusable codebase.

Example
// onboarding.he

import "fhub/postgres" pg
import "fhub/email" mail
import "local/transform" tf

// 1. Define Strict Schemas
schema User {
  id: int
  username: string
  email: string
  signup_date: timestamp
  is_active: bool
}

// 2. Configure Persistent Resources
resource PrimaryDB = pg.connection {
  host: "db.internal"
  port: 5432
}

resource Mailgun = mail.provider {
  api_key: "sk-live-..."
}

// 3. Define Reusable Steps (FFI to Python)
step fetch_active_users: void -> User = pg.query <connection=PrimaryDB> {
  query: "SELECT * FROM users WHERE status = 'active'"
}

step enrich_user_data: User -> User = tf.enrich_geo_data

step send_welcome: User -> void = mail.send_template <provider=Mailgun> {
  template_id: "welcome_v2"
}

// 4. Orchestrate Logic Safely
workflow UserOnboarding {
  fetch_active_users
    | enrich_user_data
    | send_welcome
}

Seamless Python Integration & Radical Reuse

Heddle brings Simplicity & Order by strictly separating declarative orchestration from imperative execution. Atomic, reusable steps encapsulate your side effects and logic within the Python Host Environment, allowing seamless reuse of data science and engineering functions. workflows then define a strict, typed computational graph.

Example
# local/transform.py
from typing import TypedDict
from heddle.core import Table

class GeocodingConfig(TypedDict):
    # Add configuration properties bound from Heddle
    pass

def enrich_geo_data(config: GeocodingConfig, input_table: Table) -> Table:
    # input_table is a zero-copy Apache Arrow table!
    # Perform your pandas, polars, or pyarrow operations here
    arrow_table = input_table.arrow_table

    # ... computation ...

    return Table(arrow_table)

Embedded PRQL Native Transforms

Need relational transforms without dropping into Python? Heddle natively embeds PRQL (Pipelined Relational Query Language) backed by DataFusion with native Arrow integration for massive, typed inline transformations.

Example
// analytics.he
import "fhub/aws" aws

step fetch_active_users: void -> User = ...
step sink_to_s3: User -> void = aws.s3_sink { bucket: "analytics" }

workflow Analytics {
  fetch_active_users
    | ( // Embedded PRQL Native Transform
      from input
      filter is_active == true
      select { id, email }
    )
    | sink_to_s3
}

Row-Level Recovery & Time-Travel Debugging

Heddle evaluates workflows via immutable HeddleFrames that maintain a robust history mechanism. If a step fails, Handlers catch row-level errors and recover gracefully without crashing the entire batch execution, maintaining high throughput and reliability.

Example
import "std/log" log
import "local/recovery" rec

step process_data: User -> Result = external.api_call { ... }

// Handlers provide row-level error recovery
handler api_error_handler {
  * // Capture the exact row that failed
  | rec.fallback_strategy
  | log.error { message: "Recovered from row-level failure" }
}

workflow robust_pipeline {
  fetch_active_users
    // If process_data fails on any row, it triggers the handler
    // without crashing the entire batch execution.
    | process_data ? api_error_handler
    | sink_to_s3
}

Engineered for Resilience & Performance

A Host-Core Architecture built for developer productivity and unmatched performance.

Strictly Typed DSL

Catch configuration and type mismatch errors at compile-time with our custom LSP, before your multi-hour distributed data job starts. Primitives include int, string, float, bool, timestamp, and void.

Distributed Execution via Ray

Execute workflows locally for debugging, then scale to a massive distributed cluster seamlessly. Features dynamic micro-batching, Numba JIT optimization for pure functions, and detached persistent actors.

Zero-Copy Memory via Apache Arrow

Data flows between Heddle pipelines and Python steps via Arrow Tables and Ray's Plasma store. Zero serialization overhead ensures unmatched high performance.

Time-Travel Debugging

Because data frames are immutable, the runtime persists execution history. If a step fails, you have access to the exact state of the data precisely at each prior step.

Build the Ecosystem: Radical Reusability

Heddle's true power lies in reusability. Stop rewriting the same Kafka reader or Postgres writer from scratch ten times a week.

The Functions Hub (fhub)

We are actively seeking Contributors to build fhub—an open-source standard library of reusable Heddle Steps for modern tools like dbt, Snowflake, HuggingFace, and OpenAI.

Define Once, Run Anywhere

Define a Step once and reuse it across multiple workflows. Build an internal ecosystem of Lego-like data connectors and transformations.