Skip to content

SQL in cells

Expanse runs SQL cell formulas through Apache DataFusion, a fast, in-process query engine. Write a query with the =SQL(…) kernel and the result spills into the grid.

=SQL("SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
ORDER BY total DESC")

SQL runs in the desktop app. It is not available in the in-browser (WASM) build yet.

A query’s FROM targets are connected data sources, registered once and then queryable by name:

  • Parquet files, by path.
  • PostgreSQL databases, including whole catalogs (catalog.schema.table).
  • Results of other connected tables, so one query can build on another.

Connections are configured in ~/.config/expanse/sources.json and loaded when the workbook opens.

You can also run a literal query with no source at all:

=SQL("SELECT 1 + 1") # a single cell: 2

Selecting straight from a cell range or a sheet, like =SQL("SELECT * FROM A1:C100") or FROM Sheet1, is not supported yet. Promote the data to a named connected table first, then query that table by name.

Queries are read-only by default: SELECT, WITH (CTEs), VALUES, and EXPLAIN are allowed. The full DataFusion SELECT surface works, including:

  • Joins: inner, left, right, full, cross.
  • Aggregates: COUNT, SUM, AVG, MIN, MAX, APPROX_DISTINCT, PERCENTILE_CONT, and more.
  • Window functions: ROW_NUMBER, RANK, LAG, LEAD, and the rest.
  • GROUP BY / HAVING / ORDER BY, string, math, and date functions, and CAST.

For the complete function list, see the DataFusion SQL reference.

Write statements (INSERT, UPDATE, DELETE) and DDL (CREATE, DROP, ALTER) are off by default, and transaction / session statements (BEGIN, COMMIT, SET) are always blocked, so a formula can’t mutate a shared connection out from under you.

  • A query that returns multiple rows or columns spills into the neighbouring cells, Excel-style from the formula’s anchor.
  • A single-value result (one row, one column) collapses into the one cell, so =SUM(SQL(…)) composes naturally.
  • Headers are not added automatically. Select them explicitly if you want a header row.
  • Results are capped at 10,000 rows per query; above that the result is truncated.
  • Direct range / sheet queries (FROM A1:C100, FROM Sheet1): not yet. Use a connected table.
  • HTTP / JSON API sources: planned, not wired up. Use Postgres or Parquet today.
  • “Terabyte scale”: aspirational. Today queries materialise in memory and cap at 10,000 rows, with no streaming or incremental refresh yet. DataFusion can go much further; Expanse’s integration will grow into it.