SQL in cells
Expanse runs SQL cell formulas through Apache DataFusion,
a fast, in-process query engine. Write a query with the =SQL(…) kernel and the result
spills into the grid.
=SQL("SELECT region, SUM(amount) AS total FROM sales GROUP BY region ORDER BY total DESC")SQL runs in the desktop app. It is not available in the in-browser (WASM) build yet.
What you can query
Section titled “What you can query”A query’s FROM targets are connected data sources, registered once and then queryable by name:
- Parquet files, by path.
- PostgreSQL databases, including whole catalogs (
catalog.schema.table). - Results of other connected tables, so one query can build on another.
Connections are configured in ~/.config/expanse/sources.json and loaded when the workbook opens.
You can also run a literal query with no source at all:
=SQL("SELECT 1 + 1") # a single cell: 2Querying a range or sheet
Section titled “Querying a range or sheet”Selecting straight from a cell range or a sheet, like =SQL("SELECT * FROM A1:C100") or
FROM Sheet1, is not supported yet. Promote the data to a named connected table first, then
query that table by name.
Supported SQL
Section titled “Supported SQL”Queries are read-only by default: SELECT, WITH (CTEs), VALUES, and EXPLAIN are
allowed. The full DataFusion SELECT surface works, including:
- Joins: inner, left, right, full, cross.
- Aggregates:
COUNT,SUM,AVG,MIN,MAX,APPROX_DISTINCT,PERCENTILE_CONT, and more. - Window functions:
ROW_NUMBER,RANK,LAG,LEAD, and the rest. GROUP BY/HAVING/ORDER BY, string, math, and date functions, andCAST.
For the complete function list, see the DataFusion SQL reference.
Write statements (INSERT, UPDATE, DELETE) and DDL (CREATE, DROP, ALTER) are off by
default, and transaction / session statements (BEGIN, COMMIT, SET) are always blocked, so a
formula can’t mutate a shared connection out from under you.
Results
Section titled “Results”- A query that returns multiple rows or columns spills into the neighbouring cells, Excel-style from the formula’s anchor.
- A single-value result (one row, one column) collapses into the one cell, so
=SUM(SQL(…))composes naturally. - Headers are not added automatically. Select them explicitly if you want a header row.
- Results are capped at 10,000 rows per query; above that the result is truncated.
Limits and what’s not built yet
Section titled “Limits and what’s not built yet”- Direct range / sheet queries (
FROM A1:C100,FROM Sheet1): not yet. Use a connected table. - HTTP / JSON API sources: planned, not wired up. Use Postgres or Parquet today.
- “Terabyte scale”: aspirational. Today queries materialise in memory and cap at 10,000 rows, with no streaming or incremental refresh yet. DataFusion can go much further; Expanse’s integration will grow into it.