Using syntaqlite from Python
This tutorial walks you through using syntaqlite as a Python library. By the end you'll have a script that formats SQL, validates it against a schema, traces column lineage, and inspects the AST, all from Python.
1. Install
pip install syntaqlite
The pip package includes both the syntaqlite CLI binary and a native C
extension that exposes the library API directly to Python. Requires Python 3.10+.
Note: On platforms where the C extension isn't available (e.g. Windows arm64), pip still installs the CLI binary. Only the library functions below won't be importable.
2. Format a query
import syntaqlite
sql = "select id,name,email from users where active=1 order by name"
print(syntaqlite.format_sql(sql))
Output:
SELECT id, name, email FROM users WHERE active = 1 ORDER BY name;
Customize formatting with keyword arguments:
print(syntaqlite.format_sql(sql, line_width=40, indent_width=4, keyword_case="lower"))
select id, name, email
from users
where
active = 1
order by
name;
3. Validate against a schema
Pass table definitions to validate column and table references:
import syntaqlite
from syntaqlite import Table
schema = [Table("users", ["id", "name", "email", "active"])]
result = syntaqlite.validate(
"SELECT nme FROM users WHERE active = 1",
tables=schema,
)
for d in result.diagnostics:
print(f"{d.severity}: {d.message}")
error: unknown column 'nme'
You can also load schema directly from DDL:
result = syntaqlite.validate(
"SELECT nme FROM users WHERE active = 1",
schema_ddl="CREATE TABLE users (id INT, name TEXT, email TEXT, active INT)",
)
For human-readable output with source locations, use render=True:
print(syntaqlite.validate(
"SELECT nme FROM users WHERE active = 1",
tables=schema,
render=True,
))
error: unknown column 'nme'
--> <input>:1:8
|
1 | SELECT nme FROM users WHERE active = 1
| ^~~
= help: did you mean 'name'?
3b. Column lineage
When validating a SELECT, the result includes column lineage: which source table and column each output column traces back to:
result = syntaqlite.validate(
"SELECT u.name, u.email FROM users u",
tables=[Table("users", ["id", "name", "email"])],
)
for col in result.lineage.columns:
print(f" {col.name} <- {col.origin}")
name <- users.name
email <- users.email
The lineage object also lists the relations referenced by the query:
for rel in result.lineage.relations:
print(f" {rel.name} ({rel.kind})")
users (table)
4. Parse and inspect the AST
syntaqlite.parse() returns typed Python objects. Each statement is a class
with named attributes — you get IDE autocomplete and isinstance checks instead
of string-keyed dict access:
import syntaqlite
stmt = syntaqlite.parse("SELECT id, name FROM users WHERE active = 1")[0]
print(type(stmt).__name__) # SelectStmt
print(stmt.from_clause) # TableRef(...)
print(stmt.where_clause) # BinaryExpr(...)
for col in stmt.columns:
print(f" {type(col.expr).__name__}: {col.expr.column}")
SelectStmt
TableRef(...)
BinaryExpr(...)
ColumnRef: id
ColumnRef: name
Walking the AST
Because nodes are typed, you can write recursive visitors with isinstance.
Here's a function that counts arithmetic operators in a query:
import syntaqlite
from syntaqlite.nodes import BinaryExpr
from syntaqlite.enums import BinaryOp
def count_ops(node, target_ops):
"""Walk an AST node tree and count specific binary operators."""
count = 0
if isinstance(node, BinaryExpr):
if BinaryOp(node.op) in target_ops:
count += 1
count += count_ops(node.left, target_ops)
count += count_ops(node.right, target_ops)
return count
sql = "SELECT a + b - c, d - e + f + g FROM t WHERE x - 1 > y + 2"
stmt = syntaqlite.parse(sql)[0]
adds = 0
subs = 0
for col in stmt.columns:
adds += count_ops(col.expr, {BinaryOp.PLUS})
subs += count_ops(col.expr, {BinaryOp.MINUS})
if stmt.where_clause:
adds += count_ops(stmt.where_clause, {BinaryOp.PLUS})
subs += count_ops(stmt.where_clause, {BinaryOp.MINUS})
print(f"additions: {adds}, subtractions: {subs}")
additions: 4, subtractions: 3
Error recovery
The parser recovers from errors and continues parsing. Error nodes are returned alongside valid statements:
from syntaqlite.nodes import Error
stmts = syntaqlite.parse("SELECT FROM; SELECT 1")
for s in stmts:
if isinstance(s, Error):
print(f"error: {type(s).__name__}")
else:
print(f"ok: {type(s).__name__}")
error: Error
ok: SelectStmt
5. Tokenize
import syntaqlite
for tok in syntaqlite.tokenize("SELECT 1"):
print(f" {tok['text']!r:10s} offset={tok['offset']} len={tok['length']}")
'SELECT' offset=0 len=6
' ' offset=6 len=1
'1' offset=7 len=1
Next steps
- Python API reference — all functions, parameters, and return types
- CLI reference — the
syntaqlitecommand installed alongside the library - Formatting options — line width, keyword casing, and more