Using syntaqlite from Python
This tutorial walks you through using syntaqlite as a Python library. By the end you'll have a script that formats SQL, validates it against a schema, and inspects the AST — all from Python.
1. Install
pip install syntaqlite
The pip package includes the syntaqlite CLI binary and a pure-Python
client library. Requires Python 3.10+.
2. Create a Syntaqlite instance
All operations go through a Syntaqlite
instance. Create one and reuse it across many calls:
import syntaqlite
sq = syntaqlite.Syntaqlite()
# ... use sq ...
sq.close()
Or use the context-manager form to close automatically:
with syntaqlite.Syntaqlite() as sq:
# ... use sq ...
pass
3. Format a query
sql = "select id,name,email from users where active=1 order by name"
print(sq.format_sql(sql))
Output:
SELECT id, name, email FROM users WHERE active = 1 ORDER BY name;
Customize formatting with keyword arguments:
print(sq.format_sql(sql, line_width=40, indent_width=4, keyword_case="lower"))
select id, name, email
from users
where
active = 1
order by
name;
4. Analyze against a schema
Build a Schema with the tables you want to check against and pass it to
analyze():
from syntaqlite import Schema, Table
schema = Schema(tables=[Table("users", ["id", "name", "email", "active"])])
result = sq.analyze(
"SELECT nme FROM users WHERE active = 1",
schema,
)
for d in result.diagnostics:
print(f"{d.severity}: {d.message}")
error: unknown column 'nme'
Schema also accepts raw DDL, so you can point at an existing schema
file instead of listing columns by hand:
schema = Schema(ddl="CREATE TABLE users (id INT, name TEXT, email TEXT, active INT)")
result = sq.analyze("SELECT nme FROM users WHERE active = 1", schema)
For human-readable output with source locations, switch the output format:
from syntaqlite import RenderOptions, AnalysisOutput
print(sq.analyze(
"SELECT nme FROM users WHERE active = 1",
schema,
output=AnalysisOutput.TEXT,
render_options=RenderOptions(source_name="query.sql"),
))
error: unknown column 'nme'
--> <input>:1:8
|
1 | SELECT nme FROM users WHERE active = 1
| ^~~
= help: did you mean 'name'?
Column lineage
When validating a SELECT, the result includes column lineage — which source table and column each output column traces back to:
schema = Schema(tables=[Table("users", ["id", "name", "email"])])
result = sq.analyze("SELECT u.name, u.email FROM users u", schema)
for col in result.lineage.columns:
print(f" {col.name} <- {col.origin}")
name <- users.name
email <- users.email
The lineage object also lists the catalog relations referenced in FROM
and the physical tables accessed after CTE/view expansion:
for rel in result.lineage.relations:
print(f" relation: {rel.name} ({rel.kind})")
for t in result.lineage.physical_tables:
print(f" physical table: {t}")
relation: users (table)
physical table: users
result.lineage is None when the query had no resolvable body (for
example, a CREATE TABLE statement).
5. Parse and inspect the AST
sq.parse() returns typed Python objects. Each statement is a class with
named attributes — you get IDE autocomplete and isinstance checks instead
of string-keyed dict access:
stmt = sq.parse("SELECT id, name FROM users WHERE active = 1")[0]
print(type(stmt).__name__) # SelectStmt
print(stmt.from_clause) # TableRef(...)
print(stmt.where_clause) # BinaryExpr(...)
for col in stmt.columns:
print(f" {type(col.expr).__name__}: {col.expr.column}")
SelectStmt
TableRef(...)
BinaryExpr(...)
ColumnRef: id
ColumnRef: name
Walking the AST
Because nodes are typed, you can write recursive visitors with isinstance.
Here's a function that counts arithmetic operators in a query:
from syntaqlite.nodes import BinaryExpr
from syntaqlite.enums import BinaryOp
def count_ops(node, target_ops):
"""Walk an AST node tree and count specific binary operators."""
count = 0
if isinstance(node, BinaryExpr):
if node.op in target_ops:
count += 1
count += count_ops(node.left, target_ops)
count += count_ops(node.right, target_ops)
return count
sql = "SELECT a + b - c, d - e + f + g FROM t WHERE x - 1 > y + 2"
stmt = sq.parse(sql)[0]
adds = 0
subs = 0
for col in stmt.columns:
adds += count_ops(col.expr, {BinaryOp.PLUS})
subs += count_ops(col.expr, {BinaryOp.MINUS})
if stmt.where_clause:
adds += count_ops(stmt.where_clause, {BinaryOp.PLUS})
subs += count_ops(stmt.where_clause, {BinaryOp.MINUS})
print(f"additions: {adds}, subtractions: {subs}")
additions: 4, subtractions: 3
Error recovery
The parser recovers from errors and continues. Error nodes are returned alongside valid statements:
from syntaqlite.nodes import Error
stmts = sq.parse("SELECT FROM; SELECT 1")
for s in stmts:
if isinstance(s, Error):
print(f"error: {type(s).__name__}")
else:
print(f"ok: {type(s).__name__}")
error: Error
ok: SelectStmt
6. Tokenize
for tok in sq.tokenize("SELECT 1"):
print(f" {tok['text']!r:10s} offset={tok['offset']} len={tok['length']} {tok['category']}")
'SELECT' offset=0 len=6 keyword
' ' offset=6 len=1 other
'1' offset=7 len=1 number
Next steps
- Python API reference — the full
Syntaqliteclass, parameters, and return types - CLI reference — the
syntaqlitecommand installed alongside the library - Formatting options — line width, keyword casing, and more