Skip to content

sparksql magic

A JupyterLab extension providing the following features via %%sparksql and %%trino magics:

  • SQL formatter
  • Automatic extraction of database schemas
  • Auto-completion triggered by tab or dot for:
    • table names
    • table aliases
    • table joins
    • nested column names
    • functions
  • Syntax highlighting for:
    • line magic
    • cell magic
    • Python strings

Execute and output your query results into an interactive data grid

Output as JSON

Auto suggest column names and sub-fields

Auto suggest JOINs on matching column names

Format and show syntax highlighting in Notebook code cells

To format SQL statements in the cell, right-click in the cell and select Format Sql Cell or hit <CTRL+Q>.

Works in Python strings

While inside a notebook, you can have a multi-line Python string containing SQL and enjoy the same features (syntax highlighting, code completion and SQL formatting) as in a sparksql cell by wrapping your string with --start-sparksql and --end-sparksql. Here is an example:

# declare a python string
sql = """
--start-sparksql
SELECT
    *
FROM
    table AS t
--end-sparksql
"""
print(sql)

Capture your Spark query as a Dataframe or a temporary view

Usage

Parameter usage example:

%%sparksql -c -l 10 --dataframe df
<QUERY>
Parameter Description
--database NAME Spark database to use.
-l LIMIT, --limit LIMIT The maximum number of rows to display. A value of zero is equivalent to --output skip
-r all|local|none, --refresh all|local|none Force the regeneration of the schema cache file. The local option will only update tables/views created in the local Spark context.
-d NAME, --dataframe NAME Capture results in a Spark dataframe named NAME.
-c, --cache Cache dataframe.
-e, --eager Cache dataframe with eager load.
-v VIEW, --view VIEW Create or replace a temporary view named VIEW.
-o sql|json|html|aggrid|grid|text|schema|skip|none, --output sql|json|html|aggrid|grid|text|schema|skip|none Output format. Defaults to html. The sql option prints the SQL statement that will be executed (useful to test jinja templated statements).
-s, --show-nonprinting Replace none printable characters with their ascii codes (LF -> \x0a)
-j, --jinja Enable Jinja templating support.
-b, --dbt Enable DBT templating support.
-t LIMIT, --truncate LIMIT Truncate output.
-m update|complete, --streaming_mode update|complete The mode of streaming queries.
-x, --lean-exceptions Shortened exceptions. Might be helpful if the exceptions reported by Spark are noisy such as with big SQL queries.