Technical lineage support for Snowflake SQL scripting
Updated
In the ever-evolving landscape of data management, ensuring the accuracy and completeness of your data lineage is crucial. With the newly added support for Snowflake SQL Stored Procedures (Snowflake Scripting) we improve both the accuracy and completeness by computing the design lineage of your SQL Stored Procedures including variable tracking.
In this article, we’ll explore the different integration options available with Snowflake and how they help you to get the lineage you need.
Snowflake lineage integrations
In our technical lineage integration with Snowflake we have two modes: SQL and SQL-API.
SQL mode is the integration in which we analyze the SQL statements and extract the lineage out of them. You can use the Java Database Connectivity (JDBC) integration or provide the SQL Statements using the SQL Directory option. This is called “design lineage” as it shows lineage as designed in these SQL scripts.
In 2023 we also introduced SQL-API mode; in this mode we extract the lineage as seen by
Snowflake and we complement it by analyzing the SQL statements that construct the views.
The lineage provided by Snowflake is what we call query lineage; it is the lineage seen
by the database engine. Query lineage will only show lineage if the SQL statement has
been executed and for the executed path. Looking at the below example:
CREATE OR REPLACE PROCEDURE example_procedure (input_variable INT)
RETURNS TEXT
LANGUAGE SQL
AS
BEGIN
CASE
WHEN :input_variable < 100 THEN
INSERT INTO table_A (column1, column2)
SELECT value1, value2
FROM table_X;
RETURN 'Inserted into table_A';
WHEN :input_variable >= 100 THEN
INSERT INTO table_B (column1, column2)
SELECT value1, value2
FROM table_Y;
ELSE
RETURN 'Input variable did not match any table';
END CASE;
END;
If this procedure has only been executed so far with input_variable lower than 100, only lineage will be seen for the first path while using the SQL-API mode:
Support for SQL stored procedures in SQL mode
Up until now we did not support SQL Stored Procedures when using SQL mode as this was only available with the SQL-API mode. Our customers expressed some concerns with query lineage, however, in specific situations:
If the SQL statement was not executed yet or was not executed in a recent period, lineage was missing
If the SQL Statement contains conditions and not all paths were triggered, lineage was missing
This is a valid concern and we are dedicated to providing accurate and complete lineage, hence the addition of support for SQL Stored Procedures. Looking at the above example this always provides the below lineage, regardless of if and what path has been executed:
Not only do you get the lineage from the SQL statements, but we also added support to track the variables in your stored procedures. Explore our supported SQL statements section for detailed examples.
How to get started
If you are an existing user of the Snowflake lineage integration with SQL mode you only need to use a recent version of the harvester. The automatic harvesting was introduced in Edge 2024.5.1-10 and CLI Harvester 2024.05 so you need at least this version.
If you are new to Snowflake lineage, dive into our documentation, which contains all the details you need to get started.
You can see in our documentation examples of what we support and the resulting lineage.
Questions? Reach out to our community of experts.