Technical lineage support for Snowflake SQL scripting

Updated 

In the ever-evolving landscape of data management, ensuring the accuracy and completeness of your data lineage is crucial. With the newly added support for Snowflake SQL Stored Procedures (Snowflake Scripting) we improve both the accuracy and completeness by computing the design lineage of your SQL Stored Procedures including variable tracking.


In this article, we’ll explore the different integration options available with Snowflake and how they help you to get the lineage you need.

Snowflake lineage integrations

In our technical lineage integration with Snowflake we have two modes: SQL and SQL-API.

SQL mode is the integration in which we analyze the SQL statements and extract the lineage out of them. You can use the Java Database Connectivity (JDBC) integration or provide the SQL Statements using the SQL Directory option. This is called “design lineage” as it shows lineage as designed in these SQL scripts.


In 2023 we also introduced SQL-API mode; in this mode we extract the lineage as seen by 

Snowflake and we complement it by analyzing the SQL statements that construct the views.

The lineage provided by Snowflake is what we call query lineage; it is the lineage seen

by the database engine. Query lineage will only show lineage if the SQL statement has

been executed and for the executed path. Looking at the below example:


CREATE OR REPLACE PROCEDURE example_procedure (input_variable INT)

  RETURNS TEXT

  LANGUAGE SQL

AS

BEGIN

    CASE

        WHEN :input_variable < 100 THEN

            INSERT INTO table_A (column1, column2)

            SELECT value1, value2

            FROM table_X;

            RETURN 'Inserted into table_A';

        WHEN :input_variable >= 100 THEN

            INSERT INTO table_B (column1, column2)

            SELECT value1, value2

            FROM table_Y;

        ELSE

            RETURN 'Input variable did not match any table';

    END CASE;

END;


If this procedure has only been executed so far with input_variable lower than 100, only lineage will be seen for the first path while using the SQL-API mode:

Support for SQL stored procedures in SQL mode


Up until now we did not support SQL Stored Procedures when using SQL mode as this was only available with the SQL-API mode. Our customers expressed some concerns with query lineage, however, in specific situations:

  • If the SQL statement was not executed yet or was not executed in a recent period, lineage was missing

  • If the SQL Statement contains conditions and not all paths were triggered, lineage was missing


This is a valid concern and we are dedicated to providing accurate and complete lineage, hence the addition of support for SQL Stored Procedures. Looking at the above example this always provides the below lineage, regardless of if and what path has been executed:

Not only do you get the lineage from the SQL statements, but we also added support to track the variables in your stored procedures. Explore our supported SQL statements section for detailed examples.

How to get started

If you are an existing user of the Snowflake lineage integration with SQL mode you only  need to use a recent version of the harvester. The  automatic harvesting was introduced in Edge 2024.5.1-10 and CLI Harvester 2024.05 so you need at least this version. 

If you are new to Snowflake lineage, dive into our documentation, which contains all the details you need to get started.

You can see in our documentation examples of what we support and the resulting lineage. 

Questions? Reach out to our community of experts.