N

262 Messages

Monday, May 16th, 2022 4:21 PM

Pushdown Sampling & Profiling

Hi all

With push down sampling, is data profiling carried out on the sample (say, 10k rows) returned to the Jobserver by the data source OR the jobserver fetches as a separate step entire data from the respective table into the cache and starts profiling on that entire data?

I raised a ticket with support as the sampling & profiling of a snowflake schema of around 2000 tables with Push down sampling enabled (10k rows, XL warehouse) was taking 48 hours approx, and times out most of the times.

Support says (not sure if I need to read these two together) -

" try is to increase the push down sampling value."

“The time the jobs take also depend on the size of the tables, depending on their size it is not unreasonable for the schema refresh with sampling & profiling to take a long time.”

664 Messages

 • 

10.6K Points

2 years ago

@noor.shaik Does this thread from the Developers category help at all?

664 Messages

 • 

10.6K Points

2 years ago

@noor.shaik I got this from Support: “If Push Down Sampling is being used, it will be limited to the rows entered as a value to the property, eg pushdownsampling=10000 means 10000 rows per table.”

Loading...