Category
📚
LearningTranscript
00:00Let's dive into the topic of data sampling.
00:03To optimize performance, Tableau Prep samples large datasets and returns a subset of records.
00:09The subset of rows returned from your data sources may or may not be representative of your data,
00:15depending on the settings that you've chosen.
00:18Your sample may be optimized for speed, but not necessarily a representative sample of your data.
00:24So it's important that you make sure your settings are to your liking,
00:28and if they aren't returning the right results, you can change them right within the input step.
00:33Let's take a quick look at the different options for data sampling.
00:37Now you can see the data sampling options inside your data sample tab in your input step.
00:43Once you click on the data sample tab, you'll have two main sections to be concerned about.
00:47You can select the amount of data to include in the flow, and you can choose the sampling method.
00:53Now the default sample amount is chosen by PrepBuilder.
00:57PrepBuilder will determine the number of rows to return for your dataset.
01:01This is an automatic algorithm that Tableau Prep uses to optimize the data sample.
01:07Alternatively, you can also choose to use all data to run through your flow.
01:12This will retrieve all rows regardless of the size of your dataset.
01:16And as a result, this can cause performance issues.
01:20Note that even if you pick use all data, there will be some limitations.
01:24Data will still limit to 1 million rows or less for the aggregate and union steps,
01:30and 3 million rows or less for the join and pivot steps.
01:35Now this doesn't mean that the data will be limited when you actually go and run your flow.
01:40It just means that when you're building the workflow, it's going to limit the amount of data that it uses to profile your datasets.
01:48That means when you're filtering, aliasing, and doing other cleaning steps, you may be missing data depending on which option you're choosing.
01:56You can also choose a fixed number of rows to pull.
02:00It's recommended that this is less than a million records for performance reasons.
02:05Now when we talk about sampling method, QuickSelect is the default.
02:10Using QuickSelect, a sample is returned as quickly as possible.
02:14It will use n number of rows or cached data that's available from a prior query to develop your sample.
02:22Now this option is less accurate but often quicker than the random sample.
02:27The random sample will return the number of rows requested,
02:31but it will look at all records and return a representative sample of data.
02:36Now using a random sample may impact performance, but this will only be before your standard cache.
02:43Once you run your flow more than once, some of the data will cache in your memory,
02:47and you'll be able to run the flow again with faster processing time.
02:52Now that we've looked at the data sampling options,
02:54let's jump into Tableau Prep and choose data sampling options for our own datasets.
02:59Okay, so we're back in Tableau Prep, and we're going to choose data sampling for all of our inputs.
03:05For our school level inputs, we're going to choose random samples,
03:09and for our district level inputs, we're going to use all data.
03:13Now technically, our data is not huge, so we'll really be pulling all of our dataset through our flow anyway,
03:21but this will get you some good practice at setting your data sampling.
03:25Let's go ahead and click Plan of High School Grads.
03:28We'll pull this up so we can see it a little bit better.
03:31Choose Data Sample, and we're going to choose a random sample for our sampling method.
03:36Again, this will be a more thorough sample than the Quick Select,
03:41but that won't matter too much for performance in our place because our datasets are not very large.
03:47You will want to consider this for datasets that are large, however, as it could impact performance.
03:53Let's go to our other school inputs and change to random sample.
03:57So we'll click on Educator Evaluation Performance.
04:02We'll go to Data Sample, and we'll choose Random Sample.
04:06Then we'll go into our SAT Performance.
04:09We'll go to our Data Sample tab, choose Random Sample here.
04:13We'll go to Teacher Data, click Data Sample, and click Random Sample here.
04:19Alright, so that's all of our school inputs, these top four,
04:22and then the bottom four are going to be our inputs at district level.
04:26So we'll click on Per Pupil Expenditures.
04:30We'll click Data Sample, and instead of Random Sample here,
04:35we're going to want to modify our select amount of data to include in the flow.
04:40Now for these datasets, we're going to want to choose Use All Data.
04:44Now all of the data will be used in our flow.
04:47You'll notice how sampling method is grayed out.
04:50That's because you're not sampling if you're pulling in all of the data.
04:53Sampling is a subset of your total dataset, but if we're using our total dataset,
04:58there's no need for sampling.
05:00Let's go ahead and click on our Advanced Course Data Source,
05:04choose Data Sample, and choose Use All Data.
05:08And we'll go to our Teacher Salaries dataset, click Data Sample, and choose Use All Data.
05:15And up next, we'll talk about how to refresh your data input connections.
05:23So many artists find that we are going to change the results made Width looks like certain
05:24specific things to enable Glucon to guide us, to have an authentic value to a عليality.
05:28Yes.
05:29And of this, I'll follow you later.
05:31Now for theôture, your initial Act of Ed introduce new lodge demandAmt.
05:33し ます Raymond Campagascarer.
05:34Now let's spend the time here, and my mastermind element here for another reason.
05:36Yep, which we're excited, are over here, you know,
05:37as questions as questions and here, go ahead or to65,…
05:38So here, you're going to love this tool.
05:40That way, I'm a different cornerstone, eyes.
05:41We'll go ahead and be back.
05:42So weображ out our discretion, my wurden here and I liebe it.