Skip to playerSkip to main contentSkip to footer
  • 2 days ago
Transcript
00:00Let's talk a bit about value distribution and profiling your data inside the clean step.
00:06The profile pane allows you to visualize the distribution of your data by plotting the frequency of each distinct value as bins in a histogram.
00:15This is a great way to identify outliers and null values in your data.
00:21Field bins are shown in two different variations, summary and detail views.
00:26The summary will show a continuous view of values showing both the range and frequency in which they appear in the column.
00:35While the detail view will give a discrete view of individual values within the column.
00:41Note that you can always click the distribution on the right-hand side of your detail view to skip to the desired values that you want to look at.
00:49To modify the view state, you can use your view state selection.
00:53This is located inside the More Options menu, which is available on each field pane to the right-hand side, and it's represented as three dots.
01:03If you select those three dots, you can go down your drop-down and see the view state selection.
01:09The checkmark will denote which view state you are on right now, and you can select the other to change over.
01:15Note that not all view states are available for all data types, but for the most part, you'll be able to switch between the two for most of your fields.
01:24Note that bins are created by Tableau, which leverages the min and max of a field's values and calculates the bin automatically.
01:33Changing the view state does not impact the data in the flow and does not change how your bins are created.
01:39It's just a way to better profile your data and get several views of how it can be represented.
01:45Now that we've talked high-level about value distribution, let's jump in Tableau Prep and profile our own data using the Summary and Detailed Views.
01:55Alright, so we're back in Tableau Prep, and we're going to take a quick peek at the Summary and Detailed Views in one of our clean steps.
02:03Let's click on the clean step for the Plan of High School Grads data file.
02:08Now right away, we can see that we've got a bunch of different views automatically generated.
02:13Our numeric values are giving a distribution using the Summary view to give us this nice little bell curve for a bunch of our values,
02:20so that we can see where the most common bins are populating our values.
02:25We've also got the Detailed View for a couple of our fields that are string type.
02:29Now Tableau will automatically assign the default view.
02:33Let's take a quick look at the Summary view for 4-year public college.
02:37From this view, we can tell that the most common bin is the 30-40% of students,
02:43and the least common bins are located at the higher end of the spectrum.
02:48This is interesting, but what would happen if we changed over to our Detail View?
02:53I'm going to hit the More Options menu.
02:55I'm going to go down and choose Detail View.
02:57Now we can quickly tell that we have a lot of null values and zero values in our data set.
03:05So this tells us that this field may not be populated correctly,
03:09or it's simply that there are a lot of schools that don't have a lot of 4-year public college graduates.
03:16Let's switch this back to the Summary view,
03:18because I think this view works better for this type of data.
03:22Now let's go over to School Name and see what kind of view we're showing here.
03:27This is our Detail View, and that makes sense for this type of data.
03:31And additionally, we can see a profile of how many values are represented in the data for each individual school name.
03:37Now if the bar is highlighted fully, we can see that, for example, school name Bedford, Bedford High is available in all of our data sets.
03:48But something like Bay State Academy Charter Public School is only available in one of our data sources.
03:54When we hover over, we can see that there's only one row available for this particular school name.
04:01Using the Distribution View, we can quickly skip to different values that may have gaps.
04:06If we go ahead and hit More Options, we can see that we don't have the Summary option for this view.
04:12That's because of the data type is a string,
04:15and we won't be able to do the same type of distribution with string values.
04:19The same can be said for our File Pass View, which also only allows the Detail View.
04:25This again brings up the point that our data types are very important,
04:30not only for manipulating our data, doing calculations and joins,
04:35but also for profiling our data and trying to find null values and outliers.
04:41Next up, we're going to talk more about how to find fields and values inside our data.