Skip to playerSkip to main contentSkip to footer
  • 2 days ago
Transcript
00:00Now let's talk a bit more about data types and sizes.
00:04One of the first steps in evaluating data is to examine data size, field types, and unique values.
00:12This can be done at several stages in your flow, but the simplest approach is to add a clean step to perform these checks.
00:20Let's take a look at your Tableau Prep view to see how to add a clean step.
00:24You can see here that our flow has the clean steps added in, but how did they get there?
00:29How do we move from our input step to a clean step?
00:32There's two main options.
00:34First is automatic.
00:37A gray box will appear next to your input step and it will say view and clean data.
00:42If you select the gray outline, your clean step will be added automatically.
00:47Tableau Prep is prompting you to perform a clean because it knows that's the next logical step in your flow.
00:53Alternatively, you can add a clean step manually by clicking the plus sign next to your input step and then choosing clean step.
01:02Again, there's many steps where you can change data size, field types, and unique values, but the clean step has the most robust structure to be able to perform these tasks.
01:11One of the first things you should look at in your clean step is the field data types.
01:17Data types can be modified by selecting from the header.
01:21When you click on the icon that shows the data type, you'll get a drop down.
01:25From that drop down, you can change the data type and the data role.
01:29We'll talk a little bit more about data roles later, but you can see your data types available right under that heading.
01:34You can choose number, date and time, date, or string.
01:39Data types are assigned automatically upon connection to your data source, but like Tableau Desktop, it may not be perfectly assigned at the outset.
01:47This is one reason that you'll be modifying the data type in your clean step.
01:51If you're converting a field to date or date time, Prep will automatically use date parse.
01:57This will help get your data into the right shape to be a date or a date time field.
02:01If you'd rather modify your data type inside your input step, this can be done, but it's limited to certain sources.
02:09The sources where you can modify data types inside the input step include Microsoft Excel, Text Files, Box, Dropbox, Google Drive, and OneDrive.
02:20If your data set is not included in that list, you should be modifying the field data type inside your clean step.
02:26Inside the cleaning step, you can see your data size very quickly by looking at your profile pane in the top left-hand corner.
02:34This will show the number of fields in your data set and the row count for your data set as a whole.
02:39Note that it will truncate the exact amount, but if you hover over the icon, you'll get your exact row count for your data set.
02:46Finally, you can see the unique values of each field by hovering over that field.
02:53For example, when we hover over the two-year private college field, we can see that there are 76 unique values included in this field.
03:01This is similar to doing a distinct count inside Tableau Desktop to get the unique number of values included in a column.
03:09Now that we've talked high-level about data types and sizes, let's jump in Tableau Prep and apply some of these techniques to our own flow.
03:17All right, so we're back in Tableau Prep, and the first thing that we're going to do is add our clean steps to all of our inputs.
03:24Now, to do this easily, you can just click on the gray box that automatically appears after your input step that says View and Clean Data.
03:31Let's go ahead and just click on these gray boxes, one after another, to add all of the clean steps that we're going to need to profile our data.
03:41Again, there are a few ways that you can add these.
03:44The easiest is to just click the gray box, but you can also click the plus sign to the right-hand side of your input and choose Clean Step.
03:52Now, you'll see that this data will populate after a little bit, but Tableau's calculating and profiling the data as we speak.
03:59Let's go back up to our first step so that we can get a peek at what the data looks like.
04:04Note that this might take a little bit to come up the first time you add a cleaning step, but once the data is cached, this will run pretty quickly for you.
04:12Now, we can see that our clean step has been added.
04:15I can open up my Changes pane to see if there are any changes applied, but this is brand new, so we're going to have nothing in there.
04:22Now, we're going to talk in-depth about the cleaning step in the next few lessons,
04:25but for now, we're just going to utilize the cleaning step to look at the few data profiling methods that we talked about in this lesson.
04:32First, let's take a quick peek at our data size.
04:36If we look at the top left-hand corner of our view, we can see that we have six fields and 2,000 rows.
04:43If I hover over the 2,000 rows, I can see that the exact count for our rows is 1,963 rows.
04:49Additionally, if I hover over a specific field, I can see the unique number of values for that field.
04:57For school name, I have 471 unique values.
05:01School code, I have 421.
05:03And for your private college, I have 568 unique values.
05:08Now, you might be thinking, well, there's only 471 school names, but there's almost 2,000 records.
05:15How does that make sense?
05:16Well, Tableau Prep is doing a unique count for school name, but that doesn't mean that we don't have multiple records of that particular school.
05:25And if we remember, we unioned a few different files together to get this one particular data set.
05:32We can see that this is multiple years' worth of data, which makes sense when we're looking at our unique count,
05:37that this would roughly be five files multiplied by our unique count of 471 schools would get us approximately that 2,000 records.
05:48So that's a quick check that we can do just to make sure that our data looks right and feels right
05:52before we continue on the path of cleansing and manipulating our data.
05:56Additionally, we can change our field data types by just clicking on the field type shown over the field
06:02and then selecting a different data type.
06:04Now, we're not going to perform any of these changes now.
06:07We're saving those for our clean step deep dive, but it's important to understand these steps
06:12so that we can quickly profile and analyze our data before we dive in and start manipulating.
06:19Next up, we're going to talk a little bit more about value distribution inside the clean step
06:23and how to further profile your data.
06:25Let's go.