Category
📚
LearningTranscript
00:00Up next, we're going to talk about input cleaning.
00:03Now you might be thinking, why are we cleaning data in the input step when we have a clean step available?
00:09Well, there's a few reasons that you might want to clean your data up front.
00:13Oftentimes, performing cleaning and filtering operations up front within the data will make the rest of your flow faster to perform.
00:20In addition, cleaning the data before you bring it into other steps will help you combine, join, and union your data cleaner without a full clean step in between.
00:30In general, the heavier lifting that you can do at the beginning of the flow is better for your flow overall, for performance and accuracy reasons.
00:39Let's take a look at the different cleaning operations that you can perform inside the input step.
00:44First and foremost, you can always remove fields.
00:46To remove fields, all you need to do is uncheck the field from your view, and it will be removed from the rest of the flow.
00:53Now you can select the top box to uncheck all of your fields, or you can individually select them.
00:58In addition to removing fields, you can also change your data types.
01:03For example, you can change fields from string to number, string to date, or any other combination that you can think of.
01:09To change the type, simply click on the type next to the field and select from the drop-down the type you'd like it to be.
01:15Next, you can rename your fields up front.
01:19This is a great way to clean up your fields to make them easier to identify downstream in your flow, as oftentimes database or file fields do not have proper naming.
01:28To rename, simply double-click on the existing name and enter a new name for your field.
01:34Finally, you can apply filters up front inside the input step.
01:38When you click to filter values, you'll get your standard calculation dialog box.
01:43From here, you'll put in a calculation that's Boolean in nature, meaning true or false, and the condition that produces a false result will be filtered out of your flow.
01:51Note that this is the only type of filter available inside the input step.
01:56You'll have many more filters available in other steps, especially the cleaning step.
02:00Using these four cleaning operations in your input step can drastically improve your workflow performance and clean your data for your other steps in your flow.
02:09Now that we've talked about the input cleaning operations, let's dive into Tableau Prep and apply some in our own workflow.
02:16Okay, so we're back inside Tableau Prep, and we're going to apply some of our cleaning operations inside our input step.
02:23Let's go ahead and click on Plans of High School Grads and expand the view so we can see the data.
02:29The first thing we're going to do is remove some unwanted fields.
02:32To start, let's deselect all the fields in the view, and let's just select the ones that we want to keep.
02:38We're going to keep School Name, School Code, 4-Year Private College, 4-Year Public College, and Total Count.
02:46Now the other fields, since they are deselected, will not continue on in the flow.
02:51One more that we'll want to keep is File Paths, and the reason for that is we're going to be parsing out our years here so that we can track which piece of data that we selected.
03:01Let's go to the other pieces of our flow and apply some more cleaning steps.
03:05We'll click on Educator Evaluation, expand the view, and we're going to rename a field.
03:11So let's go to Org Code, and we're going to change our field name.
03:15You'll notice that we always have the original field name available so that we can always tell what this used to be,
03:20but our field name will be what we use going forward.
03:23So we're going to change Org Code to now be School Code because that's how it's referenced in some other data sources,
03:30and when we union and join our data, it'll make it much cleaner to have the same name.
03:35We'll also want to change School Code to String Type because that's what it is in some of the other sources.
03:40You'll see how we're tracking our changes as we go along.
03:43We can see our Change Type in our Changes section, and we can also see our Rename field.
03:48If we want to see our changes in depth, we can always click on our Changes view, and we can see the changes here.
03:54If we click on our Prior Changes, you'll see that our view changes back to the state that it was before the change was applied.
04:00This is a nice way to go back in time and see what your data looked like before you applied the change.
04:06Then you can click on the Latest Change to see how the data looks now.
04:09All right, let's go to the next data source and perform some more input cleaning.
04:14So click on SAT Performance.
04:17We'll expand the view, and we're going to change our School Code to be String Type to match the other data source.
04:23So we'll click on the Type, go down to String, and select it.
04:28You'll see our data types will change.
04:29We'll get our leading zeros back because this isn't coded as a number.
04:33This will be important for our Joins as our other data sources have leading zeros,
04:38and we wouldn't get a match unless we were matching with this string applied.
04:42In addition, we won't be able to perform the Joins on different types, so that's also important.
04:47Now let's go to our Teacher Data Input.
04:49We're again going to change our School Code to String Type to match the other data sources.
04:54And then we're going to place a filter on this data input.
04:58Let's click Filter Values, and our dialog box for our calculation will show up.
05:03Now again, these aren't going to be normal calculations.
05:05They all have to result in a Boolean because they are used directly for filtering.
05:09But the dialog box and a lot of the functions will be very similar to our calculations contained inside Tableau Desktop,
05:16so they should be pretty straightforward.
05:17So our calculation is going to state the total number of teachers, FTE, if those are greater than 50,
05:26that is the data that we want to select.
05:29So essentially, any school that has less than 50 teachers will be filtered out of this data set.
05:35We're doing this because we only want schools that are of comparable size,
05:39and we want to eliminate any of those very tiny outlier schools that have less than 50 teachers.
05:45Let's go ahead and hit Save, and our data will update.
05:48Again, you can see our changes being tracked actively inside our view.
05:52We can also see them in our Changes view.
05:55And if you'd like to edit them, you can just go ahead, click the Edit button,
05:59and it will allow you to change what you've done.
06:01You can also click the X, and it will remove the change directly from your view.
06:07All right, let's go ahead and go down to Per Pupil Expenditures.
06:11Take a look at our view, and we're going to change District Code to String Type.
06:16Again, this will help with joins and combining our data to make it much cleaner.
06:21Next, we'll go to our Advanced Course Data Source, select that,
06:25and we'll again change our district code to a string.
06:28And we won't be changing teacher salaries,
06:30as we'll be doing some other modifications to the CSV data source.
06:34So for now, all of our input cleaning is complete.
06:37We're going to save our flow so we don't lose any of our changes,
06:40and we're good to go.
06:42Next up, we're going to be talking about text file configuration.