Category
📚
LearningTranscript
00:00Up next, let's talk more about wildcard unions.
00:04Wildcard unions allow you to combine files or tables within a folder or directory at the input stage.
00:11Once you bring an input step in, you'll see that there's a multiple files tab.
00:16Inside the multiple files tab, you'll have two options, single table and wildcard union.
00:22Note that the wildcard union is different from the union step.
00:26The union step is a completely different step from the input step that allows for unioning between different data sources.
00:34But wildcard unions are ideal for lots of input files from the same directory or multiple tabs from the same Excel file.
00:42The union step itself has a maximum of 10 data sources that it can union, but that is not a limiting factor for wildcard unions.
00:50Wildcard unions do have limitations in regards to database tables.
00:55They are limited to only certain databases, including Amazon Redshift, Microsoft SQL Server, MySQL, Oracle, and Postgres DB.
01:05A couple other key things to note about wildcard unions.
01:08When you're refreshing your source and you want to bring in more files, all you have to do is refresh your input step and the new files will come in.
01:15Additionally, if you're saving your workflow as a package flow file, new data won't be pulled in from new files and tables that are added to the directory.
01:23You'll need to re-add them manually because the data in a package flow file gets bundled into the file.
01:29Whereas a normal flow file will just point at a directory or location and pull in the new data.
01:35Now let's talk a little bit about customizing your wildcard unions and how you can get them working optimally for your specific use case.
01:43Let's take a look at the input dialog.
01:45We can see the first dropdown is Search In.
01:48The Search In box allows you to select the directory or schema to use to find tables and files for the union.
01:55From there you can choose to include subfolders by selecting the Include Subfolders box.
02:00This will include all files contained within subdirectories of the parent folder.
02:05From there you can select Files, Sheets, and Tables.
02:09These dropdowns will allow you to include or exclude specific files, sheets, or tables from the data source that you've selected.
02:17Matching patterns allow you to include files, sheets, and tables which contain specific text.
02:23You'll put a partial piece of text inside the matching pattern box followed by an asterisk.
02:29If you leave this box blank you'll end up unioning all files from this directory or from your Excel workbook.
02:36Note the example here where we're looking at our Plans for High School Grad School matching pattern.
02:42You can see that we just have the beginning piece of this which says Plans for High School Grad School followed by an asterisk.
02:49So if we look at our directory we can see that a match will come up for the first set of files but not the second set.
02:56Because we're specifically calling out school and not district.
03:00Since our matching pattern is specific we'll get only these top five files.
03:05If we were more general and just left it as Plans for High School Grad, then we'd end up pulling in all of these files.
03:11Or if we left it blank we'd also grab all of the files for the union.
03:16You can see the files that matched down in the Included Files and Tables section.
03:20This will give you a preview of the files or tables matched based on the wildcard settings.
03:25You'll end up tweaking various settings here and changing your matching pattern until you get exactly what you'd like to have inside your wildcard union.
03:33Note that when you do wildcard unions your input steps will have a plus sign denoting that the sources are unioned together.
03:40You'll also get a new field inside your flow which denotes the file names or table names.
03:46This will allow you to filter and parse out your data based on the fields that have been unioned.
03:51So even if you union in something that you don't need and you can't tune it out with the matching pattern you can always do so with a regular filter later on.
04:00A quick pro tip about wildcard unions is that CSV's union automatically in the same directory as well as sheets within Excel workbooks.
04:09Now that we've talked a bit about wildcard unions let's go into our own Tableau Prep flow and create wildcard unions for our inputs.
04:17So we're back in Tableau Prep and we're looking at the flow that we left off with in the last lesson.
04:22Let's go ahead and click on our Plans for High School Grad data source.
04:26We'll select the input step and we'll bring up the data.
04:29Now you can see we're on the multiple files tab and we have single table selected.
04:34We're going to want to choose wildcard union.
04:36Now our directory is already set up because that's where our original file was located.
04:41We don't have any subfolders so we'll skip that option and we'd like to include files.
04:46Now we can tune our matching pattern to match exactly what we'd like.
04:50Now we do want the plans for high school grad and this is the school part of the flow but we won't want to call out the specific dates because we want to pull in more than one file.
05:00So we'll go ahead and we'll replace that with an asterisk.
05:03Let's hit enter and see what happens.
05:06You can see it's recalculating which files are going to be included.
05:10This is looking good.
05:11We have all five of our school level plans for high school grad files and each of our files only has one sheet in them so we won't need to define the matching pattern from there.
05:21Let's go ahead and hit apply to apply our wildcard union.
05:25You'll notice that a few new fields have been added down in our profile pane.
05:29We have a table names and a file paths field that we can reference now.
05:34We'll be using these later to parse out some dates that we can use for filtering and to establish which tables we're pulling from.
05:41Let's go ahead and apply some more wildcard unions to our inputs.
05:45Let's go to educator evaluation.
05:47We'll pull this up so we can see it a little better.
05:50Click wildcard union.
05:52We won't include subfolders.
05:54We do want to include files.
05:56And again we're going to take off the back half of this file name and just add an asterisk right after school.
06:03Let's hit enter.
06:04We'll see the files come in down here on the bottom.
06:07That's looking good.
06:08We only have four files for educator evaluation performance for the school level.
06:12We'll hit apply and our wildcard union will be applied.
06:16Let's go ahead and do a few more.
06:19We'll click on our SAT performance data source.
06:22Pull this up.
06:23Choose wildcard union.
06:25We won't include subfolders.
06:27We'll go down to our matching pattern.
06:29Right after school we're going to add the asterisks.
06:32Hit enter.
06:33Now we've got all five of our files.
06:35Let's hit apply there.
06:37And our wildcard union is set.
06:39Next we'll go to the teacher data data source.
06:42We'll drag this up.
06:44Select wildcard union.
06:46Go down to our matching pattern.
06:48Go down to our matching pattern.
06:49Right after school we're going to put our asterisks.
06:52Hit enter.
06:53And we'll see our data sources come in.
06:55There we go.
06:56We've actually got six files for this specific data set.
06:59Let's hit apply.
07:00And our wildcard union will be applied.
07:03Next we'll go to our per pupil expenditure source.
07:07We'll choose wildcard union.
07:09Drag this up.
07:11Now we've got a different set because these are district level sources.
07:15So we're going to stop right after district and we'll add an asterisks there.
07:20Let's hit enter and see what pulls in.
07:22Alright.
07:23We've got all five of our files that we want.
07:25Let's go ahead and hit apply.
07:27And our data pulled in.
07:29Alright.
07:30We'll go to our advanced course completion source.
07:32Choose wildcard union.
07:34Right after district we're going to add an asterisks.
07:37We'll hit enter.
07:39And we've got two files coming in.
07:41Let's hit apply.
07:42And that's all set.
07:44Now it should be noted if you're having any trouble with the wildcards finding exactly
07:48which files you're looking for.
07:50Make sure you're spelling your matching pattern completely and using the asterisks.
07:54But also make sure that all your files are located within the same directory.
07:58If they're located within different directories, your matching pattern won't know where to look.
08:02This should be pretty straightforward because the file package that we've given you should contain all the files in one location.
08:08But if you're having trouble, just make sure that you check that out.
08:11Alright.
08:12So our wildcard unions are applied.
08:14I'm going to go ahead and click save so we can save our changes.
08:17Next up, we're going to be diving into a pro tip around input joins.
08:23Really?
08:24Maybe, but.
08:25I'.
08:26I'm going to go ahead and click save so I can follow the filter.
08:29Maybe.
08:30Maybe, maybe.
08:32Maybe.
08:33Maybe.
08:34Maybe.
08:35Maybe.
08:36Maybe.
08:37Maybe.
08:38Maybe.
08:40Maybe.
08:41Maybe.
08:42Something that we've got here in my notes.