- Describe the dataset we are working with.
- Identify and define the different parts of a worksheet.
Before we go any further we need to make sure we have the required resources to complete the tutorial. These are:
- A Tableau Public account
- The Tableau Public software
- The dataset
You can create a Tableau Public account by clicking the Sign Up button at the top of the top of the Tableau Public website.
The Tableau Public website is also where you can download the application. The button should be in the center of the screen. Tableau Public is available for both Windows and Mac (Linux users will need a VM), and the application will take approximately 1.5GB of disk space. If you are having difficulties or are concerned that your computer may not meet the system requirements, Tableau has a list of technical specifications.
Download the Data
- Follow this link to view the data on GitHub.
- Download the data by clicking the green Clone or download button and choosing Download ZIP.
- Unzip the data. On a Mac you can do this by double clicking the downloaded zip file. On Windows you can right click and choose Extract All.
- Move the folder to your Desktop or somewhere else you can easily locate it.
In this tutorial we will be visualizing Skyrim mod data. For reference, Skyrim is a roleplaying video game set in a vast fantasy world and often heavily modded. Mods are special modifications that can be added to a game. They can do anything from totally overhauling the alchemy system (Complete Alchemy and Cooking Overhaul), redoing the user interface (SkyUI), replacing all dragons with Thomas the Tank Engine characters (Really Useful Dragons), or simply adding a single weapon (Longclaw) to the game. (If you only follow one of those links, I recommend Really Useful Dragons.)
This data has been scraped from Nexus Mods, the primary site for hosting and downloading Skyrim mods. The data has been cleaned/modified slightly to make it easier to user. For more information on how the data was collected, potential issues, and what cleanup was performed, view the notes on this dataset.
Exploring the Data
Before we start working with the data, it is important to look over it so we know what we are working with. In the folder you downloaded there should be three files: README.md, mods.csv, and tags.csv. Find and open mods.csv. CSV stands for comma-separated values, a common format for storing large quantities of data. Each new line in the file marks a new record, and each comma represents a new piece of data about that record. Most computers can recognize a CSV and will open it using spreadsheet software like Microsoft Excel or LibreOffice Calc. Note that in mods.csv, each record/row is a mod. The file contains the 5,000 highest endorsed mods at the time of collection.
Some columns of interest:
- Mod ID - This is a unique identifier for each mod. Although this is not something we would want to visualize, its uniqueness provides a different kind of value that we will make use of in this tutorial.
- Name - This is the name of the mod. As such, it provides some insight into what the mod is about and could provide interesting context to some visualizations.
- Category - There are 58 different categories in this dataset. We can use these to group and visualize the data.
- Endorsements - Every Nexus user who downloads a mod has the option to endorse that mod, the equivalent of giving it a like or thumbs up.
- Unique Downloads - This provides a count of how many different Nexus users have downloaded the mod from the site.
- Total Downloads - This provides a count of the number of times the mod has been downloaded from the site. Many users will download a given mod more than once for a variety of reasons, like the mod being updated.
- Country - This is the country listed in the user profile of the user who uploaded the mod. While many of them are Not Specified, there are enough that are specified for us to use this geographical data to create a map.
Once you are satisfied looking over mods.csv you can close it. There is no need to open up tags.csv unless you really want to. It has some of the same data, but each mod has a record in tags.csv for each tag it has been given. For the 5,000 mods, we have 24,364 records of tags.
As we go through this tutorial you can imagine that you work for a news/entertainment website. You have been given this data and instructed to create a visualization that will allow users to explore the dataset as part of an entertainment article about video games and/or mods.
Loading Data into Tableau
- Open your Tableau application
- Under Connect, select Text file. Note that although your computer will open a CSV as a spreadsheet, the CSV is still, ultimately, a text file.
- Navigate to and open your mods.csv in the explorer or finder window that appears.
- You should now see your data displayed in Tableau.
Tableau has already created a blank worksheet for us. Click the Orange Sheet 1 tab at the bottom left of the screen to view it. Note the three buttons next to it for new worksheets, new dashboards, and new stories.
We will now look over some of the basic parts of the worksheet.
Data - Tables
As of early 2020, a Tableau Public update has changed this info. The terms Dimensions and Measures are no longer seen in the column. Instead, the whole column is referred to as Tables. However, Dimensions and Measures are still throughout the application, so it is important to know what they are.
This is a list of columns from the file. Note familiar names from the dataset, like Country under Dimensions and Endorsements under Measures. Tableau has broken down our spreadsheet to be useable by the software, defining each column as either a dimension or a measure. (You can change this manually by right clicking an item.) A measure is numerical, continuous data - anything that you could take a sum or an average of and have it mean something. A dimension is for more categorical types of data.
Next to each column name, there is an icon indicating the data type. The globe icon indicates that it is geographic data, the # indicates that it is numeric data, Abc indicates that it is textual data, and the calendar icon indicates that it is a date.
A few of the names here are in italics. These are automatically generated by Tableau.
Pages, Filters, and Marks
These three sections, found in a column to the right of Dimensions and Measures, are typically either called shelves or cards.
We will not be working with the Pages shelf in this tutorial, but it is useful to make note of. You can place a dimension (or measure) here to create a new "page" for each category in that dimension. Users can then cycle through these pages. An example would be showing change over time and having users cycle through years.
We will work with the Filters shelf and the Marks shelf, so for now make a note of their existence.
Rows and Columns
These are found at the top of the worksheet and will help define how our data is displayed. Fortunately, we will usually be able to let Tableau handle the specifics.
Also called the canvas, this is where our data will be displayed. Since we currently have not told Tableau to display anything, the canvas should contain the text Drop field here.
This button at the top right can be toggled on and off to show a list of the various types of charts we can use and allow us to switch between them. It will highlight various options as they come available. It also shows what dimensions/measures are required to be able to use a given type of chart.
Renaming the Worksheet
Sheet 1 is not a very informative name. This will become more of an issue as we add sheets - it is difficult to remember what Sheet 1 is as opposed to Sheet 2 or Sheet 3. Therefore, we should go ahead and rename our first worksheet.
Find the Sheet 1 tab at the bottom of the application window and right click on it. Then, type in whatever name you like. For this example we will use Map. Press Enter to save the new name.
It is useful to know that Tableau does support the use of Undo and Redo. There are arrows at the very top left that you can use, or you can use the keyboard shortcuts. Undo is ⌘ + Z (Mac) or ctrl + Z (Windows). Redo is ⌘ + Y (Mac) or ctrl + Y (Windows).