Charting the Top Tags
Now that we have joined our datasets, our goal is to use this data to show the top twenty tags for our data.
- Create a bar chart.
- Apply a "top" filter to a worksheet.
- Add formatting to a bar chart.
- Click the New Worksheet button on the bottom panel of the application (to the right of the tab for the Map worksheet and the left of the buttons for New Dashboard and New Story).
- Right click Sheet 2 and rename it something more descriptive, like Top Tags.
- Note the Data tab on the left. There should be two options: mods and tags+. If tags+ is not currently selected, click on it. This is the joined dataset we created in the last section.
- Note that there are now two sub lists for dimensions and measures, one for mods and one for tags. The automatically generated fields (in italics) are displayed as normal.
- Locate Tag under Dimensions for tags.csv and drag it onto the blank worksheet. The sheet will display a list of the various mod tags along with some placeholder text Abc.
We also want to see how many mods are associated with each tag.
- Locate tags.csv (Count). We earlier discussed how mods.csv (Count), previously Number of Records, means the same thing as number of mods. Because each mod may have many records in tags.csv, this is not quite the case for tags.csv (Count). However, for each tag, the same mod will only ever be listed once. This means that for our purposes we can treat tags.csv (Count) as number of mods.
- Drag tags.csv (Count) on top of the Abc placeholder text. (The column with the placeholder text will show a black outline when you hover over it with tags.csv (Count).)
- Click on the Show Me tab on the top right and choose the bar chart option. It should have a red/orange outline, which means that it is the recommended chart type.
We have a bar chart displaying the information we want - how many mods use each tag.
Since we want to see the top tags, it makes sense to sort our bar chart to show the most common tags at the top. In fact, as a general rule it makes sense to sort bar charts in increasing or decreasing order. The main exception would be if the categories (in this case tags, but they could be anything) have a natural order that would be unwise to break up. The list of tags has no such order.
- Hover over the "Tag" label (above the bar chart) and click on the arrow that appears. Since "Tag" is such a short word, almost the whole lavel will be covered by the dropdown arrow.
- Choose Field -> CNT(tags.csv) to sort based on the number of records using each tag.
While we can see which tags are the most popular, the bar chart is still displaying all of the tags, and there are quite a lot. We can simplify this by having Tableau filter out everything except the top twenty we want to see.
- Locate Tag under Dimensions for tags.csv and drag it onto the Filters shelf.
- In the popup menu, go to the Top tab on the far right.
- Choose the option for By field.
- The defaults should be Top, 10, by tags.csv, and Count. This is almost exactly what we want. Change the 10 to 20.
- Click OK.
The filtered chart:
Finally, we want to add some formatting to make our chart prettier and more useable. The types of formatting we can do with a bar chart (and many other similar types of charts) are very different from what we can do with a map. For this chart we will focus on removing distracting or extraneous information.
Edward Tufte, a famous figure in data visualization, coined the term "chartjunk" in his book The Visual Display of Quantitative Information. In essence, chartjunk is unnecessary ink that distracts a viewer from the information presented in a visualization. We have quite a bit of chartjunk we can remove from this visualization. We can remove an extraneous label, the unnecessary grid lines, and the bottom axis.
If you would like more information about how to design visualizations, I strongly recommend reading Tufte's book. It is currently on reserve at Bizzell Memorial Library (which means you can ask for it at the front desk and check it out for four hours at a time). Tufte, E. (1983). The Visual Display of Quantitative Information. Graphics Press.
- At the top of the bar chart, note the word "Tag" above the row labels (this is the label we hovered over to sort the chart. We can remove this extraneous label.
- Right click on "Tag" and choose Hide Field Labels for Rows.
- Next we will remove extra grid lines. Right click on a blank area of the worksheet and choose Format....
- On the formatting tab that appears on the left there is a list of five symbols. Click on the lines symbol.
- Change Grid Lines to None. The dropdown already says None, but this is incorrect because manually changing it removes lines. Note that the dropdown provides additional options for line style, width, and color.
- For good measure, set Zero Lines, Axis Rulers, and Axis Ticks to None as well.
- Unfortunately, this change is deceptive. Under Format Lines, choose Columns instead of Sheet. The Grid Lines dropdown here is not set to None!
- Change Grid Lines to None here as well.
When I first made and saved a bar chart, I was very surprised to see that the saved visualization still showed grid lines despite having set the Sheet lines to None. It took a bit of experimentation and exploration until I found the problem with the Column lines.
- Right click on the bottom axis and click (to uncheck) Show Header. (Clicking anywhere in the bottom axis should work.)