Creating an Endorsements Scatter Plot

Our goal for the first scatter plot is to compare the endorsements a mod receives with the total number of times that mod is downloaded. How are the two measures related to each other? My initial assumption is that they would be heavily correlated, with high total downloads corresponding to high endorsements, but how related will they be and how much variation will there be? Another interesting question is: Which mods stand out within these measures? There are sure to be a few outliers, data points that do not fall within the same area as the majority of points.

Learning Objectives

  • Create a scatter plot.
  • Use Columns and Rows to add dimensions/measures to a worksheet.
  • Format scatter plot points.
  • Format chart axes.

Initial Scatter Plot

  1. Click on the New Worksheet button.
  2. Right click the new worksheet and rename it Endorsements.
  3. Note that on the left we are still using the joined mod and tag (tags+) data. Click mods to switch to just the mod data - we will not be using tag data for this plot. The data source options

What happens if we forget to switch from tags+ to mods? We will be creating a point in the plot for each mod. In mods, one record corresponds to one mod, so this will be a straightforward process. In tags+, many mods will have multiple records, so our default solution would end up adding together the endorsements and downloads for each of these records, creating extremely inflated values based on the number of tags a mod uses. Changing from sum to average values might be the perfect solution to this, although it does add an extra step, but some mods may not have tags at all. These mods would not be in our tags+ dataset. The only solution for this issue (that does not involve creating a new dataset) is to use the mods dataset.

Adding Data

Scatter plots rely more heavily on Rows and Columns than our previous charts did. However, it is not necessary to know how we want the data arranged. We will prove this by adding both our measures to the Rows field.

  1. Locate Endorsements under Measures and drag it onto Rows.
  2. Locate Total Downloads under Measures and drag it onto Rows as well. The **Rows** box with Endorsements and Total Downloads and the empty **Columns** box
  3. Click on Show Me and choose the scatter plot option. Note that this is currently the recommended option. After choosing scatter plot, Tableau will automatically move one of our measures to Columns. Changing chart type to scatter plot.
  4. For this chart we want endorsements to be plotted along the y-axis. Theoretically this means we would be observing how endorsements vary depending on number of total downloads. If endorsements were along the x-axis instead, we would be observing how total downloads vary depending on number of endorsements. In some ways this is all semantics, but it still does matter. Check to make sure that Endorsements is in Rows (and that Total Downloads has been moved to Columns). The Rows and Columns boxes
  5. If Tableau moved Endorsements instead of Total Downloads to Columns, click the Swap Rows and Columns button from the toolbar. This is a handy button to be aware of when making many types of charts. The Swap Rows and Columns button

This is the resulting chart:

A scatter plot with one point

Displaying Multiple Points

Unfortunately, we only have one point on our chart. This is because Tableau does not know how to group our data. By default, it is treating everything in the dataset as one group, and therefore one point, taking the sum of all endorsements and the sum of all total downloads from the dataset. If we want it to do something different, we will have to tell Tableau how to group the data. Do we want a point for each category? For each country? For each mod? Since we do want a point for each mod, we need some kind of unique identifier so that no mods will be accidentally grouped together. This is another situation where Mod ID is extremely useful. In a pinch we could perhaps use the mod names, but we do not have a guarantee that these are always different.

  1. Locate Mod ID under Dimensions.
  2. Drag it onto the Detail box on the Marks shelf. The **Detail** box

Now we should see a point for each mod. As we can see, endorsements and total downloads are indeed heavily correlated. The most extreme outlier in our dataset, the point at the top right, is for the mod SkyUI, a complete overhaul of the Skyrim user interface.

A scatter plot with many points, most going diagonally from the bottom left to the top right


Formatting the Scatter Plot

We will apply several formatting options to make the scatter plot cleaner and easier to use. We will format the points, remove unnecessary grid lines, modify the tooltip, and edit our axes.

Data Points

The circle outlines we are currently using for our data points feel very cluttered to me. Changing them to a simpler shape would make it easier to examine the visualization and understand what is happening. Something the outlines do very well, however, is indicate where there is a lot of point overlap. In the center the overlap is so extreme that we basically have a solid blue blob. If we use a simpler shape, like solid circles, such a blob would not have the same meaning, since we could achieve it with far fewer mods. We will mitigate this slightly by making our points partially transparent. That way we will see darker areas where there is overlap.

  1. Click on the Shape box on the Marks shelf.
  2. Choose your preferred shape. I picked the solid circle, as I find it a lot cleaner than the circle outline, but you can pick whatever you like best. The Shape popup menu
  3. Click on the Color box on the Marks shelf. There are options for color, opacity, and effects.
  4. Change the opacity to 40%. At this level of opacity, it is still easy to show single data points like SkyUI while still indicating areas of considerable overlap. The Color popup menu

Grid Lines

As with the bar chart, we have unnecessary grid lines - chartjunk. We will remove these.

  1. Right click any blank space on the worksheet and choose Format....
  2. Click on the Lines symbol near the top of the Format tab that opens on the left where our dimensions and measures are normally shown.
  3. Set Grid Lines, Zero Lines, Axis Rulers, and Axis Ticks to None. Make sure that you have selected the Sheet tab, as opposed to the Rows or Columns tabs. Removing unnecessary lines

Note that this time we do not need to switch to the Columns tab to change Grid Lines there. If you check, you will see that Grid Lines is already set to None. It would seem that the issue we had with bar charts is not an issue for scatter plots.

Tooltip

The biggest issue with our tooltip is that it currently displays Mod ID in addition to endorsements and total downloads. Although Mod ID is useful for telling Tableau how to group and plot our data, it is not a particularly meaningful value, either to us or our audience. Mod name would be much more informative and interesting.

  1. Locate Name under Dimensions and drag it onto the Tooltip box on the Marks shelf.
  2. Click on the Tooltip box and remove the line displaying the Mod ID. There is no reason to have this cluttering out tooltip when it is not useful.
  3. (Optional) Remove the label for Name and center the value, as we did with Country in the map tooltip. Editing the tooltip
  4. Click OK.

Axes

Finally, we will make a few edits to the axes. There are two main changes worth making. The first is making the range fixed. It currently goes up to somewhere between 700,000 and 800,000, but if we filtered the dataset, perhaps showing only mods with the category Books and Scrolls, the endorsements range would adjust to better fit that limited dataset. Without taking note of the changed range, it would look like mods from Books and Scrolls had more endorsements (and total downloads) than they actually did. Keeping the range fixed will eliminate this problem. It does mean we may be left with extra whitespace depending on what subset of mods we are looking at, but I find this a small price to pay.

The other change we will make is to the axis ticks themselves. We have a tick mark every 100,000 endorsements, making for a fairly busy axis. These extra tick marks are not really helping us or our viewers, however. As long as I have a general idea of the range, that is good enough to estimate what is happening in the scatter plot. If I want to know exactly how many endorsements or downloads a mod has, I can hover over the data point and examine the tooltip. We will therefore set larger gaps between tick marks.

  1. Right click on the y-axis (Endorsements) and choose Edit Axis.... Choosing Edit Axis for the (vertical) y-axis
  2. In the General tab, set the Range to Fixed. There is no need to change the default values. Editing the axis for Endorsements - general
  3. Switch to the Tick Marks tab.
  4. Set the Major Tick Marks to Fixed. Here we are less concerned about having them change on us (since the range will not change), but we cannot define the tick interval without choosing this setting.
  5. Change the Tick Interval to 250,000. Editing the axis for Endorsements - tick marks
  6. Click the "X" symbol at the top right of the menu to dismiss it.
  7. Right click the x-axis (Total Downloads) and choose Edit Axis....
  8. Set the Range to Fixed. There is no need to change the default values here either. Editing the axis for downloads - general
  9. Switch to the Tick Marks tab and set Major Tick Marks to Fixed. There is no need to change the tick interval this time, since it defaults to 10 million. Editing the axis for downloads - tick marks
  10. Close the window.

Finished Plot

The finished, formatted scatter plot