Categories
How-To's Tableau Techniques

Radial Family Trees in Tableau

Recently I shared a visualization mapping the family trees of three Ancient Mythologies (see below). This was one of the more complicated visualizations I have ever worked on, but a few people had asked how it was built, so I’m going to do my best explain the approach. Unlike some of the other tutorials I have written, this won’t be a step by step guide on how to build one of these from scratch, but instead, I’ll explain some of the techniques, provide a template, and go through detailed instructions on how to use that template to build your own. Here is the visualization I mentioned earlier.

Image and Link to a visualization on Tableau Public

The tree we are going to be building today is a little less complex than some of the ones represented above. We are going to build a family tree for the eight main Houses from Game of Thrones, using the tree found here as our base. When we’re finished, it should look something like this.

An image of the Family Tree we are going to create in this tutorial

You can find all of the files needed to build this here. Start by downloading the following files.

  • Family Tree Template.twbx
  • Family_Data.xlsx
  • Lineage.xlsx

Save all of the files locally, and then open up the Tableau Workbook (Family Tree Template). Edit the data source and edit both of the connections (Family Tree Data and Lineage) to point to your locally saved files.

I’ve also saved copies of the completed files you can use for reference (Family Tree Template_Complete.twbx and Family_Data_Complete.xlsx).

The Data Source

If you look at the Data Source in the Family Tree Template, you’ll see it’s made up of 5 Tables.

  • Data: This is the main table. It is the Data sheet in the Family_Data Excel file. There are a number of calculated fields in that Sheet which we’ll discuss a little later on
  • Densification: This is our densification table. It is the Densification sheet in the Family_Data Excel file. There are records for 7 different “Types”, that will allow us to create calculations for each element of chart and then combine them in the same view.
  • Lineage: This file is used only for highlighting the lineages in the family tree. It is built using a Tableau Prep workflow and is stored as a separate file (Lineage.xlsx)
  • Father: This is also the Data sheet from our Excel file. There is a self-join from the Father field in our main table to the Name field in the second instance of this table. This is done to bring in data about the location of the Parent nodes
  • Mother: Same as above, but in this case the self-join is from the Mother field in our main table to the Name field in the third instance of this table.

For all of the fields coming from the Mother and Father tables, I have renamed the fields in the data source (ex. Column (Mother)), and hidden any fields that aren’t being used. We’ll talk more about each of the fields in our Data Source when we get to the section on filling out the template.

Elements of the Chart

As I mentioned earlier there are 7 “Types” in our densification table that represent 7 different elements of our chart.

An image highlighting the 7 different chart elements
Shapes
  • Base: This is just a colored circle, the base of each node. 1 record in the densification table
  • Base_II: This is just a white circle, sized slightly smaller than the Base circle. 1 record in the densification table
  • Selection: This is a colored diamond, sized slightly smaller than the Base_II circle to highlight which person is selected. 1 record in the densification table
Lines
  • Parent_Line: This line is made up of 3 sections and is only present when a person has two known parents. There is a straight line coming from each of the parents and then a curved line connecting them. There are a total of 104 records in the densification table, 2 for each of the straight lines, and 100 for the curved line.
  • Group_Line: This is a curved line that connects all of the nodes within a Group (a Group is a set of nodes with the same parents). There are 100 records in the densification table.
  • Parent_Drop_Line: This is a straight line that connects the Parent_Line to the Group_Line. If there is no Parent_Line (only 1 known parent), then this is a straight line from the Parent node to the Group_Line. There are 2 records in the densification table.
  • Child_Line: This is a straight line that connects the Group_Line to the Child node. There are 2 records in the densification table.

How it’s Built

I’m not going to go into too much detail here. The techniques I used were very similar to the ones used in previous tutorials. For each element of this chart, there are a separate set of calculations (grouped by Folder in the Tableau workbook). If you’re interested in learning more about these techniques, I would recommend checking out the 3-part series on this site titled “Fun with Curves in Tableau”, especially the 1st installment.

As I mention in that post, and many others on this site, to create any type of radial chart in Tableau, you really only need two inputs. For each point in your visualization, you need to calculate the distance from the center of the circle, and you need to calculate the position of the point around the circle (represented as a percentage from 0 to 100%).

So for each of these elements, I have built calculated fields to calculate the distance and position, and then I have plugged those results into the X and Y calculations to translate those values into coordinates that can be plotted on the chart. If you have questions about how any of these calculations work, please feel free to reach out to me on Twitter or LinkedIn and I’d be happy to talk through them.

Once all of those X and Y coordinates were calculated, I created 3 “Final” calculations that are used to combine all of the different chart elements together.

  • Final_Y: This is a case statement that returns the value of the appropriate Y calculation for each element “Type”
  • Final_X_Line: This is a case statement that returns the value of the appropriate X calculation for each of the Line elements
  • Final_X_Mark: This is a case statement that returns the value of the appropriate X calculation for each of the Shape elements

Then I created a dual axis chart and set the Mark Type to Line for the Final_X_Line axis, and Shape for the Final_X_Mark axis, allowing me to combine the 3 shape elements and the 4 line elements all together in the same chart.

Using the Template

Just a quick warning, even with the template, this is still somewhat difficult to build. This is not a plug and play template and requires a fair amount of manual tweaking to make the chart work. Start by opening up the “Family_Data.xlsx” file and take a look at the different sheets. Updates need to be made to all 3 of the blue sheets to customize your chart.

Data sheet

This is the sheet that ultimately is used as the data source, but it pulls in data from the other 2 tabs and performs some calculations. The 5 black columns are where you enter your Family Tree data. The 3 blue columns need to be filled out for every record. The 10 red columns are calculations and should not be modified.

  • Name: The name of the person
  • NameID: A Unique Identifier for each person
  • Group: The name of the group. A group is any set of persons who share the same parents. You can name the groups whatever you would like, but make sure it is the same for all persons in the group
  • Mother: The Mother of the person. If the Mother is unknown but the Father is known, enter the Father’s name in this field. If both parents are unknown, enter the Name of the person again here. Do not leave blank
  • Father: The Father of the person. If the Father is unknown but the Mother is known, enter the Mother’s name in this field. If both parents are unknown, enter the Name of the person again here. Do not leave blank
  • Color: This can be whatever value you would like, but it will be used to assign color to the nodes and lines in your chart.
  • Order: This is the Order of persons within a group. If there are 4 people in the same group, they should be numbered 1, 2, 3, and 4.
  • Level: This is the level where the group will appear on the chart. If you look at the image below, you’ll see that the chart we are building has 4 Levels. The level value should be the same for all persons in the same group. Also, make sure that the level for each row is higher than the level for both of it’s parents. You can use columns I and J in the template to see the levels for each parent
An image highligting each level of the Family Tree
Groups sheet

The next tab in the Excel file is for the Groups. Any group that is assigned in the Data sheet will also need to be added to the Groups sheet (1 record per Group). Similar to the Data sheet, you enter your data in the Black columns (the Groups), and then update the Blue columns. Do not edit/update any of the Red columns.

Order: This is the order of Groups within a level. If there are 5 Groups in the same level, these should be numbered 1, 2, 3, 4, and 5.

Parent_Line_Split: This should be a decimal value between 0 and 1 representing what percentage of the distance between the parent nodes and the child node do you want the Parent_Line to display. In example 1 below, this value is set to .25 so the Parent_Line appears 25% of the way from the Parent Nodes to the Child Nodes. In example 2 below, the value is set to .5, so the Parent_Line appears half way between the Parent Nodes and the Child Nodes. I like to use .3 as the default and then adjust if this line ends up overlapping with other lines.

Two images, one shown with the setting at 25%, another at 50%

Group_Line_Split: This is exactly like the Parent_Line_Split, but it determines where the Group_Line will be placed. I like to use .6 as the default and then adjust if the line overlaps with another Group.

Offset_Mother/Offset_Father: These fields are only used if one of the Parents has multiple partners. You can use these fields to slightly offset the Parent_Lines coming from the Parent Nodes so they don’t overlap. If both Parents have only one partner, enter 0 in both of these fields. If just the Mother has multiple partners, enter a small value in the Offset_Mother field and a 0 in the Offset_Father field. You can use positive numbers to offset the lines one direction, and negative numbers to offset the lines in the other direction. You’ll want to enter very small values for these, somewhere between .001 and .01, and note that as the Level increases, the effect will increase, so the value in these fields should be smaller for higher levels. In my Ancient Mythologies viz, Zeus (the blue node) had 8 different partners, so these fields were used to offset those lines (see below)

An example of how this technique was used in the Mythologies visualization

Drop_Line_Offset: The last option in this sheet is to customize the position of the Parent_Drop_Line. This should be a value between -.5 and .5. By default, if you enter a value of 0, the line will be in the center, halfway between each of the parents. When you enter a value, that value is added to .5 to determine how far along the Parent_Line, the Parent_Drop_Line should be placed. In the example below, in the Red section, the value is 0, so the line is placed right in the center (0 +.5 = 50% of the way along the line). In the Blue section, the value is -.25 (-.25 + .5 = 25% of the way along the line). In the Orange section, the value is .5 (.5 + .5 = 100% of the way along the line).

Examples illustrating the results of different values
Levels sheet

The last tab in the Excel sheet is for the Levels. Any Level that is assigned in the Data sheet needs to be added to the Levels Sheet. And, as was the case with the other tabs, you will enter your data in the black columns (Levels) and make your updates in the blue columns.

Distance: This is how far from the center of the circle that level will be plotted. Make sure that the values increase as the Levels go up

Position_Start/Position_End: These are used to set the range for the position of all nodes in that level. For example, in my Mythologies viz, I used .1 for the Position_Start and .9 for the Position_End. So each chart started at 10% (around the circle) and ended at 90%, resulting in that kind of Pac-Man shape. If you want the nodes to go all the way around the circle, use 0 and 1.

One other note about Levels. Each “Level” can contain sub-levels where you can set individual ranges (Position_Start/Position_End). In the example below, I only had 4 nodes (across 2 Groups) in Level 4 of the Egyptian family tree. If I put these Groups in the same Level they would have been evenly spread across whatever range I had entered. But by putting them in separate Sub-Levels, I was able to assign different ranges to each group and have better control over the placement.

An image demonstrating how sub-levels were used in the original Mythologies viz

Building Our Chart

Hopefully I haven’t lost you yet because we’re just getting to the fun part. Let’s get building! As you’re filling out the template, you can refer to the completed file in the Google Drive, titled “Family_Data_Complete.xlsx”

When I’m using this template, I like to build it one level at a time and make adjustments as necessary. So let’s start with Level 1.

Level 1

In this Tree, instead of using people for the 1st level, I’m going to use the name of each of the major Houses. So in the Data Sheet, I’m going to add 8 records, one for each House. I’m going to put them all in the same Group (Houses) and just enter the House Name in the Mother/Father fields since they don’t have Parents.. I’m going to assign each of them their own color in the Color field by using the House Name as well. For the Order, there are 8 records in this group, so I will number them 1 thru 8, and then assign them all to Level 1. The Data should look like this.

Next, in the Groups sheet, I will enter the new Group (Houses) in column A, set the Order to 1 since there is only 1 Group in this Level, and set the rest of the values to 0 (there will be no Parent or Group lines in this first Level because there are no Parents). The Groups sheet should look like this.

And finally, in the Levels sheet, I will enter my new Level (1) in column A, set the Distance to 3 (you can enter any number in here, but spacing each level by 2 to 3 seems to work pretty well) and I’m going to set my Position_Start to .1 and my Position_End to .9 to get that Pac-Man shape. The Levels data should look like this.

Now if we save our Excel file and refresh the Extract in our Tableau Workbook, we should have something like this.

Level 2

Next up, we have the Heads of each of the Major Houses. Looking at the Family Tree on UsefulCharts, we can see that there is one Head listed for 5 of the Houses, 2 Heads listed for 2 of the Houses (Lannister & Tyrell), and none listed for 1 of the Houses (Martell). We’ll just use an “Unknown” placeholder for House Martell. So our next Level will have 10 nodes.

We’ll enter the names of all of these Characters in Column A. In the Group column, we’ll enter “Head(s) of House….[Insert House Name]”. For Mother & Father, since we want our Level 1 nodes to connect to these new Level 2 nodes, we’ll use the House Names. For Color, once again we’ll use the House Name (we’ll use this on Color for every Level). For the Order, we’ll add a sequential number to each value in each Group. So Groups that only have 1 member, we’ll enter a 1. For Groups that have 2 members, we’ll enter a 1 for the first record and a 2 for the second record. And finally, we’ll assign a value of 2 for the Level for all of these nodes.

The Data sheet for these records should look like this.

Next, on the Groups sheet, we need to add our 8 new Groups in Column A. Since these Groups are all on the same Level, we’ll enter a sequential number in the Order column, 1 thru 8. For the Parent_Line_Split and Group_Line_Split, we’ll use my typical default values of .3 and .6 respectively, and then enter 0’s for the rest of the columns. The Groups sheet, should look like this for those records.

And finally, on the Levels sheet, we’ll add our new Level (2) to Column A. For the Distance for Level 2 we’ll use 6 (adding 3 to the Distance from Level 1), and we’ll use the same Position_Start/Position_End values. The Levels tab should look like this for Level 2.

Now, if we save our Excel file and refresh our extract, it should look something like this.

Now you may notice a few issues with our work so far.

First, it appears as if the 2nd Level is out of Order. Notice in Level 1, we have House Arryn (light blue), then House Tully (dark blue), then House Stark (grey), then House Targaryen (black). But in the 2nd Level, it goes Arryn/Stark/Targaryen/Tully. So let’s adjust the Order in our Groups sheet so that the Level 2 nodes line up better with the Level 1 nodes.

I’ve adjusted the Order field to match the Order of the Houses.

Now save and refresh the extract.

That’s better, but there is still an issue. Look at the Group_Lines for House Baratheon (yellow) and House Lannister (red). They are overlapping. So let’s use our Group_Line_Split field to fix that. In the row for “Head of House Baratheon”, let’s change that from a .6 to a .5 to bring that line inward a bit.

If I save the file and refresh the extract, we can see that that Group Line has shifted inwards and is no longer overlapping.

Level 3

Next up, we have 3 children for Hoster Tully, 3 children for Rickard Stark, 3 children for Aerys Targaryen II, 3 children for Stefon Baratheon, 3 children for Tywin and Joanna Lannister, 1 child for Luthor and Olenna Tyrell, and 3 Children of the Unknown Head of House Martell. So let’s add all of those to Column A in our Data sheet. Each of these groups of children will be their own Group in the Group column, and we’ll add the Mother and Father names to those columns. Remember, if only one parent is known, enter that name in both the Mother and Father column. Again we’ll use the House Name for the Color. For the Order, number each accordingly within each group (1 thru 3 for groups with 3 children, 1 for groups with 1 child). And then assign Level 3 to all of these records. The Data sheet for Level 3 should look like this.

If you are looking at the Family Tree on UsefulCharts, you may have noticed that there is a child for Jon Arryn. However, the mother of this child is going to be in Level 3, so instead of adding them here, we’ll add that child to the next level, Level 4.

Next, we’ll add each of our new Groups to the Groups sheet in Column A. Then, we’ll set the order so it aligns with the previous levels. Finally, we’ll use our default values of .3 and .6 for the Parent_Line_Split and Group_Line_Split and set the other 3 columns to 0. The data in this tab for Level 3, should look like this.

And finally, we’ll add Level 3 to our Levels sheet with a distance of 9 and the same Position_Start/Position_End values as our previous levels. The Levels sheet should look like this for Level 3.

Once you save the Excel file and refresh the extract, the Tree should look like this.

Once again we have some overlap between House Baratheon (yellow) and House Lannister (red). And the lines for House Baratheon and House Targaryen (black) are extremely close. So let’s make some edits to fix both of these. I want to make these so they kind of cascade as they go around the circle, so I’ll keep the Group_Line_Split as it is, at .6 for House Targaryen (in the row for Group = Children of Aerys II), then for House Baratheon, I’ll reduce it slightly to .525 (in the row for Group = Children of Stefon Baratheon), and then reduce it slightly more for House Lannister to .45 (in the row for Group = Tywin and Joanna Lannister).

The revised data in the Groups sheet should look like this.

And now, when I save the file and refresh the extract, there is a little extra room for each of these lines.

Level 4

In our final level, we have 1 child for Jon Arryn and Lysa Tully, 5 children for Eddard Stark and Catelyn Tully, 1 child for Rhaegar Targaryen and Lyanna Stark, 3 children for Jamie and Cersei Lannister, 2 children for Mace Tyrell, and 1 child each for Doran and Oberyn Martell. We’ll add all of these names to column A in our Data sheet, assign an appropriate group, update the Mother and Father, and enter the House name in the Color field. Notice that for the children of Jamie and Cersei Lannister, I assigned House Baratheon, as they were raised as Baratheons. Then assign sequential numbers for each group in the Order field, and assign Level 4 to all of these records.

The data for level 4 in the Data sheet should look like this.

Then update the Groups and Level sheets the same as we had done for the previous levels. The data in those tabs should look like this when you’re finished.

Save and refresh and your Tree should look like this.

It’s difficult to notice at first glance, but we do have one instance of overlapping lines here. In the Stark section (grey) there are actually 2 groups here. The first 5 are children of Eddard Stark and Catelyn Tully. The last one is Jon snow, child of Rhaegar Targaryen and Lyanna Stark, but raised as a Stark. So I’m going to update the Group_Line_Split for the row for Rhaegar Targaryen and Lyanna Stark, and set that to .5 instead of .6. And we’re done!!! Almost.

You could stop here, but one of the elements of the visualization I had created, which I think is really important, is the ability to highlight the lineage of any member of the tree, either by clicking on the node, or by selecting it in the parameter. This template already has that functionality built in, but in order for it to work, you need to update the Lineage.xlsx file.

Updating the Lineage File

The Lineage file has 3 key fields: a concatenated list of all ancestors (parents/grandparents/great-grandparents and so-on), a concatenated list of all spouses or co-parents, and a concatenated list of children. To update this file, start by downloading and opening the Lineage_Builder.tfl file.

First, edit the connection for the “Data” input so that it points to the Data sheet in your local copy of the Family_Data.xlsx file. Then, edit the Output step so it points to your locally saved copy of the Lineage.xlsx file

When running this workflow with new data you will most likely need to make some modifications to the flow. A lot of the steps in this flow rely on pivoting the data, assigning ranks, and then creating crosstabs with those rank values as the headers. Because those ranks will be dynamic depending on your data, you will need to modify some of the calculations. Also, depending on how many generations you are trying to map, you may need to add a few calculations to the first branch of the flow. So let’s go through all of that now. In the flow, you’ll notice there are 3 branches, one for each of our 3 lineage fields.

1st Branch – Parents

This branch of the flow is set up to support up to 6 generations. If your data has less than that (you should be able to tell from the number of levels you used), you do not need to make any major changes to the flow outside of a few calculations and you can skip ahead to the Modifying Calculations section below. If your data has more than 6 levels, you’ll need to add a few steps.

Adding Levels

If you look at the first Group in this flow, you’ll notice that the steps repeat.

Image demonstrating the repetitiveness of the flow

The first step pivots the Mother and Father columns so there is a row for each instead of a column for each. Then we rename the field “Level_1_Parent”.

The second step joins that “Level_1_Parent” field back to our Data sheet to get those people’s parents. We remove the extra Name and NameID fields and then we repeat the process.

The third step once again pivots the Mother and Father columns, and this time we rename that field “Level_2_Parent”.

Then we join back to the data again to get those people’s parents. And on and on it goes. If you need to add a Level 7, follow the steps below.

  • From the last step in the first box, Pivot Level 6 Parents, add a join. This should be a Left join from that step, to the Data input
  • Remove the extra Name and NameID fields
  • From the Join step, add a Pivot step
  • Add the Mother and Father fields to the Pivot Values column
  • Remove the Pivot Names field
  • Rename the Pivot Values to “Level_7_Parent”
  • Repeat for any additional levels
  • In the first step in the second Group (Create Lineage) named Pivot All Parents, add any new levels to the Pivot Values column
Modifying Calculations

The steps in the “Create Lineage” group will pivot all of the parents/grandparents/great-grandparents and so on and then remove duplicates. It will then assign a sequential number to each of these ancestors. So if person 1 has 10 ancestors, there will be 10 rows, numbered 1 thru 10. For person 2, the numbering will start over at 1. Then the flow creates a crosstab of this data, so there is now a column for each of those numbers. In this sample data, we only have 6 columns (named 1, 2, 3,…)

An image showing the numeric columns that need to be addressed

The last step in this branch of the flow just combines all of these numbered fields into a single, pipe-delimited, concatenated field. Check to see how many numbered fields you have, and then edit the “Lineage_Parent” calculation accordingly.

An image showing the current calculation for the sample data

For example, if your flow has 10 of these columns, you’ll need to add an additional 4 rows to this calculation (one row for [7], one row for [8], and so on). In this same step of the flow, you’ll also want to remove all of those numeric columns. The version as is removes 1 thru 6, but if there are additional columns, you’ll want to remove those as well.

2nd & 3rd Branches

In these branches, you do not need to add any steps, no matter how many children or partners any of the people in your data have had. You will, however, have to update the last step in both of these branches, the same way we did for the 1st Branch.

Modifying Calculations

In the 2nd branch of the flow, go to the 2nd to last step, labelled “Crosstab”, check to see how many number columns were created. In our sample data, there was only 1, but if any of the people in your data had children with multiple partners, there may be additional columns. Go to the last step in the 2nd branch, labelled “Create Strings” and modify the “Lineage_Spouse” calculation the same way we did for the 1st Branch. To make this a little easier, I’ve left up to 6 rows in the calculation and commented them out. You can just remove the // from the start of any line to include that row in your calculation. In this step of the flow, you’ll also want to remove any of the extra number columns.

Repeat all of these steps for the 3rd branch of the flow to get a concatenated list of children.

Other Notes

So that’s pretty much it. Just a few other quick things I want to mention before we wrap it up.

There is a parameter called “Rotation”. You can use this to rotate your viz. In my original Mythologies viz, you’ll notice the first Tree has the opening to the right, the second Tree has the opening to the left, and the third Tree has the opening to the right again. For the first and third, I set this parameter to .25, which rotates the viz 1/4 of the way around, or 25%. For the second, I set this parameter to .75, rotating the Tree 75%.

There is also a parameter called “Thickness”. You can use this to make the white circles (Base_II), larger or smaller. The higher the value, the thicker the colored rings will appear (white circles will shrink). The lower the value, the thinner the colored rings will appear (white circles will grow).

And lastly, you may notice that there are 2 filters applied. If you remove these filters, you may not notice any changes to the chart in Desktop, but it may result in some weird behavior once published to Tableau Public. For some reason, Tableau Public seems to have an issue trying to render identical marks that are stacked on top of each other. Lines end up thicker then they should, and other mark types sometimes get fuzzy and weird. There also seems to be an issue when there are too many points plotted too closely together.

The “Extra_Point_Filter” limits the number of points that are used in all of the curved lines. Remember there are 100 points in our Densification table dedicated to these lines, but sometimes the lines end up being very short. This filter basically allows all 100 points to be used for the longest line, compares the length of all other lines to that longest line, and uses the appropriate number of points (ex. if a line is half as long as the longest line, only 50 points will be used instead of 100)

The “Extra Lines Filter” removes extra Parent and Group Lines. The way this is set up, it will draw these lines for every node, so if there are 5 nodes in a Group, 5 identical lines will be drawn and “stacked” on top of each other. This calculation removes all of those extra lines.

There are also 2 Dashboard Actions set up in the template. One is just to remove the default highlighting when you click on a node. You can read more about that technique here. The second is a Parameter Action to save the selection when a node is clicked. Clicking the same node again will remove that selection.

I think that’s about it. If you made it to the end of this post, and have had a chance to test out this process, please let us know. As always, we would love to see what you were able to create with it!

Categories
How-To's Tableau Techniques

Creating Useful Documentation

I’ve written a lot of documentation, and it’s a task that few people enjoy, and I am no exception. I’ve also read a lot of documentation and unwound a lot of undocumented reporting, and it’s a task that is often overlooked and underappreciated. Good documentation can be invaluable in maintainability, training, and knowledge transfer. I’ve certainly come back to a project I worked on six months later, and forgotten why something was done a certain way. Or dug my way through Tableau Workbooks and ETL code to find out where a certain piece of logic is coming from.

I find the most useful documentation in day-to-day work is the documentation that is right where you need it. Making documentation part of your workflow can save your own sanity, and pay dividends in time saved, either your own or whoever inherits your work.

This isn’t to say that a full technical document isn’t helpful or needed. These provide invaluable information on the business context, interactivity, use cases, logic, and more. However, documenting your process right in the tool where you are working will save immense amounts of time and confusion down the line, are easier to keep up-to-date, and you can even use this information at the end of a project to make the creation of technical documents and user documents easier.

So, where does this living documentation, well, live? Ideally, it lives everywhere the data is touched. Keep in mind, this type of documentation isn’t meant to be redundant, but to add context that isn’t immediately apparent.

Data Prep Stage

What to Include

  • Authors
  • Date created or modified
  • Designed purpose and limitations of the data source
  • Data lineage and dependencies. If you’re using a well-used database, it may not be as important as if you are connecting to spreadsheets or processes that need to be updated or maintained.
  • Data freshness timestamp, if applicable
  • Call out any inclusion/exclusion criteria, transformations done, business decisions, or logic explanations

Ensure this makes it to the users of the data! If they won’t open the workflow or see the SQL, then passing this information downstream is key.

Some tools, like dbt’s “Exposures”, include features to surface this type of information to others.

Ways to Document

Commenting and annotating code and workflows is helpful to quickly orient yourself or others on what is happening, where, and why. Naming conventions for fields, subqueries, views, etc. will also go a long way.

Using a very simple example based on the Superstore data set, I’ve shown some ways to document data preparation below. Most of the queries and workflows we create will be much more complex than this, so documentation becomes more important. For this example, I used Ken Flerlage’s SQL Server Superstore instance. If you need a server to connect to for learning how to use data prep tools, check out his post on FlerlageTwins.com!

How this might look in SQL:

  • Comments should clarify any changes, logic that may be seen as unneeded or is unclear in purpose
  • Aliases should be easy to identify
  • Clean formatting to allow easy reading to locate key information
  • All fields are prefixed with the source table alias
/* 
Author: Jacqui Moore
Date Created: 2023-01-19
Purpose: All Orders and returns for West Region
Modified: 2023-01-20 Ticket ABC-123
*/

SELECT 
	 o.[Order ID]
	,o.[Order Date]
	,o.[Ship Date]
	,o.[Customer ID]
	,o.[Customer Name]
	,o.Segment
	,o.[Product ID]
	,o.[Product Name]
	,o.[Category]
	,o.[Sub-Category]
	,o.Sales as [Amount Sold]
	,o.Quantity as [Quantity Sold]
--	,o.Discount as [Discount as Sold]  --Removed per ABC-123
--	,o.Profit as [Profit as Sold] -- Removed per ABC-123
        ,r.Returned
FROM 
	SuperstoreUS.dbo.Orders o
LEFT JOIN 
	SuperstoreUS.dbo.[Returns] r
	ON r.[ORDER ID]=o.[ORDER ID]
WHERE 
	o.Region = 'West'

How this might look in Alteryx:

  • Comment header to indicate name, purpose, creator, and important information about a workflow
  • Containers can be used to create a “Read Me” for additional information
  • Tools are annotated descriptively
  • Calculated fields are commented with assumptions, or additional context the next person needs to know
  • Containers are used to segment the steps and provide additional context on the processing of the data
Screenshot of Alteryx workflow showing annotations and comments

How this might look in Tableau Prep

  • While there are fewer ways to add notes with Tableau Prep, you can add a description to each step
screenshot of tableau prep workflow
  • Groups can be used to create a cleaner flow, with the ability to drill in on steps, and act similar to Alteryx containers in some ways
screenshot of tableau prep workflow
  • Calculations can be commented using // at the start of a comment line
screenshot of tableau prep calculation

Other helpful things to include

  • If you’ve used a macro, tool group, or snippet of code from somewhere else, include a link to the original source
  • If you’ve used a macro or tool group, include a brief description of the purpose and what operations are being performed

Visualization Stage

On the Data Source

  • Give your Tableau Data Source a descriptive name
tableau desktop data source view
  • Pre-filter any data in the data source, whenever possible
  • Rename the tables, if the names aren’t clear
  • If you are using Custom SQL, comment that code
  • If the data source is published, a description containing some of the high level information from the data source section is helpful context for users who might try to later connect to the data

On the Data Pane

  • Rename fields to use ‘friendly’ names, such as the common nomenclature for the field among the business users
  • Set the right data types
tableau field type menu
  • Add a comment to fields if your data source will be used for Ask Data or for business users who are less familiar with the data and/or Tableau
tableau comment properties

This will appear on hover in the data source pane and Ask Data on Server

tableau comment display
  • If the Table names are enough context to group the fields, then that is fine, but if the data source has a lot of fields, using Folders can be useful
tableau folders
tableau folders
  • Having a naming convention that makes it clear when LODs or Parameters are being used can be very helpful, but can also sometimes be less friendly with displaying the field names on views

Did you know the field descriptions are searchable? Yep, you can come up with a tag system and include it in the description, and search right in the Tableau Desktop data pane. Field descriptions are also visible on Ask Data.

tableau field comments
searching fields
  • In addition to the items above, calculations can be commented in the calculation window, just like any other type of code
commented calculation
  • When you’re ready to publish, cleanup…
    • Delete calculations you ended up not using, copies of fields, etc.
    • Hide all unused fields
    • Hide fields that aren’t meant to be used (such as id fields that you need, but don’t mean anything to the user, or base fields that were replaced with LOD calculations). If they can’t be hidden, putting them in a folder labeled “Do Not Use” is also helpful. If it will mess up someone’s analysis to use that field, hide it.

Sheets

  • Name the sheets descriptively, with leading names that help identify the section, dashboard, etc.
sheet names
  • Color coding your tabs can be very helpful. People use the colors for different things, but I like to use it to show when certain filters will apply
sheet colors
  • The reason I like to use the colors to show filters, is because when changing filter settings to apply to specific sheets, you can see these colors, making it much easier to select the right sheets for the right filters
sheet colors in filter menu
  • If you aren’t using the captions for display on a dashboard, you can use those to add notes on how a more complicated sheet is working
captions
  • You can include a sheet with a “Read Me” for developers, containing data source or workbook level information. This sheet doesn’t get published, but can contain a wealth of knowledge

Dashboards

  • Layout containers are awesome. Use layout containers! But, really, containers will help a lot with development, layout, save you from floating many items, and help organize
  • Name the containers so you can identify them in the layout pane. This has saved me on complicated dashboards, and is definitely worth the time it takes to do it.
named containers
  • Include clear chart headings, axis headings, and helper text so the user knows what they are looking at, and have answers to any logic questions
  • When you’re done, “Hide all sheets” will clean up your workbook. Delete any unused sheets that you don’t need to keep for a reason.
hide sheets

For The End User

So far, the types of documentation I’ve covered are for developers (or yourself). But, whether you create functional user documentation or not, having documentation baked into the dashboard will be appreciated by the end user. For some users, it’s the only type of documentation they ever even see.

  • Tool tips can contain descriptions of what the metrics mean, text indicating what actions are available, and more. Don’t neglect tooltips!
tool tips
  • Titles, labels and helper text are types of text that are displayed directly on the dashboard, and are important. These are things like clear axis labels, text describing interactivity, color legends, descriptive titles, and so on.
helper text
  • Overlays can be helpful for complicated dashboards with a lot of interactivity, where the visible helper text would be redundant, or just too much.
overlay instructions
  • Include in the header or footer of the dashboard things like:
    • Data refresh date
    • Date range included, if different from the refresh date
    • Business points of contact
    • Developer point of contact

And now that I have thoroughly talked about one of the most tedious parts of development, go forth and do good data!

Categories
How-To's Tableau Techniques

Happy New Year! Your Dashboard is Broken…

Hi Jacqui,

Hope you had a Happy New Year! Can you please look at <the super important dashboard>? It seems to be broken. Everything is blank…

Thanks!

Have you ever come into the office on the first day of the new year, and found that your dashboards are blank, broken, or still looking at last year? Don’t worry. You’re in good company. But, it doesn’t have to be that way. Using calculations, you can avoid some of the issues that can happen at the start of a new period.

The Challenge: When you have dashboards or views that filter on a specific year, or the current year and prior year, you will need to update filters, and colors, and hide previous years when the new year rolls around.

I have an example dashboard here. A simple dashboard showing the current year and previous year, with YoY Growth, and a monthly trend chart:

When the new year rolls around, it’s going to have new colors, and my YoY Growth sheet is going to need to be updated. I used a relative date filter, but if I had hard-coded the year in filters or calculations, that would need to be updated as well.

The Solution:

Rather than using the date field in your views, you can use calculations to ensure your rollover to the new year goes smoothly.

If I use a calculated field to determine the current and prior year, I avoid the issues above.

  • Create a calculation called “Period”
//Period
IF DATEDIFF('year',[Ship Date],TODAY()) = 0 THEN 'Current Year'
ELSEIF DATEDIFF('year',[Ship Date],TODAY()) = 1 THEN 'Prior Year'
END
  • Replace anywhere you are using the year with this new calculation. In my example, I’ve replaced the Color, and the Filter to use the “Period” calculation.

The dashboard looks the same, but now, when the year rolls over, I don’t need to make any updates. Without making any changes, my dashboard has rolled over to 2023 seamlessly.

Now, it is possible that your stakeholders would like to see the previous year until the first month of the new year is complete. To do this, we just need to incorporate a lag into our calculation.

There are several ways to approach this, depending on what kind of lag you want to include. Here, I’m saying, if the month is January, then I want to keep looking at the prior two years, otherwise, I want to look at the current year and prior year.

//Period With January Lag
IF MONTH([Current Date]) = 1 THEN 
    IF DATEDIFF('year',[Ship Date],[Current Date]) = 1 THEN 'Current Year'
    ELSEIF DATEDIFF('year',[Ship Date],[Current Date]) = 2 THEN 'Prior Year'
    END
ELSE
    IF DATEDIFF('year',[Ship Date],[Current Date]) = 0 THEN 'Current Year'
    ELSEIF DATEDIFF('year',[Ship Date],[Current Date]) = 1 THEN 'Prior Year'
    END
END

Now, if the current date is in January, it will still show me the previous two years. This prevents the blank dashboard when you arrive on January 2nd.

On February 1st, my dashboard will roll over seamlessly:

In addition, we can solve for a couple of other issues you may have.

If your analysis is for Year to Date (YTD):

We can modify this calculation to handle YTD filters, by adding a second part to the prior year calculation:

//Period To Date
IF DATEDIFF('year',[Ship Date],[Current Date]) = 0 THEN 'Current Year'
ELSEIF DATEDIFF('year',[Ship Date],[Current Date]) = 1 
    AND [Ship Date]<=DATEADD('year',-1,[Current Date]) 
THEN 'Prior Year'
END

We will end up with a dashboard that will always compare Current YTD to Prior YTD. This can also be combined with the lag logic from earlier.

If you only want to show the last COMPLETE month:

Often we will see the trend line taking a deep dive when a new month starts:

This can be avoided by setting up a lag, so you are looking at only the last complete month. We do this using DATETRUNC.

//Period with Complete Month Lag
IF DATEDIFF('year',[Ship Date],[Current Date]) = 0 
    AND [Ship Date]<DATETRUNC('month',[Current Date])
    THEN 'Current Year'
ELSEIF DATEDIFF('year',[Ship Date],[Current Date]) = 1 
    AND [Ship Date]<DATEADD('year',-1,DATETRUNC('month',[Current Date])) 
THEN 'Prior Year'
END

Now, we won’t see the line drop at the start of a new month, and we won’t see a blank dashboard on day one of the new year.

These are not the only way to perform these calculations. They may not even be the best way to write the calculation. However, you can take the concepts of these calculations, and apply them to a number of use cases, including:

  • Showing the last complete week, or month
  • Showing comparisons of specific time frames, such as last 30 days vs. prior 30 days.

For more of this, and so many other date calculations, check out this post over on the Flerlage Twins site.

Will Perkins also did a great presentation on use cases for DATEDIFF, which is also a great one to watch!

Now, go forth, and enjoy the last time you will spend the first week of the new year updating your dashboards 🙂

Categories
How-To's Tableau Techniques

Gradient Chord Chart

Chord charts are a great way to display and quantify relationships between entities, but there are some limitations. Recently, I built a series of chord charts to show which actors have appeared together in films, and how often. Once I had built out the charts and attempted to add some color, I hit a wall. In some cases, when building these relationship diagrams, there is a logical directional flow to the relationship. Something moves from Entity 1 (Source) to Entity 2 (Target). In those cases, its easy to assign meaningful color to the chart. You can assign it by Source if you want to highlight where things are flowing from, or you can assign it by Target if you want to highlight where things are flowing to. But what if there is no flow to the relationship? How do you assign color then?

This roadblock got me thinking. If I want to add a unique color to each Entity in my chart, and there is no directional flow, then the color of the chord should be a blend of the two colors. Or better yet, it should transition from one color to the other. I have seen others do some really cool stuff with applying gradients to “solid” chart types. Ken Flerlage has an excellent post here about applying gradients to Bar Charts and Area Charts. There’s another great post from Ivett Kovacs about gradients here. Some different techniques, but the foundation is the same. If you want to apply a gradient to an otherwise “solid” chart, use a series of thin lines to “color in” the chart. So that’s exactly what I did.

I’ll warn you ahead of time, this is going to be a long post. But if you’re just interested in building one of these charts, and not so much in how to build it, I’ve got some templates for you to use. Just follow the instructions in the “Setting Up Your Data Source” section below, and swap out the data source in the template workbook. You can download the template here, and download the sample data here.

Setting Up Your Data Source

In the sample data (link above), there are two tabs; one for our data, and one for densification. You do not need to make any changes to the Densification table, but I’ll talk through the setup as we build each section of this chart. In the Base Data, you should only update the first four columns and not the calculated fields (red text) in the next five columns. Here is a quick explanation of each field and what it represents

From_ID: Unique Identifier for each Source entity. This should be a sequential number starting at 1 and ending at however many entities you have in your data. For each entity, you may have multiple rows, one for each relationship, but they should all have the same From_ID.

From_Name: A display name for each Source entity.

To_Name: A display name for the other entity in the relationship, or the “Target”.

Value: The measure being displayed. In the sample data, this value field represents the number of movies that the two actors appeared in together.

The following fields should not be changed. They will update automatically based on the first 4 columns, but here is a quick description of what they are calculating.

To_ID: This is a lookup to get the correct ID for the Target entity

From_Total: This is a sum of the Value field for each Source entity

From_Run_Total: This is a running total of the Value field

Unique_Relationship: This is a unique identifier for the relationship

Unique_Relationship_Order: This is used to identify the first and second occurrence of each Unique_Relationship (there will be two rows for each relationship, one where Actor 1 is the Source and Actor 2 is the Target, and one where Actor 2 is the Source and Actor 1 is the Target).

If you are building these charts from scratch, I would still recommend downloading the template to see how the Data Source is set up. In order to get the running total value for the Target, or the end of each chord, we are doing a self-join in the Physical Layer on the Base Data table (see below for join conditions). Then, we are re-naming the [From_Run_Total] field from the right side of that join (Base Data1) to [To_Run_Total]. And then we are joining to the Densification table using a join calculation (value of 1 on each side). Here is what the joins look like in our data source.

Building the Chart

This chart is actually comprised of 4 different sections, identified by the ShapeType field in the densification table. There are the outer polygons (Outer_Poly), the gradient chords (Inner_Lines), the borders for the gradient chords (Inner_Lines_Border), and a small rounded polygon on the end of each chord (Lines_End_Poly) to fill the gap between the chords and the outer polygons.

An image showing the 4 unique sections that make up the gradient chord chart

Before we start working on any of these individual sections, there are a number of calculations that we are going to need for all of them.

First, let’s create a parameter that will let us control the spacing between each of the outer polygons. Call this parameter [Section Spacing], set the data type to Float, and set the value very low, around .01. Once you have the chart built you can set this higher for more spacing, or lower for less spacing.

Now let’s use that parameter, along with the Max of our running total field from the data source (which represents the grand total of the [Value] field), to calculate the width of our spacing between polygons. We’ll call this [Section Spacing Width]

Section Spacing Width = [Section Spacing]*{MAX([From Run Total])}

Now we need an adjusted grand total that accounts for all of the spaces as well. We’ll call this [Max Run Total – Adj] and it will be our grand total plus the number of spaces * the width of the spaces.

Max Run Total – Adj = {MAX([From Run Total])+MAX([From ID]*[Section Spacing Width])}

Next, we want to calculate the position around the circle where each of our sections, or entities, will start. We’ll do this by subtracting the [Value] field from the running total, adding the spacing for all previous sections, and then dividing that by our adjusted grand total. Call this field [Section_Start].

Section_Start = { FIXED [From ID] : MIN(([From Run Total]-[Value]) + (([From ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

Now we need to do the same thing to calculate the position around the circle where each section ends. The only difference between this and the previous calc, is that we are going to use the running total without subtracting the value. Call this field [Section_End]

Section_End = { FIXED [From ID] : MAX([From Run Total] + (([From ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

Now for an easy one. Let’s calculate the width of each section by subtracting the [Section_Start] from the [Section_End]. Call this [Section_Width].

Section_Width = [Section_End]-[Section_Start]

Next, we need to do the same thing to get the start and end positions for each of the “sub-sections”, or each of the entity’s relationships. The calculations are almost identical, the only difference is that we are fixing the Level of Detail calculations on [From_ID] and [To_ID], instead of just [From_ID]. Call these calculations [From_SubSection_Start] and [From_SubSection_End].

From_SubSection_Start = { FIXED [From ID],[To ID] : MIN(([From Run Total]-[Value]) + (([From ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

From_SubSection_End = { FIXED [From ID],[To ID] : MAX([From Run Total] + (([From ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

And just like before, we’ll create a simple calculation to get the “width” of these sub-sections. Call this calculation [SubSection_Width].

SubSection_Width = [From_SubSection_End]-[From_SubSection_Start]

Next, we need to do the same thing, but need to calculate the start and end position for the other end, or Target, of each relationship. The calculations are the same as above except we’ll use the [To_Run_Total] instead of [From_Run_Total] and [To_ID] instead of [From_ID]. Call these calculations [To_SubSection_Start] and [To_SubSection_End].

To_SubSection_Start = { FIXED [From ID],[To ID] : MIN(([To Run Total]-[Value]) + (([To ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

To_SubSection_End = { FIXED [From ID],[To ID] : MAX([To Run Total] + (([To ID]-1)*[Section Spacing Width]))} / [Max Run Total – Adj]

And finally, we need a simple calculation to get the total number of points for each of our shape types. Call this calculation [Max Point].

Max Point = { FIXED [Shape Type] : MAX([Points])}

Densification

Before we move on, let’s take a look at our densification table. In this table, we have 5 fields.

Shapetype: Corresponds to the 4 sections of the chart mentioned previously.

Points: Used, along with Side, to calculate the positions of every point, in every element of the chart.

Order: Used on Path to tell Tableau how to connect our Points.

Side: Used to differentiate between the interior and exterior “lines” for the outer polygons and chords.

Line ID: Used on Detail to segment our lines appropriately. For all sections of the chart, other than the Inner_Lines, this value will be 1, since we want one continuous line for the polygons, and for the borders of the chords. For the Inner_Lines, we have values from 1 to 1,000, so we can “color” our chart with up to 1,000 lines per chord.

These fields are used in slightly different ways in each section of the chart, so we’ll talk about them more as we start building.

Building the Outer Polygons

First, let’s take another quick look at our densification table. For the outer polygons, identified by Shapetype=Outer_Poly, we have a total of 50 records. The Order field, used to tell Tableau how to connect the points, is a sequential number from 1 to 50. Then there is the Points field, which is used to calculate the position of each point around the circle. This number goes from 1 to 25, where the Side field = Min, and repeats where the Side field = Max. This will allow us to draw two parallel lines (Min and Max), and then connect them together to create those outer polygons. And lastly, the Line ID has a value of 1 because we are “drawing” one continuous “line” for each of these polygons. Clear as mud right? Here’s a quick illustration to help visualize how these fields function together.

This image shows side by side how the Points and Order fields are used

Before we start building the calculations for this section, we need a few parameters. The first is just going to be a generic value for the radius of our circle. The second, is going to determine the thickness of these outer polygons.

Create a parameter called [Radius], set the data type to Float and set the Current value to 10. This can be any number you want, it doesn’t really matter, we just need something to determine the size of our chart.

Next, create a parameter called [Outer_Poly_Thickness], set the data type to Float, and set the Current value to 1. You can play around with this number to get the thickness you want, but I would recommend setting it somewhere around 10% of the [Radius] value.

Now for our calculations. Going back to Part 1 of the Fun with Curves in Tableau Series, we know that we need 2 inputs to plot points around a circle. We need the distance of each point from the center of the circle, and we need the position of each point around the circle. Let’s start with the distance.

In the image from earlier in this section, we see that we need two lines that follow the same path, one for Min and one for Max. So these two lines will have different distances from the center. For the Min line, we’ll just use the [Radius] of our circle. For the Max line, we’ll use the [Radius] plus our [Outer_Poly_Thickness] parameter. Create a calculated field called [Outer_Poly_Distance].

Outer_Poly_Distance = IF [Side]=”Min” THEN [Radius] ELSE [Radius]+[Outer_Poly_Thickness] END

Next, we need to calculate the position of the points. Each of our polygons are going to be different lengths, depending on the [Value] field in our data source, so we need to calculate the spacing of each point (to evenly space 25 points along the two “lines”). Earlier, we calculated the start and end position for each of these polygons, and we’ll use those to calculate the spacing, or “width”, that each point covers. Create a calculated field called [Outer_Poly_Point_Width].

Outer_Poly_Point_Width = [Section_Width]/([Max Point]-1)

And now we’ll use that, along with the start position of each polygon, to calculate the position for each of our 25 points. Point 1 will be positioned at the start of the polygon, Point 25 will be positioned at the end of the polygon, and the remaining 23 points will be equally spaced between those two points. Call this calculation [Outer_Poly_Point_Position].

Outer_Poly_Point_Position = [Section_Start]+(([Points]-1)*[Outer_Poly_Point_Width])

Now we have the two inputs needed to calculate all of our points for these outer polygons. We just need to plug them into our X and Y calculations.

Outer_Poly_X = [Outer_Poly_Distance]* SIN(2*PI() * [Outer_Poly_Point_Position])

Outer_Poly_Y = [Outer_Poly_Distance]* COS(2*PI() * [Outer_Poly_Point_Position])

Let’s pause here and build a quick view to test out our calculations so far.

  • Drag [ShapeType] to filter and filter on “Outer_Poly”
  • Change Mark Type to Polygon
  • Right click on [Outer_Poly_X], drag it to Columns, and when prompted, choose Outer_Poly_X without aggregation
  • Right click on [Outer_Poly_Y], drag it to Rows, and when prompted, choose Outer_Poly_Y without aggregation
  • Drag [From_ID] to color
  • Right click on [Order], convert to Dimension, and then drag to Path

If all of the calculations are correct, your sheet should look something like this.

An image showing what the outer polygons should look like

One section down, three more to go.

Building the Line End Polygons

Next, let’s build the small rounded polygons that are going to go at the end of each of our chords. The calculations will be pretty similar to the previous section. For those outer polygons, we essentially drew two curved lines that followed the same path, and then connected those lines by using the Polygon Mark Type, and using our [Order] field on Path. Now, we basically want to do the same thing, but with a few minor differences. Instead of drawing two lines, we only need to draw one. Tableau will automatically connect our first and last point, resulting in a small semi-circle. The other difference is that instead of drawing one polygon for each entity (actor in this example), we need to draw a polygon for each relationship.

To draw these polygons, we need the same exact 5 calculations as we did in the previous section, but with some modifications. To start, let’s build a parameter to control the spacing between these polygons, and the outer polygons. Call this parameter [Poly_Spacing], set the data type to Float, and set the current value to .25. Again, once the view is built, you can play with this parameter to get the spacing you want.

Next, we need to calculate the distance of our points from the center of the circle. We’ll do this by subtracting our [Poly_Spacing] value from the [Radius] . Call this calculation [Line_End_Poly_Distance].

Line_End_Poly_Distance = [Radius]-[Poly_Spacing]

Next, we need to calculate the position for our 25 points, just like we did in the previous section. The only difference here, is that instead of using the section start and section end, we’re going to use the sub-section start, and sub-section end, since we need to create these for each relationship. Call this calculation [Line_End_Poly_Point_Width].

Line_End_Poly_Point_Width = [SubSection_Width]/([Max Point]-1)

Using that, along with our Subsection start calculation, we can calculate the position of all 25 points for these polygons. Call this calculation [Line_End_Poly_Point_Position].

Line_End_Poly_Point_Position = [From_SubSection_Start]+(([Points]-1)*[Line_End_Poly_Point_Width])

And now we just need to plug these inputs into our X and Y calculations.

Line_End_Poly_X = [Line_End_Poly_Distance]* SIN(2*PI() * [Line_End_Poly_Point_Position])

Line_End_Poly_Y = [Line_End_Poly_Distance]* COS(2*PI() * [Line_End_Poly_Point_Position])

Once again, let’s pause to check our work.

  • Drag [ShapeType] to filter and filter on “Line_End_Poly”
  • Change Mark Type to Polygon
  • Right click on [Outer_Poly_X], drag it to Columns, and when prompted, choose Outer_Poly_X without aggregation
  • Right click on [Outer_Poly_Y], drag it to Rows, and when prompted, choose Outer_Poly_Y without aggregation
  • Drag [From_ID] to Color
  • Drag [To_ID] to Detail
  • Right click on [Order], convert to Dimension, and then drag to Path

When complete, the line end polygons should look something like this.

An image showing what the line end polygons should look like

We’re halfway there! Kind of…

Building the Inner Line Borders

Now we need to build our chords. Let’s start with the borders since those are a little more straightforward, and will make the last section a little easier to understand.

Going back to Part 2 of the Fun with Curves in Tableau Series, we know that we need 3 sets of coordinates to create our bezier curves; one for the start of the line, one for the end of the line, and one for the mid-point. Since we’re building a chord chart, we’ll use 0,0, for the mid-point. So that just leaves the start and end of our lines.

As always, we’ll need 2 inputs to calculate these coordinates, the distance and position. The distance is easy. It’s going to be the same distance as our Line End Polygons, [Radius] – [Poly_Spacing]. The position is a little more tricky.

For each of our chords, we’re going to need 2 lines, one for the interior, or Min, line, and one for the exterior, or Max line.

An image showing the two lines, the Min and Max, that will be used to create the chords

Notice the Min Line, starts at the end of the first chord, or sub-section, and then ends at the start of the target sub-section. The opposite is true for the Max Line. This line starts at the start of the first chord, and ends at the end of the target sub-section. We have already calculated the position for all of those sections, so we can plug those into our X and Y calcs to get our coordinates. We’ll need two sets of calculations, one for the starting coordinates and one for the ending coordinates.

Chord_X_Start = if [Side]=”Min” then ([Radius]-[Poly_Spacing])* SIN(2*PI() * [From_SubSection_End])
else ([Radius]-[Poly_Spacing])* SIN(2*PI() * [From_SubSection_Start])
END

Chord_Y_Start = if [Side]=”Min” then ([Radius]-[Poly_Spacing])* COS(2*PI() * [From_SubSection_End])
else ([Radius]-[Poly_Spacing])* COS(2*PI() * [From_SubSection_Start])
END

Chord_X_End = if [Side]=”Min” then ([Radius]-[Poly_Spacing])* SIN(2*PI() * [To_SubSection_Start])
else ([Radius]-[Poly_Spacing])* SIN(2*PI() * [To_SubSection_End])
END

Chord_Y_End = if [Side]=”Min” then ([Radius]-[Poly_Spacing])* COS(2*PI() * [To_SubSection_Start])
else ([Radius]-[Poly_Spacing])* COS(2*PI() * [To_SubSection_End])
END

These calculations may look complicated, but they’re all relatively simple. They are all using the same distance input (Radius – Poly Spacing), and then depending on if it’s the Min line or the Max line, and if it’s the Start or End of the line, we’re plugging in the appropriate position field (From_SubSection_Start, From_SubSection_End, To_SubSection_Start, and To_SubSection_End).

Those four calculations will give us the 2 remaining sets of coordinates needed to draw our bezier curves. So let’s plug all of our coordinates into the bezier X and Y calculations. But first, we need to calculate [T] to evenly space points along our lines.

T = ([Points]-1)/([Max Point]-1)

Chord_X = ((1-[T])^2*[Chord_X_Start] + 2*(1-[T])*[T]*0+[T]^2*[Chord_X_End])

Chord_Y = ((1-[T])^2*[Chord_Y_Start] + 2*(1-[T])*[T]*0+[T]^2*[Chord_Y_End])

Let’s pause again and make sure our calculations are working as expected.

  • Drag [ShapeType] to filter and filter on “Inner_Lines_Border”
  • Change Mark Type to Line
  • Right click on [Chord_X], drag it to Columns, and when prompted, choose Chord_X without aggregation
  • Right click on [Chord_Y], drag it to Rows, and when prompted, choose Chord_Y without aggregation
  • Drag [From_ID] to Color
  • Drag [To_ID] to Detail
  • Drag [Line_ID] to Detail
  • Right click on [Order], convert to Dimension, and then drag to Path

When finished, your bezier curves should look something like this. There should be two lines for each relationship in your data source.

An image showing what the chord borders should look like

We’re getting close!

Building the Inner Lines

Here’s the fun part. This is the section that really makes this chart unique. And guess what…we already built all of the calculations needed to make this work. Go the sheet you built in the previous section, and change the filter from Inner_Lines_Border to Inner_Lines…I’ll wait. Pretty cool right? These may look like polygons, but really, each of these chords is being “colored” with 1,000 individual lines. Go ahead and click on a few to see for yourself.

This is what one of those chords would look like if we had used 50 lines instead of 1,000.

An image demonstrating how the gradient lines are used

So how does this work? How were we we able to use the same exact calculations for the chord borders, and these inner lines, when all we did was change a filter? It’s all in how we set up the densification table.

For both of these sections, we used the [Points] field in the same way…to calculate the position of each point along the chords. The [Line ID] and [Order] fields are where the magic happens.

Let’s look at the chord borders first.

An image showing how the Order and Line ID fields are used for the chord borders

For these lines, the [Order] field has sequential numbers from 1 to 50, and the [Line ID] field has just two values, 1 and 2. With [Line ID] on Detail, and [Order] on Path, this will draw 2 curved lines.

Now look at the inner lines.

An image showing how the Order and Line ID fields are used for the chord gradients

For these lines, the [Order] field has just two values, 1 and 2, and the [Line ID] field has sequential numbers from 1 to 50 (up to 1,000 in the final chart). In this case, when you put [Line ID] on Detail, and [Order] on Path, it will draw 50 straight lines.

So using the same exact calculations, but getting a little creative with our data source, we can draw both the borders of our chords, and “color” them in. Now let’s bring all of these sections together. We’re almost done!

Building the Final Chart

So now we have all of the calculations done for the four sections of our chart. Time to bring them together. In this viz we have two different Mark Types that we are trying to use together; Polygons and Lines. So we’ll need to use a dual axis and create a calculation for each mark type. All of our sections will share the same Y calculation, but we’ll have an X calculation for the polygon sections, and an X calculation for the line sections. These will all be simple case statements to use the appropriate X and Y values for each section of the chart.

Final_X_Line

CASE [Shape Type]
WHEN “Inner_Lines” then [Chord_X]
WHEN “Inner_Lines_Border” then [Chord_X]
END

Final_X_Poly

CASE [Shape Type]
WHEN “Outer_Poly” then [Outer_Poly_X]
WHEN “Line_End_Poly” then [Line_End_Poly_X]
END

Final_Y

CASE [Shape Type]
WHEN “Inner_Lines” then [Chord_Y]
WHEN “Inner_Lines_Border” then [Chord_Y]
WHEN “Outer_Poly” then [Outer_Poly_Y]
WHEN “Line_End_Poly” then [Line_End_Poly_Y]
END

Now to build our view. Let’s start with the Line sections.

  • Change Mark Type to Line
  • Right click on [Final_X_Line], drag it to Columns, and when prompted, choose Final_X_Line without aggregation
  • Right click on [Final_Y], drag it to Rows, and when prompted, choose Final_Y without aggregation
  • Drag [From ID] to Detail
  • Drag [To ID] to Detail
  • Drag [ShapeType] to Detail
  • Drag [Line ID] to Detail
  • Drag [Order] to Path

When this part is complete, your chart should look something like this.

An image showing what the chart should look like with the gradient lines

Now, let’s add our polygons.

  • Right click on [Final_X_Poly], drag it to the right of [Final_X_Line] on the Columns shelf, and when prompted, choose Final_X_Poly without aggregation
  • Right click on the green [Final_X_Poly] pill on the columns shelf and select “Dual Axis”
  • Right click on one of the axes, and select “Synchronize Axis”
  • On the Marks Card, go to the Final_X_Poly card and change the Mark Type to Polygon
  • Remove the [To ID] field from Detail
  • Remove the [Line ID] field from Detail
  • Drag the [Unique Relationship] field to Detail

At this point, your chart should look something like this.

An image showing the chart with all 4 sections included

Woohoo, our chart is built! Before we start fine tuning our design, there are a few filters we should add. All of these will improve performance, but one of them is needed to accurately color our chart.

First, for our outer polygons, there are actually a number of identical polygons stacked on top of each other. The way our data is set up, it’s creating one of these polygons for each relationship for each entity. But we only want one polygon per entity. This will not only improve performance, but also fix some weird display issues that happen on Tableau Public. It doesn’t happen all the time, but sometimes when you have multiple identical polygons stacked on top of each other, the curved lines end up looking jagged and weird. So this will help fix that. This filter will include anything that’s not an outer polygon, and only the first outer polygon for each entity. We’ll call this [Single_Outer_Poly_Filter].

Single_Outer_Poly_Filter = [Shape Type]!=”Outer_Poly” or ([Shape Type]=”Outer_Poly” and [To ID]={ FIXED [From ID] : MIN([To ID])})

Drag that to the Filter shelf and filter on True.

This next calculation helps with performance and also ensures an accurate color calculation in the next section. For each of our chords, as it stands now, there are actually two sets of lines stacked on top of each other. That’s because each unique relationship has two rows in our source data (one where Actor 1 is the Source and Actor 2 is the Target, and one where Actor 2 is the Source and Actor 1 is the Target). This calculation will keep all of the records for our two polygon sections, but for the line sections, it will only keep the first occurence of each relationship. Call it [Unique_Relationship_Filter].

Unique_Relationship_Filter = CONTAINS([Shape Type],”Poly”) or (not CONTAINS([Shape Type],”Poly”) and [Unique Relationship Order]=1)

Drag that to the Filter shelf and filter on True.

This last one is purely for performance and is completely optional. Depending on how large this chart will be in your viz, you may not need 1,000 lines to color the chords. You can use a parameter and a calculated field to set the number of lines you want to use. First, create a parameter called [Gradient Lines], set the data type to float, set the max value to 1,000, and set the current value to about 750. Then create a simple calc called [Gradient_Line_Filter].

Gradient_Line_Filter = [Points]<=[Gradient Lines]

Drag this field to the Filter shelf and filter on True. Then, right click on the blue pill, and select “Add to Context”.

Alright, now we’re ready to move on to Design!

Adding Color to the Chart

This part took me a little while to figure out. The goal is to be able to have a unique color for each of the entities, and then have the chord between them, transition from one of the entity’s colors, to the other. We have our [From_ID] field that is a sequential number for each entity, so that’s what we’ll use to color our polygons.

To color our chord borders, we’re going to use a parameter with a value lower than our first [From ID], and I’ll talk more about this a little later on. For now, create a parameter called [Color Range Start], set the data type to Float, and set the current value to .5.

To color the gradient lines, we’ll need a few calculations. The first thing we need to do, is calculate an evenly spaced value between the ID for the Source entity and the ID for the Target entity. For example, if we have a chord going from entity 1 to entity 2, we’ll take the difference of those and divide by the number of lines (1,000). Same thing if we have a chord going from entity 1 to entity 5. Let’s create that calc and call it [Color Step Size].

Color Step Size = ABS([From ID]-[To ID])/([Max Point]-1)

Now we’ll multiply that value by the [Points] field and add it to our [From ID] field to get evenly spaced values between the Source ID and the Target ID. Call this [Step Color Num].

Step Color Num = [From ID] + (([Points]-1)*[Color Step Size])

And finally, a simple case statement to use the [Step Color Num] value for the inner lines, the [From ID] value for the polygons, and the [Color Range Start] value for the chord borders. Call this [Color].

Color

CASE [Shape Type]
WHEN “Inner_Lines” then [Step Color Num]
WHEN “Line_End_Poly” then [From ID]
WHEN “Outer_Poly” then [From ID]
WHEN “Inner_Lines_Border” then [Color Range Start]
END

Now, on the Marks Card, click on the “All” card. Then, right click on the [Color] field, drag that onto Color, and when prompted, choose Color without aggregation. Notice something kind of funky happens when you add the Color.

An image showing the issue caused by sorting on the color measure

Our Border Lines are rising to the top and crossing over all of the Inner Lines, which is not what we want. The reason this is happening, is that Tableau uses the order of fields on the Marks Card to sort. When we added [Color], it moved to the top of the list of fields on the Marks Card, so that is the first thing being used to sort. Since our Borders all currently have the lowest value of any mark (based on the parameter we created earlier, it should be .5), those lines are coming to the top.

To fix that, on the Marks Card, on the Final_X_Line card, drag the [Color] field all the way to the bottom. The order of the fields on the Marks Card should be as follows to get the correct sorting.

  1. From ID
  2. To ID
  3. Shape Type
  4. Line ID
  5. Order
  6. Color

Now your chart should look something like this.

An image with the color sorting resolved

It’s starting to come together!

Picking the Right Colors

You could use any of the sequential palettes available in Tableau, and they would work just fine. But for my viz, I wanted unique colors for each of the entities, not different shades of the same color. The palette I ended up using is called “Inferno”. You can create any custom palette you would like, but you want to make sure you pick a palette that is “Perceptually Uniform”. There is a lot behind these palettes, but basically it means that each color in your palette is close to the next color, and the amount of change between one color and the next is consistent throughout the entire palette. Here are some great examples of Perceptually Uniform color scales.

Examples of perceptually uniform color palettes

Here is the color palette that I used, and if you’re not sure how to add custom color palettes to Tableau Desktop, here’s an article from the Tableau Knowledge Base.

<color-palette custom=’true’ name=’Inferno’ type=’ordered-sequential’>

<color>#F3E660</color>

<color>#FAC72C</color>

<color>#FCA50F</color>

<color>#F98511</color>

<color>#EE6925</color>

<color>#DD513A</color>

<color>#C73E4C</color>

<color>#AE2F5C</color>

<color>#932667</color>

<color>#781C6D</color>

<color>#5C136E</color>

<color>#420F68</color>

<color>#24094F</color>

<color>#090625</color>

<color>#FFFFFF</color>

</color-palette>

Just a few things to keep in mind when building your palette. I would recommend ordering them in your preferences file from lightest to darkest. Also, on the very last line, add a line for White (#FFFFFF). This is what is used for the chord border color. Also, make sure that you set the type=sequential.

Once you add your custom palette, go ahead and select it from the Color options on the Mark Card. If you ordered your colors from lightest to darkest, you’ll also want to check the “Reversed” box in the color options.

Now that I’ve added my custom palette and applied it to the viz, this is what it looks like.

An image with the custom Inferno palette applied to the chord chart

Just one more note about color. We had built that [Color Range Start] parameter to select a value that can be used for coloring the borders. In my sample data, I only had 5 entities, so the largest value in my [Color] field is 5. When I use .5 in that parameter, it gives me the full color range from my palette, and a white border. If you have more or less entities, that might not be the case. You may need to play with this parameter a bit to make it work correctly for your chart.

If you pick a number that is too low, you will not get the full color range from your palette. This is what it looks like if I set that value to -1.

An image showing the resulte when the color range start value is too low

On the other side of that, if you pick a number that is too high, the white from your palette will start to “bleed” into your chart. This is what it looks like if I set the value to .9

An image showing the result when the color range start value is too high

It’s not an exact science, so just play with that parameter until you have a white border and your full color range. If you don’t have the full color range, increase that parameter value. If the white is bleeding into your chart, decrease that value.

Finishing Touches

We are almost done! Just a few more steps that I would recommend. First, hide the null indicator in the bottom right of the chart, by right clicking on it and selecting “Hide Indicator”. Then clean up your borders and grid lines, and last, but definitely not least, I would recommend creating a calculated field to determine the thickness of your lines. For the gradient lines, we want a very thin line, but for the borders, it should be a little bit thicker. Create a calculated field called [Line Size].

Line Size = If [Shape Type]=”Inner_Lines_Border” then 2 else 1 END

Then, on the Marks Card, on the Final_X_Line card, right click on the [Line Size] field, drag it to Size, and then select Line Size without aggregation when prompted. Then, similar to what we did with Color, drag that field all the way down to the bottom on the Marks card, below [Color]. Again, you can play with these to get the look that you want. Double click on the Size Legend, and play with the slider. This is how mine are set.

An image showing the Line Size legend

And that’s it! We are done! If you have made it this far, please let me know on Twitter or LinkedIn. This was the most complicated topic we’ve covered so far, and I would love to get your thoughts, and to see what you came up with. As always, thanks for reading, and check back soon for more fun Tableau tutorials!

Categories
How-To's Tableau Techniques

Fun With Curves in Tableau Part 2+: Controlling Bezier Curves

This post is an add-on to Part 2 of the Fun with Curves in Tableau series.

Recently, after using the post above to create a chord chart, someone had reached out asking if there was a way to tone down the curves, or, as they had put it, to make them less “bouncy”. I started racking my brain trying to figure out how to make that happen. In my mind, the only possible way to change the intensity of the curves would be to calculate a new mid-point for each of the lines, rather than using the center of the circle (0,0).

For the next hour or so, I wrote and tested 9 new calculated fields that would allow users to “smooth” their curves. Excitedly, I jumped back on Twitter to share my update, and found that this person had not only figured out how to do it, but the method they came up with was exponentially easier than what I had done. So a huge thank you to Anne-Sophie Pereira De Sá for figuring out Method 1 below for controlling your curves.

Luckily, the work I had done wasn’t a complete waste, as it led me to two more methods for controlling the intensity of the curves. So in this post we’re going to cover a total of three different approaches depending on what you want your curves to look like.

Also, just want to mention that these methods are really only relevant if you’re building a radial chart, like a chord chart. If you are using the Part 2 tutorial to build something like an Arc Chart, there are much easier methods for controlling the intensity of the curves (you can modify the intensity of the curves by adjusting the Y_Mid value).

If you want to follow along, I have updated the sample workbook for Part 2 of the series, which can be downloaded here.

The Methods

Method 1: Smoothing

The first method will allow you to tone down the intensity of your curves and it is incredibly easy to implement. It uses a parameter to set the intensity of your curve using values from 0 (standard curve) to 1 (no curve). The height of each curve is changed proportionally (ex. .1 will reduce the height of each arc by roughly 10% from it’s standard curve height).

A gif demonstrating Method 1

Method 2: Consistent Scaling

This method allows you to create consistent curves (same arch height) for all of your lines, regardless of where the start and end points are on the circle. It’s much more complicated, but the result is pretty nice. Again, this method uses a parameter with values between 0 and 1 to set the intensity of the curve.

Method 3: Dependent Scaling

The final method allows you to set the arch height based on the length of the line, so shorter lines on the outside of the circle will have a lower arch, and longer lines going through the middle of the circle will have a higher arch. The approach is similar to Method 2, sharing many of the same calculations, and also uses a parameter with values between 0 and 1 to set the intensity of the curves.

Method 1 walk-thru

As I mentioned above, this method is incredibly easy to implement (thank you again Anne). All you need is a new parameter and a few small tweaks to the Bezier X and Y calculations. First, let’s create our parameter.

  • Create a new Parameter called “Smooth Factor”
    • Set Data type to “Float”
    • Set Current Value to 0
    • Set Allowable values to “Range”
    • Set Minimum to 0
    • Set Maximum to 1
    • Set Step size to .1
An image showing the correct parameter settings

Next, let’s modify our calculations. The only difference between the calculations discussed in the original post, and the calculations used here, are the exponents. The original calculation has an exponent of 2 in two sections of the calculation. Anne had figured out that if you replaced this exponent with a 1, you got a straight line, and if you replaced it with anything between a 1 and a 2, it would tone down the curve of the line proportionally.

Here is our original calculation for X

((1-[T])^2*[Circle_X_Start] + 2*(1-[T])*[T]*0+[T]^2*[Circle_X_End])

And here is our new calculation for X

((1-[T])^(2-[Smooth Factor])*[Circle_X_Start] + 2*(1-[T])*[T]*0+[T]^(2-[Smooth Factor])*[Circle_X_End])

The only difference between the two calculations is that we replaced ^2 with ^(2-[Smooth Factor]) in both places. That’s it. So here are our new calculated fields for X and Y and we’ll keep the naming consistent with the original post

Circle_Bezier_X_Smooth = ((1-[T])^(2-[Smooth Factor])*[Circle_X_Start] + 2*(1-[T])*[T]*0+[T]^(2-[Smooth Factor])*[Circle_X_End])

Circle_Bezier_Y_Smooth = ((1-[T])^(2-[Smooth Factor])*[Circle_Y_Start] + 2*(1-[T])*[T]*0+[T]^(2-[Smooth Factor])*[Circle_Y_End])

Now, assuming you had already created your chord chart using the original post, just replace [Circle_Bezier_X] with [Circle_Bezier_X_Smooth] and replace [Circle_Bezier_Y] with [Circle_Bezier_Y_Smooth] and you’re done. Just play around with the parameter to get the intensity of curve that you’re looking for.

Method 2 walk-thru

As I mentioned earlier, this method is a lot more complicated. It involves calculating a new mid-point for each of our lines using some geometry and trigonometry. But similar to the first method, we’ll need a parameter to set the intensity of our curves, so let’s start there.

  • Create a new parameter called “Consistent Factor”
    • Set Data type to “Float”
    • Set Current Value to .5
    • Set Allowable values to “Range”
    • Set Minimum to 0
    • Set Maximum to 1
    • Set Step size to .1

Now here comes the tricky part. And I am sure there is a much easier solution out there, but this is what I was able to come up with.

As we discussed in the original post, to draw these curved lines, we’re essentially creating a bunch of triangles and letting the Bezier X and Y calcs do the rest of the work.

A gif demonstrating that each line is basically just a triangle

Each of these triangles have a different height depending on where, around the circle, the starting point and the ending points appear. We’re using 0,0 as the mid point, so all of the triangles meet in the middle. But if we want to apply a consistent curve to all of our lines, then we need to calculate a new mid point for each of our triangles. And the result of that new mid-point calculation, should be a bunch of triangles with the same height.

You may remember from earlier posts, that in order to plot points around a circle, we need two inputs; the distance from the center of the circle, and the position around the circle. The position around the circle is much easier to calculate so let’s start there. Let’s create a calculated field called “Scale_Mid_Point_Position”

Mid_Point_Position = IF([Position_End]-[Position_Start])*360 >180 THEN (([Position_End]-[Position_Start])/2)+[Position_Start]+.5 ELSE (([Position_End]-[Position_Start])/2)+[Position_Start] END

Now this calculation looks a little complicated, but basically what it’s doing is calculating the center of the vertex angle for each of our triangles by comparing the start and end position of each of our lines. Because this is a circle, this comparison might result in an angle that is greater than 180 degrees, which isn’t possible for a triangle. Really, it just means that our starting point and ending point are farther apart (more than .5) so the direction of the triangle flips. So when that happens, we want the inverse of that position, which we can get by adding “.5” to the calculation. Otherwise, we are just subtracting the start position from the end position, splitting the difference, and then adding it back to the start position to get the position that is directly in the middle of the start and end.

Ok, if I haven’t lost you yet, let’s keep moving. Now we need to calculate the distance from the center for each of our new mid-points. To get there, first we need to calculate the height of our existing triangles (that meet in the middle at 0,0). Unfortunately, it’s not that easy. There are a lot of different ways to calculate different parts of a triangle, so we have to work with what we got. We know the length of two sides of the triangle (equal to the radius of the circle), and we know the vertex angle. That alone isn’t enough to calculate the height (as far as I know), but it gets us a step closer. With that information, we can calculate the length of the base.

First, a quick look at the parts of one of our triangles.

An image showing the different parts of a triangle, including the Height, Base, Vertex Angle, and side length

Now let’s calculate the Vertex angle, which is going to be very similar to the position calculation we just did.

Vertex_Angle = IF ([Position_End]-[Position_Start])*360 >180 then 360-(([Position_End]-[Position_Start])*360) else ([Position_End]-[Position_Start])*360 END

Now, with that, along with our circle’s radius, we can calculate the length of the base.

Tri_Base = SQRT([Radius]^2 + [Radius]^2 – 2*[Radius]*[Radius]*COS(RADIANS([Vertex_Angle])))

And now that we know the length of all three sides of the triangle, we can calculate the height.

Tri_Height = SQRT([Radius]^2 – ([Tri_Base]^2/4))

Now we have everything we need to calculate the distance from the center for each of our mid-points. We want all of our triangles to be the same height. So first, we need a consistent height to use. We can get there by multiplying our radius by our newly created Scale Factor. Our radius = 20, so if our scale factor equals 0, the height will be 0. So no curve. If our scale factor is .5, our height will be 10. Easy.

Now, if we take the height of our existing triangle, and then subtract our new standardized height, that will give us the distance from center, our second and final input.

Con_Mid_Point_Distance = [Scale_Tri_Height] – ([Radius]*[Consistent Factor])

A gif showing all triangles with a consistent height

You can see in the gif above that all triangles now have the same height.

Now we have the position and distance from center for our mid points. All that we have to do is calculate the coordinates for the mid-points using those inputs, and then plug those coordinates into our Bezier X and Y formulas.

Con_Circle_X_Mid = [Con_Mid_Point_Distance]* SIN(2*PI() * [Scale_Mid_Point_Position])

Con_Circle_Y_Mid = [Con_Mid_Point_Distance]* COS(2*PI() * [Scale_Mid_Point_Position])

Circle_Bezier_X_Con = ((1-[T])^2*[Circle_X_Start] + 2*(1-[T])*[T]*[Con_Circle_X_Mid]+[T]^2*[Circle_X_End])

Circle_Bezier_Y_Con = ((1-[T])^2*[Circle_Y_Start] + 2*(1-[T])*[T]*[Con_Circle_Y_Mid]+[T]^2*[Circle_X_End])

Now, again assuming you had already built your chord chart using the original post, just replace [Circle_Bezier_X] with [Circle_Bezier_X_Con] and replace [Circle_Bezier_Y] with [Circle_Bezier_Y_Con] and play with the parameter to get the curve intensity you want.

Method 3 walk-thru

This method is going to be extremely similar to Method 2. The only thing we are changing is the value that we subtract from our triangle height to get the distance from the center for our new mid-points. If you went through Method 2 above, you are already 90% there. Like the other two methods, we need a parameter to control the intensity of our curves. Let’s create one called “Dependent” Factor”

  • Create a new parameter called “Dependent Factor”
    • Set Data type to “Float”
    • Set Current Value to .5
    • Set Allowable values to “Range”
    • Set Minimum to 0
    • Set Maximum to 1
    • Set Step size to .1

Now we need most of the same calculations that we used in Method 2. I’ll list them here again in case you haven’t already tried that method, but for explanations of these calcs, please read through method 2 in the previous section.

Mid_Point_Position = IF([Position_End]-[Position_Start])*360 >180 THEN (([Position_End]-[Position_Start])/2)+[Position_Start]+.5 ELSE (([Position_End]-[Position_Start])/2)+[Position_Start] END

Vertex_Angle = IF ([Position_End]-[Position_Start])*360 >180 then 360-(([Position_End]-[Position_Start])*360) else ([Position_End]-[Position_Start])*360 END

Tri_Base = SQRT([Radius]^2 + [Radius]^2 – 2*[Radius]*[Radius]*COS(RADIANS([Vertex_Angle])))

Tri_Height = SQRT([Radius]^2 – ([Tri_Base]^2/4))

Here is the main difference between this method and the previous one. Instead of multiplying our factor (parameter) by a consistent value, like radius, we are going to multiply it by the base length of our triangles. This will result in a higher value for longer lines, and a lower value for shorter lines. Just want to point out that I divided the base length by 2 in these calculations, to tone down the curves a little more, but that is optional. Here is the calculation for the distance from center for each of our new mid-points.

Dep_Mid_Point_Distance = [Scale_Tri_Height] – (([Scale_Tri_Base]/2)*[Scale Factor])

You can see in the gif above that the height of the triangles are now dependent on the length of their base.

Now we just need to calculate those new mid-points and plug them into our bezier X and Y calculations.

Dep_Circle_X_Mid = [Dep_Mid_Point_Distance]* SIN(2*PI() * [Scale_Mid_Point_Position])

Dep_Circle_Y_Mid = [Dep_Mid_Point_Distance]* COS(2*PI() * [Scale_Mid_Point_Position])

Circle_Bezier_X_Dep = ((1-[T])^2*[Circle_X_Start] + 2*(1-[T])*[T]*[Dep_Circle_X_Mid]+[T]^2*[Circle_X_End])

Circle_Bezier_Y_Dep = ((1-[T])^2*[Circle_Y_Start] + 2*(1-[T])*[T]*[Dep_Circle_Y_Mid]+[T]^2*[Circle_X_End])

Again, assuming you had already built your chord chart using the original post, or one of the previous methods, you just need to replace [Circle_Bezier_X] with [Circle_Bezier_X_Dep] and replace [Circle_Bezier_Y] with [Circle_Bezier_Y_Dep] and play with the parameter to get the curve intensity you are looking for.

That’s it. Three methods for controlling the intensity of your bezier curves. These certainly aren’t necessary when building out your chord charts, but they do produce some nice effects if you are looking to really customize your visualization. Personally, I lean towards Method 1 because of it’s simplicity. If you are looking for something more complicated, Method 2 and Method 3 are both nice, but if I had to pick, I would go with #3.

As always, thank you so much for reading and keep an eye on our blog for more fun Tableau tutorials.

Categories
How-To's It Depends Tableau Techniques

It Depends: KPI Swapping with BAN Selectors

Welcome to another installment of “It Depends”. In this post we’re going to look at two different ways to use BAN’s to swap KPI’s in your dashboard. If you’re not familiar with the term “BANs”, we’re talking about the large summarized numbers, or Big Ass Numbers, that are common in business dashboards.

When I build a KPI dashboard, I like to give my users the ability to dig into each and every one of their key metrics, and the techniques we cover in this post are a great way to provide that kind of flexibility. Here is a really simple example of what we’re talking about.

A gif demonstrating how the measures change when each BAN is selected

In the dashboard above, we have 4 BAN’s across the top; Sales, Quantity, Profit, and Profit Ratio. Below that, we have a bar chart by Sub-Category, and a Line Chart showing the current vs previous year. When a user clicks on any of the BANs in the upper section, the bar chart and the line chart will both update to display that metric. A couple of other things that change along with the metric are the dashboard title, the chart titles, and the formatting on all of the labels.

We’re going to cover two different methods for building this type of flexible KPI dashboard. A lot of what we cover is going to be the same, regardless of which method you choose, but there are some pretty big differences in how both the BANs and the Dashboard Actions are constructed in each method.

For this exercise we’re going to use Superstore Data, so you can connect to that source directly in Tableau Desktop. If you would like to follow along in the Sample Workbook, you can download that here.

The Two Methods

Measure Names/Values – In the first method we’re going to use Measure Names and Measure Values to build our BANs. When a user clicks on one of the Measure Values, we will have a dashboard action that passes the Measure Name to a parameter.

Individual BANs – In the second method, we’re going to use separate worksheets for each of one of our BANs. When a user clicks on one of the BANs, we’ll pass a static value that identifies that metric (similar to the Measure Name) to a parameter. With this method, we’ll need a separate dashboard action for each of our BANs.

Method Comparison

So at this point you may be wondering, why would you waste time building out separate worksheets and separate dashboard actions when it can all be done with a single sheet and a single action. Fair question. As you’re probably aware, Measure Names and Measure Values cannot be used in calculated fields, so with the Measure Names/Values method, you are going to be pretty limited in what you can do with your BANs. Let’s take another look at the BANs in the example dashboard from earlier.

An image of BANs with growth indicators and color applied

Numbers alone aren’t always very helpful. It’s important to have context, something to compare those numbers to. Anytime I put a BAN on a dashboard, I like to add some kind of indicator, like percent to a goal, or growth versus previous periods. Another thing I like to do is to use color to make it very obvious which metric is selected and being displayed in the rest of the dashboard. Neither of these are possible with the first method as they both require calculated fields that reference either the selected measure or the value of that measure.

Unlike some of our other “It Depends” posts, the decision here is pretty easy.

A decision tree for which method to use. If you want to add anything other than the measure to your BAN, or want to apply color to show the selection, use Method 2, otherwise you can use Method 1

Method 2 does take a little more time to set up, but in my opinion, it’s usually the way to go. Beyond the two decision points above, the second method also provides a lot more flexibility when it comes to formatting. But if you’re looking for something quick and these other considerations aren’t all that important to you or your users, by all means, go with the first one.

Methods in Practice

This section is going to focus only on building the BANs and setting up the dashboard actions. We’ll walk through how to do that with both of the methods first, and then we’ll move onto setting up the rest of the dashboard, since those steps will be the same for both methods.

Before we get started, let’s build out just a couple of quick calculations that we’ll be using in one or both methods.

First, let’s calculate the most recent date in our data source. Often, in real world scenarios, you’ll be able to use TODAY() instead of the most recent date, but since this Superstore Data only goes through the end of 2021, we’re going to calculate the latest date.

Max Date: {FIXED : MAX([Order Date])}

Now, let’s calculate the Relative Year for each date in our data source. So everything in the most recent year will have a value of 0, everything in the previous year will have a value of -1, and so on.

Relative Year: DATEDIFF(“year”,[Max Date],[Order Date])

And lastly, we’re working with full years of data here, but that’s usually not the case. In my BANs, I want to be able to show a Growth Indicator, but in a real world scenario, that growth should be based on the value of that metric at the same point in time during the previous year. So let’s build a Year to Date Filter.

Year to Date Filter: DATEDIFF(“day”,DATETRUNC(“year”,[Order Date]),[Order Date])<=DATEDIFF(“day”,DATETRUNC(“year”,[Max Date]),[Max Date])

And that calculation is basically just calculating the day of the year for each order (by comparing the order date to the start of that year), and then comparing it to the day of the year for the most recent order. Again, in a real world scenario, you would probably use TODAY() instead of the [Max Date] calculation.

And finally, we just need one parameter that will store our selection when we click on any of the BANs. For this, just create a parameter, call it “Selected Metric”, set the Data Type to “String”, and set the Current Value to “Sales”.

An image showing what the Parameter settings should look like

Ok, that’s enough for now, let’s start building.

Measure Names/Values

Follow the steps below to build your BANs using the Measure Names/Values method. I’m going to provide the steps on how the ones in the sample dashboard were built, but feel free to format however you would like.

Building

  • Right click on [Relative Year] and select “Convert to Dimension”
  • Drag [Relative Year] to filter shelf and filter on 0 (for current year)
  • Drag Measure Names to Columns
  • Drag Measure Values to Text
  • Drag Measure Names to Filter and select Sales, Quantity, Profit, and Profit Ratio
  • Right click on Measure Names on the Column Shelf and de-select “Show Header”
  • Drag Measure Names to “Text” on the Marks Card

Formatting

  • Change Fit to “Entire View”
  • Click on “Text” on the Marks Card and change the Horizontal Alignment to Center
  • Click on “Text” on the Marks Card, click on the ellipses next to “Text” and format
    • Position Measure Names above Measure Values and set font size to 12
    • Change font size of Measure Values to 28
    • Set desired color
  • On the Measure Values Shelf, below the Marks Card, right click on each Measure and format appropriately (Currency, Percentage, etc.)
  • Go to Format > Borders and add Column Dividers (increase Level to get dividers between BANs)
  • Click on Tooltip on the Marks Card and de-select all checkboxes to “turn off”

When you’re done building and formatting your BANs, your worksheet should look something like this

An image showing what the BANs worksheet should look like with Method 1

Now we just need to add this to our dashboard, and then add a a Parameter Action that will pass the Measure Name from our BANs worksheet to our [Selected Metric] parameter.

  • Go to Dashboard > Actions and click “Add Action”
  • Select “Change Parameter” when prompted
  • Give your Parameter Action a descriptive Name
  • Under Source Sheets, select the BANs worksheet that you created in the previous steps
  • Under Target Parameter, select the “Selected Metric” parameter we created earlier
  • Under Source Field, select “Measure Names”
  • Under Aggregation, select “None”
  • Under Run Action on, choose “Select”
  • Under Clearing the Selection Will, select “Keep Current Value”

The Parameter Action should look something like this.

An image showing what the Parameter Action settings should look like

One last formatting recommendation that I would make is to use one of the methods described in this post, to remove the blue box highlight when you click on one of the BANs. Use either the Filter Technique, or the Transparent Technique.

So that’s it for this method…for now. We’re going to switch over to setting up the BANs and dashboard actions for Method 2 first, and then we’ll regroup and walk through the rest of the dashboard setup. If you plan on using Method 1, please skip ahead to the “Setting up the Dashboard” section below.

Individual BANs

Follow the steps below to build your BANs using the Individual BANs method. I’m going to walk through how to build one of the BANs, and then you’ll need to repeat that process for each one in your dashboard. A couple of other things we’ll do in these BANs include adding growth indicators vs the previous year, and adding color to show when that BAN’s measure is selected/not selected. And as I mentioned in the previous example, I’m going to cover how I formatted these BANs in the sample workbook, but feel free to format however you see fit.

Let’s start by building our “Sales” BAN.

Building

  • Right click on [Relative Year] and select “Convert to Dimension”
  • Drag [Relative Year] to filter shelf and filter on 0 and -1 (for current and prior year)
  • Drag [Year to Date Filter] to filter shelf and filter on True
  • Drag Relative Year to Columns and make sure that 0 is to the right of -1
  • Drag your Measure (Sales) to Text on the Marks Card
  • Add Growth Indicator
    • Drag your Measure (Sales) to Detail
    • Right click and select “Add Table Calculation”
    • Under Calculation Type, select “Percent Difference From”
    • Next to “Relative To”, make sure that “Previous” is selected
    • Drag the Measure with the Table Calculation from Detail onto Text on the Marks Card
  • Right click on the “-1” in the Header and select “Hide”
  • Right click on Relative Year on the Column Shelf and de-select “Show Header”

Formatting

  • Change Fit to “Entire View”
  • Click on “Text” on the Marks Card and change the Horizontal Alignment to Center
  • Click on “Text” on the Marks Card, click on the ellipses next to “Text” and format
    • Insert a line above your Measure and add a label for it (ex. “Sales”). Set font size to 12.
    • Change font size of the measure (ex. SUM(Sales)) to 28
    • Change font size of growth indicator (ex. % Difference in SUM(Sales)) to 14
  • Right click on your measure on the Marks Card and format appropriately (for Sales, set to Currency)
  • Right click on your growth measure on the Marks Card, select Format, select Custom, and then paste in the string below
    • ▲ 0.0%;▼ 0.0%; 0%
    • When the growth is positive, this will display an upward facing triangle, along with a percentage set to 1 decimal point
    • When the growth is negative, this will display a downward facing triangle, along with a percentage set to 1 decimal point
    • When there is 0 growth, this will display 0% with no indicator
  • Click on Tooltip on the Marks Card and de-select all checkboxes to “turn off”

When you’re done building and formatting your Sales BAN, it should look something like this.

An image showing what the Sales BAN worksheet should look like

There are a couple more additional steps before we move on to the dashboard actions. First, we need a field that we can pass from this BAN to our parameter. For this, just create a calculated field called “Par Value – Sales”, and in the calculation, just type the word “Sales” (with quotes).

An example of the Par Value calculation

Par Value Sales: “Sales”

And then drag the [Par Value – Sales] field to Detail on your Sales BAN worksheet.

Just a quick note here. If I was building this for a client, I would probably use a numeric parameter, and pass a number from this BAN instead of a text value. It’s a little cleaner and better for performance, but for simplicity and continuity, we’ll use the same parameter we used in Method 1. Ok, back to it.

Now, we need one more calculated field to test if this measure is the currently selected one. This is just a simple boolean calc, and we’ll call it “Metric Selected – Sales”.

Metric Selected – Sales: [Selected Metric]=”Sales”

Now drag that field to Color on your Sales BAN worksheet. Set the [Selected Metric] Parameter to “Sales” (so the result of the calculation is True) and assign a Color. Now, set the [Selected Metric] Parameter to anything else (so the result of the calculation is False) and assign a color.

Now our Sales BAN is built, we just need to add it to our dashboard and then add a Parameter Action that will pass our [Par Value – Sales] field to our [Selected Metric] parameter when a user clicks on the Sales Ban.

  • Go to Dashboard > Actions and click “Add Action”
  • Select “Change Parameter” when prompted
  • Give your Parameter Action a descriptive Name
  • Under Source Sheets, select the Sales BAN worksheet that you created in the previous steps
  • Under Target Parameter, select the “Selected Metric” parameter we created earlier
  • Under Source Field, select [Par Value – Sales]
  • Under Aggregation, select “None”
  • Under Run Action on, choose “Select”
  • Under Clearing the Selection Will, select “Keep Current Value”

The Parameter Action should look something like this.

An image showing what the parameter action settings should look like

And just like with Method 1, I would recommend using one of the methods described in this post, to remove the blue box highlight when you click on the Sales BAN. You could use either the Transparent or the Filter Technique, but with this method, I would really recommend using the Filter Technique.

Now, repeat every step from the “Individual BANs” header above to this step, for each of your BANs. I warned you it would take a little longer to set up, but it’s totally worth it. And once you’re comfortable with this technique, it moves very quickly. To save some time, you can probably duplicate your Sales BAN worksheet and swap out some of the metrics and calculations, but be careful you don’t miss anything.

Setting up the Dashboard

Now our BANs are built and our dashboard actions are in place. Either method you chose has brought you here. We just have a few steps left to finish building our flexible KPI dashboard. Here’s what we’re going to do next.

  • Adjust our other worksheets to use the selected metric
  • Dynamically format the measures in our labels and tooltips
  • Update our Headers and Titles to display the selected metric

Show Selected Metric in Worksheets

The first thing we need to do here is to create a calculated field that will return the correct measure based on what is in the parameter. So users will click on a BAN, let’s say “Sales”. The word “Sales” will then get passed to our [Selected Metric] parameter. Then our calculation will test that parameter, and when that parameter’s value is “Sales”, we want it to return the value of the [Sales] Measure. Same thing for Quantity, Profit, etc. So let’s create a CASE statement with a test for each of our BAN measures, and call it “Metric Calc”.

Metric Calc

CASE [Selected Metric]
WHEN “Sales” then SUM([Sales])
WHEN “Quantity” then SUM([Quantity])
WHEN “Profit” then SUM([Profit])
WHEN “Profit Ratio” then [Profit Ratio]
END

Now, we just need to use this measure in all of our worksheets, instead of a static measure. In our Bar Chart, we’re going to drag this measure to Columns. In our Line Chart, we’re going to drag this measure to Rows.

An image showing the Metric Calc field being used in the dynamic charts in the dashboard

Now, whenever you click on a BAN in the dashboard, these charts will reflect the measure that you clicked on. Pretty cool right? But there is a glaring problem that needs to be addressed.

Dynamic Formatting on Labels/Tooltips

In our example, we have 4 possible measures that could be viewed in the bar chart and line chart; Sales, Quantity, Profit, and Profit Ratio. So 4 possible measures, with 3 different number formats.

  • Sales = Currency
  • Quantity = Whole Number
  • Profit = Currency
  • Profit Ratio = Percentage

At the time of writing this post, Tableau only allows you to assign one number format per measure. But luckily, as with all things Tableau, there is a pretty easy way to do what we want. We’re going to create one calculated field for each potential number format; Currency, Whole Number, and Percentage.

Metric Label – Currency: IF [Selected Metric]=”Sales” or [Selected Metric]=”Profit” then [Metric Calc] END

Metric Label – Whole: IF [Selected Metric]=”Quantity” then [Metric Calc] END

Metric Label – Percentage: IF [Selected Metric]=”Profit Ratio” then [Metric Calc] END

Here’s how these calculations are going to work together. When a user selects;

  • Sales
    • [Metric Label – Currency] = [Metric Calc]
    • [Metric Label – Whole] = Null
    • [Metric Label – Percentage] = Null
  • Quantity
    • [Metric Label – Currency] = Null
    • [Metric Label – Whole] = [Metric Calc]
    • [Metric Label – Percentage] = Null
  • Profit
    • [Metric Label – Currency] = [Metric Calc]
    • [Metric Label – Whole] = Null
    • [Metric Label – Percentage] = Null
  • Profit Ratio
    • [Metric Label – Currency] = Null
    • [Metric Label – Whole] = Null
    • [Metric Label – Percentage] = [Metric Calc]

No matter what BAN is selected, ONE of these calculations will return the appropriate value and TWO of these calculations will be Null. In Tableau, Null values do not occupy any space in labels and tooltips. So all we need to do is line up these three calculations together. The two Null values will collapse, leaving just the populated value. Here’s how you do that.

  • Go to the Metric by Sub-Category sheet (or one of the dynamic charts in your workbook)
  • If [Metric Calc] is on Label, remove it
  • Format your measures
    • Right click on [Metric Label – Currency] in the Data Pane, select “Default Properties” and then “Number Format”
    • Set the appropriate format
    • Repeat for [Metric Label – Whole] and [Metric Label – Percentage]
  • Drag all 3 [Metric Label] fields to “Label” on the Marks Card
  • Click on “Label” on the Marks Card and click on the ellipses next to “Text”
  • Align all of the [Metric Label] fields on the first row in the Edit Label box, with no spaces between fields (see image below)

Your Label should look like this.

An image showing the correct layout, with all of the dynamic labels next to each other

If all of your calculations and default formatting are correct, your Chart labels should now be dynamic. When you click on Sales or Profit, your labels should show as currency. When you click on Quantity, your labels should show as whole numbers. And when you click on Profit Ratio, your labels should show as Percentages. You can repeat this same process for all of your Labels and Tooltips.

Displaying Parameter in Titles

Saving the easiest for last! One last thing you’ll want to do is to update your chart titles so that they describe what is in the view. The view is going to be changing, so the titles need to change as well. Luckily this is incredibly easy.

  • Double click on any chart title, or text box being used as a title
  • Place your cursor where you would like the [Selected Metric] value to appear
  • In the top right corner, select “Insert”
  • Choose the [Selected Metric] parameter

It should look like this. Just repeat for all of your other titles (can work in Tooltips as well).

An image demonstrating how to insert the parameter into a chart title

Finally, I think we’re done! Those are two different methods for building really flexible KPI dashboards. This is a model that I use all the time, and users always love it! It’s incredibly powerful and just a really great way to let your users explore their data without having to build different dashboards for each of your key indicators.

As always, thank you so much for reading, and see you next time!

Categories
How-To's Tableau Techniques

Robinson Projection in Tableau

When it comes to visualizing data, it’s no secret that the Mercator Projection has it’s issues. Certain countries, especially in the northern hemisphere, appear much larger than they are in reality, and it gets worse the farther you move from the equator. Just look at the difference in the size of Greenland between the Mercator and Robinson projections.

That’s because this particular projection was created for one specific purpose…to aid in navigation. And it is perfect for that purpose, but not so perfect for visualizing data. But when it comes to Tableau, it’s our only option…or is it?

I’d like to use this opportunity to point out that I did not come up with the methods discussed in this blog post, but they are techniques that I use pretty frequently, and something I get asked about a lot. So the goal of this post is to combine information from various sources, provide you with all of the files needed to create your own Robinson projections, and walk through, step by step, multiple approaches to building these maps in Tableau. For more information, I recommend checking out this blog post by Ben Jones, and this post by John Emery

I also want to use this opportunity to point out that these methods are more complicated, and less performant than using Tableau’s standard mapping, so I would recommend using them with caution.

Different Methods

This post is going to cover three different methods for creating Robinsons Projection Maps in Tableau.

  • Using a Shapefile
  • Using a CSV File (for countries)
  • Using a CSV File (for cities)

You can find all of the files needed for these methods here, as well as a sample workbook with all of the maps built.

Method 1: Using a Shapefile

This is by far the easiest method, but it’s a little less flexible than using a CSV File. To get started, download the following 4 files from the link above, and place them all in the same folder somewhere on your machine.

  • Country_ShapeFile_Robinson.DBF
  • Country_ShapeFile_Robinson.PRJ
  • Country_ShapeFile_Robinson.shp
  • Country_ShapeFile_Robinson.SHX

Now, let’s set up our data source

Building the Data Source

One quick note before we start. This particular Shapefile only has the 2-digit codes for countries. In order to relate this file to your data, you’ll need to make sure that your source also contains these codes (would not recommend trying to relate on country name). You can find a full list of codes here. Now let’s build our data source.

  • Connect to your Data
    • If you do not have data and just want to practice these techniques, you can use the Happiness_Scores.xlsx (from the World Happiness 2022 Report) file in the Google Drive
  • Connect to the Shapefile
    • From the Data Source Page, click “Add” at the top left to add a new connection
    • In the “To a File” options, select “Spatial File”
    • Navigate to the location where you stored the files above and select the Country_ShapeFile_Robinson.shp file
    • Drag the Shapefile into the data source pane
  • Update the relationship
    • Click on the noodle connecting the two sources
    • In the lower section, select the country code in each file (called ISO in Shapefile)

When complete, your data source should look like this

Building the View

  • Create a new worksheet
  • Double-click on the [Geometry] field
  • Add the 2-digit country code from your primary source onto Detail (in the sample data it’s called ISO A2)
  • Add whatever measure you would like to Color and assign your desired palette (in this example, I placed [Happiness Score] on color and assigned a Viridis palette)

When complete, your worksheet should look something like this.

Well, that’s a bit strange, isn’t it? Here’s an extremely quick explanation of what’s happening. Tableau only supports the Mercator projection. Also, in Tableau, you are limited to just the Map Mark Type for Shapefiles. When you add the [Geometry] field, Tableau generates the latitude and longitude for each polygon in your Shapefile and plots them on the map. But the coordinates that make up those polygons in our Shapefile are for a Robinson projection. So here you can see our Robinson Projection overlaid on Tableau’s Mercator Projection. There’s some magic happening behind the scenes in those Shapefiles that allows this to happen. You can read more about it in John’s post.

So now, let’s get rid of the Mercator Projection (or at least hide it).

First, you’ll want to remove the “Search” Map option. This search is based on the positions of countries in the Mercator Projection, not our new Robinson Projection, so if you try to search for a country, it’s going to bring you to the wrong place. I’d recommend disabling all of the Map options, but that’s up to you

  • In the upper toolbar, click on Map
  • Click on Map Options
  • Deselect the box labelled “Show Map Search”
  • Optional – Deselect all other options

Next we’ll want to remove all of the Map Layers

  • In the upper toolbar, click on Map
  • Click on Map Layers
  • Deselect all of the boxes in the Map Layers Window along the left side of the screen

The Map Layers Window should look like this

Finally, let’s remove our Worksheet Shading, just in case we want to put this on a colored dashboard, or lay it over a background image

  • In the upper toolbar, click on Format
  • Click on Shading
  • Change the first option (on the Sheet tab, under Default, labelled “Worksheet”) from White to None

And that should do it! At this point, your map should look something like this.

Wait a second…something’s still not right. Antarctica has no permanent population, so how did they respond to the World Happiness Survey? Well, they didn’t. In fact, there are several countries that are showing up on this map as yellow that aren’t in the World Happiness data. But because we’re using relationships, and because these countries exist in the Shapefile, they are being shown on the map and colored the same as the lowest value in the data source. This can be misleading, so let’s get rid of those.

  • Drag the measure (the same one currently on color) to the filter shelf
  • Click on “Special”
  • Select “Non-Null” Values

Ok, now we should be all set.

I got to be honest, something about this still bothers me. I would love to be able to see the countries that are missing data, but unfortunately, Tableau does not have the capability to ignore null values in the color application. But we have a few options.

Option 1: Create bins off of your measure and assign a color to each bin. Null will have its own bin

Option 2: The same as option 1 but use a calculated field to set thresholds instead of creating bins

Option 3: Duplicate our map worksheet, remove the measure from filter, remove the measure from color, set the color to how we want the “missing” values to appear, and then stack these two maps on top of each other on our dashboard (set to floating and set x, y, w, and h the same). The result would look something like this

Much better! Any of these methods will work, but I’m partial to the 3rd. This is mainly because you can use a diverging or continuous color palette instead of having to figure out which colors to assign to each “bin” with Option 1 and 2.

Now let’s move on to the CSV File method

Method 2: Using a CSV File (for countries)

This method has a lot of similarities to the first one, so I won’t go in to too much detail. The two main differences are; we are going to use a csv instead of a shapefile, and we are going to use the polygon mark type instead of a map.

To get started, download the CountryShapes.csv file from the Google Drive.

Building the Data Source

We’re going to set up our data source the same way as we did in Method 1. Connect to your data, add a connection for the csv file, and set up the relationship. This csv file, unlike the shapefile, has a ton of different country identifier fields. Do some prep work in advance to figure out which of these identifiers match what is in your data source, and set up the relationship using that field. When you’re finished, it should look something like this.

Building the View

Building the view is a little more complicated with this method. This csv file has the coordinates to “draw” all of the countries. But take a look at what happens if we try to use those coordinates as-is.

Now that looks an awful lot like a Mercator Projection. Well, that’s because it is. This file has the all of the coordinates to “draw” each country, but they are based on the Mercator Projection. So we need to re-calculate those latitude and longitude values. We’re going to build two new calculated fields, R_Lat (for our new latitude), and R_Lon (for our new longitude).

R_Lat

IF [Latitude]=0 THEN [Latitude]

ELSEIF ABS([Latitude])<5 THEN 0.5072 * (0+(0.0620-0) * ((ABS([Latitude])-0)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<10 THEN 0.5072 * (0.0620+(0.1240-0.0620) * ((ABS([Latitude])-5)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<15 THEN 0.5072 * (0.1240+(0.1860-0.1240) * ((ABS([Latitude])-10)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<20 THEN 0.5072 * (0.1860+(0.2480-0.1860) * ((ABS([Latitude])-15)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<25 THEN 0.5072 * (0.2480+(0.3100-0.2480) * ((ABS([Latitude])-20)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<30 THEN 0.5072 * (0.3100+(0.3720-0.3100) * ((ABS([Latitude])-25)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<35 THEN 0.5072 * (0.3720+(0.4340-0.3720) * ((ABS([Latitude])-30)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<40 THEN 0.5072 * (0.4340+(0.4958-0.4340) * ((ABS([Latitude])-35)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<45 THEN 0.5072 * (0.4958+(0.5571-0.4958) * ((ABS([Latitude])-40)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<50 THEN 0.5072 * (0.5571+(0.6176-0.5571) * ((ABS([Latitude])-45)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<55 THEN 0.5072 * (0.6176+(0.6769-0.6176) * ((ABS([Latitude])-50)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<60 THEN 0.5072 * (0.6769+(0.7346-0.6769) * ((ABS([Latitude])-55)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<65 THEN 0.5072 * (0.7346+(0.7903-0.7346) * ((ABS([Latitude])-60)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<70 THEN 0.5072 * (0.7903+(0.8435-0.7903) * ((ABS([Latitude])-65)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<75 THEN 0.5072 * (0.8435+(0.8936-0.8435) * ((ABS([Latitude])-70)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<80 THEN 0.5072 * (0.8936+(0.9394-0.8936) * ((ABS([Latitude])-75)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<85 THEN 0.5072 * (0.9394+(0.9761-0.9394) * ((ABS([Latitude])-80)/5)) * SIGN([Latitude])

ELSEIF ABS([Latitude])<90 THEN 0.5072 * (0.9761+(1-0.9761) * ((ABS([Latitude])-85)/5)) * SIGN([Latitude])

END

R_Lon

IF [Latitude]=0 THEN [Longitude]

ELSEIF ABS([Latitude])<5 THEN [Longitude] * (1-((ABS([Latitude])-0)/5) * (1-0.9986))

ELSEIF ABS([Latitude])<10 THEN [Longitude] * (0.9986-((ABS([Latitude])-5)/5) * (0.9986-0.9954))

ELSEIF ABS([Latitude])<15 THEN [Longitude] * (0.9954-((ABS([Latitude])-10)/5) * (0.9954-0.9900))

ELSEIF ABS([Latitude])<20 THEN [Longitude] * (0.9900-((ABS([Latitude])-15)/5) * (0.9900-0.9822))

ELSEIF ABS([Latitude])<25 THEN [Longitude] * (0.9822-((ABS([Latitude])-20)/5) * (0.9822-0.9730))

ELSEIF ABS([Latitude])<30 THEN [Longitude] * (0.9730-((ABS([Latitude])-25)/5) * (0.9730-0.9600))

ELSEIF ABS([Latitude])<35 THEN [Longitude] * (0.9600-((ABS([Latitude])-30)/5) * (0.9600-0.9427))

ELSEIF ABS([Latitude])<40 THEN [Longitude] * (0.9427-((ABS([Latitude])-35)/5) * (0.9427-0.9216))

ELSEIF ABS([Latitude])<45 THEN [Longitude] * (0.9216-((ABS([Latitude])-40)/5) * (0.9216-0.8962))

ELSEIF ABS([Latitude])<50 THEN [Longitude] * (0.8962-((ABS([Latitude])-45)/5) * (0.8962-0.8679))

ELSEIF ABS([Latitude])<55 THEN [Longitude] * (0.8679-((ABS([Latitude])-50)/5) * (0.8679-0.8350))

ELSEIF ABS([Latitude])<60 THEN [Longitude] * (0.8350-((ABS([Latitude])-55)/5) * (0.8350-0.7986))

ELSEIF ABS([Latitude])<65 THEN [Longitude] * (0.7986-((ABS([Latitude])-60)/5) * (0.7986-0.7597))

ELSEIF ABS([Latitude])<70 THEN [Longitude] * (0.7597-((ABS([Latitude])-65)/5) * (0.7597-0.7186))

ELSEIF ABS([Latitude])<75 THEN [Longitude] * (0.7186-((ABS([Latitude])-70)/5) * (0.7186-0.6732))

ELSEIF ABS([Latitude])<80 THEN [Longitude] * (0.6732-((ABS([Latitude])-75)/5) * (0.6732-0.6213))

ELSEIF ABS([Latitude])<85 THEN [Longitude] * (0.6213-((ABS([Latitude])-80)/5) * (0.6213-0.5722))

ELSEIF ABS([Latitude])<90 THEN [Longitude] * (0.5722-((ABS([Latitude])-85)/5) * (0.5722-0.5322))

END

Ok, so those calculations are a bit intense. But luckily, they are the only calculations that we need. Now let’s build our Robinson Projection.

  • Right click on [R_Lon] and drag it to Columns. When prompted, choose R_Lon without any aggregation
  • Right click on [R_Lat] and drag it to Rows. When prompted, choose R_Lat without any aggregation
  • Right click on your measure (Happiness score in the sample data) and drag it to Color. Choose MIN() when prompted
  • Drag [Polygon ID] to Detail
  • Drag [Sub Polygon ID] to Detail
  • Right click on [Point ID] and drag it to Path. When prompted, choose Point ID without any aggregation
  • Right click on the R_Lat axis, select “Edit Axis” and fix the axis to Start at -.6 and end at .6
  • Right click on the R_Lon axis, select “Edit Axis” and fix the axis to Start at -200 and end at 200

It should look something like this

You’ve probably noticed that we have the same problem as we did before. Countries that were not in our data are showing up in the map and colored the same as the lowest value. To remove these, you can follow the same process as we did with the first method (drag measure to filter, click on All Values, click on “Special”, select “Non-null values”). Another way to remove these is to just drag the country field from your data file to Detail.

Now, if we want to view those missing countries, but not apply the color, we could do exactly what we did with the first method (duplicate and stack). Or…we can do something a bit different. Let’s add a background image.

From the Google Drive, download the MapBackground.png file.

Now, let’s add that as a background image to our map

  • In the upper toolbar, click on Map
  • Click on Background Images
  • Select your data source (the same one used for the map)
  • Click “Add Image”
  • Give your background image a name like “Background Map”
  • Click “Browse” and select the MapBackground.png file
  • Under “X Field” make sure R_Lon is selected and set the Left = -180 and the Right = 180
  • Under the “Y Field” make sure R_Lat is selected and set the Left = -.51 and the Right = .52
  • Adjust the “Washout” for a lighter or darker map

The Background Image options should look like this when you’re done

You can also find or create your own background images, you’ll just have to play with them a bit to get them lined up correctly.

At this point, you’re pretty much done, but I would recommend a few finishing touches (similar to the first method)

  • Disable all Map Options
  • Hide the X and Y axes
  • Remove all Lines (Grid lines/Zero lines/etc.)
  • Remove Worksheet Shading (change White to Null)

And now, you’re map should look something like this

Alright, we’re going to cover one more method, but I promise, this one will be short. These two methods are great if you’re mapping countries, but what if you want to map cities?

Method 3: Using a CSV File (for cities)

In order to plot cities on a Robinson Projection, your data source will need to have the Latitude and Longitude values. Don’t have those? Well, let Tableau do the work for you. Here’s a link to a quick tutorial on how to export the generated Latitude and Longitude values from Tableau

How to get latitude and longitude values from Tableau

Once you have those in your data source, connect to your file, and let’s start building. If you don’t have a file, you can download the “OlympicGames_Cities.xlsx” file from our Google Drive.

Building the View

So we have the coordinates for each of our cities, but similar to the last example, if we try to plot these in Tableau, it’s going to plot them using the Mercator Projection. So we are going to use the same R_Lon and R_Lat calculations that we used in the previous method. They’re crazy long, so I’m not going to paste them here, but you can find them earlier in the post.

Once you have those calculations built, the process is nearly identical to Method 2.

  • Right click on [R_Lon] and drag it to Columns. When prompted, choose R_Lon without any aggregation
  • Right click on [R_Lat] and drag it to Rows. When prompted, choose R_Lat without any aggregation
  • Drag City to Detail
  • Right click on the R_Lat axis, select “Edit Axis” and fix the axis to Start at -.6 and end at .6
  • Right click on the R_Lon axis, select “Edit Axis” and fix the axis to Start at -200 and end at 200
  • Make any other design adjustments desired (change mark type to circle, add field to color, adjust size, adjust transparency, etc)

Now, just follow the same exact process from Method 2 to add your Background Map and to put on the finishing touches. And you should have something like this

Well that’s it for this post. I hope you enjoyed it, and as always, please reach out with any questions or to share your creations. We would love to see them. Thanks for reading, and see you next time!

Categories
How-To's It Depends Tableau Techniques

Building an Org Chart in Tableau: Two methods

Many of us have been asked at some point to build an org chart, or something like it, in Tableau. And, like most of you, I started off with some ideas on how it could work, played with it a little bit, and then went to trusty ol’ Google, to see what the community had come up with. And as usual, the community delivered. I found two posts that set me on the right direction, even though they weren’t quite working for what I needed to do. So, credit, and a huge thank you to Jeff Schaffer, for his post on the subject from 2017, and to Nathan Smith for his post.

Starting with the data…

In order to build an org chart, you will need, at minimum — two fields:

  1. Employee
  2. Manager

Ideally you will have unique IDs for these records, and additional information such as departments and titles. But those two fields are all you really need.

Next, you will need to shape your data to create the hierarchical relationships between the Employee, their direct subordinates, and all of their supervisors. There are two approaches you can take to model the data. Whether you can transform the data using Tableau Prep, Alteryx, SQL, etc. will probably be the main factor in the decision. Both methods will produce the same end result from the user’s perspective.

Method 1: Preparing the data outside of Tableau Desktop

Using this method, we will prepare the data in Tableau Prep* to create a table that has one record for each employee-supervisor, and one record for each employee-subordinate relationship. We will then use the output to build the org chart visual in Tableau Desktop.

*I’ve used Prep to demonstrate because it does a nice job of visually showing what is happening, and many Tableau Creators have access to Tableau Prep. You can use the same concepts in your data prep tool of choice.

  • Pro: If the hierarchy becomes deeper, you can make the change once in the workflow and the Tableau dashboard will not need to be updated to scale with your organization. (If using Alteryx or SQL, this can be fully automated)
  • Con: You need the ability and access to use a data preparation tool and refresh the data on a schedule.

Learn how to use this method here >

Method 2: Preparing the data in Tableau Desktop

Using this method, we will create a data source in Tableau Desktop with one record for each employee with one column for each supervisor in the hierarchy, and one record for each employee-subordinate relationship. We will then use the data source to build the org chart visual.

  • Pro: You an do all the data preparation you need right within Tableau Desktop, with no other tools or schedulers necessary.
  • Con: There will be more to update in the event the organizational hierarchy gets deeper.

Learn how to use this method here >

The end result

What I ended up with was an interactive org chart dashboard that thrilled my stakeholders, complete with name search and PDF downloads, and a lot of interactivity. I’ve published a couple of variations with fewer bells and whistles to my Tableau Public profile.

An interactive org chart navigator dashboard:

Org Chart - Interactive

And, a static vertical layout for printing to PDF:

Org Chart - Printable
Categories
How-To's It Depends Tableau Techniques

It Depends: Techniques for Filtering on Multiple Selections with Dashboard Actions in Tableau

Welcome to installment #3 of the “It Depends” blog series. If you’re not familiar with the series, each installment will cover a question where there is no clear definitive answer. We’ll talk through all of the different scenarios and approaches and give our recommendations on what approach to use and when. The question we are tackling this week is “How can dashboard actions be used to filter on multiple selections in a sheet?”. Pretty easy right. Just use a filter action…or a set action…or set controls…or a parameter action. There are clearly a lot of different ways to accomplish this, but which one should you use? And I think you know the answer…It depends!

But before we start…why does this even matter? Why not just throw a quick filter on the dashboard and call it a day? For me, it’s about user experience. When a user sees a mark on a dashboard and they want to dig into it, would they rather a) mouse over to a container filled with filters, find the right filter, click on a drop down, search for the value they want to filter, click that value, and then hit apply, or b) just click on the mark? Don’t get me wrong, quick filters are important, and often times essential, but they can also be a bit clunky, and can hinder performance. So whenever possible, I opt for dashboard actions.

Using dashboard actions to filter from a single mark is pretty easy, but the process gets a little more complicated when you want to be able to select multiple marks. And as developers, we have a choice on whether or not we want to push that burden onto our users. We have to decide what’s more important, ease of use for our users, or ease of setup for ourselves. We’ll also need to take performance into consideration. Those will be the two biggest decision points for which of these 4 methods to implement. And those methods are, as I mentioned above; Filter Actions, Set Actions, Set Controls, and Parameter Actions. There are no restrictions on these four approaches. Any one of them could be used in any scenario, but it’s up to you to weigh the importance of those two decision points and to determine which method gives you the right balance for your specific situation.

The Four Methods

Filter Action – This is the easiest of the four methods to implement and provides users with a better user experience than quick filters, both in terms of interacting with the dashboard and performance. But there are some downsides. First off, in order to select multiple marks, users would either need to hold down the CTRL key while making their selections, or those marks would need to be adjacent so that users could click and drag to select multiple marks together. Not ideal. Also, with Filter Actions, the selections cannot be used in other elements of your dashboard (calculated fields, text boxes, etc), like they can with other methods. And finally, and this one might be a bit personal, you can’t leverage any of the highlighting tips that I discussed in Installment 1 of this series, Techniques for Disabling the Default Highlighting in Tableau. Technically, you could use the “Highlight Technique” but because you can’t use the selected values in a calculated field, there would be no clear way to identify which mark(s) have been selected. Full disclosure, I never use this technique, but I’m going to include it because technically it will accomplish what we’re trying to do.

Set Action – This method provides a somewhat similar user experience to Filter Actions. Better performance than quick filters, but users still need to hold CTRL, or click and drag to select multiple marks. The main benefit of this approach over Filter Actions is it’s flexibility. You can use it to filter some views, and in other views you can compare the selected values to the rest of the population. This type of analysis isn’t possible with Filter Actions. With Filter Actions, your only option is to filter out all of the data that is not selected. With Set Actions, you can segment your data into selected and not selected. You can also use those segments in calculated fields, which is another huge benefit over Filter Actions.

Set Controls – This method provides all of the same benefits of Set Actions, but with one major improvement. Users do not need to hold CTRL or click and drag. They can click on individual marks and add them one by one to their set. This method is a little more difficult to set up than the previous two, but in my opinion, it is 100% worth it. It’s possible that it may be marginally worse for performance, but I have never had performance issues using this method. The only things that I don’t like about this approach are that you can’t easily toggle marks in and out of the set (you can do this with a second set action in the ‘menu’, but it’s a bit clunky), and you can’t leverage the Filter Technique for disabling the default highlighting (we’ll talk more about this later).

Parameter Action – In my opinion this approach provides the best user experience. Users can select marks one by one to add to their “filter”, and can also, depending on how you set it up, toggle marks in and out of that filter. The main downside here is performance. This technique relies on some somewhat complicated string calculations that can really hurt performance when you’re dealing with large or complex data sets. It’s also the most difficult to implement. But when performance isn’t a concern, I love this approach.

Method Comparison

So which method should you use? Let’s take a look at those two decision points that I mentioned earlier, User Experience and Performance.

If neither of these decision points are important then you can ignore this entire post and just use a quick filter. But that’s usually not the case. If Performance is important, but you’re less concerned with User Experience, you can use either a Set Action or a Filter Action (but I would recommend Set Actions over Filter Actions). If User Experience is important and Performance is not a concern, you can use a Parameter Action. And if both User Experience and Performance are important, then Set Controls are the way to go. But as I mentioned earlier, you are not limited to any of these methods in any scenario, and there could be other elements that influence your decision. So let’s do a closer comparison of these methods.

*Flexibility refers to how the selected values can be used in other elements of the dashboard
**Default Highlight Techniques refer to which technique (from Installment 1) can be used with each method

So now we have a pretty good idea which method we should use in any given scenario. Now let’s look at how to implement each of these.

Methods in Practice

For each of these methods we are going to use this ridiculously simple Sales dashboard. We’re going to use the bar chart on the left (Sales by Subcat) to filter the line chart on the right (Sales by Quarter). If you’d like to follow along, you can download the sample workbook here.

Filter Action

As I mentioned earlier, this is the easiest method to set up, but will require your users to either hold CTRL or click and drag to select multiple marks.

  • Go to Dashboard > Actions and click “Add Action”
  • Select “Filter” from the list of options
  • Give your Action a descriptive Name (ex. SubCat Filter)
  • Under “Source Sheets” select the worksheet that will drive the action. In this example it is our bar chart “Sales by SubCat”.
  • Under “Target Sheets” select the worksheet that will be affected by the action. In this example it is our line chart “Sales by Quarter”
  • Under “Run action on” choose “Select”
  • Under “Clearing the selection will” choose “Show All Values”
  • Click “OK”

When you’re finished, your “Add Filter Action” box should look like this

And now when we click on a mark, hold CTRL and click multiple marks, or click and drag to select multiple marks our line chart will be filtered to just the selected value(s). And when we click in the white space on that sheet, or the selected value (when only one mark is selected) the filter will clear and the line chart will revert to show all values

Set Action

Filtering with Set Actions is slightly more involved but still pretty straightforward. For this method we need to create our set, we need to add that set to the Filter Shelf on our line chart, and then we need to add the dashboard action to update that set.

  • Go to the sheet that will be affected by the action. In this case it is our line chart (Sales by Quarter).
  • Create a set for the field that will be used for the filter. In this case it is our [Sub-Category] field.
    • Right click on [Sub-Category]
    • Click on “Create”
    • Click on “Set”
    • Click the “All” option to select all values
    • Click “OK”
  • Add the filter to the sheet that will be affected by the action (Sales by Quarter).
    • Drag the [Sub-Category Set] to the filter shelf on the “Sales by Quarter” worksheet
    • It should default to “Show Members in Set”, but just in case, click on the drop-down on the [Sub-Category Set] pill on the filter shelf and make sure that is the option selected
  • Add the dashboard action to update the [Sub-Category Set]
    • Navigate back to the dashboard
    • Go to Dashboard > Actions and click “Add Action”
    • Select “Change Set Values” from the list of options
    • Give your Action a descriptive Name (ex. SubCat Set Action Filter)
    • Under “Source Sheets” select the worksheet that will drive the action. In this example it is our bar chart “Sales by SubCat”.
    • Under “Target Set” select the set that was used as the filter in the previous steps. In this case it is our [Sub-Category Set]
    • Under “Run action on” choose “Select”
    • Under “Running the action will” choose “Assign values to set”
    • Under “Clearing the selection will” choose “Add all values to set”
    • Click “OK”

When you’re finished your “Add Set Action” box should look like this.

The way this Set Action configuration works is that each time you make a new selection, the contents of the set are being completely overwritten with the newly selected values. That’s what the “Assign values to set” option does. And when you clear the selection, by clicking on white space in the sheet, or the last selected value, the contents of the set are replaced again with all of the values. That’s what the “Add all values to set” option does.

I would recommend one additional step if you’re using this method to override the default highlighting. When using Set Actions you are somewhat limited on what techniques you can use for this, but the “Highlight Technique” works great. You can read about how to use that technique here. Once you’ve added the Highlight Action, just put the [Sub-Category Set] field on color on your “Sales by Subcat” sheet and select the colors you want to display for marks that are selected (in the set) and marks that are not selected (out of the set). When you’re done, your dashboard should look and function like this. Keep in mind that similar to Filter Actions, users will need to hold CTRL and click, or click and drag to select multiple marks.

Set Controls

Setting up our filter with Set Controls is going to be very similar to Set Actions, but with one major difference. The way Set Controls work is that they allow you to select marks one by one and either add them to your set or remove them from your set. This is a great feature, but it makes filtering with them a little tricky.

If we were to start with all of our values in the set, we couldn’t just click on a value in our bar chart to add it to the set, since it’s already there (as well as all of the other values). So we need to start with the set empty and then start adding values when we click on them. But if our set is empty, and we use that set as a filter, as we did in the previous example, then our line chart will be blank until we start adding values. And we don’t want that. We want the line chart to show all of the values, until we start selecting values, and then we want it to just show those values. And we can accomplish this with a calculated field. So first, we’re going to create our set, then we’ll create our calculated field, then we’ll add that field as a filter to our line chart, and then we’ll add the action to update the set.

  • Go to the sheet that will be affected by the action. In this case it is our line chart (Sales by Quarter).
  • Create a set for the field that will be used for the filter . In this case it is our [Sub-Category] field. (skip this step if you followed along with the Set Action example)
    • Right click on [Sub-Category]
    • Click on “Create”
    • Click on “Set”
    • Click the “All” option to select all values
    • Click “OK”
  • Create a calculated field called [SubCat Filter]
    • { FIXED : COUNTD([Sub-Category Set])}=1 OR [Sub-Category Set]
  • Add the filter to the sheet that will be affected by the action (Sales by Quarter).
    • Drag the [SubCat Filter] field to the filter shelf on the “Sales by Quarter” worksheet
    • Filter on “True”
  • Add the dashboard action to update the [Sub-Category Set]
    • Navigate back to the dashboard
    • Go to Dashboard > Actions and click “Add Action”
    • Select “Change Set Values” from the list of options
    • Give your Action a descriptive Name (ex. SubCat Set Control Filter)
    • Under “Source Sheets” select the worksheet that will drive the action. In this example it is our bar chart “Sales by SubCat”.
    • Under “Target Set” select the set that was used as the filter in the previous steps. In this case it is our [Sub-Category Set]
    • Under “Run action on” choose “Select”
    • Under “Running the action will” choose “Add values to set”
    • Under “Clearing the selection will” choose “Remove all values from set”
    • Click “OK”

When you’re finished your “Add Set Action” box should look like this.

So there are a couple of things we should cover here, starting with the calculated field. Here it is again.

{ FIXED : COUNTD([Sub-Category Set])}=1 OR [Sub-Category Set]

Sets only have two possible values; IN or OUT. The set may contain hundreds or even thousands of values from the source field, but the sets themselves can only have these two values. So if we use a FIXED Level of Detail expression and count the distinct values, the result will be either 1 or 2. If the set is empty, the value for every record will be OUT, so the result of the LOD will be 1. Similarly, if the set contains all values, the value for every record will be IN, so the result of the LOD will still be 1. But if some values are in the set (IN) and other values are not in the set (OUT), then the result of the LOD will be 2 (IN and OUT).

So the first part of this calculated field ( FIXED : COUNTD([Sub-Category Set])}=1) will be true for all records when the set is empty, or if it contains all values. The second part of this calculated field (OR [Sub-Category Set]) will only be true for records in the set. So when we start with an empty set, the overall result of this calculated field will be True for every record, so everything will be included in our line chart. As soon as we add a value to our set, the first part becomes false for every record, but the second part becomes true for the values in our set. Because we are using an OR operator, the overall result, once we click on some values, will be true for the selected values and false for the rest of the values.

Next, let’s look at the Set Control options. We are starting with an empty set. Each time we click on a new mark, that value will be added to our set. That’s what the “Add values to set” option does. Unlike the “Assign values to set”, it does not override the set, it just adds new values to the existing ones. And then when we click on some white space in the sheet, or the last selected value, the set will go back to being empty. That’s what the “Remove all values from set” option does.

And just like in the previous example, I would recommend using the “Highlight Technique” covered here, and then adding the [SubCat Filter] field to color on the bar chart (Sales by SubCat).

And now your dashboard should look and function like this. Notice that you no longer need to CTRL click, or click and drag, to select multiple values. Nice!

Parameter Action

This method is by far the most complex, but if done correctly, it provides a really smooth user experience. The thing that I like most about this approach is that you can set it up so that you can “toggle” values in and out of the filter. There are a few extra steps in this approach, and some somewhat complex calculations. We need to create our parameter, create a calculated field that will be passed to that parameter, add that field to Detail on our bar chart, create a calculated field for our filter, add that filter to our line chart, and add the dashboard action that will update the parameter.

  • Create a string parameter called [SubCat Select] and set the “Current value” to the pipe character “|”
  • Create a calculated field called [SubCat String]
    • IF CONTAINS([SubCat Select], ‘|’ + [Sub-Category] + ‘|’) THEN REPLACE([SubCat Select],’|’ + [Sub-Category] + ‘|’,’|’)
    • ELSE [SubCat Select] + [Sub-Category] + ‘|’
    • END
  • Go to the sheet that will drive the parameter action and drag the [SubCat String] field to Detail. In this example, that is the “Sales by SubCat” sheet
  • Create a calculated field called [SubCat String Filter]
    • [SubCat Select]=’|’ OR CONTAINS([SubCat Select],’|’+[Sub-Category]+’|’)
  • Add the filter to the sheet that will be affected by the action (Sales by Quarter).
    • Drag the [SubCat String Filter] field to the filter shelf on the “Sales by Quarter” worksheet
    • Filter on “True”
  • Add the dashboard action to update the [Sub-Category Set]
    • Navigate back to the dashboard
    • Go to Dashboard > Actions and click “Add Action”
    • Select “Change Parameter” from the list of options
    • Give your Action a descriptive Name (ex. SubCat Parameter Filter)
    • Under “Source Sheets” select the worksheet that will drive the action. In this example it is our bar chart “Sales by SubCat”.
    • Under “Target Parameter” select the parameter that was set up in the previous steps. In this example it is [SubCat Select]
    • Under “Source Field” select the [SubCat String] Field
    • Under “Aggregation”, leave set to “None”
    • Under “Run action on” choose “Select”
    • Under “Clearing the selection will” choose “Keep Current Value”
    • Click “OK”

When you’re finished, your “Add Parameter Action” box should look like this

Alright, let’s talk through some of those calculations, starting with the [SubCat String] Field. Basically, what this calculation is doing is building and modifying a concatenated string of values. Here’s that calculation again.

IF CONTAINS([SubCat Select],’|’ + [Sub-Category] + ‘|’) THEN REPLACE([SubCat Select],’|’ + [Sub-Category] + ‘|’,’|’)
ELSE [SubCat Select] + [Sub-Category] + ‘|’
END

We set up our parameter to have a default value of just the pipe character (“|”). I’ll explain the use of the pipes a little later on. The first line of the calculation looks to see if the selected value is already contained in our string. So it’s searching our entire concatenated string for the pipe + selected value + pipe (ex. |phones|). If it finds that value in our string, it will replace it with just a pipe. So for our example, if our parameter value was |phones|binders|storage| and we click on phones, it will replace “|phones|” with a pipe, leaving “|binders|storage|”

The second line of this calculation will add the selected value. The calc has already tested to see if it’s there in the previous step, and if it’s not, this line will add it along with a pipe. Now let’s look at our parameter action…in action.

Take a look at the parameter at the top. As we click on each Sub-Category, its added to our string. Now look what happens when we click on those selected marks again.

As we click on each previously selected mark, that mark is removed from our string, until we’re left with just our starting value, the pipe.

The reason I use pipes on both sides, instead of just a comma separated list is to avoid situations where one potential value could be equal to a part of another value (ex. Springfield and West Springfield). The pipes ensure we are matching exact values in the next step, our filter calculation. Here’s that filter calculation again.

[SubCat Select]=’|’ OR CONTAINS([SubCat Select],’|’+[Sub-Category]+’|’)

This is pretty similar to the filter that we created in our Set Controls example. The first part is checking to see if our concatenated string is “empty”, meaning nothing has been selected yet and it’s just a pipe. If that’s the case, the result of this calculated field will be true for every record. The second part of the calculated field is checking the [Sub-Category] field in each record and seeing if that value is contained within our parameter string (along with the starting and ending pipe). If it is, the result of this calculation will be True for those records and False for all of the other records.

When using this method, I would highly, highly, highly recommend using the “Filter Action” technique for disabling the default highlighting. You can find it here. This technique not only masks the selection, but it also automatically de-selects a mark after you click on it. That is really important when using this method. Once you add that Filter Action, just add the [SubCat String Filter] field to color, and you should be good to go. Here’s what it looks like in action.

And there you have it. Four different methods for filtering your dashboards on multiple selections. As I mentioned before, some of these are much more complex than others, but they provide a much better user experience. In my opinion, it’s worth it to put in the extra time and effort to save your users frustration later on. I hope you enjoyed this post, and keep an eye out for more installments of “It Depends”.

Categories
How-To's Tableau Techniques Totally Useless Charts

Totally Useless Charts & How to Build Them – Lotus Flowers

Welcome to our new series, Totally Useless Charts & How to Build Them. In each installment of this series we’ll look at one very custom chart, something with almost no real use cases, and we’ll walk through, step by step, how to build it. The purpose of this series isn’t necessarily to teach you how to build these specific useless charts, it’s more about talking through the techniques, the approach, and the thought process behind each chart. Our hope is that seeing how we went about building these will help you with your own custom charts. But if you do somehow find a great use case for one of these charts, by all means, please download the workbook and use it as your own.

In this first installment we’re going to learn how to build Lotus Flowers in Tableau. It’s not a requirement, but it may be a good idea to review Part 1 and Part 2 of the Fun With Curves Series before proceeding. To follow along, you can download the workbook here, and the data here.

Lotus Flower

First, let’s take a look at what we’re trying to build. Below is a lotus flower with 10 petals, which means we have a total of 11 polygons; 1 circle and 10 petals. The circle is fairly easy to build using the techniques in Part 1 mentioned above. The petals are a little more complicated. But first thing’s first…we need some data.

Building Your Data Source

Let’s start with our data. For this example we’re going to build 12 lotus flowers and we’re going to use the value from our data source to size the flowers appropriately. We’ll start with the tab titled ‘Source Data’.

Next, we’re going to do some densification to get the number of polygons needed for each of the flowers. Below I have 1 record for the Circle and 24 records for the petals. We’re going to build this in a way that will let you choose how many petals you want to display (up to 24). This data can be found in the ‘Polygons’ tab in the sample data.

Now we’re going to join these two sources together using a join calculation (value of 1 on each side). The result will be 25 records for each of our 12 ‘Base’ records.

Next, we need to do a little more densification, but this time it’s a little trickier. For our circle, we want at least 50 points for a relatively smooth curve. For our petals, we actually need to draw 2 lines for each petal, one for the left side of the petal (Min) and one for the right side of the petal (Max), and then join those together. Pretty confusing right? We’ll talk about this in a lot more detail. This table is a little too large to include a screenshot, but take a look at the ‘Densification’ tab in the sample data.

For our circles, we have 50 records. We have two numerical fields, [Points] and [Order], that both run from 1 to 50 and a [Type] field to identify that these points are for our circles. For our petals, we have 100 records. We still have the same two numerical fields, but the values are a little different. We have an [Order] field that runs from 1 to 100, and a [Points] field that runs from 1 to 50 and then back down from 50 to 1. We also have a [Side] field with values of Min or Max. The Min records will be used to draw the left side of our petals. The Max records will be used to draw the right side of our petals. And then we have a [Type] field to identify that these records are for our petals. Now we just need to join this table to our data source on the [Type] Field.

Building Your Circles

If you have read through Part 1 of the Fun With Curves series, then you may remember that in order to draw a circle in Tableau, we only need 2 inputs; the distance of each point from the center of the circle (the radius), and the position of each point around the circles (represented as a percentage).

Let’s start with the first input, the radius. We are going to size our circles based on the Value field in the Source Data. We want the area of our circles to represent the value in the data. So we have the area of each circle, we just need to use those values to calculate the radius of each circle. We can do this with the simple calculation below.

Radius = SQRT([Value]/PI())

Next, we need to calculate the position of each point around the circle. I’m not going to go into too much detail on this, but you can read more about it in the post mentioned above. To calculate this, we need the maximum number of points for our Circles (50), and we need the [Points] field (values 1 thru 50). For the max point calculation I am going to use an LOD because the max number of points for our circles, may not always align with the max number of points in our data source (but in this case it does).

Max_Point = {FIXED [PolygonType] : MAX([Points])}

Circle_Position = ([Points]-1)/([Max_Point]-1)

Next, we just need to plug the [Radius] and the [Circle_Position] values into our X and Y formulas for plotting points around a circle.

Circle_X = [Radius]* SIN(2*PI() * [Circle_Position])

Circle_Y = [Radius]* COS(2*PI() * [Circle_Position])

Now, let’s draw our circles

  • Right click on [Circle_X] and drag it to columns. When prompted, choose [Circle_X] without any aggregation
  • Right click on [Circle_Y] and drag it to rows. When prompted, choose [Circle_Y] without any aggregation
  • Right click on [Base_ID], change it to a Dimension, and drag it to Detail
  • Right click on [Order], change it to a Dimension, and drag it to Path
  • Drag [Type] to Filter Shelf and filter to ‘Circle’
  • Change the Mark Type to Polygon

Now you should have something that looks like this.

Although it looks like one big circle, we actually have all 12 circles in this view. They’re just stacked on top of each other. So next we need to space these out a little bit. There are a lot of different techniques to do this, but here’s one I like to use to create Trellis Charts. This technique works great when you have a sequential ID field, which we do (Base_ID).

First, we’re going to create a numeric parameter that will allow us to choose the number of columns we want to create. We’ll call the parameter [Column Count] and set the value to 3. Next, we’re goin to use the [Base_ID] field to break our circles into columns and rows, starting with row.

Row = CEILING([Base ID]/[Column Count])

Column = [Base ID]-(([Row]-1)*[Column Count])

Now right click on both of these fields, change them to Dimensions, and then drag them to the appropriate shelf (Row to Rows, Column to Columns). The result should look something like this.

Building Your Petals

Alright, so this part is a little more complicated. I’m going to start by reviewing the basics of how you build these shapes, but I’m going to skim over the calculations since those will change significantly once we try to build these petals around our circle. No need to follow along with the workbook during this section.

So here are the basics. Let’s start by drawing a Bezier Curve with 4 control points. Our line is going to start at 0,0 and end at 5,10. Wow, this is easy, we already have the coordinates for 2 of the points!

Let’s take a look at our inputs. Our line will have a height of 10 and a width of 5. I’ve also built 2 parameters that we’ll use to calculate the 2nd and 3rd set of points. You can experiment with different values here, but these seem to work pretty well. We need a total of 8 values (4 sets of X and Y).

  • Point 1 will be the start of the line. In this case, it’s 0,0
  • Point 2 will appear on the same X axis as Point 1, but will be somewhere between the start and end of the line on the Y axis. I like to place it two thirds of the way, or .67 (the value in the P2 Parameter Input above). So the coordinates for Point 2 will be 0 and 6.7 (P2 Parameter x Height of the line)
  • Point 3 will appear on the same X axis as Point 4, and will appear somewhere between the start and end of the line on the Y axis (similar to P2). I like to place it halfway, or .5 (the value in the P3 Parameter Input above). So the coordinates for Point 3 will be 5 and 5 (P3 Parameter x Height of the line).
  • Point 4 will be the end of the line. In this case, it’s 5,10

If you were to plot these 4 points, you would have a jagged line like you see in the image above. But look what happens when we plug those values into our Bezier calculations

X = (1-[T])^3*[P1_X] + 3*(1-[T])^2*[T]*[P2_X] + 3*(1-[T])*[T]^2*[P3_X] + [T]^3*[P4_X]

Y = (1-[T])^3*[P1_Y] + 3*(1-[T])^2*[T]*[P2_Y] + 3*(1-[T])*[T]^2*[P3_Y] + [T]^3*[P4_Y]

Alright, we are halfway there! Kind of. So now we have a line that will create 1/2 of one of our petals. But in order to turn this into a petal shaped polygon, we need another line that’s a mirror image of this one.

This is where the Min and Max records come in. We need to calculate our 4 sets of coordinates for both sides. Luckily, most of the values are actually the same. P3 and P4 are going to be identical for both lines. And the Y values for P1 and P2 are the same. The only differences are the X values for P1 and P2. And to calculate those we just add the width of the whole petal (width x2) to our starting point. And if we were to plug these coordinates into the same calculations, we have this.

Now this is where the [Order] field comes into play. To make this one single polygon instead of two separate lines, we can use the [Order] field on Path and change the Mark Type to polygon

On the Left (Min) side, we have points 1 thru 50, running from the bottom left up to the top middle. On the right (Max) side, we have points 1 thru 50 running from the bottom right up to the top middle. But then we have the [Order] field (on label in the image above). This field runs from the bottom left to the top middle to the bottom right, in one continuous flow, from value 1 to 100. This is what makes it a single continuous polygon.

Ok, so that’s how we would build 1 single petal shaped polygon, perfectly positioned facing directly upward. But that’s not what we’re trying to do. We’re trying to build a dynamic number of petals, evenly spread around a circle and facing outward in the appropriate directions. So let’s do that.

Building Your Petals (for real this time)

We’re going to use a similar approach to what was described above, but everything needs to be plotted around a circle. So any calculations we use to determine our 4 points are going to have to run through those X and Y calculations for plotting points around a circle. That means, for all of our points, P1 thru P4 for both sides, we need to calculate two things; the distance from the center of the circle, and the position around the circle. But before we calculate those specific points there are a few other calculations that we’ll need. We also need a parameter, called [Petal Count] that will allow us to select how many petals we want. This should be a numeric parameter, and let’s set the value to 10 (I recommend using a range, allowing values from 4 to 24)

T = ([Points]-1)/([Max_Point]-1) – this is the same as the [Position] calc used earlier. It’s used to evenly space points between 0% and 100%

Petal_Count_Filter = [Polygon ID]<=[Petal Count] – this will be used as a filter to limit the number of petals displayed to what is selected in the [Petal Count ] parameter

Petal_Width = 1/[Petal Count] – this calculates the total position around the circle that will be occupied by each petal. For example, if there were 10 petals, each one would occupy 1/10 of the space around the circle, or .10

Petal_Side_Position = IF [Side]=’MIN’ THEN ([Polygon ID]-1)*[Petal_Width] ELSE [Polygon ID]*[Petal_Width]
END
– this calculates the position of the start of each Min line and Max line. If there were 10 petals, the Min side of the 2nd petal would be at position .1 or (2-1)*.1, and the Max side of the petal would be at position .2, or 2*.1. The Min value will share the same position as the Max value of the previous petal. The max value will share the same position as the Min value of the next petal

Petal_Middle_Position = ([Polygon ID]/[Petal Count]) – ([Petal_Width]/2) – this calculates the position of the center of each petal. If there were 10 petals, the center of petal 3 would be at .25, or (3/10) – (.1/2). This is also halfway between the position of the Min line and the Max line.

Alright, now we can calculate all of our coordinates. Let’s start with P1. For the first input, we want this point to start right at the edge of our circle. So the distance from the center is going to be equal to the radius of the inner circle. So the first input is just [Radius]. For the second input, we’ll use the the [Petal_Side_Position] we calculated above.

P1_X = [Radius]* SIN(2*PI() * [Petal_Side_Position])

P1_Y = [Radius]* COS(2*PI() * [Petal_Side_Position])

If we were to plot these points for 12 petals, we would end up with 24 points, but it would appear that we only 12 because each is overlapping with another point. But this gives us the outside edges of each of our petals

Now onto P2. This one is a little more complicated. We’re going to use P1 as a starting point for this calculation, instead of the center of the inner circle. Now we need to calculate the distance from P1 where we want our next point to appear. First, we need to determine the length of the entire petal. I like to use a parameter for this so I can dynamically adjust the look of the flowers. So let’s create a parameter called [Petal_Length_Ratio]. This is going to be a number relative to the radius, so a ratio of 1 would set the length of the petal equal to the radius of the circle. A value of .8 would set the length of the petal equal to 80% of the radius, and so on. I usually go with a value somewhere between .5 and 1. We’ll use this along with the radius, so that the petals of each flower are sized appropriately based on the size of their inner circle. Next, we need to position this point somewhere between the start of the line and the end of the line. As I mentioned earlier, I like to place it two thirds of the way (P2_Parameter from the previous section). So the first input, the distance from P1, is going to be the radius x the length ratio x the P2 parameter. For the second input, we’re going to use the [Petal_Middle_Position] because we want this side of the line to follow the same path as the line with P3 and P4. If we were to use the [Petal_Side_Position] field, we would end up with really wide, strange looking petals. This will probably make more sense a little further along. For now, let’s plug those values into our X and Y calcs.

P2_X = [P1_X] + (([Radius] * [Petal_Length_Ratio] * [P2_Parameter]))* SIN(2*PI() * [Petal_Middle_Position])

P2_Y = [P1_Y] + (([Radius] * [Petal_Length_Ratio] * [P2_Parameter]))* COS(2*PI() * [Petal_Middle_Position])

P3 is a little more straight forward. For the first input, we’re going to calculate the distance from the center of the inner circle. And then we’ll use a similar approach to what we did for P2. The first input will be the radius + (the radius x the length ratio x the P3 parameter). As I mentioned in the earlier section, I like to set this parameter to .5. And once again, we’re going to use the [Petal_Middle_Position] field for the second input.

P3_X = ([Radius]+([Radius] *[Petal_Length_Ratio] * [P3_Parameter]))* SIN(2*PI() * [Petal_Middle_Position])

P3_Y = ([Radius]+([Radius] *[Petal_Length_Ratio] * [P3_Parameter]))* COS(2*PI() * [Petal_Middle_Position])

P4 is almost identical to P3, except we don’t need the length ratio. We want this point to appear at the end of the line. So we can just remove that from the calc.

P4_X = ([Radius]+([Radius] * [Petal_Length_Ratio])) SIN(2*PI() * [Petal_Middle_Position])

P4_Y = ([Radius]+([Radius] * [Petal_Length_Ratio])) COS(2*PI() * [Petal_Middle_Position])

We’re almost there! If we were to plot these points for the first petal in our lotus flower, it would look like this. It looks very similar to what we reviewed in the previous section, but with one very important difference…everything is at an angle…which is what we wanted.

All that’s left to do is to plug all of these points in our Bezier calcs and then build our polygons!

Petal_X = (1-[T])^3*[P1_X] + 3*(1-[T])^2*[T]*[P2_X] + 3*(1-[T])*[T]^2*[P3_X] + [T]^3*[P4_X]

Petal_Y = (1-[T])^3*[P1_Y] + 3*(1-[T])^2*[T]*[P2_Y] + 3*(1-[T])*[T]^2*[P3_Y] + [T]^3*[P4_Y]

Now the polygons. Let’s build this as a Trellis chart, just like we did with the Circles. So drag [Row] on to Rows and [Column] onto Columns. And then;

  • Right click on [Petal_X] and drag to Columns. When prompted, select [Petal_X] without aggregation
  • Right click on [Petal_Y] and drag to Rows. When prompted, select [Petal_Y] without aggregation
  • Drag [Type] to Filter Shelf and filter to ‘Petal’
  • Drag [Petal_Count_Filter] to Filter Shelf and filter to TRUE. Right click and ‘Add to Context’
  • Drag [Polygon_ID] to Detail
  • Drag [Order] to Path
  • Change Mark Type to Polygon

We’re so close! Your sheet should look like this

The only thing left to do is to combine the Circle polygons with the Petal Polygons. We have separate data for them, all we need to do is get them on the same sheet. So we’ll create two more simple calcs to bring it all together.

Final_X = IF [Type]=’Circle’ THEN [Circle_X] ELSE [Petal_X] END

Final_Y = IF [Type]=’Circle’ THEN [Circle_Y] ELSE [Petal_Y] END

Now just replace [Petal_X] and [Petal_Y] with [Final_X] and [Final_Y] and drag [Type] from the filter shelf on to Color and you should have your lotus flowers!

The Final Touches

The hard part is done, now to make it look pretty. Play around with some of the parameters until you get the look that you like. Adjust the [Petal Count], the [Column_Count], the [Petal_Length_Ratio], and even the [P2_Parameter] and [P3_Parameter] if you wanna get crazy.

Next, throw some color on there. You could make the color meaningful to encode some data, or you could do what I just did and color it randomly. I used the calc below and then just assigned one of the color palettes I have saved.

Color = [Type] + STR([Base ID])

And that’s it! If you made it this far, please reach out and let me know what you thought, and what you came up with. Thank you so much for reading, and keep an eye on the blog for more ‘Totally Useless Charts & How to Build Them’