I know you’re probably still kicking yourself for not choosing the perfect bracket this year and for missing out on the big bucks. Want to dominate your March Madness office pool next year?! Use the step-by-step instructions in this blog to follow the process I used to merge college basketball data sets together in Oracle Data Visualization (DV) Desktop to build intuitive visualizations that predict results of NCAA tournament basketball games.
After finding basic college basketball data online by team and uploading it to DV Desktop, I created a new Data Flow in order to combine data sets for the offensive and defensive performance for each team. Merging this data into one data set made it easier to report in a DV Project:
The first step to create this Data Flow was to import the two data sets. The next step was to add a filter that contains the data for the teams that made the NCAA tournament. After that, I took the team name, FG %, 3PT %, Total Points, Total Points Against, FG % Against, and 3PT % Against. This keeps the data clean and simple for building new calculations.
The next step of the Data Flow creation was to build three new columns:
- Total Points Diff- Team Total Points – Total Points Against
- FG % Diff – Team FG % – FG % Against
- 3PT % Diff – Team 3PT % – 3PT % Against
This information now provides a quick and easy way to see where teams rank in relation to a given statistic. If there is a major difference in these new metrics, this could demonstrate that a team is far more advanced at the statistic offensively than they are defensively, or it could mean that they are above that metric on both sides of the ball.
After completing the build of these new metrics, I saved the Data Flow and the data set that was created with this Data Flow. The next step was to bring this data set into a new Project, and then to evaluate the data to find outliers to predict upsets in the matchups.
For example, one easy way to compare all of the teams by FG % and 3PT % is to create a scatter plot comparing how a team performed offensively and defensively related to these statistics. The visualization below shows a comparison of both the team’s offensive and defensive FG and 3PT %:
At first look, I saw that there were a handful of outliers in both data sets. The far-right circle in the FG % scatter plot (top left) is Gonzaga University. This shows that not only do they have the highest FG % but they also seem to be one of the better teams defensively given that they are lower than several other circles (teams).
One other outlier I saw is the far-right circles in the scatter plot (the pink circles). This compares the 3PT % for a team and what they give up defensively. The circle that is in the bottom right of the scatter plot is UVA. This shows that UVA has a very high 3PT %. Even more impressively, they also are much further down in the scatter plot, demonstrating that they are also one of the best teams defending the three-point shot.
The ease of use and power of the solution makes DV Desktop a no-brainer for simple data mashups and analytics. After roughly 30 minutes of effort, I had a working data set that provided very powerful visualizations to analyze a basketball data set.
Who knows?! By using these steps and analyzing this data, you might even be able to predict a perfect bracket next year!
Need more help with Oracle DV or with your fantasy sports brackets? Email the Performance Architects team at firstname.lastname@example.org or leave a note below and we’ll get in touch!