TAGS

Recent Posts

Archives

How to Use Your Own Data Sets: Step Five in Learning R Programming for Free
Posted on October 31, 2018
Author: Linda Stewart, Performance Architects

I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments to provide some helpful background.

Step One
Step Two
Step Three
Step Four

In the previous installment, we learned about data frames and how to access elements and columns in a pre-built data set. In this installment, we will explore bringing in our own data sets. We will then explore some of the mathematical operators we would want to use to do calculations on values in our data set.

Some say climate change is the biggest threat of our age, while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

R has a function for reading character delimited files, “read.csv:”

mydata <- read.csv(file=”c:/mycsvfile.csv”, header=TRUE, sep=”,”)

Here, we see that the defaults for “read.csv” are “header row, set header=TRUE” and “sep” to the separator in the data, with the default as a comma.

On my computer, I happen to have a CSV file on climate change from Data.World where I extracted the temperatures for the USA since the 1700s.   Let’s load it:

> globallandtemp <- read.csv(“C:/GlobalLandTemperaturesByCountryUSA.csv”)

Print out a sample to examine our data:

> head(globallandtemp)

dt AverageTemperature AverageTemperatureUncertainty       Country
1 1768-09-01             15.420                         2.880 United States
2 1768-10-01              8.162                         3.386 United States
3 1768-11-01              1.591                         3.783 United States
4 1768-12-01             -2.882                         4.979 United States
5 1769-01-01             -3.952                         4.856 United States
6 1769-02-01             -2.684                         3.311 United States

Our temperatures are in Celsius.  Let’s convert them to Fahrenheit.

Our formula to make the conversion is:

f = (9/5) * celsius + 32

So, if we want to see “AverageTemperature” for September 1768 in Fahrenheit degrees, we can write:

> f <- (9/5) * globallandtemp$AverageTemperature[1] + 32
> f
[1] 59.756

A bit chilly!  So, now we know the basics of adding, multiplying and dividing.  Just remember your order of
operations and place parentheses so you get the expected results.

If we want to apply the formula to the entire data set:

> all_f <- (9/5) * globallandtemp$AverageTemperature + 32

If you happen to print “all_f” to the console, you will see many “NA” values.  This means the value in the data set is NULL.  When we get to table functions in a future blog, I will explain how to handle this (for instance, if we wanted to apply a function to get the annual mean temperature by year). In a future installment, I will introduce the mutate and select functions to allow the data set to be extended and to trim down the columns in the data set.

In our next installment, we will discuss data types in R.

Additional blog posts on more complex R concepts to follow; please contact communications@performancearchitects.com if you have any questions or need further help!

Share
This post was posted in Business and tagged BI , Business Intelligence , Data Science , Installing R , Learning R , R Programming .
© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Leave a Reply

Your email address will not be published. Required fields are marked *