TAGS

Recent Posts

Archives

How to Use Sample Data Sets: Step Four in Learning R Programming for Free
Posted on October 10, 2018
Author: Linda Stewart, Performance Architects

I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments to provide some helpful background:

In this segment, we will introduce sample data sets we can use in our learning.

There are a couple of ways to do this.  One way is to build a data set on the console.  We can also play with the data sets that come with R.  To see the data sets that come with R, type: “data()”:

As we see, typing “data()” at the Console prompt lists a rather long list of data sets.

If we want to see the contents of one of the data sets, type its name at the “R prompt.”

Let’s look at the “CO2” data set (data frame):

> CO2

Plant        Type  Treatment conc uptake
1    Qn1      Quebec nonchilled   95   16.0
2    Qn1      Quebec nonchilled  175   30.4
3    Qn1      Quebec nonchilled  250   34.8
4    Qn1      Quebec nonchilled  350   37.2
5    Qn1      Quebec nonchilled  500   35.3
6    Qn1      Quebec nonchilled  675   39.2
7    Qn1      Quebec nonchilled 1000   39.7
8    Qn2      Quebec nonchilled   95   13.6
9    Qn2      Quebec nonchilled  175   27.3
10   Qn2      Quebec nonchilled  250   37.1
11   Qn2      Quebec nonchilled  350   41.8
12   Qn2      Quebec nonchilled  500   40.6
13   Qn2      Quebec nonchilled  675   41.4
14   Qn2      Quebec nonchilled 1000   44.3
15   Qn3      Quebec nonchilled   95   16.
16   Qn3      Quebec nonchilled  175   32.4
17   Qn3      Quebec nonchilled  250   40.3
18   Qn3      Quebec nonchilled  350   42.1
19   Qn3      Quebec nonchilled  500   42.9

Above are the first 19 lines of the 84 lines of the data set.

This is a good time to introduce the “head()” function which returns the first or last parts of a vector, matrix, table, data frame or function.  “CO2” is a data frame.  A data frame can be thought of as a table or two-dimensional array.

> head (CO2)

Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2

If we have a large data set, we can use “head()” or “tail()” to have a smaller data set to work with while we are setting up our analysis.

Here is how to access the conc value on our first row of data:

> a <- CO2$conc[1]
> a
[1] 95

Here is how to refer to the data set elements:

The $ sign tells the interpreter we are going to access a certain column in the data frame.

To see all values in the “conc” column:

> a <- CO2$conc
> a
[1]   95  175  250  350  500  675 1000   95  175  250  350  500  675 1000   95  175  250  350
[19]  500  675 1000   95  175  250  350  500  675 1000   95  175  250  350  500  675 1000   95
[37]  175  250  350  500  675 1000   95  175  250  350  500  675 1000   95  175  250  350  500
[55]  675 1000   95  175  250  350  500  675 1000   95  175  250  350  500  675 1000   95  175
[73]  250  350  500  675 1000   95  175  250  350  500  675 1000

In addition to pre-built data sets, there are pre-built mathematical objects.  In a previous segment, we assigned 3.14 to a variable named pi but, pi is actually built in.  If we restart a fresh session or clear the workspace and type, we see a more precise definition of pi that is already available for us to use:

> pi
[1] 3.141593

We’ll wrap up this blog post by discussing a few rules around variable naming.

There are a few rules we should know around naming variables in R.  The basic rules are:

  • Variables start with a letter and they cannot contain spaces; use underscores to replace spaces for readability
  • We can use letters, numbers and the underscore when building variable names
  • We also want to avoid using variable names of objects that are already defined in R (like “pi”), because that could cause confusion. Also, we do not want to reuse reserved words like “package” or “data.”

In our next installment, I will discuss how to bring your own data sets into R.

Additional blog posts on more complex R concepts to follow; please contact communications@performancearchitects.com if you have any questions or need further help!

Share
This post was posted in Technical and tagged Business Intelligence , Data Science , Installing R , Learning R , R Programming .
© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Leave a Reply

Your email address will not be published. Required fields are marked *