How to Upload a Csv File in R

How to Work With Data Frames and CSV Files in R — A Detailed Introduction with Examples

Welcome! If you lot want to start diving into information science and statistics, then data frames, CSV files, and R will be essential tools for you. Let's meet how you can employ their amazing capabilities.

In this commodity, you volition learn:

  • What CSV files are and what they are used for.
  • How to create CSV files using Google Sheets.
  • How to read CSV files in R.
  • What Data Frames are and what they are used for.
  • How to admission the elements of a information frame.
  • How to change a data frame.
  • How to add and delete rows and columns.

We will utilize RStudio, an open-source IDE (Integrated Evolution Environment) to run the examples.

Let's begin! ✨

🔹 Introduction to CSV Files

CSV (Comma-separated Values) files can exist considered one of the building blocks of data analysis because they are used to store data represented in the form of a tabular array.

In this file, values are separated by commas to represent the dissimilar columns of the table, like in this example:

image-153
CSV File

Nosotros will generate this file using Google Sheets.

🔸 How to Create a CSV File Using Google Sheets

Let's create your first CSV file using Google Sheets.

Pace ane: Go to the Google Sheets Website and click on "Go to Google Sheets":

image-227

💡 Tip: You lot can access Google Sheets by clicking on the push button located at the top-right edge of Google'southward Home Page:

image-228

If nosotros zoom in, we see the "Sheets" button:

image-156

💡 Tip: To employ Google Sheets, you need to take a Gmail account. Alternatively, y'all can create a CSV file using MS Excel or another spreadsheet editor.

Y'all volition run into this panel:

image-157

Pace 2: Create a blank spreadsheet by clicking on the "+" button.

image-158

Now you lot have a new empty spreadsheet:

image-159

Step 3: Change the name of the spreadsheet to students_data. Nosotros will demand to use the proper name of the file to work with data frames. Write the new name and click enter to confirm the change.

image-162

Step 4: In the start row of the spreadsheet, write the titles of the columns.

image-160

When you import a CSV file in R, the titles of the columns are called variables. We will define half-dozen variables: first_name, last_name, age, num_siblings, num_pets, and eye_color, equally you tin can come across right here below:

image-163

💡 Tip: Find that the names are written in lowercase and words are separated with an underscore. This is not mandatory, simply since you lot volition need to access these names in R, it's very common to use this format.

Step 5: Enter the data for each one of the columns.

When you lot read the file in R, each row is chosen an observation, and it corresponds to information taken from an individual, fauna, object, or entity that we collected data from.

In this case, each row corresponds to the information of a educatee:

image-164

Step 6: Download the CSV file by clicking on File -> Download -> Comma-separated values, every bit you tin can see below:

image-165

Step 7: Rename the file CSV file. You will need to remove "Sheet1" from the default name because Google Sheet will automatically add this to the name of the file.

image-169

Slap-up work! Now you accept your CSV file and it'southward time to offset working with it in R.

🔹 How to Read a CSV file in R

In RStudio, the first pace earlier reading a CSV file is making certain that your current working directory is the directory where the CSV file is located.

💡 Tip: If this is not the case, you will need to use the full path to the file.

Change Current Working Directory

You can change your current working directory in this panel:

image-172

If we zoom in, you lot can run across the current path (1) and select the new one by clicking on the ellipsis (...) push button to the right (ii):

image-171

💡 Tip: Y'all tin also check your current working directory with getwd() in the interactive console.

Then, click "More" and "Prepare As Working Directory".

image-175

Read the CSV File

Once you have your electric current working directory gear up, you can read the CSV file with this command:

image-176

In R code, nosotros have this:

                > students_data <- read.csv("students_data.csv")              

💡 Tip: We assign it to the variable students_data to access the data of the CSV file with this variable. In R, nosotros can separate words using dots ., underscores _, UpperCamelCase, or lowerCamelCase.

After running this command, you will see this in the meridian right panel:

image-177

Now you take a variable defined in the environment! Let's run across what data frames are and how they are closely related to CSV files.

🔸 Introduction to Data Frames

Data frames are the standard digital format used to store statistical data in the class of a tabular array. When you read a CSV file in R, a data frame is generated.

Nosotros can confirm this by checking the type of the variable with the grade function:

                > class(students_data) [1] "information.frame"              

It makes sense, right? CSV files comprise data represented in the grade of a tabular array and information frames correspond that tabular data in your lawmaking, so they are deeply connected.

If yous enter this variable in the interactive panel, you volition run into the content of the CSV file:

                > students_data   first_name last_name historic period num_siblings num_pets eye_color 1      Emily    Dawson  fifteen            two        5      BLUE ii       Rose Patterson  14            5        0     Greenish 3  Alexander     Smith  16            0        2     BROWN 4       Nora    Navona  16            4       10     GREEN 5       Gino      Sand  17            3        8      BLUE              

More Information Nigh the Data Frame

You have several unlike alternatives to see the number of variables and observations of the data frame:

  • Your get-go option is to await at the elevation right console that shows the variables that are currently divers in the environs. This information frame has v observations (rows) and 6 variables (columns):
image-178
  • Another alternative is to use the functions nrow and ncol in the interactive console or in your programme, passing the data frame as argument. Nosotros get the same results: 5 rows and six columns.
                > nrow(students_data) [1] 5 > ncol(students_data) [1] 6              
  • You can also see more information almost the data frame using the str function:
                > str(students_data) 'data.frame':	5 obs. of  6 variables:  $ first_name  : Factor w/ 5 levels "Alexander","Emily",..: 2 5 1 4 iii  $ last_name   : Cistron due west/ 5 levels "Dawson","Navona",..: one 3 5 2 4  $ age         : int  xv 14 xvi 16 17  $ num_siblings: int  ii 5 0 4 3  $ num_pets    : int  v 0 2 10 eight  $ eye_color   : Factor w/ 3 levels "BLUE","BROWN",..: one iii 2 iii 1              

This function (applied to a data frame) tells you lot:

  • The number of observations (rows).
  • The number of variables (columns).
  • The names of the variables.
  • The information types of the variables.
  • More than information about the variables.

You can see that this part is really swell when y'all want to know more than about the data that yous are working with.

💡 Tip: In R, a "Factor" is a qualitative variable, which is a variable whose values correspond categories. For case, eye_color has the values "Blue", "Brown", "GREEN" which are categories, and so as you can see in the output of str above, this variable is automatically defined every bit a "factor" when the CSV file is read in R.

🔹 Information Frames: Key Operations and Functions

Now yous know how to meet more information nearly the data frame. But the magic of data frames lies in the amazing capabilities and functionality that they offer, so let's come across this in more than detail.

How to Access A Value of a Data Frame

Information frames are like matrices, so you can access individual values using ii indices surrounded by square brackets and separated by a comma to indicate which rows and which columns you lot would like to include in the effect, like this:

image-181

For example, if we want to access the value of eye_color (column 6) of the fourth educatee in the data (row 4):

image-182

We demand to utilise this command:

                > students_data[4, vi]              

💡 Tip: In R, indices start at one and the first row with the names of the variables is not counted.

This is the output:

                [1] GREEN Levels: BLUE Chocolate-brown GREEN              

You can run across that the value is "GREEN". Variables of type "factor" have "levels" that represent the different categories or values that they can take. This output tells us the levels of the variable eye_color.

How to Access Rows and Columns of a Data Frame

Nosotros can also use this syntax to access a range of rows and columns to get a portion of the original matrix, similar this:

image-179

For example, if we desire to get the historic period and number of siblings of the third, fourth, and fifth student in the listing, we would apply:

                > students_data[3:5, 3:4]    age num_siblings 3  16            0 4  sixteen            4 5  17            3              

💡 Tip: The basic syntax to ascertain an interval in R is <start>:<end>. Notation that these indices are inclusive, so the third and fifth elements are included in the example above when nosotros write 3:5.

If we want to become all the rows or columns, nosotros simply omit the interval and include the comma, like this:

                > students_data[3:five,]    first_name last_name age num_siblings num_pets eye_color 3  Alexander     Smith  16            0        2     Chocolate-brown four       Nora    Navona  16            4       10     GREEN 5       Gino      Sand  17            three        eight      Blueish              

We did not include an interval for the columns after the comma in students_data[iii:v,], so we go all the columns of the data frame for the iii rows that we specified.

Similarly, nosotros can get all the rows for a specific range of columns if we omit the rows:

                > students_data[, 1:three]    first_name last_name age 1      Emily    Dawson  fifteen 2       Rose Patterson  14 3  Alexander     Smith  xvi 4       Nora    Navona  16 v       Gino      Sand  17              

💡 Tip: Notice that you still need to include the comma in both cases.

How to Access a Column

At that place are three ways to access an unabridged column:

  • Pick #1: to access a column and return it equally a data frame, you can utilise this syntax:
image-184

For instance:

                > students_data["first_name"]    first_name 1      Emily 2       Rose 3  Alexander four       Nora five       Gino              
  • Option #2: to go a cavalcade as a vector (sequence), you lot can use this syntax:
image-185

💡 Tip: Discover the utilize of the $ symbol.

For instance:

                > students_data$first_name  [ane] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              
  • Option #3: You can also use this syntax to get the column every bit a vector (meet below). This is equivalent to the previous syntax:
                > students_data[["first_name"]]  [one] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              

How to Filter Rows of a Data Frame

You tin filter the rows of a data frame to get a portion of the matrix that meets certain conditions.

For this, we use this syntax, passing the condition as the get-go chemical element within square brackets, then a comma, and finally leaving the second chemical element empty.

image-190

For example, to get all rows for which students_data$age > 16, we would use:

                > students_data[students_data$age > sixteen,]    first_name last_name age num_siblings num_pets eye_color 5       Gino      Sand  17            iii        8      BLUE              

Nosotros  get a data frame with the rows that run across this status.

Filter Rows and Choose Columns

Y'all can combine this condition with a range of columns:

                > students_data[students_data$age > 16, three:vi]    age num_siblings num_pets eye_color v  17            3        8      Bluish              

We get the rows that encounter the condition and the columns in the range 3:vi.

🔸 How to Change Information Frames

You tin can modify private values of a data frame, add columns, add rows, and remove them. Let'south see how yous can practice this!

How to Change A Value

To modify an private value of the information frame, you need to use this syntax:

image-191

For example, if we want to change the value that is currently at row 4 and column 6, denoted in blue right here:

image-182

Nosotros need to employ this line of lawmaking:

                students_data[iv, 6] <- "BROWN"              

💡 Tip: You can as well use = every bit the assignment operator.

This is the output. The value was inverse successfully.

image-193

💡 Tip: Remember that the first row of the CSV file is not counted as the get-go row considering it has the names of the variables.

How to Add together Rows to a Data Frame

To add together a row to a information frame, you lot need to apply the rbind role:

image-194

This part takes 2 arguments:

  • The data frame that y'all want to modify.
  • A list with the data of the new row. To create the listing, you can use the listing() function with each value separated by a comma.

This is an example:

                > rbind(students_data, list("William", "Smith", 14, 7, three, "Dark-brown"))              

The output is:

                                  first_name last_name age num_siblings num_pets eye_color i      Emily    Dawson  15            two        v      Blueish 2       Rose Patterson  14            5        0     GREEN three  Alexander     Smith  sixteen            0        2     Brown 4       Nora    Navona  16            4       10     BROWN 5       Gino      Sand  17            3        eight      Blueish 6       <NA>     Smith  14            7        3     Dark-brown              

But wait! A warning message was displayed:

                Warning message: In `[<-.factor`(`*tmp*`, ri, value = "William") :   invalid factor level, NA generated              

And observe the starting time value of the sixth row, it is <NA>:

                6       <NA>     Smith  xiv            7        iii     BROWN              

This occurred because the variable first_name was defined automatically as a cistron when we read the CSV file and factors have fixed "categories" (levels).

You cannot add a new level (value - "William") to this variable unless you read the CSV file with the value FALSE for the parameter stringsAsFactors, every bit shown beneath:

                > students_data <- read.csv("students_data.csv", stringsAsFactors = Faux)              
image-196

Now, if nosotros attempt to add this row, the data frame is modified successfully.

                > students_data <- rbind(students_data, listing("William", "Smith", xiv, 7, 3, "Dark-brown")) > students_data    first_name last_name historic period num_siblings num_pets eye_color 1      Emily    Dawson  fifteen            2        5      Bluish ii       Rose Patterson  14            5        0     GREEN 3  Alexander     Smith  16            0        2     BROWN 4       Nora    Navona  sixteen            four       10     GREEN 5       Gino      Sand  17            3        viii      Blueish 6    William     Smith  14            seven        three     BROWN              

💡 Tip: Annotation that if yous read the CSV file again and assign it to the same variable, all the changes made previously will be removed and you volition encounter the original data frame. You need to add this argument to the first line of code that reads the CSV file and then brand changes to it.

How to Add Columns to a Data Frame

Adding columns to a data frame is much simpler. You need to use this syntax:

image-197

For example:

                > students_data$GPA <- c(four.0, 3.5, 3.ii, 3.15, two.9, 3.0)              

💡 Tip: The number of elements has to be equal to the number of rows of the data frame.

The output shows the information frame with the new GPA column:

                > students_data    first_name last_name age num_siblings num_pets eye_color  GPA 1      Emily    Dawson  fifteen            2        five      Blueish 4.00 2       Rose Patterson  14            five        0     Dark-green 3.50 iii  Alexander     Smith  16            0        2     Brown 3.20 4       Nora    Navona  16            iv       10     GREEN 3.fifteen 5       Gino      Sand  17            3        8      Blueish 2.90 half dozen    William     Smith  14            7        3     BROWN iii.00              

How to Remove Columns

To remove columns from a data frame, you lot need to use this syntax:

image-198

When y'all assign the value Nada to a cavalcade, that column is removed from the data frame automatically.

For example, to remove the historic period column, nosotros apply:

                > students_data$age <- NULL              

The output is:

                > students_data    first_name last_name num_siblings num_pets eye_color  GPA 1      Emily    Dawson            2        v      BLUE 4.00 two       Rose Patterson            5        0     GREEN iii.fifty 3  Alexander     Smith            0        2     BROWN 3.xx 4       Nora    Navona            4       x     Dark-green iii.15 five       Gino      Sand            iii        8      BLUE 2.90 vi    William     Smith            7        3     BROWN three.00              

How to Remove Rows

To remove rows from a data frame, you can use indices and ranges. For example, to remove the kickoff row of a data frame:

image-200

The [-1,] takes a portion of the information frame that doesn't include the first row. And then, this portion is assigned to the same variable.

If nosotros have this data frame and nosotros desire to delete the get-go row:

image-230

The output is a data frame that doesn't include the first row:

image-231

In full general, to remove a specific row, you need to use this syntax where <row_num> is the row that yous want to remove:

image-229

💡 Tip: Notice the - sign before the row number.

For example, if nosotros want to remove row 4 from this data frame:

image-232

The output is:

image-233

As you can see, row four was successfully removed.

🔹 In Summary

  • CSV files are Comma-Separated Values Files used to represent data in the form of a tabular array. These files can be read using R and RStudio.
  • Data frames are used in R to represent tabular data. When yous read a CSV file, a data frame is created to store the information.
  • Y'all can admission and modify the values, rows, and columns of a data frame.

I actually hope that you liked my commodity and plant it helpful. Now yous can work with data frames and CSV files in R.

If you liked this article, consider enrolling in my new online grade "Introduction to Statistics in R - A Practical Arroyo "



Learn to lawmaking for free. freeCodeCamp'southward open source curriculum has helped more than 40,000 people get jobs as developers. Get started

bassettaredle.blogspot.com

Source: https://www.freecodecamp.org/news/how-to-work-with-data-frames-and-csv-files-in-r/

0 Response to "How to Upload a Csv File in R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel