Reading files and basic syntax
Contents
Reading files
There are several options to read a file containing your data not only numerical as R can read text, book pages, among others for different analysis. Similarly, your data does not need to be in an excel file, you are able to import a lot of formats as long as you are using the right package. Similarly, it can be directly obtained from other online data sources.
Most of the time, we will use the classic csv
,txt
, and xlxs
files. Let’s begin by selecting a file that we want. Several files for online sources or the output of programs are in comma separated format (CSV), to ensure that they can be open in different places, therefore it is common and advised that you save your files in that format so others can easily read them. Run the next line of code and select any csv
file that you got on the computer.
df<-read.csv(file.choose(),skip = 0,header = T)
df #look at the data file
df<-read.csv("file.csv"/"file.txt") #read a file by using the name of the file
We added the argument skip
to indicate two points: first, each function in R has several arguments, some are by default like skip=0
or header=T
meaning that you do not need to add them to obtain the results. Second, we can add indicators that will do the action that we want, for example, files may contain 4 lines of text at the beginning therefore, skip=4
will not read the file until the 5th line, similarly, header=F
is used for files with no column names.
R-Studio requires you to have something well structured because you will have some documents that you may need to open them after a long time. If you select file.choose it is easy to forget which file you were working with. Therefore, when you got some data and you are ready to work with something have one folder for all the information and create File -> New Project. You will save, store, share to anyone and they will be able to run the same things you did.
We may want to read files from excel, here we may find different pages. If you got a large file and you don’t want to create one csv
file for each page then you can use the package readxl
and call the read_excel
function to do something like this.
#install.packages("readxl")
library(readxl)
data <- read_excel("datafile.xls", sheet = 2) #page number
data <- read_excel("datafile.xls", sheet = "Revenues") #name of the page
We can also copy to our clipboard some data and save it as a data frame. The package clipr
has a function to directly convert the selection into a tibble.
- Open an excel file so we can copy the data for this exercise and copy the data that you one to use
#only available for windows
data.selected<-read.csv("clipboard",header = F,sep = "\t")
#install.packages("clipr")
library(clipr)
data<- read_clip_tbl() #read from the clipboard directly
As you can see there are many ways to import data. The manual option will be to use the Import Dataset
button in the environmnet window which will take your file in the same way as file.choose()
. Now, we got the file ready for analysis and interpretation.
R and its packages have a lot of built-in data for our use and practice. Use the command data() to get the names of the data sets that you can use.
General commands
There are a lot of things that you can do in R with your code, you may encounter a few during this tutorail but let’s cover some of the most used syntaxt:
new_variable<-function(object)
- new_variable will save all the info under that new name. Next time you can use that in a different location.
- function can take any function inside a package or created by you.
- object is the data, file, values, names. Different functions required some structure.
- <- or = these symbles means store funtion/results into our new variable.
Strings or text should go inside " "
or ''
you will use the later one when there is some quotes in your text.
x=c(1:10)
x %>%
sum() #sum all x
## [1] 55
sum(x)
## [1] 55
Pipes or %>% have the function to take the information created before and pass it to the next action. you can read it as: using this do this then that for as many pipes
as you got. This way you dont need to save all the information in one new variable so your environment has less names.
a=1:10 #numbers 1 to 10
str(a) #structure of a says numbers
## int [1:10] 1 2 3 4 5 6 7 8 9 10
a<-factor(1:10) # treat numbers from 1:10 as factors
str(a) #structure of a now says Factor with 10 levels
## Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10
All numbers in R are treated as continous, your raster file will have that and if you want to treat them as one different group make sure you use factor().