R - data analysis - descriptive¶
Packages¶
- data manipulation:
dplyr
- data visualization:
ggplot
Getting started¶
- load package:
library(package_name)
- load data:
data(dataframe_name)
Overview¶
- data dimension:
dim(dataframe_name)
- no. of rows:
norws(dataframe_name)
- column names:
names(dataframe_name)
- data structure (dimension + data types):
str(dataframe_name)
dplyr
- Data manipulation¶
filter(condition)
: filter data based on conditions-
general syntax:
dataframe_name %>% filter(col_1 == value1, col_2 != value2)
-
summarise()
: numerical summaries of data - general syntax:
dataframe_name %>% summarise(new_var_name = stats_function(col_name))
-
summary stats functions:
- mean()
- median()
- std()
- var()
- range()
- IQR()
- min()
- max()
- sum()
-
mutate()
: create a new column -
general syntax:
dataframe_name(with added column) <- dataframe_name %>% mutate(new_col_name = ifelse(old_col == value, , "value_if_true", "value_if_false" ))
-
group_by()
: useful for summarise data based on group arrange()
: ordering dataarrange(desc(col_names))
select()
: like in SQL, selecting only columns of interestdistinct()
sample_n()
ggplot
- Data visualization¶
- general syntax:
ggplot(data = dataframe_name, aes(x = col_name)) + geom_histogram()
- histogram:
geom_histogram(binwidth = )
- bar:
geom_bar()
- scatterplot:
geom_point