R - data analysis - descriptive¶
Packages¶
- data manipulation:
dplyr - data visualization:
ggplot
Getting started¶
- load package:
library(package_name) - load data:
data(dataframe_name)
Overview¶
- data dimension:
dim(dataframe_name) - no. of rows:
norws(dataframe_name) - column names:
names(dataframe_name) - data structure (dimension + data types):
str(dataframe_name)
dplyr - Data manipulation¶
filter(condition): filter data based on conditions-
general syntax:
dataframe_name %>% filter(col_1 == value1, col_2 != value2) -
summarise(): numerical summaries of data - general syntax:
dataframe_name %>% summarise(new_var_name = stats_function(col_name)) -
summary stats functions:
- mean()
- median()
- std()
- var()
- range()
- IQR()
- min()
- max()
- sum()
-
mutate(): create a new column -
general syntax:
dataframe_name(with added column) <- dataframe_name %>% mutate(new_col_name = ifelse(old_col == value, , "value_if_true", "value_if_false" )) -
group_by(): useful for summarise data based on group arrange(): ordering dataarrange(desc(col_names))select(): like in SQL, selecting only columns of interestdistinct()sample_n()
ggplot - Data visualization¶
- general syntax:
ggplot(data = dataframe_name, aes(x = col_name)) + geom_histogram() - histogram:
geom_histogram(binwidth = ) - bar:
geom_bar() - scatterplot:
geom_point