R - data analysis - descriptive

Packages

  • data manipulation: dplyr
  • data visualization: ggplot

Getting started

  • load package: library(package_name)
  • load data: data(dataframe_name)

Overview

  • data dimension: dim(dataframe_name)
  • no. of rows: norws(dataframe_name)
  • column names: names(dataframe_name)
  • data structure (dimension + data types): str(dataframe_name)

dplyr - Data manipulation

  • filter(condition): filter data based on conditions
  • general syntax: dataframe_name %>% filter(col_1 == value1, col_2 != value2)

  • summarise(): numerical summaries of data

  • general syntax: dataframe_name %>% summarise(new_var_name = stats_function(col_name))
  • summary stats functions:

    • mean()
    • median()
    • std()
    • var()
    • range()
    • IQR()
    • min()
    • max()
    • sum()
  • mutate(): create a new column

  • general syntax: dataframe_name(with added column) <- dataframe_name %>% mutate(new_col_name = ifelse(old_col == value, , "value_if_true", "value_if_false" ))

  • group_by(): useful for summarise data based on group

  • arrange(): ordering data arrange(desc(col_names))
  • select(): like in SQL, selecting only columns of interest
  • distinct()
  • sample_n()

ggplot - Data visualization

  • general syntax: ggplot(data = dataframe_name, aes(x = col_name)) + geom_histogram()
  • histogram: geom_histogram(binwidth = )
  • bar: geom_bar()
  • scatterplot: geom_point