README

Visualize and tabulate single-choice, multiple-choice, matrix-style questions from survey data. Includes ability to group cross-tabulations, frequency distributions, and plots by categorical variables and to integrate survey weights. Ideal for quickly uncovering descriptive patterns in survey data.

Installation

install.packages("surveyexplorer")
# or devtools::install_github("liamhaller/surveyexplorer") for the devlopment version

Examples

library(surveyexplorer)

The data used in the following examples is from the berlinbears dataset, a fictional survey of bears in Berlin, that is included in the surveyexplorer package.

Single-choice questions

#Basic table
single_table(berlinbears, 
             question = income)

	n	freq
Question: income
<1000	82	0.164
1000-2000	50	0.100
2000-3000	177	0.354
3000-4000	109	0.218
5000+	57	0.114
No answer	22	0.044
NA	3	0.006
Column Total	500	—

single_table(berlinbears,
             question = income,
             group_by = gender)

	female		male		NA		Rowwise Total
Question: income
grouped by: gender
	Frequency	Count	Frequency	Count	Frequency	Count	Frequency	Count
<1000	16.74%	39	15.73%	39	21.05%	4	16.40%	82
1000-2000	9.87%	23	9.68%	24	15.79%	3	10.00%	50
2000-3000	35.62%	83	35.89%	89	26.32%	5	35.40%	177
3000-4000	21.89%	51	22.18%	55	15.79%	3	21.80%	109
5000+	11.59%	27	10.89%	27	15.79%	3	11.40%	57
No answer	3.86%	9	4.84%	12	5.26%	1	4.40%	22
NA	0.43%	1	0.81%	2	0.00%	0	0.60%	3
Columnwise Total	46.60%	233	49.60%	248	3.80%	19	100.00%	500

single_table(berlinbears,
             question = income, 
             group_by = gender, 
             subgroups_to_exclude = NA)

	female		male		Rowwise Total
Question: income
grouped by: gender
	Frequency	Count	Frequency	Count	Frequency	Count
<1000	16.74%	39	15.73%	39	16.22%	78
1000-2000	9.87%	23	9.68%	24	9.77%	47
2000-3000	35.62%	83	35.89%	89	35.76%	172
3000-4000	21.89%	51	22.18%	55	22.04%	106
5000+	11.59%	27	10.89%	27	11.23%	54
No answer	3.86%	9	4.84%	12	4.37%	21
NA	0.43%	1	0.81%	2	0.62%	3
Columnwise Total	48.44%	233	51.56%	248	100.00%	481

single_table(berlinbears,
             question = income, 
             group_by = gender, 
             subgroups_to_exclude = NA,
             na.rm = TRUE)

	female		male		Rowwise Total
Question: income
grouped by: gender
	Frequency	Count	Frequency	Count	Frequency	Count
<1000	16.81%	39	15.85%	39	16.32%	78
1000-2000	9.91%	23	9.76%	24	9.83%	47
2000-3000	35.78%	83	36.18%	89	35.98%	172
3000-4000	21.98%	51	22.36%	55	22.18%	106
5000+	11.64%	27	10.98%	27	11.30%	54
No answer	3.88%	9	4.88%	12	4.39%	21
Columnwise Total	48.54%	232	51.46%	246	100.00%	478

single_table(berlinbears,
             question = income, 
             group_by = gender, 
             subgroups_to_exclude = NA,
             na.rm = TRUE,
             weights = weights)

	female		male		Rowwise Total
Question: income
grouped by: gender
	Frequency	Count	Frequency	Count	Frequency	Count
<1000	15.96%	59.6	17.21%	75.2	16.63%	134.8
1000-2000	10.46%	39.1	10.19%	44.5	10.31%	83.6
2000-3000	33.79%	126.3	33.88%	148.0	33.84%	274.3
3000-4000	25.08%	93.7	25.34%	110.7	25.22%	204.4
5000+	9.82%	36.7	8.68%	37.9	9.21%	74.6
No answer	4.90%	18.3	4.70%	20.5	4.79%	38.8
Columnwise Total	46.10%	373.6	53.90%	436.9	100.00%	810.5
Frequencies and counts are weighted

The same syntax can be applied to the single_freq function to plot frequencies of the question optionally partitioned by subgroups.

single_freq(berlinbears,
             question = income, 
             group_by = gender, 
             subgroups_to_exclude = NA,
             na.rm = TRUE,
             weights = weights)

Multiple-choice questions

The options and syntax for multiple-choice tables multi_table and graphs multi_graphs are the same. The only difference is the question input also accommodates tidyselect syntax to select several columns for each answer option. For example, the question “will_eat” has five answer options each prefixed by “will_eat”

berlinbears |> 
  dplyr::select(starts_with('will_eat')) |> 
  head()
#>   will_eat.SQ001 will_eat.SQ002 will_eat.SQ003 will_eat.SQ004 will_eat.SQ005
#> 1              0              1              0              1              1
#> 2              0              1              1              1              1
#> 3              1              1              0              1              1
#> 4              0              0              0              1              0
#> 5              0              0              0              1              1
#> 6              0              0              0              1              0

The same syntax can be used to select the question for the multiple choice tables and graphs

multi_table(berlinbears, 
            question = dplyr::starts_with('will_eat'), 
            group_by = genus, 
            subgroups_to_exclude = NA,
            na.rm = TRUE)

	Ailuropoda		Ursus		Rowwise Total
Question: dplyr::starts_with(“will_eat”)
grouped by: genus
	Frequency	Count	Frequency	Count	Frequency	Count
will_eat.SQ004	97.54%	278	91.62%	175	40.12%	453
will_eat.SQ002	59.30%	169	66.49%	127	26.22%	296
will_eat.SQ005	43.86%	125	46.60%	89	18.95%	214
will_eat.SQ001	24.91%	71	27.23%	52	10.89%	123
will_eat.SQ003	8.42%	24	9.95%	19	3.81%	43
Columnwise Total	59.08%	667	40.92%	462	100.00%	1129

For graphing, the multi_freq function creates an UpSet plot to visualize the frequencies of the intersecting sets for each answer combination and also includes the ability to specify weights.

multi_freq(berlinbears, 
            question = dplyr::starts_with('will_eat'), 
            na.rm = TRUE,
            weights = weights)
#> Estimes are only preciese to one significant digit, weights may have been rounded

multi_freq(berlinbears, 
            question = dplyr::starts_with('will_eat'), 
            group_by = genus,
            subgroups_to_exclude = NA,
            na.rm = FALSE,
            weights = weights)
#> Estimes are only preciese to one significant digit, weights may have been rounded

Matrix Questions

matrix_table has the same syntax as above and works with array or categorical questions

matrix_table(berlinbears, 
             dplyr::starts_with('c_'),
             group_by = is_parent)

	high	low	medium	NA
Question: dplyr::starts_with(“c_”)
grouped by: is_parent
0
c_diet	6.02% (20)	71.99% (239)	16.57% (55)	5.42% (18)
c_exercise	25% (83)	27.71% (92)	24.1% (80)	23.19% (77)
1
c_diet	3.57% (6)	75% (126)	17.26% (29)	4.17% (7)
c_exercise	19.05% (32)	27.38% (46)	23.81% (40)	29.76% (50)

matrix_freq(berlinbears, 
             dplyr::starts_with('p_'), 
             na.rm = TRUE)

For array/matrix style questions that are numeric matrix_mean plots the mean values and confidence intervals

matrix_mean(berlinbears, 
             question = dplyr::starts_with('p_'),
             na.rm = TRUE)

#Can also apply grouping + survey weights
matrix_mean(berlinbears, 
            question = dplyr::starts_with('p_'),
            na.rm = TRUE,
            group_by = species, 
            subgroups_to_exclude = NA)

Finally, for Likert questions (scales of 3,5,7,9…) matrix_likert provides a custom plot

#you can specify custom labels with the `label` argument
matrix_likert(berlinbears,
              question = dplyr::starts_with('p_'),
              labels = c('Strongly disagree', 'Disagree','Neutral','Agree','Strongly agree'))


#can also apply pass custom colors and specify weights weights 
matrix_likert(berlinbears, 
              question = dplyr::starts_with('p_'),
              labels = c('Strongly disagree', 'Disagree','Neutral','Agree','Strongly agree'), 
              colors = c("#E1AA28", "#1E5F46", "#7E8F75", "#EFCD83", "#E17832"),
              weights = weights)

Overview

Functions

*_table functions return a gt table of the cross tabulations and frequencies for each question while *_freq returns the same data but as a plot.

For matrix-style questions with numerical input, matrix_mean plots the mean value value and ± two standard deviations. matrix_likert visualizes questions that accept Likert responses (strongly agree-strongly disagree) or questions with 3,5,7,9… categories.

Surveyexplorer