This document is the
internal technical reference for the tabxplor package.
It is intended for developers and AI assistants modifying the codebase.
For user-facing documentation, see
vignette("tabxplor").
Note: Some details may become out of date as the code evolves. Always verify against the current source. When in doubt, the code is the source of truth.
tabxplor creates, manipulates, and formats color-coded cross-tabulation tables for exploratory data analysis. Two core design principles underpin the entire package:
Every cell carries all statistical data. Each
numeric cell is a vctrs record (tabxplor_fmt)
storing count, weighted count, percentage, mean, difference,
contribution to variance, confidence interval, odds ratio, and
display/formatting metadata. This enables lossless display
switching: users can change what is displayed (e.g., from
percentages to differences to CI) without recalculating or losing
data.
Tables are tibbles with full dplyr
compatibility. Results inherit from tibble (via
tabxplor_tab and tabxplor_grouped_tab S3
classes), so all dplyr verbs (filter, mutate,
select, arrange, etc.) work out of the box
while preserving table metadata and formatting.
Performance strategy: Aggregation is done with
data.table internally for speed on large data frames. The
user-facing API returns tibbles with fmt columns. Users
never interact with data.table directly.
CRAN stability: This is a public CRAN package with external users. All public function signatures (argument names, defaults, return types) are part of the stable API. Internals (helper functions, class fields, color logic) can be changed freely, but public-facing arguments must not be removed or renamed without proper deprecation.
tabxplor_fmt is a vctrs::new_rcrd() record
class defined in R/fmt_class.R. It is the
foundation of the entire package — every numeric column
in a tabxplor table is an fmt vector.
Fields (per-cell, accessed via
vctrs::field()):
| Field | Type | Description |
|---|---|---|
n |
integer | Unweighted count |
display |
character | Which field to show: “n”, “wn”, “pct”, “mean”, “diff”, “ctr”, “ci”, “pct_ci”, “mean_ci”, “var”, “pvalue”, “or”, “or_pct”, “OR”, “OR_pct”, “rr” |
digits |
integer | Decimal places for display |
wn |
double | Weighted count |
pct |
double | Percentage (stored as 0–1, multiplied by 100 only in
format()) |
mean |
double | Cell mean (for numeric column variables) |
diff |
double | Difference from reference. For type=“mean”: stores a RATIO, not a difference |
ctr |
double | Contribution to chi-squared variance |
var |
double | Variance (used for CI calculation) |
ci |
double | Confidence interval half-width (margin of error), not full interval |
rr |
double | Relative risk |
or |
double | Odds ratio or relative risk ratio |
in_totrow |
logical | Cell belongs to a total row |
in_tottab |
logical | Cell belongs to the total table |
in_refrow |
logical | Cell belongs to the reference row |
Attributes (per-column, accessed via
attr()):
| Attribute | Type | Description |
|---|---|---|
type |
character | Column type: “n”, “mean”, “row”, “col”, “all”, “all_tabs” |
comp_all |
logical | Compare against total table (TRUE) or subtable (FALSE) |
ref |
character | Reference type: “tot” or “first” |
ci_type |
character | CI type: ““,”no”, “cell”, “diff”, “auto” |
col_var |
character | Name of the column variable this belongs to |
totcol |
logical | This column is a total column |
refcol |
logical | This column is a reference column |
color |
character | Color scheme: “no”, “diff”, “diff_ci”, “after_ci”, “contrib”, “or”, “OR” |
Critical distinction: Fields are per-cell vectors
(every cell can have a different n, pct,
etc.). Attributes are scalar values describing the entire column (all
cells in the column share the same type,
color, etc.). Do not confuse the two when modifying the
class.
Constructor chain: fmt() (public,
validates and coerces arguments) → new_fmt() (internal,
calls vctrs::new_rcrd()).
Adding a new field requires updating:
new_fmt(), fmt(),
format.tabxplor_fmt(),
pillar_shaft.tabxplor_fmt(), the relevant
vec_arith methods, and possibly
tab_pct()/tab_ci()/tab_chi2().
Expect ~8 functions across 3 files.
tabxplor_tab is a tibble subclass created via
tibble::new_tibble() in R/tab_classes.R. It
adds two attributes beyond what a regular tibble carries:
subtext (character vector): Legend lines printed below
the table.chi2 (tibble): Chi-squared test results with columns:
tables, pvalue, df,
cells, variance, count.Constructor: new_tab(tabs, subtext, chi2).
When tab_vars are provided, the result is a
tabxplor_grouped_tab — a grouped_df subclass.
It carries the same subtext and chi2
attributes, plus groups data from dplyr.
Constructor:
new_grouped_tab(tabs, groups, subtext, chi2).
This class requires a separate S3 method for every dplyr verb to preserve class and attributes through operations. See the dplyr Integration section below.
The main pipeline flows through these functions in
R/tab.R:
tab(data, row_var, col_var, ...)
└── tab_many(data, row_vars, col_vars, ...)
└── per row_var:
tab_prepare() ──► Cleans data, drops NA, collapses rare levels
tab_plain() ──► data.table aggregation (dcast), wraps in fmt, adds totals
or tab_num() (for numeric col_vars: calculates means/variances)
tab_pct() ──► Calculates percentages and differences from reference
tab_ci() ──► Calculates confidence intervals (Wilson/Wald/AC methods)
tab_chi2() ──► Chi-squared test, cell contributions to variance
tab_totaltab() ──► Adds total table (overall cross-tab when subtables exist)
tab_spread() ──► Pivots wider (spread_vars from rows to columns)
tab_compact() ──► Binds multiple row_var tables into one
tab() is a simplified wrapper around
tab_many(). Key differences:
tab() takes a single row_var and
col_var; tab_many() takes multiple
row_vars and col_vars.tab() has a sup_cols argument
(supplementary columns showing only the first level with row
percentages); tab_many() achieves this via
levels = "first".tab() translates its simpler argument interface into
tab_many() arguments.tab_many() processes multiple variables with a key
asymmetry:
Arguments vectorised over row_vars: totaltab,
totrow, ref, ref2,
OR, comp, color, ci,
chi2. Arguments vectorised over col_vars:
levels, digits, totcol,
pct.
tab_plain() is where raw cross-tabulation happens:
data.table::dcast(DT, row_var ~ col_var, fun.aggregate = sum)
for weighted counts. Column names are temporarily prefixed to avoid
data.table reserved name conflicts.fmt vectors via new_fmt().tot argument.tab_pct(),
tab_ci(), tab_chi2() if requested.When a col_var is numeric (not a factor),
tab_num() is used instead of tab_plain(). It
calculates means and variances per group using data.table
aggregation. The resulting fmt vectors have
type = "mean" and display = "mean".
The ref argument controls which row serves as the
comparison baseline for differences and colors:
"auto": defaults to "first" when OR
requested, "tot" otherwise"tot": the total row is the reference (differences =
cell - total)"first": the first non-total row is the reference"no": skip difference calculation entirelyThe comp argument adds another dimension:
comp = "tab" (default): compare within each subtable’s
own totalcomp = "all": compare against the total table’s total
(across all subtables)Critical non-obvious design choice: For
type = "mean" columns, the diff field stores a
ratio (cell_mean / ref_mean), not an
additive difference. This means:
c(1.15, 1.5, 2, 4)) are ratio thresholds:
1.15 = “+15% above reference”.diff >= break directly
(no subtraction).diff is an additive difference
(cell_pct - ref_pct), and breaks like 0.05
mean “+5 percentage points”.This asymmetry propagates through tab_pct(),
color_formula(), and
format.tabxplor_fmt().
tab_ci() stores the CI as a half-width
(margin of error), not a full interval:
ci = z * sqrt(variance)format())method_cell for absolute proportions
(default: Wilson), method_diff for differences (default:
Agresti-Caffo)color_formula() for
diff_ci/after_ci modes)Two display modes controlled by
options("tabxplor.ci_print"):
"moe": show as value ± margin (e.g.,
45% ±3)"ci": show as [lower; upper] (e.g.,
[42%; 48%])The color system has three layers, all working together to determine which cells get which colors at which intensity.
Six predefined color palettes are defined as named character vectors
in R/tab_classes.R (around line 2892). Each palette has 11
hex color codes:
pos1 through pos5: Increasing intensity
for over-represented values (green/blue spectrum)neg1 through neg5: Increasing intensity
for under-represented values (yellow/orange/red spectrum)ratio: Special color for the “*2 rule” ratio comparison
(purple/blue)The palettes are:
| Palette | Use case |
|---|---|
color_style_text_dark |
Console text on dark background |
color_style_text_light |
Console text on light background |
color_style_text_light_24_blue_red |
HTML 24-bit (green→blue→red) |
color_style_text_light_24_green_red |
HTML 24-bit (green→red, traditional) |
color_style_bg_light |
Cell background on light theme |
color_style_bg_dark |
Cell background on dark theme |
Selection is done by
set_color_style(type, theme, html_24_bit), which sets
options("tabxplor.color_style").
get_color_style() returns either crayon functions (for
console) or hex codes (for HTML/Excel), depending on the
mode parameter.
Breaks are thresholds stored in
options("tabxplor.color_breaks"), set by
set_color_breaks():
c(0.05, 0.1, 0.2, 2, 0.3)): For percentage differences.
0.05 = “+5 percentage points above reference” →
pos1 color0.1 = “+10 pp” → pos20.2 = “+20 pp” → pos32 = “twice the reference percentage” →
ratio color (the “*2 rule”)0.3 = “+30 pp” → pos5-0.05 → neg1, etc.c(1.15, 1.5, 2, 4)): Always ratios for mean
comparisons.c(1, 2, 5, 10)): Multiples of mean contribution to
variance.**The “*2 rule”:** Any pct_breaks value > 1 switches
from additive difference comparison to multiplicative ratio comparison.
Only one such value is allowed. When a cell’s percentage is ≥ 2× the
reference, it gets the ratio color (typically purple).
fmt_color_selection() in R/fmt_class.R
(line ~1869) orchestrates the selection:
force_breaks
parameter).color_formula() to get a
boolean mask of cells exceeding that threshold.keep_last_break() resolves ties: each cell gets the
strongest (highest) matching threshold.pos1–pos5,
neg1–neg5, ratio).color_formula() (line ~2134) applies different boolean
logic per color mode:
| Color mode | Formula |
|---|---|
"diff" |
diff >= break (additive) or
ratio >= break (when break > 1) |
"diff_ci" |
Difference must exceed CI to be significant |
"after_ci" |
Subtracts CI from difference before comparing to break |
"contrib" |
ctr >= break * mean_ctr (cell contribution vs. mean
contribution) |
"or" / "OR" |
Odds ratio comparison; negative uses 1/break for
under-represented |
The pillar_shaft.tabxplor_fmt() method then applies the
selected colors using crayon::make_style() functions for
console display, or hex codes for HTML/Excel.
Four export formats, all in separate files:
R/tab_xl.R)Exports to .xlsx via openxlsx
(Suggests-only dependency). Features:
hide_near_zero: cells displaying as 0 are grayed
outn_min: columns/rows with too few observations are
grayed outR/tab_classes.R)Uses kableExtra for HTML table output. Supports:
inst/tab.cssR/tab_md.R)Lightweight standalone export (new in v1.3.1):
R/tab_classes.R)Creates ggplot2 bar charts from tabxplor tables:
ggpubr and cowplot for layouttabxplor provides 30+ S3 methods to ensure tables survive dplyr operations. This is the most maintenance-intensive part of the package.
Three methods form the backbone of class preservation for
tabxplor_grouped_tab:
dplyr_row_slice(): Called when rows
are filtered/sliced. Calls NextMethod(), then re-wraps with
new_tab() or new_grouped_tab().dplyr_col_modify(): Called when
columns are added/modified. Same re-wrapping logic.dplyr_reconstruct(): Called to
reconstruct the object after operations. Same pattern.Each checks lv1_group_vars(): if only one grouping level
remains, downgrades to plain tabxplor_tab (no longer
grouped). Otherwise, preserves tabxplor_grouped_tab.
Every dplyr verb that a user might call needs an S3 method:
group_by,
ungroup, rowwiseselect,
relocate, rename,
rename_witharrange (note: for
tabxplor_tab, not grouped)mutate,
summarisedplyr_row_slice,
dplyr_col_modify, dplyr_reconstructIf a method is missing, the operation silently drops
the tabxplor_* class, reverting to a plain
tbl_df. This causes loss of subtext,
chi2 attributes and breaks colored printing. Always check
NAMESPACE for the current method list.
All options are set in .onLoad() in
R/utils.R. Users can override via
options().
| Option | Default | Description |
|---|---|---|
tabxplor.color_style_type |
"text" |
Color type: “text” or “bg” |
tabxplor.color_style_theme |
auto-detect | “light” or “dark” (detects RStudio theme) |
tabxplor.color_html_24_bit |
"no" |
“green_red”, “blue_red”, or “no” |
tabxplor.color_breaks |
(see Layer 2) | List of break vectors |
tabxplor.print |
"console" |
“console” or “kable” |
tabxplor.ci_print |
"ci" |
“ci” (brackets) or “moe” (±margin) |
tabxplor.compact |
FALSE |
Compact table output by default |
tabxplor.cleannames |
FALSE |
Clean factor names by default |
tabxplor.export_dir |
NULL |
Default directory for tab_xl() export |
tabxplor.output_kable |
FALSE |
Auto-output as kable |
tabxplor.kable_html_font |
DejaVu Sans | Font for HTML kable output |
tabxplor.kable_popover |
FALSE |
Show CI as HTML tooltip |
tabxplor.always_add_css_in_tab_kable |
TRUE |
Inject custom CSS in kable |
The foundation file. Contains:
fmt:
constructor fmt(), getters (get_num,
get_type, get_color, is_totrow,
is_refrow, etc.), setters (set_num,
set_type, set_display, as_totrow,
etc.).new_fmt() and helper fmt0().fmt_field_factory(), reference detection
(get_reference()).format.tabxplor_fmt()
— the central display method handling 20+ display modes.pillar_shaft.tabxplor_fmt() — console color rendering,
mutate.tabxplor_fmt().fmt_color_selection()
— the color selection pipeline.color_formula(),
keep_last_break(), helper functions.get_reference() —
identifies reference cells (totals, first row, or regex match).vec_arith), casting (vec_cast), type
compatibility (vec_ptype2), comparison/equality
proxies.The main API file. Contains:
tab() roxygen
documentation.tab() function body —
argument processing, delegation to tab_many().tab_many() — the
full-featured engine with vectorisation, per-row_var loop, pipeline
chaining.tab_spread(),
tab_get_vars(),
tab_get_wrapped_dimensions().tab_prepare() — data
cleaning, NA handling, rare level collapsing.tab_plain() —
data.table aggregation core, total rows/cols, fmt wrapping.tab_num() — numeric
variable means/variances, similar structure to tab_plain.tab_pct() —
percentage calculation, difference computation.tab_ci() — confidence
interval calculation (Wilson/Wald/AC methods).tab_chi2() —
chi-squared test, contributions to variance.tab_tot(),
tab_totaltab(), internal helpers (diff_index,
quo_miss_na_null_empty_no, etc.).Classes, dplyr methods, and colors. Contains:
new_tab(),
new_grouped_tab() constructors, is_tab(),
validators.print.tabxplor_tab, tbl_sum,
tbl_format_body, tbl_format_footer),
tab_kable().tab_compact() — merges
multiple row_var tables.tab_plot() — ggplot
visualization.lv1_group_vars() helper.vec_ptype2, vec_cast).set_color_style(),
get_color_style().set_color_breaks(),
get_color_breaks(), color legend generation.Excel export. Main function tab_xl() handles:
fmt_color_selection() with
mode = "color_code"Markdown export. Standalone file (does not modify existing code). Handles:
Utilities and initialization:
%>% from magrittr).onLoad() — sets all default optionsquo_miss_na_null_empty_no() — helper to check for
missing/empty quosuresfct_recode_helper,
etc.)score_from_lv1() — scoring helper for survey dataEntirely commented out. Future logistic regression integration using
parsnip/tidymodels. Contains draft code for multi_logit(),
readable_OR(), or_plot(). Do not try to use or
integrate these — they are a work in progress.
Jamovi module integration. jmvtab.h.R is auto-generated
by Jamovi (do not edit). jmvtab.b.R is the R6 backend class
that bridges Jamovi’s UI to tabxplor’s tab() function.