Introduction to pixieweb
Source:vignettes/introduction-to-pixieweb.Rmd
introduction-to-pixieweb.RmdNew to pixieweb? Start with
vignette("a-quickstart")for a hands-on walkthrough. This vignette covers the design and advanced features.
Background
PX-Web is the statistical database platform used by national statistics agencies across the Nordic countries and beyond. Each agency runs its own instance (Statistics Sweden at scb.se, Statistics Norway at ssb.no, etc.), but they all share the same underlying API.
pixieweb provides a consistent, pipe-friendly R interface to all these APIs. It follows the same design principles as rKolada: tibbles everywhere, search-then-fetch, and progressive disclosure.
Design principles
- Tibbles everywhere. Every function returns a tibble (or a vector extracted from one).
- Pipe-friendly. First argument is always a tibble or API object; output is always pipeable.
- Search-then-fetch. Users discover metadata before downloading data.
-
NULL on failure. API errors return NULL with a
warning, never
stop(). - Progressive disclosure. Simple things are simple; complex things are possible.
The data model
PX-Web tables are multi-dimensional data cubes. Unlike Kolada — where the dimensions are always KPI, municipality, and period — each PX-Web table defines its own set of dimensions. pixieweb calls these variables.
| pixieweb entity | What it represents | rKolada analog |
|---|---|---|
| api | A PX-Web instance (SCB, SSB…) | (implicit — single) |
| table | A statistical table | kpi |
| variable | A dimension within a table | (municipality/year) |
| codelist | An aggregation/value set | kpi_groups |
| data | Downloaded values | values |
Discovering tables
Tables are the central entity. get_tables() sends a
server-side search query. The result is a tibble with rich metadata:
tables <- get_tables(scb, query = "income") |>
table_search("taxable")
tables |> table_describe(max_n = 3)The table tibble includes subject path, time period range, time unit,
and data source — all of which are searchable by
table_search().
Table helper functions
| Function | Purpose |
|---|---|
table_search() |
Filter by regex (client-side) |
table_describe() |
Print human-readable summaries |
table_minimize() |
Remove constant columns |
table_extract_ids() |
Extract ID vector for piping |
Exploring variables
Each table has its own set of variables (dimensions). The key
discovery step is get_variables():
vars <- get_variables(scb, "TAB638")
vars |> variable_describe()Important variable properties: - elimination: can
this variable be left out of your get_data() call? If
TRUE, omitting it means the API returns a pre-computed
total (e.g., omitting “Sex” gives the total for all sexes). If
FALSE, the variable is mandatory — you
must include it. - time: is this the time dimension? -
values: the available codes and their human-readable
labels. - codelists: alternative groupings
(e.g. municipalities → counties).
# See what values a variable has
vars |> variable_values("Kon")
# Look up variable codes by name
variable_name_to_code(vars, "sex")Fetching data
Direct approach
If you know exactly what you want:
pop <- get_data(scb, "TAB638",
Region = c("0180", "1480"),
Kon = c("1", "2"),
ContentsCode = "*",
Tid = px_top(5)
)Variables you omit are eliminated (aggregated) if the API allows it. If a variable is mandatory, you must include it.
Selection helpers
| Helper | Meaning | Example |
|---|---|---|
c("0180") |
Specific values | Item selection |
"*" |
All values | Wildcard |
px_top(5) |
First N values | Most recent |
px_bottom(3) |
Last N values (v2 only) | |
px_from("2020") |
From value onward (v2) | |
px_to("2023") |
Up to value (v2) | |
px_range(a, b) |
Inclusive range (v2) |
The prepare_query() shortcut
For interactive exploration, prepare_query() inspects
the table metadata and builds a query with sensible defaults:
q <- prepare_query(scb, "TAB638")Default strategy: - ContentsCode: all values
("*") - Time variable: latest 10 periods
(px_top(10)) - Eliminable variables:
omitted (API aggregates) - Small mandatory variables (≤
22 values): all ("*") - Large mandatory
variables: first value (px_top(1))
Override specific variables while letting defaults handle the rest:
q <- prepare_query(scb, "TAB638",
Region = c("0180", "1480"),
maximize_selection = TRUE
)With maximize_selection = TRUE, the function expands
unspecified variables to include as many values as possible while
staying under the API’s cell limit.
Then fetch:
pop <- get_data(scb, query = q)Codelists
Codelists provide alternative groupings of variable values. They are useful when you want data at a different aggregation level than the table’s default. For example, a “Region” variable with 290 municipalities might have a codelist that groups them into 21 counties:
cls <- get_codelists(scb, "TAB638", "Region")
cls |> codelist_describe(max_n = 5)
# Use a codelist in a query
get_data(scb, "TAB638",
Region = "*",
Tid = px_top(5),
ContentsCode = "*",
.codelist = list(Region = "vs_RegionLän07")
)Wide output and multiple contents
When a table has multiple content variables (e.g. both Population and
Deaths), use .output = "wide" to pivot them into separate
columns. This is useful when you want to compute with multiple
measures (e.g. death rate = Deaths / Population):
Advanced: query composition
For full control over the HTTP request — useful for debugging or when you need to inspect/modify the exact query before sending it — use the low-level query composers:
q <- compose_data_query(scb, "TAB638",
Region = c("0180"),
ContentsCode = "*",
Tid = px_top(3)
)
# Inspect the query
q$url
q$body
# Modify and execute
raw <- execute_query(scb, q$url, q$body)Saved queries (v2 only)
PX-Web v2 supports server-side stored queries. Useful for recurring reports — save a query once, then retrieve it by ID later:
# Save a query
id <- save_query(scb, "TAB638", Region = "0180", Tid = px_top(5), ContentsCode = "*")
# Retrieve later
get_saved_query(scb, id)