pixieweb makes it easy to download open statistical data from PX-Web APIs — the platform used by Statistics Sweden (SCB), Statistics Norway (SSB), Statistics Finland, and many others. This vignette walks you from zero to a tidy tibble in five steps.
Step 1: Connect to an API
px_api() accepts a short alias ("scb",
"ssb", "statfi") or a full URL. Use
px_api_catalogue() to list known instances.
Step 2: Find a table
PX-Web organises data into tables. Each table holds
a data cube with one or more dimensions (called
variables). Use get_tables() to
search:
tables <- get_tables(scb, query = "population")
tablesThe result is a tibble. You can narrow it further on the client side
with table_search(), and inspect tables with
table_describe():
tables |>
table_search("municipal") |>
table_describe(max_n = 3, format = "md")table_describe() now shows the subject path, time period
range, and data source alongside the title — making it much easier to
pick the right table.
Step 3: Explore variables
Once you have a table ID, inspect what variables (dimensions) it has:
vars <- get_variables(scb, "TAB683")
vars |> variable_describe()Each variable has a set of available values (codes). Look at a specific variable’s values:
vars |> variable_values("Region")Step 4: Fetch data
Now you know which variables the table has and what values are
available. Pass your selections to get_data():
-
ContentsCode tells the API what to measure
(population, deaths, etc.).
"*"means “all measures in this table”. - Variables you omit are eliminated — the
API returns a pre-computed aggregate (e.g., omitting
Kongives totals for both sexes). Not all variables allow this; seevignette("introduction-to-pixieweb")for mandatory vs eliminable.
pop <- get_data(scb, "TAB638",
Region = c("0180", "1480"),
ContentsCode = "*",
Tid = px_top(5)
)
popSelection helpers like px_top(), px_from(),
and px_range() let you select values without knowing exact
codes. Use them when you want “the latest N periods” or “everything from
2020 onward” rather than typing out specific year codes.
Optional shortcut: prepare_query()
You can skip this section if you prefer the direct approach above.
prepare_query() inspects the table and fills in sensible
defaults — handy when you don’t want to specify every variable:
q <- prepare_query(scb, "TAB638", Region = c("0180", "1480"))It prints a summary of what was chosen and why. When you’re happy,
pass the query to get_data():
pop <- get_data(scb, query = q)Set maximize_selection = TRUE to automatically include
as many variables as the API’s cell limit allows:
q <- prepare_query(scb, "TAB638",
Region = c("0180"),
maximize_selection = TRUE
)Step 5: Work with the result
The result is a standard tibble. Use your favourite tidyverse tools:
library(ggplot2)
pop |>
ggplot(aes(x = Tid, y = value, colour = Region_text)) +
# One line per region
geom_line(aes(group = Region_text)) +
# Separate panel for each measure (Population, Deaths, etc.)
facet_wrap(~ ContentsCode_text, scales = "free_y") +
# Rotate x-axis labels to avoid overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(
title = "Population over time",
caption = px_cite(pop) # Auto-generated data citation
)Notice the _text suffix: get_data() returns
both raw code columns (Region = "0180") and human-readable
label columns (Region_text = "Stockholm"). Use
_text columns for display and plotting; use the raw codes
for filtering and joining.
Other useful helpers:
-
data_minimize()— remove columns where all values are identical -
data_legend()— generate a caption string from variable metadata -
px_cite()— create a citation for the downloaded data
Next steps
-
Concepts & advanced features —
vignette("introduction-to-pixieweb")covers the data model, codelists, saved queries, and query composition. -
Multiple countries —
vignette("multi-api")shows how to compare data across national statistics agencies. - ggplot2 reference — https://ggplot2-book.org/ for more on visualisation.