Skip to contents

One of pixieweb’s strengths is its ability to connect to any PX-Web instance with the same interface. This vignette shows how to compare data across national statistics agencies.

The honest truth about cross-country comparison: The pixieweb functions work identically across APIs, but the data is not harmonised. Table IDs, variable names, and code systems differ between countries. The workflow is: find a comparable table in each country (the hard part), then use identical pixieweb code to fetch and combine the results.

Prerequisite: This vignette assumes you are comfortable with the basics from vignette("a-quickstart").

Available APIs

pixieweb ships with a catalogue of known PX-Web instances. You can also connect to any PX-Web API by providing a full URL.

Connecting to multiple agencies

scb <- px_api("scb", lang = "en")      # Sweden (v2)
ssb <- px_api("ssb", lang = "en")      # Norway (v2)
statfi <- px_api("statfi", lang = "en") # Finland (v1)

Each API object stores the base URL, language, API version, and configuration (cell limits, rate limits):

scb
ssb

API version differences

PX-Web has two API versions: - v1: Legacy, POST-only data queries, no search endpoint. Table discovery requires walking a folder hierarchy. - v2: Modern, GET+POST data queries, full-text search, codelists endpoint, saved queries.

pixieweb handles both versions transparently. The user-facing functions have the same signatures — only the internal request building differs.

Some selection helpers are v2-only: px_bottom(), px_from(), px_to(), and px_range() will raise an informative error if used against a v1 API.

Cross-country comparison example

Suppose you want to compare population data across Sweden and Norway. The table IDs and variable codes will differ, but the workflow is identical:

library(dplyr)
library(purrr)

# Find population tables in each country
scb_tables <- get_tables(scb, query = "population")
ssb_tables <- get_tables(ssb, query = "population")

# Explore a table from each
scb_tables |> table_describe(max_n = 3)
ssb_tables |> table_describe(max_n = 3)

Note that table IDs are completely different between countries, and variable names may also differ (“Region” in SCB vs other names elsewhere). Always run variable_describe() on each table before building your query:

# Fetch data using prepare_query() for quick exploration
scb_q <- prepare_query(scb, "TAB638",
  Region = "00",       # "Riket" (whole country)
  Tid = px_top(5),
  ContentsCode = "BE0101N1" # Population
)

# Norwegian table IDs are different — explore to find the right one
ssb_vars <- get_variables(ssb, "05803")
ssb_vars |> variable_describe()

Combining results

Since get_data() returns standard tibbles with a table_id column, you can bind results from different APIs:

results <- list(
  sweden = get_data(scb, query = scb_q),
  norway = get_data(ssb, "05803",
    ContentsCode = "Personer",
    Tid = px_top(5)
  )
)

# .id = "country" adds a column tracking which list element each row
# came from — essential for traceability after binding
bind_rows(results, .id = "country")
# NOTE: column names may differ between countries. If so, you may need
# to rename() before bind_rows() to align them.

Tips for cross-agency work

  • Language matters. Codes are often language-dependent. lang = "en" gives the most consistent labels across countries, but codes and table IDs are language-independent.
  • Table structure varies. Swedish tables may have “Region” while Finnish tables have “Alue”. Run get_variables() |> variable_describe() on each table before writing queries.
  • API limits differ. SCB allows ~100 000 cells per request; other agencies may allow less. Use api$config$max_cells to check. prepare_query() respects the limit automatically.
  • v1 vs v2. Not all agencies have migrated to v2. Selection helpers px_from(), px_range() etc. raise an informative error if used against a v1 API. Check api$version and the catalogue’s versions column.

Next steps