Pakistan Taxpayer Data

Between 2013 and 2018, the Pakistan Federal Board of Revenue (FBR) published a directory of all taxpayers. Naturally, these are shared as PDFs that look something like:

2018 taxpayer directory 35k page PDF (takes preview a few minutes to open and close)
2018 taxpayer directory 35k page PDF (takes preview a few minutes to open and close)

To save anyone trying to see this data in the future, I’ve shared the extracted and compressed parquet files in the GitHub repo.

Some cursed knowledge I have learnt:

  • NTNs have an 8th digit sometimes (data for 2013-2014), but it is just a check digit. The all.parquet file contains a column ntn7 to help with grouping
    • Hyderabad Development Authority’s 8th digit was 0 in 2015 but 1 in 2016
  • shell scripts are very fast

Explore the top 1000 taxpayers across all years and within each category at the GitHub pages link.

Query the data (or lookup an NTN/CNIC) directly in your browser with DuckDB-WASM at this link.