How We Collect and Analyse UK Business Data
Yolist research draws on public open data, regulatory registers, and data collected on our own platform. This page explains our sources, cleaning process, and the limitations of our findings.
Data sources
Companies House
4.5M+ active companiesUpdated: Daily delta feedThe primary source for business registration, SIC codes, incorporation dates, director information, and legal entity type. We ingest the daily delta feed to keep our records current.
License: Open Government Licence v3.0 · Official source
Food Standards Agency (FSA)
500k+ hygiene ratingsUpdated: WeeklyUsed for food-business hygiene scores for restaurants, cafés, takeaways, and food retailers. We display FSA ratings on eligible profiles and use them in our hospitality benchmarks.
License: Open Government Licence v3.0 · Official source
ONS Postcode Data (ONSPD)
2.6M+ postcodesUpdated: QuarterlyProvides geographic coordinates, parliamentary constituency, local authority, and Census Output Area codes for every UK postcode. Used for all map and density calculations.
License: Open Government Licence v3.0 · Official source
Yolist Platform Data
User reviews, photos & claimed profilesUpdated: Real-timeVerified reviews, business photos, claimed-profile data, inquiry logs, and user behaviour signals collected directly on our platform. All personal data is anonymised before research use.
License: Proprietary — Yolist Research use only · Official source
Quality standards
Deduplication
Companies House and FSA records are merged using a deterministic matching algorithm keyed on registered address postcode, business name (Levenshtein ≤ 2), and SIC code. Duplicate entries with dissolved status are removed.
Geocoding
Every business address is geocoded against the ONS Postcode Directory. Addresses that do not resolve to a current postcode are flagged for manual review before inclusion in geographic analyses.
Category normalisation
SIC-2007 codes are mapped to our internal category taxonomy (850+ categories). Where SIC codes are too broad (e.g. 47190 "other retail"), we apply an NLP classifier trained on business names to assign granular categories.
Survey weighting
Yolist consumer and business surveys are weighted to match ONS population estimates by age, gender, region, and business size band to reduce sample bias.
Outlier removal
Rate cards and pricing data are trimmed at the 2nd and 98th percentile before median calculation to remove data-entry errors and premium outliers.
Known limitations
- Our research covers England primarily. Scotland, Wales, and Northern Ireland may have different regulatory environments and are noted where they differ.
- Sole traders and partnerships that have not filed with Companies House are under-represented in registration counts. HMRC Self Assessment tables are used to estimate these populations.
- Consumer survey data reflects the views of Yolist users, who skew toward people who actively research businesses online. Non-digital consumers are under-represented.
- Pricing data is derived from publicly disclosed rate cards and may not represent the full market, particularly for bespoke or project-based work.
- FSA hygiene ratings reflect inspections at a point in time and may not reflect current hygiene standards.
- Historical comparisons pre-2019 rely on ONS and Companies House published statistics rather than our own data and should be treated as indicative.
Press enquiries
Journalists and researchers are welcome to use our data with attribution. For bespoke data cuts, embargo requests, or interview requests please contact our research team.
press@yolist.uk