How the data is sourced, cleaned, and scored
No black box. SkyMind unifies fragmented government statistics for 845 regions and builds one transparent, re-weightable composite index. Every source is cited and every component is inspectable. If something doesn't add up, email us โ we welcome the scrutiny.
1. What we measure โ and what we don't
SkyMind takes government statistics that are normally scattered across dozens of national portals, in several languages, with inconsistent schemas and sometimes broken APIs โ and unifies them into one clean, comparable dataset for 845 regions across five countries. On top of that data, we compute a transparent composite index that summarises each region's economic, demographic and social profile into comparable scores.
This is a descriptive product. It shows the measured state of a region from official data. It is not a forecast, not a probability of any event, and not a rating of the future. A score that moves over time reflects a change in the underlying published statistics โ nothing more, nothing less.
2. The data & sources
Every figure traces back to an official, public source. We ingest no personal data โ Zero Personal Data Architecture, GDPR public-interest basis (Article 6).
| Country | Regions | Coverage | Sources |
|---|---|---|---|
| ๐ฉ๐ช Germany | 401 Kreise | 2015โ2026 | Eurostat, Destatis, INKAR/BBSR |
| ๐ฎ๐ฑ Israel | 255 municipalities | 2002โ2018 core | data.gov.il, CBS, audited financial reports |
| ๐ฆ๐ช UAE | 109 districts | 2015โ2026 | Dubai Land Department, Bayanat.ae, Bayut, World Bank |
| ๐ธ๐ฆ Saudi Arabia | 51 governorates | 2015โ2025 | GASTAT, KAPSARC, RCRC, World Bank |
| ๐ถ๐ฆ Qatar | 29 municipalities | 2015โ2025 | data.gov.qa, World Bank |
Total: ~1,100 metrics, ~2 million observations. Coverage and depth are not uniform โ we say so explicitly. Israel, for example, has a deep, fully-populated core for 2002โ2018; later years are partial because the underlying government datasets thin out. Where the source data is thin, the data is thin โ we do not paper over it.
3. How the composite index is built
The index is a standard, transparent composite โ the same family of method as the UN Human Development Index or city liveability rankings. Three steps:
- Normalise each metric to a 0โ100 scale (minโmax across all regions and years for that metric).
- Group metrics into three axes and average the normalised metrics within each: Economic, Demographic, Social / Infrastructure.
- Combine into a composite:
40% Economic + 30% Demographic + 30% Social/Infrastructure.
Which metric belongs to which axis, and the 40/30/30 weighting, are deliberate choices, not laws of nature. Change the weights and the ranking changes. That is why the index is re-weightable: the axis scores and every underlying metric are exposed in the API, so you can apply your own weights and judge the result yourself.
4. Data integrity โ no fabricated values
Scores are re-derived directly from the raw fact tables. Two rules make the dataset honest:
- No neutral defaults. If a region-year genuinely has no underlying data for an axis, that axis is left empty โ we do not fill it with a placeholder "50".
- No carry-forward. We never copy a prior year's value into a year that has no real data. A region-year receives a composite score only when all three axes have genuine underlying data; otherwise the available axes are shown and the composite is left empty.
This means our coverage looks smaller than a "fill every cell" approach would โ by design. An empty cell is more useful than a fabricated one.
5. Honest limitations
- It does not predict. No event timing, no crash calls, no election outcomes. It describes what the published statistics currently show.
- It does not explain causes. The index tells you a region scores low on the economic axis โ not why.
- The weighting is a choice. Treat the default composite as one transparent lens, not a verdict. Re-weight it for your use case.
- Coverage is uneven. Compare within a country and period with confidence; be careful comparing absolute levels across countries with different source datasets.
- The data is periodic. Most sources are annual. A short, sharp shock between releases will not show up until the next data point.
- It does not replace domain expertise. A real-estate analyst or municipal economist reads these numbers in context; without that context a score is just a number.
6. FAQ
Isn't this just aggregating CSV files?
The aggregation is the hard part, and we don't pretend otherwise. Government data for these five countries is fragmented across dozens of portals, in multiple languages, with inconsistent identifiers and frequently broken APIs. Cleaning it, geocoding it, translating metric names, reconciling schemas and keeping it current is months of unglamorous work โ and it is genuinely hard to reproduce. That labour, and the coverage breadth, is the product. We are not selling a model on top of it.
Can I reproduce a single region's score?
Yes. The API exposes every normalised metric, every axis score and the composite. Pick any region from /map/data, apply the normalisation and the 40/30/30 weights by hand, and you will match within rounding. No login, no API key (rate-limited).
Why these particular weights?
40/30/30 is a transparent default in line with how comparable composite indices balance economic and non-economic factors. It is not sacred. The API returns the axis scores separately precisely so you can re-weight, drop an axis, or build your own composite.
Do you predict prices, crises or migration?
No. We made a deliberate decision not to be in the prediction business. SkyMind shows what the official data currently says about a region. Any forecasting on top of that is the user's call, with their own assumptions.
If you find an error in the data, the sourcing or the index construction โ including small ones โ email us at info@sky-mind.com. We post methodological corrections publicly with credit to whoever found them.