Die Suchmaschine für Unternehmensdaten in Europa
UK-Förderung (351.190 £): Ein Atlas der wirtschaftlichen Aktivitäten in Großbritannien: Nutzung von Web-Archiven für die sozialwissenschaftliche Forschung Ukri01.06.2024 Forschung und Innovation im Vereinigten Königreich, Großbritannien
Auf einen Blick
Text
Ein Atlas der wirtschaftlichen Aktivitäten in Großbritannien: Nutzung von Web-Archiven für die sozialwissenschaftliche Forschung
| Zusammenfassung | This project will use the web, one of the largest sources of digital footprints data, to create a granular and dynamic typology of economic activities in the UK. By doing that, we will exemplify the value of the web as an untapped source of digital footprints data and create tools for the broader social science community to utilise web data. Websites are archetypal digital footprints data: they are born digital data positioned at the core of what we understand as the internet; they are geospatial as 70% of all websites contain some place reference; they are commercial and transactional since they capture information - often self-reported - about various entities, from individuals to firms and third sector organisations; and they are unstructured, containing textual and visual information, among other things. Despite the utility of web data for social science research, the usage of such rich and big textual data is hindered by a lack of easy-to-access data and relevant tools. This project will develop the computational tools that are needed to utilise web data at scale from web archives and, specifically the Common Crawl (CC), to answer social science research questions that traditional data sources do not allow us to answer (RQ1). It will address calls for the development of tools beyond the traditional social science toolkit to allow social scientists to access and analyse digital footprints web data. It will then create a dynamic and flexible typology of economic activities in the UK (RQ2). By analysing self-descriptions of economic activities on business websites, this project will produce typologies of economic activities that are rich in terms of content and their reach extends beyond small case studies. Moreover, we will map and model the spatial footprints and the dynamics of economic activities in the UK (RQ3). By geolocating and observing commercial websites over time we will expose the dynamics of economic activities: from stable industrial clusters to emerging economic activities and their geographies. It will also assess potential biases associated with archived web data (RQ4). Just like non-digital archives, web archives do not archive everything - be it all public websites (archival extend) or all webpages within a website (archival depth). The project will deliver a data product, a dynamic inventory of commercial websites, including their URLs, timestamps, associated geolocations and typologies of economic activities - the Atlas of Economic Activities in the UK. We will design our code so that it can incorporate past and future versions of the CC data. Due to the UK legislation, we cannot openly provide the web data and text we will mine from the CC. Instead, the data product will include a workflow to the CC for other researchers to mine the content of archived websites of interest. We will collaborate with the Consumer Data Research Centre (CDRC) to produce an interactive visualisation (web map). The Atlas does not aim to replace SIC codes, but instead to complement them by providing a dynamic and flexible typology economic of activities. Researchers and policy makers interested in the distribution and evolution of economic activities will directly benefit as they will obtain a detailed understanding of the (co)location of economic activities even at the building level and over time. We will openly disseminate the code we will develop in a small library and reproducible notebooks. We expect our tools to be used by other researchers (1) interested in business-related questions who will directly use our code to mine commercial websites, e.g. for tracing R&D and innovation activities; and (2) who want to analyse other subsets of the web such as UK governmental websites (.gov.uk) or other country code top-level domain (ccTLD - .de) to answer substantive questions within their research domains. |
| Kategorie | Research Grant |
| Referenz | ES/Y01054X/1 |
| Status | Active |
| Laufzeit von | 01.06.2024 |
| Laufzeit bis | 28.02.2026 |
| Fördersumme | 351.190,00 £ |
| Quelle | https://gtr.ukri.org/projects?ref=ES%2FY01054X%2F1 |
Beteiligte Organisationen
| University of Bristol | |
| The Alan Turing Institute | |
| British Library | |
| Common Crawl Foundation |
Die Bekanntmachung bezieht sich auf einen vergangenen Zeitpunkt, und spiegelt nicht notwendigerweise den heutigen Stand wider. Der aktuelle Stand wird auf folgender Seite wiedergegeben: University of Bristol, Bristol, Großbritannien.
Die Visualisierungen zu "University of Bristol - UK-Förderung (351.190 £): Ein Atlas der wirtschaftlichen Aktivitäten in Großbritannien: Nutzung von Web-Archiven für die sozialwissenschaftliche Forschung"
werden von
North Data
zur Weiterverwendung unter einer
Creative Commons Lizenz
zur Verfügung gestellt.