header

Characterising Dataset Search – An Analysis of Search Logs and Data Requests

27 Pages Posted: 19 Nov 2018 First Look: Accepted

See all articles by Emilia Kacprzak

Emilia Kacprzak

University of Southampton

Laura Koesten

University of Southampton

Luis-Daniel Ibanez

University of Southampton

Tom Blount

University of Southampton

Jeni Tennison

Open Data Institute

Elena Simperl

University of Southampton - School of Electronics and Computer Science (ECS)

Abstract

Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.

Keywords: Dataset Search, Vertical Search, Search Logs

Suggested Citation

Kacprzak, Emilia and Koesten, Laura and Ibanez, Luis-Daniel and Blount, Tom and Tennison, Jeni and Simperl, Elena, Characterising Dataset Search – An Analysis of Search Logs and Data Requests (November 19, 2018). Available at SSRN: https://ssrn.com/abstract=3287149 or http://dx.doi.org/10.2139/ssrn.3287149

Emilia Kacprzak (Contact Author)

University of Southampton

University Rd.
Southampton SO17 1BJ, Hampshire SO17 1LP
United Kingdom

Laura Koesten

University of Southampton

University Rd.
Southampton SO17 1BJ, Hampshire SO17 1LP
United Kingdom

Luis-Daniel Ibanez

University of Southampton

University Rd.
Southampton SO17 1BJ, Hampshire SO17 1LP
United Kingdom

Tom Blount

University of Southampton

University Rd.
Southampton SO17 1BJ, Hampshire SO17 1LP
United Kingdom

Jeni Tennison

Open Data Institute

65 Clifton St
Floor 3,
London, EC2A 4JE
United Kingdom

Elena Simperl

University of Southampton - School of Electronics and Computer Science (ECS) ( email )

University Road
Southampton
United Kingdom

Register to save articles to
your library

Register

Paper statistics

Abstract Views
404
Downloads
46