The Digital Layer: How Innovative Firms Relate on the Web

13 Pages Posted: 4 Feb 2020 Last revised: 7 Feb 2020

See all articles by Miriam Krüger

Miriam Krüger

Technische Universität Berlin (TU Berlin)

Jan Kinne

Centre for European Economic Research (ZEW)

David Lenz

Justus Liebig University Giessen

Bernd Resch

University of Salzburg

Date Written: 2020


In this paper, we introduce the concept of a Digital Layer to empirically investigate inter-firm relations at any geographical scale of analysis. The Digital Layer is created from large-scale, structured web scraping of firm websites, their textual content and the hyperlinks among them. Using text-based machine learning models, we show that this Digital Layer can be used to derive meaningful characteristics for the over seven million firm-to-firm relations, which we analyze in this case study of 500,000 firms based in Germany. Among others, we explore three dimensions of relational proximity:

(1) Cognitive proximity is measured by the similarity between firms’ website texts.

(2) Organizational proximity is measured by classifying the nature of the firms’ relationships (business vs. non-business) using a text-based machine learning classification model.

(3) Geographical proximity is calculated using the exact geographic location of the firms.

Finally, we use these variables to explore the differences between innovative and non-innovative firms with regard to their location and relations within the Digital Layer. The firm-level innovation indicators in this study come from traditional sources (survey and patent data) and from a novel deep learning-based approach that harnesses firm website texts. We find that, after controlling for a range of firm-level characteristics, innovative firms compared to non-innovative firms maintain more numerous relationships and that their partners are more innovative than partners of non-innovative firms. Innovative firms are located in dense areas and still maintain relationships that are geographically farther away. Their partners share a common knowledge base and their relationships are business-focused. We conclude that the Digital Layer is a suitable and highly cost-efficient method to conduct large-scale analyses of firm networks that are not constrained to specific sectors, regions, or a particular geographical level of analysis. As such, our approach complements other relational datasets like patents or survey data nicely.

Keywords: Web Mining, Innovation, Proximity, Network, Natural Language Processing

JEL Classification: O30, R10, C80

Suggested Citation

Krüger, Miriam and Kinne, Jan and Lenz, David and Resch, Bernd, The Digital Layer: How Innovative Firms Relate on the Web (2020). ZEW - Centre for European Economic Research Discussion Paper No. 20-003, Available at SSRN:

Miriam Krüger

Technische Universität Berlin (TU Berlin) ( email )

Straße des 17
Juni 135
Berlin, 10623

Jan Kinne (Contact Author)

Centre for European Economic Research (ZEW) ( email )

P.O. Box 10 34 43
L 7,1
D-68034 Mannheim, 68034

David Lenz

Justus Liebig University Giessen ( email )

Licher Str. 64
Giessen, 35394

HOME PAGE: http://

Bernd Resch

University of Salzburg ( email )

Akademiestraße 26
Salzburg, Salzburg 5020

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics