DataLake and the DataIndex: Catalysts in a Search Evolution

Posted: 28 Jan 2021

Date Written: November 4, 2020

Abstract

Weighing in at over 150 terabytes and 6 billion documents, the LexisNexis DataLake is a serverless data storage system with superior reliability that is capable of executing at a massive scale and at a reasonable cost. The LexisNexis DataIndex is a sub-asset to the DataLake that indexes every object that the DataLake stores. Built on Elasticsearch, DataIndex provides a way to find any of DataLake’s objects as well as provide key metrics on the DataLake’s size, collection ingestion rate, and the response latency in DataLake’s Eventual Consistency architecture. The DataLake and DataIndex allow LBUs to concentrate on content and product development rather than storage, but what is even more valuable is what you, the user, could do with it.

In this talk we will discuss the strategic applications of these two assets and their untapped potential to improve search and knowledge retrieval including content component-driven product development enabled by DataLake and DataIndex.

Keywords: Search Content Optimization, Data Lake, Cloud Storage, Elasticsearch

Suggested Citation

Heitkamp, Doug and Rosenoff, Doug, DataLake and the DataIndex: Catalysts in a Search Evolution (November 4, 2020). Proceedings of the 4th Annual RELX Search Summit, Available at SSRN: https://ssrn.com/abstract=3774375

Doug Heitkamp (Contact Author)

LexisNexis ( email )

P. O. Box 933
Dayton, OH 45401
United States

Doug Rosenoff

LexisNexis ( email )

P. O. Box 933
Dayton, OH 45401
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
51
PlumX Metrics