VIP Hashing-Adapting to Skew in Popularity of Data on the Fly

Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, Kwanghyun Park

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching and code complexity, among other factors. These factors make learning in-the-loop challenging as the overhead of performing additional operations can have significant impact on performance. In this paper, we propose VIP hashing, a hash table method that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout on the fly. These mechanisms are non-blocking, i.e, the hash table is operational at all times. The overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the mechanisms as needed. We ran extensive tests against a host of workloads generated by Wiscer, a homegrown benchmarking tool, and we find that VIP hashing improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with 1M keys under low skew) while adapting to insert and delete operations, and changing popularity distribution of keys on the fly. Our experiments on DuckDB show that VIP hashing reduces the end-to-end execution time of TPC-H query 9 by 20% under low skew.

Original languageEnglish
Pages (from-to)1978-1990
Number of pages13
JournalProceedings of the VLDB Endowment
Volume15
Issue number10
DOIs
Publication statusPublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: 2022 Sept 52022 Sept 9

Bibliographical note

Publisher Copyright:
© 2022, VLDB Endowment., All rights reserved.

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'VIP Hashing-Adapting to Skew in Popularity of Data on the Fly'. Together they form a unique fingerprint.

Cite this