The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity

5 Pages Posted: 26 Oct 2012 Last revised: 12 Nov 2012

Date Written: October 25, 2012

Abstract

We summarize the statistical properties of statistics computed from independent random bitstreams including the commonly discussed support and cosine similarity. We derive the moments of the asymptotically normal approximation to the sampling distribution of the cosine similarity of independent random bitstreams and compare those computed moments to those measured by Monte-Carlo simulation. We find agreement for bitstreams of internet scale in length (i.e. of order 10,000 bits) and much smaller (100 and 10 bits) and demonstrate that the expected value of the cosine similarity of independent bitstreams might very significantly distant from zero. To compensate for this bias we propose a new statistic Support Adjusted Cosine Similarity or SACS.

Keywords: collaborative filtering, cosine similarity, random bitstreams, sampling distribution, support, nested binomial distribution, Monte-Carlo simulation, delta method

Suggested Citation

Giller, Graham L., The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity (October 25, 2012). Available at SSRN: https://ssrn.com/abstract=2167044 or http://dx.doi.org/10.2139/ssrn.2167044

Graham L. Giller (Contact Author)

Giller Investments ( email )

121 Red Hill Road
Holmdel, NJ 07733
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
312
rank
90,977
Abstract Views
1,007
PlumX Metrics