References (3)


Footnotes (4)



The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity

Graham L. Giller

Bloomberg LP; Giller Investments

October 25, 2012

We summarize the statistical properties of statistics computed from independent random bitstreams including the commonly discussed support and cosine similarity. We derive the moments of the asymptotically normal approximation to the sampling distribution of the cosine similarity of independent random bitstreams and compare those computed moments to those measured by Monte-Carlo simulation. We find agreement for bitstreams of internet scale in length (i.e. of order 10,000 bits) and much smaller (100 and 10 bits) and demonstrate that the expected value of the cosine similarity of independent bitstreams might very significantly distant from zero. To compensate for this bias we propose a new statistic Support Adjusted Cosine Similarity or SACS.

Number of Pages in PDF File: 5

Keywords: collaborative filtering, cosine similarity, random bitstreams, sampling distribution, support, nested binomial distribution, Monte-Carlo simulation, delta method

Download This Paper

Date posted: October 26, 2012 ; Last revised: November 12, 2012

Suggested Citation

Giller, Graham L., The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity (October 25, 2012). Available at SSRN: http://ssrn.com/abstract=2167044 or http://dx.doi.org/10.2139/ssrn.2167044

Contact Information

Graham L. Giller (Contact Author)
Bloomberg LP ( email )
731 Lexington Avenue
New York, NY 10022
United States
HOME PAGE: http://www.bloomberg.com
Giller Investments ( email )
121 Red Hill Road
Holmdel, NJ 07733
United States
Feedback to SSRN

Paper statistics
Abstract Views: 398
Downloads: 101
Download Rank: 169,418
References:  3
Footnotes:  4

© 2015 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollo8 in 0.282 seconds