Adrien Hitz
University of Oxford
01.12.2016
Modeling Website Visits
In this talk, I will analyze a data set consisting of the number of hits to the 99 most visited websites in the United States. Modeling the random vector of visits seems challenging because its marginals are very heavy-tailed distributed, exhibit peaks at zero and are strongly dependent. It turns out that a simple model based on a censored multivariate normal distribution with marginals transformed to be discrete Pareto IV accurately describes the observations. Following the ideas of Gaussian graphical models, we will see how to reduce dimensionality and visualize the dependence structure as a graph.
Preprint link: https://arxiv.org/abs/1611.01024