So I just added a new analysis feature to the festival information pages – Similarity Measures. This allow you to at a glance understand how strongly or not a given festival is another using their shared films. Similarity is shown in two ways — by percentages and a log-likelihood score. The percentages are straight forward… they are a measure of the frequency of overlap as compared to the total number of films shown in the other festival. This measure is good, but is prone to distortion based on differing sizes of festivals. An overlap of 5 films with a small festival maybe significant, but the same overlap with a larger festival may not be. By random chance alone, you should expect larger festivals to have more overlap with any other festival, since they have more slots to potentially showcase overlapping films. Likewise, a larger festival should have a greater chance of sharing films with other festivals than a smaller festival occurring at the same time. Log-Likelihood solves this problem by calculating a statistical measure of similarity using a Chi-Squared distribution. This allows the significance of any similarity to be quantified and we can more readily understand what is a meaningful difference and what is not. Tomorrow, I will flesh out the use of this new tool. In the meantime, you should give it a good and tell me what you think.

