not_unwise: ([tim] smilie)
Sophie ([personal profile] not_unwise) wrote2010-12-25 04:13 pm

Omnomnom MOAR stats

Yuletide 2010 Statistical Analysis of Fics' Lengths


Since I had fun making stats for Yuletide 2009 not that long ago, I decided to do the same for Yuletide 2010. I know it's very early after the reveal and people might edit and edit and edit for a week, still, at least, but I am going to assume people won't change their fics enough that it would completely change the average lengths of fics.

Also watch me copy-paste my 2009 post for all the blahblah. Yes, yes, see my delicious copypasta.

There were 2623 fics submitted for Yuletide 2010. Although 2623 is a smaller number than 2732 (which was the amount of 2009 fics), making statistics about 2623 values would still be long and I am a lazy person. My solution to this problem was to analyse two sets of data. First, the length of the fics by one-thousand-word intervals, and second, the length of a random sample of the 2010 fics.


1) Intervals

Intervals are mostly useful in this situation because you can make pretty graphs out of the data collected. We're in a deeply asymmetric situation with more than one outliers. Only one person wrote over 30,000 words this time, and seven over 20,000.

The detailed data of the number of fics in each thousand-word interval can be found here, in the first spreadsheet.

And since intervals are useful for the sake of pretty graphs, I made pretty graphs (except that they're not pretty, because, once again, I am a lazy person).



Every bar represents the number of fics of a certain length. The first bar on the left represents fics between 0k and 1k. The first bar on the right represents the number of fics between 32k and 33k. The highest bar is, predictably, the "between 1k and 2k" one, as nearly half the fanfictions are in that interval.

Random fact: once again, only 3% of the fanfictions were over 10,000 words.

The average calculated with the average of each interval is 2.99k. This average is being pulled up by the outliers, and this is made even more obvious if we make a box plot out of the data.

So, now, have a box plot (can you tell it was drawn in a minute?):


(Yes, it includes the outliers. Have I mentioned I am lazy?)

The informations about this box plot (in number of words):
Min: 212 (there are fics shorter than this listed, but they're mistakes and should be listed as longer...)
1st quartile: 1336
2nd quartile (median): 2040
3rd quartile: 3407
Max: 32,933

This means that:
-Half the fics were under 2040 words.
-Half the fics were between 1336 and 3407 words.
-A fourth of the fics were under 1336 words and a fourth over 3407.
-The ridiculously far outliers are pulling the average with them.


2) Random sample

I wanted a random sample with 100 fic lengths, so I decided to take the first 100 fics in alphabetical order of the titles. It is not a perfect choosing method or anything, but it's still a unbiased random sample, whereas choosing 100 fics after ordering them by date wouldn't have been since the data wouldn't have been independent.

This data is collected here too, for the interested, in the second spreadsheet.

The average of the sample is x=2914.69.
The standard deviation calculated considering this is a sample (s and not σ -- so with a division by [n-1]) is s=2155.7960

Knowing these two things means it's possible to calculate the average length of the Yuletide fics with a confidence interval, now (by approximating the average as a normal distribution N~[2914.69; 215.58]).

The average length of the 2010 Yuletide fics was 2914.69 ± 422.54 words, 95% of the time.

This means that my sample allows me to say that I am 95% sure that the real average of the fics' length is between 2492 and 3337 words.

And there! I won't detail the calculations that allow me to say this because they're relatively long and mostly boring, but even though it would seem the average length of fics is higher than last year's, analysing the data doesn't allow me to say that the difference is significant (at either 90% or 95%).


Happy Holidays and Happy Yuletide everyone!
jae: (yuletidegecko)

[personal profile] jae 2010-12-25 09:28 pm (UTC)(link)
So fun! Thank you for posting this. :)

-J
keerawa: Coyote in a dreamscape (Default)

[personal profile] keerawa 2010-12-25 09:45 pm (UTC)(link)
*laughs* OK, this brings joy to my geeky heart, box plots and all!

(Anonymous) 2010-12-25 11:23 pm (UTC)(link)
Woooo! Thanks again for posting the stats for this year! It's interesting to see the numbers! :D

[identity profile] literary-critic.livejournal.com 2010-12-26 01:03 am (UTC)(link)
I love you for doing this! Stastistics are wonderful.

(Also I feel good that the 15 000 word thing I saddled my poor recipient with wasn't the longest thing written. I kind of didn't realise that yuletide fics are usually short until after I went ahead and did it.)
ext_17970: (Default)

[identity profile] sheila-snow.livejournal.com 2010-12-26 01:40 am (UTC)(link)
Mine last year was 43,000 (yeah, I was one of the ridiculously long outliers). I tend to enjoy reading the longer fics myself, and I doubt I'm the only one out there who does. The important thing is that your recipient enjoys it!

[identity profile] literary-critic.livejournal.com 2010-12-26 05:33 am (UTC)(link)
I love long fic too, and apparently, am just wordy. Heh. Kudos on 43k! That's a solid effort at this time of year.

(And yeah, hope the recipient is a fan - for all the questions I bombarded him/her/them with, I totally forgot to ask about length preference... oops.)
robespierre: (Default)

[personal profile] robespierre 2010-12-26 05:05 am (UTC)(link)
I didn't even have to ask this time!
robespierre: (Default)

[personal profile] robespierre 2010-12-26 05:09 am (UTC)(link)
...The rest of that comment was going to say, "It's like you read my mind."

SOPHIE, STOPPIT.