Making comparisons between data from multiple studies is delicate business. When I saw destntoast
's post about the Wattpad stats, my experimentalist alarm bells started ringing because the numbers on each archive come from very different subsets of their users.
Sampling is a really important topic in empirical work. In most research, we can't measure all the things, so we do our best to measure some of the things (a sample or subset) in such a way that these fewer measurements reflect the true distribution across all the things.
I could talk about sampling for ages, so to keep the focus, this post specifically discuss the information about user ages. However, the same or similar factors matter for other kinds of demographics too.
Above the cut, I've tried to explain the sources of user ages from each archive, to the best of my knowledge. Below the cut, I'll get into their expected consequences on the relationship between these numbers and the true distribution of user ages.AO3 census
The information about the users of AO3 come from an anonymous survey conducted by Centrum Lumina in 2013. It was posted on tumblr, got widely reblog, and a remarkable 10,000 people submitted information on their identities and their uses of the archive, whether or not they had accounts. Of those who participated, 99.9% reported their ages.
To put this number in perspective, nearly 4000 of those who answered reported archiving works on AO3. From my posting rates data, a quick and dirty estimate puts that at around 1% of people who had, by that time, posted works to the archive. If this was a random sample of archiving users of AO3, we could expect a margin of error of +/- ~1.5% on a lot of the resulting stats on that subset of respondents. And if the other types of users are represented in the same proportion, their numbers would have a similar degree of accuracy. But both of those ifs are pretty big, as carefully acknowledged by the diligent Centrum Lumina.FFNet User Accounts
The demographic data about FFNet comes from a (well sampled) subset of new user accounts over the year 2010. This is NOT a sample of people who use this site like the AO3 census: you don't need an account to read fic on FFNet or submit reviews, and we don't know how this sample of new users in 2010 represent those who were active at that time but joined before 2010.
The data on user ages comes the profiles of this sample of new user accounts. Specifically, the numbers are taken from the 9% of these new users who chose to declare their age in the text of their public profiles. Yeah. The stats reported on the FFNet analysis (the margin of error) treats this source of numbers as a random sampling of new users ages, but, as I'll get into later, there are so many reasons for that not to be the case. Wattpad numbers
To be honest, I don't know where the numbers on Wattpad user ages comes, but here is my best guess. I don't think they are from an independent anonymous survey of people who use the archive (like the AO3 census). I googled for such a thing and found nothing: were these numbers from a survey, it can't have been very big.
However, these numbers come straight from Wattpad itself, so it's more likely they are from user accounts like the FFNet data. Note: Whether or not they post content, the users of Wattpad are strongly encouraged to have accounts by both the website interface and functionality, much more so than FFNet or AO3. It is possible to read stories on Wattpad without an account, but you have to click past pop up windows and start from a directly link to a work, rather than the splash page of www.wattpad.com.
Anyway, users with accounts on Wattpad have the option of reporting their date of birth on their user profiles in a dedicated date-feild. They can further opt to make that information visible on their public profile. If they are from user accounts, the numbers passed on by Emily might reflect the ages across all accounts reporting date of birth or some selected subset, though what kind subset I couldn't say. So what?
The comparisons between the archives' user ages data in destntoast
's Wattpad post sent me into a bit of a panic because each set of numbers is likely to be biased for different reasons and in different ways. Given their origins, I'd bet that:
1. the AO3 ages are either pretty accurate or skew a little older than the true population
2. the FFNet ages skew substantially younger than the true population
3. the Wattpad data could skew younger, older, or towards the late teens (specifically over seventeen) depending on how account holders expect Wattpad to use their profile details, and how/whether the Wattpad user accounts were sampled.
Below the cut I get into the hypothesized factors behind these expectations.( Read more...Collapse )
Edits warning: I will undoubtably make edits, but they will probably be trivial orthographic corrections. If some sentence is unparseable, my dyslexic self will probably never notice, so please point it out.