One day not too long ago I decided to read the entire list of common misconceptions on Wikipedia. Since then it has unfortunately been split up into multiple separate articles, but at the time it was one exceedingly long page that topped out around some 25,000 words and took me at least two hours to work my way through.
How did I get that 25,000 number? I certainly didn't count them myself! After trying a tool I found hosted on Wikipedia that didn't work, I decided to write my own ⸺ as a userscript, of course, because that's what I do. How hard could it be, right?
Well, it wasn't that hard in the end, but it did require a lot of back and forth to figure out what all should and shouldn't be included in the count. For instance, including all the references and external links is a sizable inflation of the end number. Some tables within articles are fine but others, like those that point out issues with an article or a section, need to go.
I'm expecting this script to see minor updates here and there as I find more things within the article body section that should be skipped. Here's the core of the operation:
for (element of content.querySelectorAll(':scope > *')) {
if (! element.checkVisibility()) continue;
if (element.matches('table.metadata')) continue;
if (element.matches('table.nomobile')) continue;
if (element.querySelector('#External_links')) break;
if (element.querySelector('#Further_reading')) break;
if (element.querySelector('#References')) break;
if (element.querySelector('#See_also')) break;
cleaned += element.textContent.trim();
}
cleaned = cleaned.replaceAll('[edit source]', ' ');
cleaned = cleaned.replaceAll(/\[\d+?\]/gm, ' ');
cleaned = cleaned.replaceAll(/\s+?/gm, ' ');
Everything that gets past the core cleanup operations is stuffed into a single giant string, split it on spaces, and the length of that array is added to the top of the page as an estimated word count.
On the page for viridian, which I pulled up because the bass solo of the song stuck in my head is titled as such, the top bar under the title will say Article Talk ~868 words while this script is active. The actual number may be different any time after I wrote that sentence, of course.
If you'd like to run this userscript yourself, you can check out its code at GitHub using the comically large button below.