Lies, damned lies, and statistics

At first glance this looks like another puff piece about how Wikipedians are always improving. But look again. According to this chart, there are just as many if not more bot edits than human edits.


But see what they did there?  The bot edits are artificially divided into “group bots” and bots with “bot” in the name. Don’t all bots have “bot” in the name? (Ah, yes, they do except for “CommonsDelinker”, last edit Mar 31, 2018, and “D6”, last edit Mar 30, 2014. Hmm, no raw numbers.)

The result is to make it look like there are more human edits than bot edits, with only occasional bot bursts.  But what if you add all the bots together in the chart?

Looks more like the bots are taking over.

But this is all freehand, it is not even based on data. And the original chart is not sourced either. Chance are it came from somewhere here on these analytics pages, but how would you reproduce it, much less do your own data? (Oh look over there: SPARQL!)

In that case, what about the ACTRIAL trial, started September 14, 2017, ended March 14, 2018. Here is “new articles per day” for English Wikipedia, with some colors added for comparing ACTRIAL months with each other.

Mar 2018 1209986 6102 32946 3698 5.6 M 723 97.9 3.7 M 8.2 M
Feb 2018 1203884 4988 29491 3350 5.6 M 726 97.7 3.4 M 8.2 M
Jan 2018 1198896 5483 31584 3713 5.6 M 722 97.4 3.6 M 8.1 M
Dec 2017 1193413 5322 30072 3434 5.5 M 644 97.2 3.4 M 8.1 M
Nov 2017 1188091 6043 31678 3484 5.5 M 743 96.9 3.3 M 8.1 M
Oct 2017 1182048 5824 31080 3512 5.5 M 615 96.7 3.3 M 8.0 M
Sep 2017 1176224 5029 29243 3336 5.5 M 639 96.4 3.3 M 8.0 M
Aug 2017 1171195 5656 30112 3493 5.5 M 640 96.2 3.4 M 7.9 M
Jul 2017 1165539 5504 29983 3518 5.4 M 679 95.9 3.4 M 7.9 M
Jun 2017 1160035 5345 29406 3430 5.4 M 658 95.6 3.7 M 7.8 M
May 2017 1154690 6187 31820 3540 5.4 M 664 95.3 3.5 M 7.8 M
Apr 2017 1148503 6421 32060 3525 5.4 M 737 95 3.4 M 7.7 M
Mar 2017 1142082 7138 34010 3582 5.4 M 873 94.8 3.7 M 7.7 M
Feb 2017 1134944 5922 30539 3376 5.3 M 754 94.6 3.3 M 7.7 M
Jan 2017 1129022 6104 32043 3728 5.3 M 796 94.3 3.6 M 7.6 M
Dec 2016 1122918 5904 30456 3390 5.3 M 811 94.1 3.5 M 7.6 M
Nov 2016 1117014 6067 30440 3272 5.3 M 717 93.9 3.5 M 7.6 M
Oct 2016 1110947 5723 29721 3395 5.2 M 727 93.6 3.6 M 7.5 M
Sep 2016 1105224 5423 29195 3338 5.2 M 833 93.3 3.4 M 7.5 M
Aug 2016 1099801 5600 29548 3449 5.2 M 760 93.1 3.3 M 7.4 M
Jul 2016 1094201 5436 29162 3324 5.2 M 733 92.9 3.2 M 7.4 M
Jun 2016 1088765 5472 29063 3238 5.1 M 697 92.7 3.3 M 7.4 M
May 2016 1083293 6161 30777 3418 5.1 M 841 92.4 3.6 M 7.3 M
Apr 2016 1077132 6157 30820 3365 5.1 M 767 92.2 3.4 M 7.3 M
Mar 2016 1070975 6347 31736 3510 5.1 M 785 91.9 3.6 M 7.2 M
Feb 2016 1064628 5796 29838 3278 5.0 M 813 91.7 3.4 M 7.1 M
Jan 2016 1058832 5864 31051 3545 5.0 M 818 91.4 3.4 M 7.1 M
Dec 2015 1052968 5573 29397 3335 5.0 M 746 91.2 3.0 M 7.0 M
Nov 2015 1047395 6176 30581 3376 5.0 M 776 91 3.1 M 7.0 M
Oct 2015 1041219 6036 30579 3421 4.9 M 762 90.8 3.2 M 7.0 M
Sep 2015 1035183 5745 29526 3399 4.9 M 709 90.6 3.1 M 6.9 M

Kind of hard to see.

Looks like fewer articles right? So that’s good, right? Not so much work for patrollers? Maybe they can’t make a patroller bot. So how many of these articles were deleted? Were they useful articles? Can’t tell.

This page might make it easier: New articles per day by language. I have made the English wiki in red so you can see it.

Σ en ru de es
Mar 2018 6506 723 236 294 265
Mar 2017 9918 873 235 317 253
Mar 2016 9468 785 250 331 290
Mar 2015 10132 829 315 322 215
Mar 2014 7314 807 272 316 214
Mar 2013 25589 841 404 347 234
Mar 2012 8884 855 397 462 262
Mar 2011 7563 906 531 417 551
Mar 2010 7365 1015 400 466 40
Mar 2009 7433 1643 309 440 295
Mar 2008 7945 1472 540 523 327

So there were only 723 article per day this March, but ten years ago there were twice as many. In fact, there have not been this few new articles since 2004.

viking kittens overlordsSo Wikipedia is dying? And they are trying to hide it by counting bot edits as user edits?

You know, I can’t tell, and there isn’t enough data here to find out. But something here just doesn’t feel right. And you know what they say about that.

I for one welcome our new bot overlords.
https://www.albinoblacksheep.com/flash/vikingkittens
https://coub.com/view/3he4j

//coub.com/embed/3he4j?muted=false&autostart=false&originalSize=false&startWithHD=false

UPDATE:

As it turns out, the bot uprising theory is premature. Here is the chart for enwiki and meta.

enwiki edits by user type meta edits by user type

Also the new edits per month.  You can see the yearly dip in June.  But at this scale you can’t see that there are at least 100 fewer new articles per day compared to the previous year.
new articles

And my favorite, content vs.. non-content edits. How much is dramah.

enwiki by page type

Advertisements

2 thoughts on “Lies, damned lies, and statistics

  1. Chart IS sourced, and clicking on its source brings to here where you can replicate or create your own: https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors

    As for definitions, https://meta.wikimedia.org/wiki/Research:Wikistats_metrics#Common_Terms says:

    “Group Bot: logged in users that are in the “bot” user_group

    Name Bot: logged in users whose name contains `bot`. These users have a high probability of being a bot, even if counter examples exist”

    It doesn’t sound 100% clear to me whether one category is included within the other (user with “bot” in their username probably all have a bot flag, etc.)

  2. Okay so if you click the tiny “Wikimedia” in the lower left corner, it takes you to the page where you can experiment with the chart.

    “New Pages” does have the deletions subtracted. https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/New_pages So you can’t deduce anything about ACTRIAL from this, but whatever, we knew it was a done deal from the beginning.

    And the chart shown in the article is for all wikis, including the tiny language ones that are mainly populated by bots.

    Enwiki is not so bot-driven, there are about an equal number of IP edits as bots and the great majority are users. For example a mouseover for April 2018 shows user 3,412,357; anonymous 804,097; group bot 718,477; and name-bot 3,362. https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/edits

    The “name-bot” thing seems more legacy, maybe they were individually developed bots that were taken over by someone else after the developer left the project? Or maybe it reflects a change in the bot approval process?

    My favorite metric is now the dramah chart, “content” vs. “non-content”. https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors

    Kinda miss the idea of the bot takeover though, a bot utopia might have been interesting.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s