FANDOM


External link

Of approximately 18,000 English wikis created between 22 July and 31 August, 3,468 (19.2%) of them contained at least one external link in mainspace or user blog space. All 1161 spam wikis contained an external link (100%), and 2307 of 17,000 presumed "good" wikis contained an external link (13.6%). The rest of the report considers only wikis with external links, since the number of spam wikis without an external link is negligible.

FeatureSpam%Good%Factor
0/500 external link100%13.57%7.37
0/500 no external link0%85.25%0

Of the 3,468 wikis with an external link in namespace 0 or 500, this shows the breakdown of the number of external links in that namespace.

CountSpam%Good%Factor
155.47%33.16%1.673
220.24%13.26%1.526
38.79%7.20%1.221
41.81%5.68%0.319
51.89%4.20%0.451
61.38%3.16%0.436
7+10.42%32.90%0.317

File:Spamwikis external link count.png

Time after creation

Defined as the time from User:Default's last edit to the last edit on the first page with an external link.

Cumulative
Time (m) <Spam%Good%Factor
131.35%2.47%12.689
244.01%3.99%11.037
353.83%5.94%9.065
462.88%7.76%8.104
568.30%9.32%7.329
671.49%10.36%6.901
775.62%11.05%6.842
878.73%12.31%6.395
980.79%13.05%6.192
1082.52%14.09%5.857
1183.63%15.30%5.466
1284.58%16.08%5.260
1385.70%17.08%5.018
1486.56%17.86%4.847
1587.94%18.86%4.664
1688.54%19.94%4.441
1789.66%20.37%4.401
1890.09%20.85%4.321
1990.70%21.24%4.270
2090.96%21.89%4.155
2191.39%22.45%4.070
2291.90%23.06%3.985
2391.90%23.32%3.941
2492.25%23.67%3.898
2592.51%24.40%3.791
2692.76%24.92%3.722
2793.11%25.14%3.704
2893.54%25.66%3.645
2993.71%26.09%3.591
3093.97%26.44%3.554
3194.06%26.92%3.494
3294.14%27.22%3.458
3394.32%27.65%3.410
3494.40%27.92%3.382
3594.49%28.26%3.343
3694.75%28.78%3.292
3794.75%29.13%3.253
3894.92%29.65%3.201
3995.18%30.04%3.168
4095.26%30.56%3.117
4195.26%30.86%3.087
4295.26%31.17%3.057
4395.43%31.60%3.020
4495.61%31.90%2.997
4595.61%32.29%2.961
4695.69%32.77%2.920
4795.87%33.07%2.899
4895.87%33.46%2.865
4995.87%33.72%2.843
5095.87%33.98%2.821
5195.87%34.37%2.789
5296.04%34.50%2.783
5396.12%34.72%2.769
5496.12%35.11%2.738
5596.12%35.28%2.724
5696.21%35.41%2.717
5796.30%35.67%2.699
5896.30%35.98%2.677
5996.38%36.19%2.663
6096.38%36.45%2.644
613.62%63.55%0.057
Range (m)Spam%Good%Factor
131.35%2.47%12.689
212.66%1.52%8.346
39.82%1.95%5.034
49.04%1.82%4.968
5-68.61%2.60%3.312
7-87.24%1.95%3.709
9-103.79%1.78%2.132
11-122.07%1.99%1.037
13-141.98%1.78%1.115
15-194.13%3.38%1.223
20-241.55%2.43%0.639
25-291.46%2.43%0.603
30-391.46%3.94%0.371
40-490.69%3.68%0.187
50-590.52%2.47%0.209
60+3.62%63.81%0.057

File:Spamwikis time after creation.png

Wiki category

CategorySpam%Good%Factor
Finance3.62%0.09%41.729
Food and Drink3.36%0.30%11.071
Lifestyle14.13%1.43%9.875
Philosophy1.12%0.22%5.166
Travel2.67%0.61%4.400
Green1.81%0.48%3.794
Technology9.47%3.25%2.914
Auto2.76%1.00%2.765
Science2.67%1.26%2.124
Sports4.13%2.12%1.947
Politics1.81%1.00%1.814
Toys0.95%0.78%1.214
Entertainment38.50%39.97%0.963
Education4.74%5.72%0.828
Humor0.95%1.39%0.683
Creative3.53%7.63%0.463
Music0.86%3.64%0.237
Gaming2.93%29.09%0.101
Wikia0.00%0.04%0.000

File:Spamwikis category.png

Wiki creation hour

The hour (UTC) in which the wiki was created. Hour 0 is midnight to 1:00, hour 1 is 1:00 to 2:00, et cetera.

HourSpam%Good%Factor
01.25%3.67%0.341
12.19%4.62%0.474
22.40%4.74%0.506
32.92%3.43%0.852
46.26%2.79%2.243
59.18%3.55%2.587
610.84%3.71%2.926
79.18%3.63%2.530
88.03%3.39%2.370
98.34%3.55%2.352
1010.11%4.58%2.207
115.63%3.67%1.536
125.53%4.11%1.346
132.82%4.18%0.673
142.19%4.74%0.462
151.77%4.62%0.383
161.36%5.02%0.270
171.77%5.74%0.309
182.09%4.62%0.451
190.94%4.70%0.200
201.67%4.90%0.340
211.15%4.66%0.246
221.36%4.11%0.330
231.04%3.27%0.319

File:Spamwikis creation hour.png

Day of week

DaySpam%Good%Factor
Sun6.78%11.72%0.578
Mon14.60%14.11%1.035
Tue15.85%16.58%0.956
Wed15.75%15.30%1.029
Thu18.46%14.51%1.272
Fri17.31%15.58%1.111
Sat11.26%12.20%0.923

File:Spamwikis creation date.png

Country of origin

This compares the country in which the wiki was created (found via CheckUser and ip-api.com). Obviously I can't CheckUser a few thousand safe wikis just to get a baseline of the wiki creation breakdown, so I used the stats for wikia.com on Quantcast, averaging the unique and pageview percentages because I really have no idea which would be a better indicator of wiki creation. The Quantcast stats may or may not be reliable, but they match my initial assumptions fairly well. Surprisingly or not, the four main South Asian countries (Bangladesh, India, Pakistan, Philippines) make up about 60% of the spam wikis even though they only account for 3-5% of total Wikia traffic. Based solely on a Bayesian interpretation of this data, the fact that a wiki has an external link and was made in India gives it a \frac{7% \times 100% \times 46.02%}{7% \times 100% \times 46.02% + 93% \times 13.57% \times 0.94%} = 96.45% chance of being spam. My country data would have to be off by a factor of 6 for that chance to be below 80% in reality.

CountrySpam%Good%Factor
BANGLADESH3.93%0.05%78.636
INDIA46.02%0.94%48.952
PAKISTAN4.57%0.21%21.757
MACEDONIA0.45%0.05%8.921
CHINA2.08%0.49%4.248
PHILIPPINES7.89%2.13%3.703
VIETNAM1.73%0.49%3.521
SWITZERLAND0.42%0.28%1.532
ROMANIA0.68%0.49%1.396
JAPAN0.60%0.72%0.838
INDONESIA0.94%1.23%0.771
UNITED KINGDOM3.79%5.75%0.660
SINGAPORE0.47%0.74%0.644
TURKEY0.42%0.68%0.622
UNITED STATES16.14%30.77%0.524
NETHERLANDS0.55%1.14%0.483
MALAYSIA0.39%0.90%0.437
AUSTRALIA1.08%2.62%0.412
FRANCE0.55%2.27%0.243
CANADA1.02%4.51%0.227
GERMANY0.76%3.80%0.200
SPAIN0.37%1.91%0.192
OTHER4.78%34.35%0.139
MEXICO0.37%3.51%0.105

File:Spamwikis creation country.png

Template presence

Whether or not there is a template of some kind on the suspect page.

FeatureSpam%Good%Factor
Template0.071%15.72%0.0045
No template99.93%84.28%1.186

Keywords

Note: keywords are not part of the filter yet.

Looks at the number of times a keyword shows up in spam wikis versus non-spam wikis. I looked at 5624 spam wikis and 2206 non-spam wikis, and then ignored 338 specific Romanian spam wikis that were throwing off the top 250 list. To "normalize" the factors, I added one to each word bucket before dividing, so there would be no division by zero. Clearly a lot of the top non-spam ones have to do with formatting and regular wiki-related things -- there isn't going to be a spam wiki that uses an infobox.

Highest spam words
Word#Spam#GoodFactor
affordable294241.037
investing97040.898
investors85035.890
reputable84035.473
buyers158133.178
loans76032.134
seo144130.256
invest130127.335
readily122125.666
sellers60025.457
excellence58024.622
advisable57024.205
vitamins54022.953
flooring53022.536
homeowners52022.118
expenses104121.910
firms155221.701
coupon51021.701
conditioning50021.284
convenience99120.866
fanduel49020.866
investment194320.345
customers5331020.259
optimization96120.240
clients381719.927
providers190319.927
ought278519.406
vitamin45019.197
mubuy45019.197
pros45019.197
retailer45019.197
feasible90118.988
supplements90118.988
financing44018.780
convenient88118.571
deposit43018.362
medications43018.362
hassle85117.945
accessibility42017.945
guarantees41017.528
testimonials41017.528
uncomplicated41017.528
upkeep41017.528
alternatives81117.110
discounts80116.902
dietary39016.693
lessen39016.693
portfolio39016.693
savings78116.484
garcinia38016.276
obligation38016.276
fats37015.858
marketers37015.858
satisfaction110215.441
intake36015.441
irrespective36015.441
lawyers71115.024
undoubtedly70114.815
discount104214.607
knowledgeable69114.607
contemplate34014.607
fiber34014.607
finances34014.607
resorts34014.607
anticipate33014.189
dwelling33014.189
homeowner33014.189
lenders33014.189
stressful33014.189
precisely99213.911
residential132313.876
moisture32013.772
provider194513.563
dependable64113.563
tissue64113.563
advantages193513.494
buyer63113.355
profitable63113.355
expertise186513.007
solutions3721112.972
bridal30012.937
cambogia30012.937
carpets30012.937
jakarta30012.937
lender30012.937
mattress30012.937
sensible60112.729
contractor59112.520
prospects59112.520
cleanse29012.520
conveniences29012.520
mindful29012.520
neglect29012.520
certainly6272012.480
essential3831212.327
treatments117312.311
suppliers86212.103
pricing57112.103
clientele28012.103
credibility28012.103
ecommerce28012.103
metacafe28012.103
servicing28012.103
workouts28012.103
consumers229711.998
acquiring113311.894
utilized197611.804
prices279911.685
satisfied111311.685
bargain27011.685
entrepreneurs27011.685
gadget27011.685
louboutin27011.685
referrals27011.685
uld27011.685
opt110311.581
factors248811.546
supplier82211.546
consultation53111.268
prospective53111.268
calorie26011.268
clinically26011.268
employers26011.268
facets26011.268
fibers26011.268
gh26011.268
premiums26011.268
therapies26011.268
wrinkles26011.268
payments78210.990
bargains25010.851
doable25010.851
flats25010.851
fret25010.851
inspection25010.851
particulars25010.851
securities25010.851
uygun25010.851
conserve50110.642
hiring99310.433
wholesale49110.433
affordability24010.433
aluminum24010.433
banyak24010.433
repayment24010.433
travellers24010.433
superb73210.294
aesthetic48110.225
assured48110.225
utilizing170610.195
customer3371310.076
benefits3361310.046
wellness71210.016
ache23010.016
clinics23010.016
formulated23010.016
gourmandia23010.016
ppc23010.016
refund23010.016
washroom23010.016
employ7029.877
crucial23499.807
shoppers4619.807
effective631269.769
trends9139.599
circumstance4519.599
conveniently4519.599
pricey4519.599
accomplishing2209.599
accountable2209.599
blogging2209.599
evaluating2209.599
movers2209.599
pointers2209.599
purchaser2209.599
safest2209.599
simplicity2209.599
surgeries2209.599
adequate6729.459
profits6729.459
luxury11249.432
lawyer8939.390
delhi4419.390
customized8739.181
compromise2109.181
credible2109.181
dealership2109.181
dubai2109.181
evaluations2109.181
ght2109.181
greens2109.181
mereka2109.181
reservation2109.181
employing8639.077
optimum4218.973
appliances4218.973
locating4218.973
mortgage4218.973
undertaking4218.973
examine12758.903
solution339158.868
inexpensive8438.868
rates21198.847
achievable4118.764
bathrooms4118.764
equipments4118.764
dosage2008.764
angeljackets2008.764
cravings2008.764
hardwood2008.764
liable2008.764
roi2008.764
services958458.700
price519248.680
nowadays14468.645
moreover12358.625
costly10248.597
marketing349168.592
bucks4018.555
wh4018.555
investments6028.486
manufacturers10048.430
therapy10048.430
budget221108.422
offer859428.347
beneficial15978.347
calories5928.347
workout5928.347
coupons3918.347
filing3918.347
purchasers3918.347
antioxidants1908.347
appliance1908.347
avenues1908.347
executing1908.347
facet1908.347
improper1908.347
likelihood1908.347
misplaced1908.347
permitting1908.347
pests1908.347
qualification1908.347
relocating1908.347
secara1908.347
stimulate1908.347
undesirable1908.347
unt1908.347
viscose1908.347
vuitton1908.347
wagering1908.347
Highest non-spam words
Word#Spam#GoodFactor
infobox0183440.899
wikitable080194.092
redlink051124.602
colspan2131105.432
pokemon041100.640
imagewidth03995.848
reflist03893.451
roblox03688.659
roleplay03586.263
trivia313682.070
hates02971.886
minecraft28669.490
nocookie15769.490
playable02767.093
rpg15364.697
buttonlabel02664.697
sortable02664.697
admins15263.499
fanfiction02562.301
fanon02562.301
pok02562.301
protagonist14959.905
cameo02459.905
rowspan02459.905
alice02357.509
battlefield02255.112
contestants02255.112
trolling02050.320
tba13947.924
imagecaption01947.924
songwriter01947.924
youtuber01947.924
wikis13745.528
accessdate01845.528
antagonist01845.528
parody01845.528
pxsolidrgb01845.528
sings01845.528
superhuman01845.528
woke01845.528
murdered01743.131
troll01743.131
vocals01743.131
px1934941.933
beast13441.933
aliases01640.735
aunt01640.735
discography01640.735
emma01640.735
insulting01640.735
jquery01640.735
katie01640.735
reunited01640.735
xd01640.735
clan24939.937
cite611439.366
notoc36338.339
nickname24738.339
escaped13138.339
radius13138.339
ashley01538.339
hedgehog01538.339
mars01538.339
moz01538.339
nickelodeon01538.339
profanity01538.339
sage01538.339
skeleton01538.339
tablesorter01538.339
noeditsection13037.141
pronounced01435.943
airdate01435.943
arrows01435.943
ascending01435.943
birthplace01435.943
confronts01435.943
coppa01435.943
demonic01435.943
dennis01435.943
eats01435.943
embargo01435.943
galactic01435.943
headersort01435.943
horas01435.943
hunted01435.943
lia01435.943
mlp01435.943
reborn01435.943
unnamed01435.943
yells01435.943
aired24335.144
rgb610134.916
bgcolor12834.745
premiered12834.745
episodes711434.445
allies24133.547
encyclopedia12733.547
cutie01333.547
tina01333.547
anthem01333.547
elves01333.547
keith01333.547
meme01333.547
minecraftforum01333.547
moderator01333.547
recieved01333.547
sid01333.547
stabbed01333.547
successor01333.547
underworld01333.547
squad12632.349
oficial01231.150
verde01231.150
angered01231.150
animator01231.150
avatars01231.150
bruno01231.150
cannons01231.150
collapsible01231.150
cries01231.150
defaultsort01231.150
didnt01231.150
eddie01231.150
flashback01231.150
kenny01231.150
kirby01231.150
kisses01231.150
luigi01231.150
personaje01231.150
robinson01231.150
roleplaying01231.150
rollbacks01231.150
sniper01231.150
spaceship01231.150
usuarios01231.150
varios01231.150
yuki01231.150
manga23730.352
grandmother12429.952
col1013529.626
nationality23629.553
png1823229.385
sword57229.154
enemies68429.097
spawn12328.754
ancestor01128.754
apperance01128.754
bennett01128.754
bulbapedia01128.754
diana01128.754
dq01128.754
drake01128.754
estilo01128.754
flirting01128.754
hacia01128.754
hides01128.754
hinted01128.754
homeworld01128.754
jessie01128.754
launcher01128.754
moderators01128.754
naive01128.754
numbered01128.754
offical01128.754
ooc01128.754
paperback01128.754
pikachu01128.754
preceded01128.754
rachel01128.754
reaper01128.754
rebellious01128.754
reverting01128.754
sega01128.754
shinobi01128.754
sitcom01128.754
sourse01128.754
spongebob01128.754
stealth01128.754
steamcommunity01128.754
svg01128.754
toccolours01128.754
unrightful01128.754
werewolf01128.754
debut57028.355
cb45828.275
serif34527.556
amy12227.556
axe12227.556
backstory12227.556
clans12227.556
fighters12227.556
grupo12227.556
gameplay56727.157
defeated23327.157
mods23226.358
df01026.358
tras01026.358
acronyms01026.358
amigos01026.358
andreas01026.358
aparece01026.358
apologize01026.358
apologizes01026.358
awakening01026.358
awakens01026.358
bvm01026.358
chapman01026.358
contestant01026.358
debe01026.358
entonces01026.358
espacio01026.358
filmography01026.358
foes01026.358
harrassing01026.358
hes01026.358
immortality01026.358
infantry01026.358
irc01026.358
jefferson01026.358
kai01026.358
knocks01026.358
ltima01026.358
mediawiki01026.358
nether01026.358
nowysiwyg01026.358
obsidian01026.358
personajes01026.358
prompting01026.358
raids01026.358
realms01026.358
relocated01026.358
reunite01026.358
shyp01026.358
sociales01026.358
starship01026.358
starwars01026.358
vevent01026.358
webcomic01026.358
zh01026.358
anime67325.331
exceptions12025.160
ranged12025.160
scream12025.160
wanna12025.160
died67224.989
ep23024.761
dragon89124.494
brother1314024.133
armor55923.962
soldiers22923.962