Forum:Directives for a QVFD bot
Okay, now I'm thinking about the next step in javascript bot engineering (did you notice the bump on my cheek?). A QVFD bot seems relatively easy to make. But what should be its criteria for adding stuff to QVFD?
- Size: one-liners will always be reported. Let's stabilish a minimun size in bytes.
- Words: we can make it look for a list of typical 12-year-old dumbarsery. Lol, Rox, Poop, OMFG, etc. This could work with another conditional: articles above some size or from registered users could be ignored.
- Repetition: a little harder to implement, it could look for repeating sentences, typical IP stupidity.
- Google whacking: in order to find vanity, it could test the article's title in google to see if it returns at least one entry. Again, this test is only for IP entries.
- Google whacking 2: to find keyboard random dumping, it could also look for the article's first word in order to find if it's a real word.
Notice that all of the above are just initial suggestions. Let's discuss. -- herr doktor needsAshuttle [scream!] 16:26, 24 May 2007 (UTC)
- Cool idea. As usual I am going to spend a while devil-advocating to throw up problems for you to solve.
- I assume it'll work from New Pages, and only check things once, so we don't need to worry about existing articles, redirects, or things that have been vandalised? There's quite a few things already out there fitting most of those descriptions. (Okay, not so much a problem, that one.)
- Dunno about the size thing. As a Mistress of Short Pages (self-titled) I've observed that we get a fair spread of rubbish across all sizes. Maybe the admins and New Pages watchers could tell if there's a size limit on the stuff they come across before it gets passed to Short Pages, I dunno. But the other problem with this is: what about pages with Template:construction? And will the bot wait a decent amount of time to see if the page is still being edited before reporting it? (Because people never seem to work out how to use the preview button...)
- I've not seen repeating sentences much, but I'll take your word for it. I've not seen repeating sentences much, but I'll take your word for it.
- I don't know how much use title-Googling would be. Some great articles are titled by whole phrases that won't come up in Google, or single word titles can get mispelled. And Google searches so many pages that most things are gonna get at least a result or two - even some complete gibberish. This would also be a problem for part 2, along with flagging of good gibberish like PHNURR!
- (I do like the idea, honest!) On an semi-related note, something that searched around generally for fairly unaltered Wikipedia sporks would be good. --Whhhy?Whut?How? *Back from the dead* 19:18, 24 May 2007 (UTC)
- One problem you'll have to find a way around is that when admins create CVPs, they'll be added to QVFD. CVPs are only 7 bytes. Also disambigs shouldn't be added. That might be a problem. Sir Cs1987 UOTM. t. c 01:01, 25 May 2007 (UTC)
- Maybe the better option is simply restricting this software to 'non-trusted' users, that is, IPs and non-welcomed n00bs. As an option, I also thought about making it totally separated from QVFD: it could create a log page where admins could view its list of supicious articles with checkboxes to auto-delete them. They would do this with their own accounts and their own rights. -- herr doktor needsAshuttle [scream!] 01:59, 25 May 2007 (UTC)
- I think that's a good idea, although I'm sure the page nazis would not like us having another extra page. I reckon pages that are less than 1500 bytes or more than about 40 000 bytes should be added. Lets just see if the admins like it. Sir Cs1987 UOTM. t. c 02:06, 25 May 2007 (UTC)
- I think we should include welcomed n00bs. They can be evil too. Sir Cs1987 UOTM. t. c 03:36, 25 May 2007 (UTC)
- It should also check for double redirects. Marshal Uncyclopedian! Talk to me! 03:37, 25 May 2007 (UTC)
- It should not check userspace. Marshal Uncyclopedian! Talk to me! 03:52, 25 May 2007 (UTC)
- It should also check for double redirects. Marshal Uncyclopedian! Talk to me! 03:37, 25 May 2007 (UTC)
- I think we should include welcomed n00bs. They can be evil too. Sir Cs1987 UOTM. t. c 03:36, 25 May 2007 (UTC)
- I think that's a good idea, although I'm sure the page nazis would not like us having another extra page. I reckon pages that are less than 1500 bytes or more than about 40 000 bytes should be added. Lets just see if the admins like it. Sir Cs1987 UOTM. t. c 02:06, 25 May 2007 (UTC)
- Maybe the better option is simply restricting this software to 'non-trusted' users, that is, IPs and non-welcomed n00bs. As an option, I also thought about making it totally separated from QVFD: it could create a log page where admins could view its list of supicious articles with checkboxes to auto-delete them. They would do this with their own accounts and their own rights. -- herr doktor needsAshuttle [scream!] 01:59, 25 May 2007 (UTC)
- One problem you'll have to find a way around is that when admins create CVPs, they'll be added to QVFD. CVPs are only 7 bytes. Also disambigs shouldn't be added. That might be a problem. Sir Cs1987 UOTM. t. c 01:01, 25 May 2007 (UTC)
- No article should ever be marked by the bot twice. If an article gets marked but survives QVFD, it should be safe henceforth. --Sir gwax (talk) 18:14, 25 May 2007 (UTC)
- I like the idea, I think and easy way to get past construction and CVP would be to just make it avoid articles with "{{construction}}" or "{{CVP}}". As for the problem with people who don't use preview, it's no big deal, as these pages would be reviewed by a person before they were deleted.
- One thing you do need to think about is the people who get their kicks adding stuff to QVFD, I'd get their opinion on this before you go any further. t o m p k i n s blah. ﺞوﻦ וףה ՃՄ ண்ஸ ފއހ วอฏม +տ trade websites 00:33, 26 May 2007 (UTC)
- I thought about this subject. If it could work really smooth, people can feel disincouraged to add stuff to QVFD. However it's worth noticing that it's not (and not intended to be) a perfect tool: it can report shorties, interlingua, and other obvious forms of dumbarsery, but not all stupidity that only the human mind can come up with. -- herr doktor needsAshuttle [scream!] 00:40, 26 May 2007 (UTC)
it'd fuck up way too often.
have it check for anything under about 300-500 chars.
ONX 00:42, 26 May 2007 (UTC)
Summarizing
So, that's it:
- Check in Special:NewPages for the last XX (informed by the operator) entries (although ignore the first ones per delay instruction below... --SbU);
- Repeat the check each XX (informed by the operator) seconds;
- Make a log subpage in order to avoid repeating checks (by ID, not name, vandals use to recreate their shit);
- Ignore:
- User space/page/talk entries;
- Veteran entries: will absolve any registered user with more than 50 contributions;
- Articles with Contruction tag;
- Articles with ICU tag;
- Articles with disambig tag; --SbU
- Articles that have been edited in the last 3(?) hours. --SbU
- Report to QVFD:
- Any article shorter than
1k350800 bytes;--SbU--Rataube Any article between 1k and 6k without links;--SbU- Any article in all caps (What about AAAAAA and others like it? Vogons? just a thought -ONX);
- Any article with curse expressions listed in a special editable subpage of its user space. Suggestion: interlingua, admin's names, etc.
- Any article shorter than
- Placing a small comment after the reported article, ex.: "(bot:<350b)"
Option:
- Instead of or complementary with QVFD, it could have a log page with conditional auto-deleting links for admins. This is for a 2.0 version thus.
-- herr doktor needsAshuttle [scream!] 00:56, 26 May 2007 (UTC)
Comments on the proposal
- I wouldn't want any qvfd'ing based on size alone... Some of those are actually all right. And it would be best to only act on articles that haven't been editing in a certain amount of time (say, a few hours) in order to not qvfd articles someone has only just created. • Spang • ☃ • talk • 01:58, 26 May 2007
- 1k is much too large. t o m p k i n s blah. ﺞوﻦ וףה ՃՄ ண்ஸ ފއހ วอฏม +տ trade websites 02:38, 26 May 2007 (UTC)
- Ignore articles with disambig template. And scrap the "without links" one - these pages probably just need a deadend tag not QVFD (n00bs are useless at this). Thinking about it some more, I'd suggest that the limit where QVFD ends and ICU begins is about 350 bytes, give or take, although of course there are plenty of counter-examples either side. I do like the registered user with 50 edits idea - that's not a bad guide - I think, Spang, that this should help ignore most actual good tiny articles. --Whhhy?Whut?How? *Back from the dead* 06:57, 26 May 2007 (UTC)
- You all may edit the guidelines above. I'll start just when we have an agreement. -- herr doktor needsAshuttle [scream!] 07:50, 26 May 2007 (UTC)
- (Edited in line with what's been said so far, and tagged what I've changed.) I'm not very well up on how bots work, so forgive me if I'm a bit n00bish. Will this have to be specifically run by users? And will it report to a single page anyone can view and change? If so then could it add dates to entries and instead of checking the last XX entries and check for duplicates it could check back until the last time it was run...? --Whhhy?Whut?How? *Back from the dead* 12:44, 26 May 2007 (UTC)
- The size issue is a tough one. A recent IRC discussion made me realise that there is a large amount of variation in what people consider to be QVFD worthy articles. I'd say that articles under about 800 bytes are very unlikely to be good, and I would be happy to see them all go into QVFD via a bot, and let the admins decide from there. My earlier suggestion of 1500b was more to do with "suspicious" articles, rather than instaQVFD worthy articles. I do like Spang's last suggestion too.
- Also, don't forget about CVPs. Very important, they are. Sir Cs1987 UOTM. t. c 12:51, 26 May 2007 (UTC)
- Testing CVP/Redirect/Disambig is easy but these stuff are normally not created by IPs/users with less than 50 edits and, when they are, it's almost always vandalism. -- herr doktor needsAshuttle [scream!] 16:15, 26 May 2007 (UTC)
- CVP, true. I'd disagree on Redirect because n00bs spell titles wrong and then move them (and we don't tend to delete redirs for obvious mispellings). Disambig, well, sometimes, and I've never seen a vandal one - the n00bs don't know that we're more lax on these. --Whhhy?Whut?How? *Back from the dead* 17:06, 26 May 2007 (UTC)
- Testing CVP/Redirect/Disambig is easy but these stuff are normally not created by IPs/users with less than 50 edits and, when they are, it's almost always vandalism. -- herr doktor needsAshuttle [scream!] 16:15, 26 May 2007 (UTC)
- (Rataube changed 350 to 800.) I'm going to disagree, at least if it is a pure QVFD bot. Perhaps it should be an ICU and QVFD bot - then 800 seems reasonable. I just think we shouldn't encourage admins to huff everything under 800 bytes when at the moment the limit is a lot lower. We're hard enough on the n00bs as it is. Well anyway, I think we agree about most things apart from this point, which will presumably be easy enough to change when the bot exists... --Whhhy?Whut?How? *Back from the dead* 23:12, 27 May 2007 (UTC)
- (Edited in line with what's been said so far, and tagged what I've changed.) I'm not very well up on how bots work, so forgive me if I'm a bit n00bish. Will this have to be specifically run by users? And will it report to a single page anyone can view and change? If so then could it add dates to entries and instead of checking the last XX entries and check for duplicates it could check back until the last time it was run...? --Whhhy?Whut?How? *Back from the dead* 12:44, 26 May 2007 (UTC)
- You all may edit the guidelines above. I'll start just when we have an agreement. -- herr doktor needsAshuttle [scream!] 07:50, 26 May 2007 (UTC)
- Ignore articles with disambig template. And scrap the "without links" one - these pages probably just need a deadend tag not QVFD (n00bs are useless at this). Thinking about it some more, I'd suggest that the limit where QVFD ends and ICU begins is about 350 bytes, give or take, although of course there are plenty of counter-examples either side. I do like the registered user with 50 edits idea - that's not a bad guide - I think, Spang, that this should help ignore most actual good tiny articles. --Whhhy?Whut?How? *Back from the dead* 06:57, 26 May 2007 (UTC)
- 1k is much too large. t o m p k i n s blah. ﺞوﻦ וףה ՃՄ ண்ஸ ފއހ วอฏม +տ trade websites 02:38, 26 May 2007 (UTC)
I guess this belongs under a separate heading...
While we're thinking about technology (sorry, I do love it, even if in this case I don't know nearly enough about it) - would it be at all feasible to create a bot that makes a list of pages from which construction and ICU tags have been recently removed? I keep a watch on the ones that start off on short pages, and from that I know that there's loads of people removing the things without having made the article any better. I wouldn't like to change the rules to forbid new users from removing them, but it would be nice to have a list to have a glance through and check they've been taken off appropriately. Anyway, just a thought in case anyone needs a future project... --Whhhy?Whut?How? *Back from the dead* 12:44, 26 May 2007 (UTC)