Skip to Content

American Hippodice judging model

32 replies [Last post]
jwarrend
Offline
Joined: 08/03/2008

I mentioned in the "Hippodice" thread that a while back, Darkehorse expressed interest in wanting to develop an American equivalent of the Hippodice competition. I'd like to begin some discussion here about how such a contest might be judged, both conceptually and, practically speaking, how exactly we'd pull the thing off.

First, I propose that we not reinvent the wheel, and use Hippodice's structure. Loosely, it works like this:

Round 1: Rulebooks are read, with X selected for playtesting.
Round 2: The X games are playtested, with Y selected for the finals.
Round 3: The Y finalists are playtested by representatives from established publishers, and a winner is chosen.

I'll elaborate a bit: (throughout, I'm assuming we'll get about 75 entries or so to start; this may be way too high, or way too low)

Round 1: I envision that 5 or 6 people could be responsible for reading the rulebooks and selecting, say, 30 or so games for playtesting. We'd have to set up a structure where each rulebook was read by at least 2 people, and the game given a rating, with the best-rated games going on. (We'd have to define whether the "winner" of the overall competition was the game judged to be the "best", or the "most sellable". There is a difference.)

Round 3: This one is also (theoretically) easy to implement. Here in the US, we have Rio Grande, Mayfair, Out of the Box, Days of Wonder, Uberplay, Dancing Eggplant, etc. If each company could provide a rep to help judge, we'd have a very respectable competition right off the bat.

Round 2: This is the tricky bit. It's my understanding that Hippodice does nothing but playtest for 3 months. I don't honestly believe there's a group in the US willing to do that. My thinking, then, is that we need a model whereby games can be judged by a "consortium" of many groups. My initial idea was something like the Boston area's "Unity Games" group, which is a group of 10+ different local game groups. But it wouldn't necessarily have to consist of groups that were geographically linked.

We would need to derive a judging model for such an arrangement. Here's my idea: each group would be responsible for playtesting X games. They would play each game Y times and give it a score. To calibrate, each group would also give scores to several "benchmark" games (Puerto Rico, Acquire, El Grande, etc), so that the groups' scores can be normalized. Probably, we'd also want each game to be playtested by at least 2 different groups, if possible, although that would increase the total number of "playtesting events" that would need to happen.

Finally, I want to propose an additional refinement that Hippodice doesn't include, namely, the categorization of games. Perhaps it could be as simple as "card games/board games", or perhaps it could be grouped by the scope of the games (e.g., "strategy", "family", "party", etc). The point is that since this contest, like Hippodice, is supposed to bridge the gap between publishers and designers, it might be helpful to have the games broken down by type so that publishers can get a sense for what the best games based on what they want to publish. It would also streamline judging, since it's easier to ask a group to compare 5 card games than to compare a card game, 2 tile games, and 2 board games.

These are just my initial thoughts. Please chime in with a judging model that you think would work. Offers to participate in judging are welcome, but remember, we're talking about playtesting 30-40 games here. This is a HUUUUGE commitment, and I really do think it's beyond what one group could handle. We absolutely MUST avoid the judging bailout we had in our last contest, and so I think we want to minimize the work that any one gruop has to do. That's why I favor a "consortium" model for Round 2, but other models might work.

Let's make this happen!

-Jeff

zaiga
Offline
Joined: 12/31/1969
American Hippodice judging model

Jeff,

I think the general outline of what you are proposing here is good. What you want to avoid is getting bogged down in the details of the judging process before you even know for sure if you can get this thing of the ground. I suggest you approach this whole project topdown. Start with the most crucial stuff, then fill in the details.

An American version of Hippodice will stand or fall with the availability of a dedicated group of playtesters. If you cannot find enough people who are willing to invest the time and energy to do the judging then forget it. This is definately the most crucial factor. If you can find enough judges, then I think it can be done.

Another point of interest is the final playtesting round by industry people. I'm pretty sure you can find a bunch of industry people who are interested in playtesting a few games, but the trick is getting them all together in the same place at the same time. Perhaps this final round could be arranged in such a way that it takes place during a big gaming convention which those industry people woudl have attended anyway.

Personally, I'm not too fond of categorizing games. You say that it is not entirely fair to compare card games to board games, but even within those categories there will be discrepancies between the scope of different games. Suppose one card game is on the level of "6 Nimmt", how are you going to compare it to a card game that is more like "Settlers of Catan: the Card Game"? I just don't think categorizing games will be very useful. You will have to judge every game on its own merits. However, now I'm already going into the details of the judging process, which is something I said you should avoid for now, so I'll shut up about it.

Good luck with the whole project. You'll need it!

- René Wiersma

Anonymous
American Hippodice judging model

Jeff, I'm more than willing to marshall what folks I can in HG to assist. We have bi-monthly playtest sessions where members can float their own designs...I don't see any reason why we couldn't also incorporate (or substitute for the period of the competition) playtesting contestants' designs instead.

For what it's worth, I had envisioned the first screening round to be something like this (and maybe this is what you had in mind as well, sometimes communication in pure-text media isn't 100% clear): each of the "selection committee" members would read all of the entries (certainly not a small task unto itself), and, say, select the best 10 from the group. If you want to get elaborate, they could even be ranked top to bottom, but it would probably just be easier to select the best 10. Then, everyone's Top-10 list would be compared, and the games which received the most commonality by appearing on more lists would tend to be the ones to advance to the prototype phase. Of course, the weakness here would be if everyone picked the same 10 (or close to it), and only 10-15 entries actually got votes.

This type of structure (or something similar) I think would facilitate having judges in less-than-geographically-similar locations, as it would rely less heavily on a conference-style discussion of which designs to select. Having a geographically diverse selection committee (and playtest group, if it can be organized) would help to eliminate any potential geographical bias in the judging.

I am a little concerned about your suggestion to categorize games. Would you proposed to have a "best card game", "best abstract strategy game", "best bidding game", etc., or were you more intending it to be something like "the 10 finalists include 3 card games, 2 bidding games, ...."? I'm not sure that we'd get enough entrants, at least in the early years, to be able to meaningfully segregate the different categories of games. I could, of course, be completely wrong.

Another item we should consider is timing: when should we have it? I would think that summer would probably be best, as time demands from school (for those in school) would be less and that might translate into a larger judging/playtesting pool. On the other hand, it might conflict with family vacations. Further, since Hippodice has their's in the winter, it would make sense for us to stagger "Rhinopawn" (or whatever we choose to call it...I'm just going to use my pun-inspired name until an alternative is derived) to the summer. Thoughts?

I will definitely lend my support (and recruit heavily in HG) in getting this ball rolling!

jwarrend
Offline
Joined: 08/03/2008
American Hippodice judging model

Guys,

Thanks for your replies. A couple of clarifications:

First, I agree with Rene that fine-tuning the exact rules of the competition is indeed putting the cart before the horse. What I'm hoping to get at with this discussion is more a conceptual idea of how we could fairly judge a bunch of games when the judging is being done by several small groups and no one group will look at all of the games (or even a majority of the games). And also, to try to identify groups that could possibly handle some (or all) of the judging.

Mike, I definitely appreciate the offer to use the Houston Area gamers group, and I hope to talk to you off-line about how we might pull it off. Again, we are talking about a LOT of games here. How big is the group?

Finally, with respect to categorization: this is not something I insist on, it just seems like it might be helpful to the publishers to have some level of distinction between the games so they can "filter out" the games that would be most of interest to them. Maybe it would be just as valid to categorize by "price point"? e.g, games that would probably sell for under $20, vs those that would sell for more than $20? Again, just trying to think of a possible scheme for making the results more useful to publishers who might want to pick up these games. My other thought with this would be that it might enable separation of the games with respect to judging, so one group could do "Category X" and another could do "Category Y", and you eliminate the problem of calibrating the two groups but still get all the games evaluated fairly.

With that in mind, it's also important to be clear on what the contest would reward: the games that are "best", or those that are "most marketable"? Or some amalgamation of the two? (e.g., "Dallas-opoly" shouldn't be able to win the contest, but neither should "The Authoritative Game of Soap Manufacturing", which may be an absolute gem of a game but wouldn't sell a single copy...

Thanks for the input. Others, keep it coming!

-Jeff

Anonymous
American Hippodice judging model

jwarrend wrote:

Mike, I definitely appreciate the offer to use the Houston Area gamers group, and I hope to talk to you off-line about how we might pull it off. Again, we are talking about a LOT of games here. How big is the group?

On our regular gaming days we usually get anywhere between 12 to 24 or so, which is a subset of the roughly 30 or so active members (I'm shooting from the hip after a long night of work and a short night of sleep, but I think that's a good ball-park). The playtest groups are typically much smaller, say 4-6.
jwarrend wrote:

Finally, with respect to categorization: this is not something I insist on, it just seems like it might be helpful to the publishers to have some level of distinction between the games so they can "filter out" the games that would be most of interest to them. Maybe it would be just as valid to categorize by "price point"?

Here's a wild idea: let's ask the publishers what they might want to help filter out the games that would be of most interest? I think after we approach the companies you listed earlier to see if there is any interest (and if so, who would be stepping forward), we can then ask them what they would find valuable coming out of the competition. Even if a company isn't willing/able to participate in the judging, getting their input regarding what deliverable would be most valuable would help lend credibility to the contest.

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

This is a great idea and I hope we can get it to work. I've not been able to contribute as much as I would like to the site, but I believe I may have another group to playtest games for this. We have 8 to 10 people coming each week, and play for several hours, so we could probably get a bunch of games in on a given Saturday.

If this gets going forward, I'll bring it up in front of my group. Even though I haven't played any of my own games with them (mostly for lack of a finished proto) I think my group would love to be the group that helped playtest games that may be the next big hit.

Any idea of how this would be presented to the American companies?

-Steve

Edit: Also, how about the Gamefest crew? They don't make games specifically, but they sure have a big presence here in the U.S. with their reviews, etc.

jwarrend
Offline
Joined: 08/03/2008
American Hippodice judging model

Again, the offers to help with the playtesting are very generous and tremendously helpful, but they are reinforcing my conjecture that no one group can handle the full load. So, what we really need at this point is, I think, a model by which we can judge the games fairly given that not every group will play every game. It doesn't have to be fully fleshed out, but just a rough idea of how we're going to judge this contest fairly. Having that, we can push ahead recruiting groups (or still hold out hope that one big group might emerge) and publishers for the "final round", then start talking
about dates. I'd love to have this in 2005 if possible, but it might be too soon. It might be nice to have submissions due by, say, April or May to run opposite the Hippodice's late fall entry deadline. But that might be too soon for this time around.

I have a note off to Alan Moon inquiring as to whether SAZ (the German game designer's organization) would be interested in getting involved. I'll let you know what he says!

-Jeff

Anonymous
American Hippodice judging model

This is a great idea. And I like the name Rhinopawn. And I like timing it opposite of hippodice - However I would have the entries due for Round 1 March 31, to avoid play testing in August.

I would be willing to spearhead a group from Portland Oregon. I belong to a couple of play groups. One has 12-16 people on average (meeting every couple of weeks) and the other has 6-8 people (meeting once a week). I would want to get group buy-in, but they are fairly open to new games.

I like the breakdown of rounds.

How would I recommend doing round 1? Require that the rules/description/whatever be submitted in digital form. Post them them to a private forum. Have the committee member vote on their favorite 10 or 25.

How would I recommend doing round 2? Lets say that we had 75 games go from round 1 to round 2. Then I would get 6 groups to play test 25 games each. This would allow for each game to be play tested by 2 groups. I would suggest that each group play the game at least 2 or 3 times. I don't think there is a need to have the scores normalized across play groups. After all, within a given playgroup not everyone will necessarily play a given game. We could also look at the standard deviation for scores on a given game. If the S.D. is high for a given game, we could have another group also evaluate it.

Another detail to not forgot about are the logistics of getting the game to each play group. We would proably have to charge more than 5 Euros if we are shipping the game two or three times.

Jonathan

jwarrend
Offline
Joined: 08/03/2008
American Hippodice judging model

jhager wrote:

How would I recommend doing round 2? Lets say that we had 75 games go from round 1 to round 2.

I think we can probably only realistically pass 25-35 games into playtesting for this trial, but ok, for the sake of argument...

Quote:

Then I would get 6 groups to play test 25 games each.

We should probably standardize our use of terms here. Let's define a "group" to be 5 people; e.g., a group that can playtest one game of 3-5 players, or two two-player games in one "unit of time". Let's define one "unit of time" to be 75-90 minutes, long enough to play one "full-length" game or two "short" games.

So, to judge a "standard game" requires 2-3 units of time for one group. In that time, the group could instead judge two "short" games, or two "standard-length, two-player" games, or 4 "short, two player" games. That's about what can be judged in a 3 hour block of time.

So the question becomes, what seems like a manageable load to ask a given "group" to commit to? How many hours, e.g.? That will give us a way to estimate how many games a group could judge, and thus, once we have committed groups, we'd know how many games could pass to the 2nd round.

Quote:

Another detail to not forgot about are the logistics of getting the game to each play group. We would proably have to charge more than 5 Euros if we are shipping the game two or three times.

Absolutely, this will cost considerably more than Hippodice -- maybe something like $25, with a $5-10 discount for SAZ members (assuming we can get SAZ to come on board with this...)

Thanks for the observations!
-Jeff

Jonathan

IngredientX
IngredientX's picture
Offline
Joined: 07/26/2008
American Hippodice judging model

Hi Jeff...

American Hippodice (i.e. Rhinopawn) would be fantastic! A few cents worth of ideas...

1) The fact that the Stage 2 playtesting would happen amongst a large amount of play groups may not work very well, unless the scores they give us are normalized really well. If we have one group that gives every game they get 10s, and another that gives every game they get 1s, are we giving entrants a fair shot? It's an extreme example, I know, but I hope you get my point.

The only way around that is for each group to test each game. If that's logistically possible, then it's probably the way to go. I don't know if we can pull it off, though.

2) If we get, say, 10 playtest groups to play 5 times, that's 50 games. If we want each game to be played, say, 5 times, we're back down to allowing 10 games in Stage 2. That's a lot of pruning that stage 1 will have to do!

3) How much playing can we reasonably expect to occur in Stage 3? The industry folk we get will be so busy, can we expect them to stage more than 5 playings? Can we expect them to play a game more than once?

4) We're a pretty close-knit bunch of gamers. What happens when one of us gets a game that we've already heard about on the BGDF, or even helped playtest? That game will already have an advantage, even if the playtesting juror doesn't score, simply because he/she can provide rules interpretations.

5) Here's an idea about categorization. Have three categories: Games playable in 30 minutes (6 Nimmt, Mamma Mia, Gipf), games playable between 30-60 minutes (Bohnanza, San Juan), and games that take over an hour (Settlers of Catan, Puerto Rico). Instead of categorizing a game by content, we categorize by playing time. We still might get strange matches (Gipf vs. 6 Nimmt, for example), but the application makes sense to me. If you had 30 minutes before your gaming group starts the next game, what would you play?

Gotta get back to work. Sorry for the devil's advocacy... I think this is a great idea, and I'd love to see it fly!

sedjtroll
sedjtroll's picture
Offline
Joined: 07/21/2008
American Hippodice judging model

Regarding timing:
If we got this organized by the end of the year and people had 4 or 5 months to submit their game this time around, I think thtas probably ok. In the future they'll have more time. In any case it's notl ike they didn't just get their game done for Hippodice anyway ;)

Regarding categorization:
I think it's a terrible idea. For one thing, I think the publishers are perfectly capable of reading about the top 10 finalists' games and deciding if it's their bag. Especially considering they probably had a rep test them out. Secondly, I think it might actually be BETTER to have a single group play different types of games than 5 different bidding games- I think it will help out the normalization. Say the group that got the bidding games isn't good at or doesn't like bidding, or are just more critical than average to begin with. This means Bidding games are at a disadvantage from the start (especially if the 'benchmark' games- a great idea btw- are not bidding games).

Regarding selecting finalists for round 2:
I liked the idea of several people reading all the entries and selecting the top 10 or so. This could get to be a big job when there are 200 entrants though.

Regarding playtesting/judging:
I think the games could be divided among a number of test groups. The Benchmark thing was a great idea, and if it's done ahead of time and all the results are distributed to all the groups then the groups could sort of self normalize. Maybe the best way to go though is to have a core group- maybe the ones that select the top X rulesets to request protos for, who coordinate the test groups. They would look at all the benchmark reviews and keep them on hand so that when contest entry reviews come in they can be "normalized" somehow (I think perhaps a subjective way might actually be more fair than some mathematical algorithm). It shoudl be up to this core group to use their judgement to select the finalists out of all the groups, based on but not strictly limited to the scores from the test groups (this is what I mean by subjective normalizing).

So each test group would be responsible for a certain number of games based on size of the group. Bigger groups can handle more games, so let them. When the results from that testing come in, X finalists are chosen by the core group to be sent to the judges.

Possible second round:
Since the initial testing would be done by various groups around the country, and no one group would see all the games, perhaps a second round of testing is in order. The core group would simply take the best 1/3 or so of the games, mix them up, and send them back out to different groups. This is another layer of 'normalization' which would take the games that are "obviously good" and see which of those remain at the top of the list when played by another group.

Regarding final judging:
Not much to this one, the reps from the game companies would playtest and rank the top X games. The idea of judging at a game convention is a good one, if there's not one convenient maybe we could start our own Game Design Conference at the end of summer :)

- Seth

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

jwarrend wrote:
Again, the offers to help with the playtesting are very generous and tremendously helpful, but they are reinforcing my conjecture that no one group can handle the full load.

This is what I understood as well. I think the idea of several groups playtesting the games is a great idea, even the idea of the games moving around to different groups. This would definately cost a quite a bit of money for the entries, but would make the judging a whole lot better (plus, it isn't the cost of sending it to Germany.)

In regard to Gil's comments, if I remember right, Hippodice picks 50 entries for the first playtesting. Then from there they pick 10 entries. So here's my suggestions in weeding out the original 50 to the 10 (if we follow their way, which I think is good):

1. Each group plays so many games, if there's 5 groups, then 10 games each. Those groups choose their top 2 games and they move on to the final round. So in this way, the 50 games only get played once.

2. Each of the 50 games is played a certain amount of times by the different groups, as again if there are 5 groups, maybe a game will be played by 2 or 3 groups and then the top ten is moved to the final playtest.
I think 2 groups is enough to play a game, although 3 would give more variety to the scores.

3. Each group plays all of the 50 games. Not only is this time consuming, but it's also going to be expensive. Then the top ten scores move to the next round.

4. Same as number one, except each group chooses 5 games. Those 5 games are then sent to another group to play. Those 5 are played and then 3 are sent to another group. Those 3 are played and then the best 2 are chosen. Each time games are chosen, the ones that are not chosen are not played anymore and have no consideration for the top ten games.

In each of these cases, the top ten are then playtested again by all of the groups, with each submitting a score. The game with the best score wins.

Hope this makes sense. Hope to help in any way possible.

Steve

jwarrend
Offline
Joined: 08/03/2008
American Hippodice judging model

sedjtroll wrote:
Regarding timing:
If we got this organized by the end of the year and people had 4 or 5 months to submit their game this time around, I think thtas probably ok. In the future they'll have more time. In any case it's notl ike they didn't just get their game done for Hippodice anyway ;)

Regarding categorization:
I think it's a terrible idea. For one thing, I think the publishers are perfectly capable of reading about the top 10 finalists' games and deciding if it's their bag.

I agree. In part, I'm considering the situation where, perhaps 7 of the top 10 are "heavy" games, whereas perhaps if we said here are the top 6 games in category X, the top 6 in category Y, and the top 6 in category Z, it's a little better at comparing apples to apples and gives companies a little easier time of viewing the results as legitimate (since the criteria for comparing games are more obvious because the games in each category are more obviously alike than in a free-for-all model.

Again, not something I insist on, just a thought. I do think Gil's idea of dividing it up by playing time is sensible. I agree with Mike that asking the publishers if they'd even want this would be a good thing.

The key point is this: Hippodice is a great competition, but we don't have to do things the way they do it; we can reinvent their model, if we want. It's a legitimate question as to how you'd compare, say, PR with Fluxx. Breaking things up may make things easier...

Quote:

Especially considering they probably had a rep test them out. Secondly, I think it might actually be BETTER to have a single group play different types of games than 5 different bidding games- I think it will help out the normalization. Say the group that got the bidding games isn't good at or doesn't like bidding, or are just more critical than average to begin with.

I wasn't advocating this level of differentiation; rather, I was saying, perhaps, that "all the 'light' games go to Boston groups, all the 'heavy' games go to midwest groups", almost as if there are two separate competitions with two separate judging pools. It would be a way to accomodate the reality of the judging situation we're likely to face, but also to keep the comparison of "like" games as fair and consistent as possible.

I'm glad you (and others) provided a dissenting opinion, though; it's just an idea, and my defense of it isn't meant to apply my committment to it.

Quote:

Regarding selecting finalists for round 2:
I liked the idea of several people reading all the entries and selecting the top 10 or so. This could get to be a big job when there are 200 entrants though.

Yeah, I hope we don't get 200 entries! Or, at least not 200 good ones! I'd like it to be a little easier to narrow down than that...

Your other ideas are very good, as are those of the others (even though I haven't responded to all of them directly). Thanks!

Just to let you all know, I've already heard back from Alan Moon that he's interested in the idea and will discuss the idea with the SAZ leadership and get back to us. A good start!

-Jeff

sedjtroll
sedjtroll's picture
Offline
Joined: 07/21/2008
American Hippodice judging model

jwarrend wrote:
In part, I'm considering the situation where, perhaps 7 of the top 10 are "heavy" games, whereas perhaps if we said here are the top 6 games in category X, the top 6 in category Y, and the top 6 in category Z, it's a little better at comparing apples to apples and gives companies a little easier time of viewing the results as legitimate (since the criteria for comparing games are more obvious because the games in each category are more obviously alike than in a free-for-all model.

Well, now we're talking about seperate competitions. You can't really start divying up the games by category without following through...

Quote:
I wasn't advocating this level of differentiation; rather, I was saying, perhaps, that "all the 'light' games go to Boston groups, all the 'heavy' games go to midwest groups", almost as if there are two separate competitions with two separate judging pools. It would be a way to accomodate the reality of the judging situation we're likely to face, but also to keep the comparison of "like" games as fair and consistent as possible.

So then Boston sends up the 'best of the 'light' games' and Texas sends up the 'best of the 'heavy' games' and then you have to compare the best light game to the best heavy game in the final round... that's the opposite of the apples to apples comparison you were looking for.

I'd also like to point out that from the perspective of a playtester, it's nice to have some variety. Maybe half the people in Boston really like light games, but the other half prefer a deeper experience.

Quote:
I've already heard back from Alan Moon that he's interested in the idea and will discuss the idea with the SAZ leadership and get back to us. A good start!

Very good indeed. I'm going to contact my potential playtesters and see if they'd even be interested in such a commitment. I'd love to be a part of a playtest commitee, or possibly even the core group in charge of the contest (helping interpret the playtest results and doing subjective normalizing). As a side note, I don't see why a member of the core group couldn't also be a playtester... though it's possible they'll bebiased toward a game they played and liked.

- Seth

phpbbadmin
Offline
Joined: 04/23/2013
Hmm

With regards to comparing heavy games to light games... Where do you draw the line? By that I mean how do you group those in between games? I.E. Those medium games, or those heavier light games or those lighter heavy games? Do you see my point? I think you can into dangerous territory when you try to put games into neat little groups. My personal opinion is that you judge them on the enjoyable experience they provide. If a light game that takes 15 minutes is just as fun as a heavy game that takes three hours, then it should score the same. The same goes for card games vs board games, strategy games vs beer & pretzel, abstract vs themed. It's very easy to try and classify games, but it's not always going to be black and white when trying to classify them.

My thoughts for judging are much like the others. Prescreen the entrants based upon the rule book or game preview, then divide the games evenly among the different groups. Then I think each group should recommend the games that they feel are the best among those that they judged. But, and here's the kicker, there's no minimum or maximum that they have to recommend. If each group gets 10 games, and group A feels that 8 of the games deserve further review, then all 8 get further review. If group B gets 10 games and they only feel 2 get further review, then only 2 get further review. After that, the recommendations can get shifted to another group and they can further narrow down the recommendations, they get shifted again, etc. This method could be used to whittle the field down to a a manageable # (say 10 or so), and then these final entries could get passed to the game execs.

Why do I think it should be done this way? Because I think it's the fairest and most thorough method. It may take a little longer but I think it's for the best. I also think it would be great if a review team could look through the games that didn't make the cut and perhaps request a rereview by their team. If a game is a niche market game, then it may only appeal to a certain segment of the gamers. If a game doesn't hit it's target audience initially, it can be looked over for further review. However, if a group of gamers likes the theme of a game, perhaps they could request the game to rereview it to confirm it's non recommended status. For that matter, I think it might be a great idea for groups to be able to request a particular game for review in advance. At least that way, a niche market game could be sought out by it's intended audience rather than being looked over by a group who didn't care for the theme.

Just my dos pence,
-Darke

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

sedjtroll wrote:
Very good indeed. I'm going to contact my potential playtesters and see if they'd even be interested in such a commitment. I'd love to be a part of a playtest commitee, or possibly even the core group in charge of the contest (helping interpret the playtest results and doing subjective normalizing). As a side note, I don't see why a member of the core group couldn't also be a playtester... though it's possible they'll bebiased toward a game they played and liked.

I don't think it's a big deal that a core member would be a playtester also. I think it may be a good idea that games are given to groups somewhat randomly, to prevent anything bad from happening.

I think the big difference between this and Hippodice's competition is that we will probably need to have multiple places to playtest all of the games, where I believe Hippodice has a huge group for that.

On the other hand, there is an advantage to have multiple groups playing the different games, since it will get a better overall feel for the games, since they may be playtested in different regions of the country.

It's good to hear that we might have the support of the SAZ. It would probably be fruitless for us to talk about this without any support from the major and/or minor American companies/designers.

-Steve

sedjtroll
sedjtroll's picture
Offline
Joined: 07/21/2008
American Hippodice judging model

SVan wrote:
there is an advantage to have multiple groups playing the different games, since it will get a better overall feel for the games, since they may be playtested in different regions of the country.

I think it's a really good idea to have 2 different groups look at some of the games, but more than that will take too long and stop being constructive. I proposed a first and second cut, where all the games get played once, but only the better half or 1/3 of those get played again by another group.

Also, people keep mentioning dividing the games equally among groups to playtest. In my proposal I suggested that the size of the group determines the number of games they will test. I think this is important so that small groups won't be held responsible for the same number of games a large group could test in half the time. Using a weighted average would allow each game the same amount of attention because it's not fair to the testers or the contestants if one group has to burn through games lest they not get to all of them while another can take their time and really look into each game.

- Seth

jwarrend
Offline
Joined: 08/03/2008
American Hippodice judging model

One issue I'd like to raise; I also suspect it's ok for a "core member" to be a playtester, but it probably won't be ok for an entrant to be invovled in the judging process.

I could probably come up with a rationalization as to why it's ok to enter and still playtest other's games, but I just don't think it would look good to the outside. So, to all who are volunteering your groups to playtest, bear in mind that doing so may preclude you from entering!

-Jeff

setarcos
Offline
Joined: 07/31/2008
American Hippodice judging model

IMHO adopting the general format that Hippodice uses for their design contest (as outlined by jwarrend) is definitely the way to go.

FWIW, I’ll submit for your consideration the way I would probably flesh out the contest procedures. (At least these are my preliminary thoughts. Feel free to spindle, bend, and mutilate them at your whim, of course; this is just offered as a basis for further discussion.)

Round 1: Rulebooks are read, with X selected for playtesting.

Entrants are required to submit rules electronically to facilitate distribution to the dedicated volunteers who’ll be doing the initial screening.

Each member of the Selection Committee (maybe members of BGDF?) are encouraged to at least skim all of the rules submitted before deciding which ones are worthy of thorough study.

Each committee member is allowed to nominate as many games as he likes, but is required to rank his list of nominees.

Initially the games ranked number one by each committee member are selected to go to Round 2. (There will probably be some overlap at each level of ranking.) If those are enough games for the next round, then stop there. If not, then the games ranked number two are also chosen. Continue to select each additional level of ranking in turn until you have enough games for Round 2.

Designers are notified as to whether their game has been chosen to go to the next round or not. Those who are going to Round 2 are told how many prototypes must be submitted for playtesting by the predetermined deadline. (One copy for each game club participating in Round 2.)

Round 2: The X games are playtested, with Y selected for the finals.

Prototypes of the games selected in Round 1 are sent to the game clubs throughout the U.S. that are participating in this round.

Each game club (or other Playtest Committee) is free to choose which games to playtest, how much to playtest them, and by what criteria and process they will choose their nominations. They may nominate as many finalists as they like, but are required to rank their nominations.

As in the first round, the games ranked number one are chosen to go on to the next round. Continue to select each additional level of ranking in turn until you have enough games for Round 3.

Prototypes used in Round 2 are sent back to the award sponsor. Those that won’t be needed for Round 3 are returned to the designers.

Round 3: The Y finalists are playtested by representatives from established publishers, and a winner is chosen.

Prototypes of the games selected in Round 2 are presented for playtesting to the panel of industry representatives at the venue of the award.

The panelists decide as a group which games to playtest, how much to playtest them, and by what criteria and process they will choose the winner(s). The panel is required to select a single grand champion, but may make as many honorable mentions as they like.

The culmination of the award venue is the coronation of the year’s most promising new game design(s), and a fierce bidding competition ensues between all the world’s game publishers...

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

sedjtroll wrote:
I think it's a really good idea to have 2 different groups look at some of the games, but more than that will take too long and stop being constructive. I proposed a first and second cut, where all the games get played once, but only the better half or 1/3 of those get played again by another group.

Quote:
Also, people keep mentioning dividing the games equally among groups to playtest. In my proposal I suggested that the size of the group determines the number of games they will test. I think this is important so that small groups won't be held responsible for the same number of games a large group could test in half the time. Using a weighted average would allow each game the same amount of attention because it's not fair to the testers or the contestants if one group has to burn through games lest they not get to all of them while another can take their time and really look into each game.

In the first quote, I refered to this already, as in:

Quote:
4. Same as number one, except each group chooses 5 games. Those 5 games are then sent to another group to play. Those 5 are played and then 3 are sent to another group. Those 3 are played and then the best 2 are chosen. Each time games are chosen, the ones that are not chosen are not played anymore and have no consideration for the top ten games.

Since there might be more groups than 5 and a different amount of games for each group a similar system could be used, in this way:

The first 50 games are divided out evenly. The groups play them and score them. The best 15 to 30 games are then chosen, divided again and sent to random groups, except no group may play the same game twice. Then it could be brought down to the top ten games. Then, probably like Hippodice, we would let the publisher's reps at them and they will judge the rest. (I think this is how it goes, I'm not entirely sure if the top ten of Hippodice is judged this way...)

This means a game will be played twice by the playtest groups, but only more than once if it has scored high enough to move on to the second round of judging.

If each game has an average of 2 hrs of playing time, and there are 5 groups who average 10 games each, it will require 20 hrs of playtesting for the first round, and 6 hrs (15 games) to 12 hrs (30 games) for the 2nd round, so maybe a total of 32 hrs total. Now my group meets together for 6 hrs on a Saturday, but I doubt they would playtest the whole time, maybe half, so that's looking to be a max of 10 weeks. I don't know how long it takes for Hippodice, but that's about 2 1/2 months, without a break in between, with one game a week. (Of course, this would change from group to group depending upon the load, and some groups maybe able to do more than one game at a time, also reducing the time, but I know there will be some 4 to 6 hour doozies out there as well.)

Just rambling off, but for myself it's nice to see the figures involved in this. Hope my rambling and comments are helping.

-Steve

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

jwarrend wrote:
One issue I'd like to raise; I also suspect it's ok for a "core member" to be a playtester, but it probably won't be ok for an entrant to be invovled in the judging process.

I figured on this and fully support it. Even if you wouldn't play your game in this, you shouldn't be a participant.

sedjtroll
sedjtroll's picture
Offline
Joined: 07/21/2008
American Hippodice judging model

SVan wrote:
sedjtroll wrote:
I think it's a really good idea to have 2 different groups look at some of the games, but more than that will take too long and stop being constructive. I proposed a first and second cut, where all the games get played once, but only the better half or 1/3 of those get played again by another group.

Quote:
Also, people keep mentioning dividing the games equally among groups to playtest. In my proposal I suggested that the size of the group determines the number of games they will test. I think this is important so that small groups won't be held responsible for the same number of games a large group could test in half the time. Using a weighted average would allow each game the same amount of attention because it's not fair to the testers or the contestants if one group has to burn through games lest they not get to all of them while another can take their time and really look into each game.

In the first quote, I refered to this already, as in:

Quote:
4. Same as number one, except each group chooses 5 games. Those 5 games are then sent to another group to play. Those 5 are played and then 3 are sent to another group. Those 3 are played and then the best 2 are chosen. Each time games are chosen, the ones that are not chosen are not played anymore and have no consideration for the top ten games.

My first quote above addresses this- it's too much to have 3 rounds of playtesting, even if the number of games reduces every time. I think 2 rounds is "pushing it but necessary". 3 rounds would be nice but not pheasible.

Quote:
Since there might be more groups than 5 and a different amount of games for each group a similar system could be used, in this way:

The first 50 games are divided out evenly.
There it is again, by the way. My second quote above addresses this. based on your last sentance though I think you just meant "divided out" and not "evenly".

SVan
Offline
Joined: 10/02/2008
American Hippodice judging model

Quote:

Since there might be more groups than 5 and a different amount of games for each group a similar system could be used, in this way:

The first 50 games are divided out evenly.

sedjtroll wrote:
There it is again, by the way. My second quote above addresses this. based on your last sentance though I think you just meant "divided out" and not "evenly".

Yeah, I was thinking a little too fast. I mentioned above that the games divided up would be different for each group. I think that's the best way to go.

On my most recent post, I said 2 rounds would be enough as well. Before, I was thinking aloud some of the ways we could playtest the games. I think the best way is 2 rounds, maybe cut it in half (or close to half) and then get it down to 10 games for the industry reps to take over from there.

Anyways, I think we have a good basis, but we really need to know if anyone from the business is going to care that we're doing this. Otherwise, this planning could be all for nothing.

Steve

FastLearner
Offline
Joined: 12/31/1969
American Hippodice judging model

I proposed that each game be reviewed by two completely different groups a while back, to help reduce favoritism. I still think it'd doable.

As far as picking the number of games to move on to playtesting, I suggest we do it the way Hippodice does: each ruleset/picture/intro package is graded, with points for various things. You know in advance that you'll only want to judge x games, so you simply set the cutoff score at the point where, after points have been assigned, x games make the cut. In my Hippodice rejection last year the email specifically said "this year's cutoff of 6.8 points" (paraphrased).

The final judging -- and 6 is a better final number than 10, I think, at least early on until it becomes popular -- is best done at either Origins, GenCon (less good), the GAMA Trade Show (time is tight there, though), or the Gathering of Friends, as those are the places where you'll actually have a fair number of publishers together at once. If Alan is up for it and the Gathering is the choice, I'll be happy to bring the games with me and ensure that they're judged.

Sorry, time is still tight (though it will be MUCH better after the end of the year), but those are a few quick thoughts.

-- Matthew

Anonymous
American Hippodice judging model

Wow, you get on a plane and fly home after a business trip and the thread just explodes! But, that's a good thing!

Some thoughts:

I don't think we should underestimate the logistical impact of moving protos around the country. Serious consideration to the structure of the competition should be given to minimize this expense, as inevitably the expense will have to be borne by the entrants.

Regarding categorization: Why not let the entrants self-categorize? Or, more specifically (if we're going to have categorization at all), we can say "these are the categories" and let the entrant tell us in which category his/her design falls. This removes the burden from the judging committee (and the possible accusation that a game was mis-categorized) and puts the onus on the entrant.

Regarding normalization: Well, isn't this a grander problem than just game compentitions? How do you evenly grade worker performance? Movies? Here's an idea I've been toying around with (ever since hearing the term "firewall 9s" in the Air Force) and have been wondering how effective it would be. Each group that judges gets a pool of points based on the number of games they are asked to judge. (e.g. 5 points per game). Games are then awared between 1 and 10 points, but the total points awarded cannot exceed the pool (and actually should equal the pool total ro be completely fair). If you get a batch of equally good or equally bad games, scores should be fairly equal around the midpoint of the range. Very good games should score higher (6-8), exception games should score very highly (9, maybe 10), and all scores from different groups should be directly comparable. Granted, it would be possible for one group to get all bad games that, being equally bad, socre in the middle of the range, while another group gets all good games which also score in the middle of the range, thus making good and bad games scoring equally, but I think the probability of that is unlikely and could probably be avoided by careful dissemination of the protos by the selection committee.

I really like Setarcos' suggestion of all reviewers listing their favorites, then selecting all the #1's, then #2's, etc. until the target number have been selected. Excellent idea!

Regarding judging: (Idea thrown out for discussion) What if we had a series of standard questions to be answered by the playtest groups? Something along the lines of this:
5 = Strongly Agree 1 = Strongly Disagree

1. The rules were clearly written
2. The game was balanced and fair
3. The game was fun to play
4. I understood what I was supposed to do
5. I was able to visualize a winning strategy
etc. etc.

Regarding protos: I think we should keep the number of protos requested from designers to a minimum (1 per entry). We're judging game design, not how industrious a person is. Also, we should probably weight any judging of the proto components themselves very lightly, if at all. Again, the emphasis should be on the design, as the eye-candy factor can be addressed more thoroughly during production. Besides, a pretty game is already going to have a subconscious advantage in the judging, there's no need to give it any more of one.

Now, here are some topics I don't recall being addressed (but, I may have just missed them):

1. Advertising - How are we going to let the gaming community outside of BGDF know about the competition?
2. Final round judge selection criteria - Who do we invite and who do we not invite and why?
3. Organizational schedule - We've been bantying about LOTS of fantastic ideas, but at some point decisions are going to have to be made, for better or for worse, and ran with...most likely starting with who the game organizers are going to be (and by when does that need to be decided)? When will the rules be finalized? Etc.

RobBartel
Offline
Joined: 12/31/1969
Resurrecting this conversation

Jeff Warrend pointed me to this long-dormant thread after I made a recent post suggesting a N. American Hippodice equivalent so I thought now might be a good time to resurrect it.

It appears that this original idea died out for a number of reasons:

1) As a designer-centric forum, BGDF members were hesitant to get too involved as it would actually disqualify them from participating in the competition.
- Solution: While initiated by the BGDF, we need a separate organization to actually host and operate the competition. I've proposed the BoardGameGeek crew and have posted the idea to their forums to gauge their response.

2) The logistics of running multi-locational playtests seemed daunting. The costs of sending prototypes back and forth was one factor. The means of ensuring consistent ratings between multiple playtest groups was another factor.
- Solution: On both counts, I think BoardGameGeek might hold the answer. Prototypes could be hand-distributed to playtest group leaders at their annual BGGCon (November), brought home to be playtested with their existing groups, and then hand-returned at the subsequent BGGCon where winners are jointly chosen through shared playtests and where the next year's submissions can be distributed. As for ensuring consistent ratings, BGG is already a statistics- and database-driven website - it tracks the average review scores of every reviewer (some have reviewed over 1,000 games!), has Bayesian ratings by title, and so forth. If anyone's set up to do this, it's them. They could even open the initial rating of the rulebooks to the general BGG public prior to the Con.

3) There was debate about the notion of categorizing the submissions.
- Solution: I think France's 'Contest of Creators' handles this very well. Designers self-categorize their titles and the categorization isn't so much for the publisher's benefit but rather for the playtest groups. A group of mixed-age family-oriented casual gamers simply isn't going to be interested in playtesting a hardcore 7-hour historical battle sim. Categorization becomes a means of ensuring that each game is assessed by a group that understands and enjoys that genre.

4) The discussion hinged on the involvement of publishers in the final judging process.
- Solution: I don't think this is actually necessary. The players and playtesters are sufficiently equipped to make those decisions. Publishers are welcome to join the final BGGCon playtests, participate in the final voting, hand out specific sub-awards (Carta Mundi, for instance, hands out a separate prize for best card game entered in the Premio Archimede competion), meet with the designers in attendance, and even take individual prototypes home with them. Their main role, however, is as recipients of the competition's final report, where detailed assessments of all of the finalists are provided.

So, with that in mind, is there sufficient interest to warrant reopening this discussion?

Thanks,
Rob[/list][/list]

Scurra
Scurra's picture
Offline
Joined: 09/11/2008
American Hippodice judging model

Well it died out here because a Yahoo group was started to try and widen the discussion beyond BGDF folk. But that group has been dormant for a while too; I think Real Life got in the way again for various people.

FastLearner
Offline
Joined: 12/31/1969
American Hippodice judging model

Real life did indeed interfere for me. That said, I'm still paying for a placeholder website every month and certainly haven't forgotten about it.

In April I'm going to nail down final judging commitments with publishers. While I'd gotten the agreement of several previously, I want to finalize it, with judging of an actual contest next year. I plan to talk to representatives of Abacus, Alea, Asmodee, Days Of Wonder, Face2Face, Hans im Glueck, Hasbro, Mattel, Ravensburger, Rio Grande, Schmidt, Wizards Of The Coast, and Z-Man Games, with the intent of getting a commitment from at least six of them. I'm putting together a little paper presentation and spiel (in the classic English language sense) next month to that end.

I'll comment more later, but I think there's something to be said for BGG involvement. I can take some of Derk and Aldie's time then, too.

-- Matthew

jwarrend
Offline
Joined: 08/03/2008
Re: Resurrecting this conversation

RobBartel wrote:

1) As a designer-centric forum, BGDF members were hesitant to get too involved as it would actually disqualify them from participating in the competition.

Actually, as this thread shows, designers were offering their groups left and right for playtesting. But I claim that this was the case because the idea was still in the abstract. Once the hammer dropped and those who would be playtesting were excluded from entering, I think we would have lost some support.

Quote:
I've proposed the BoardGameGeek crew and have posted the idea to their forums to gauge their response.

Could you post a link to this discussion? The new 'geek is impossible for me to navigate, but I'd be interested in following this and seeing what the folks say.

Quote:
- Solution: On both counts, I think BoardGameGeek might hold the answer. Prototypes could be hand-distributed to playtest group leaders at their annual BGGCon (November), brought home to be playtested with their existing groups, and then hand-returned at the subsequent BGGCon where winners are jointly chosen through shared playtests and where the next year's submissions can be distributed.

I don't think it will be quite this easy, but this may be a route to a solution. There's no question that each game will need to be evaluated by more than one group, and that was one of the difficult points in our process; we were pretty sure that each game would probably be tested only by two groups, for logistical reasons, but that just doesn't seem like enough (to me anyway...)

Quote:

As for ensuring consistent ratings, BGG is already a statistics- and database-driven website - it tracks the average review scores of every reviewer (some have reviewed over 1,000 games!), has Bayesian ratings by title, and so forth. If anyone's set up to do this, it's them.

True, but keep in mind that it's not just a statistics excercise. The tricky bit is comparing a game rated by two groups to a game rated by two completely different groups. How do you ensure that the games are being measured by the same yardstick? There are ways to do it (I think we came up with a pretty good method), but it's not an easy problem.

Quote:
They could even open the initial rating of the rulebooks to the general BGG public prior to the Con.

I think just the opposite in this case. I think that for the initial downselect, this should be done only by those that will read every single rulebook. That's daunting and will probably only be a group of 5 or 6 folks max. I think that's how Hippodice handles the first cut, and I think it's the fairest way to do it.

BGG is, for better or worse, a community entirely sold out to the concept of "buzz". I can't imagine a less fair method to do a downselect than to open it up to all comers. In a nearby forum, you'll get posts like "check out Game X, it looks really good!", and then Game X gets an inordinate amount of attention.

Now, on the one hand, that's actually a good thing, since the whole mission of the project is to expose these games to the world. So I like the potential approach of running the contest out in the open. But from a fairness standpoint, decisions of consequence should probably be restricted to only those who are agreed upon at the inception of the contest.

Quote:
4) The discussion hinged on the involvement of publishers in the final judging process.
- Solution: I don't think this is actually necessary.

This actually wasn't the hard part, and it's quite essential to the process that publishers care about the outcome. If we're to hold this contest and they aren't interested in it either way, then it's a waste of time. Including them as judges is one way to increase the "buy-in" of the publishers, and improve the overall image of the contest, so it's worth pursuing if possible. And as Matthew indicates, he made some pretty good inroads on this front.

-Jeff

RobBartel
Offline
Joined: 12/31/1969
American Hippodice judging model

Thanks, Jeff and Matthew. It's clear the two of you have put a lot of thought and effort into this project and I'm glad to hear that the idea isn't as dead as this thread had first led me to believe. As you can probably tell, I come from the 'naive optimist' school of thought and am of the opinion that, if we can just get the ball rolling, the kinks will get worked out and the necessary pieces will fall into place. That said, it's always good to have some sober second thought.

The Geekthread can be found here: http://www.boardgamegeek.com/article/821011#821011. I posted it in the game design forum there, which is possibly a redundant location for it - perhaps the BGG General, BGG Suggestions, or BGG.CON forums would be a better place for it. So far, there's only been one post, essentially scurrilous.

As for the issue of statistics, my point is simply that the BGG stats provide us with a good means of achieving that consistent yardstick. Their ratings of the submissions can be compared against their existing ratings history at the Geek and weighted accordingly.

For the public rulebook rating, it depends on the point and purpose of that initial phase. My assumption has been that the initial phase is simply for weeding out the obvious duds as opposed to actually deciding between otherwise viable candidates. Also, because the rulebooks can be handled electronically, it's the one stage where the process can be opened up to a larger public. That said, your concerns are valid ones. Perhaps the 'buzz' factor of the BGG public is best steered towards a special 'Player's Choice' award chosen from the pool of privately selected finalists.

I agree that publisher apathy would be a death knell to the whole idea and I'm glad that it isn't proving to be the hard part. Buy-in is important, as long as we provide the publishers with a range of commitment levels. Not every publisher has to be a sponsor, not every publisher has to be a judge, not every publisher has to give out a separate sub-prize as part of the competition. The way I see it, at any rate, is that those are opportunities for expanded involvement in the competition. There are plenty of publishers not on Matthew's list, for instance, that would still value receiving the competition's final report in the mail. As an example, if Steve Jackson Games read about a humorous game like "Who Stole Ed's Pants" in the report, they may very well approach the inventor to request a prototype because it fits into their overarching brand.

One other point regarding the date - Hippodice is in fall while Premio Archimede and Contest of Creators are both in spring. With that in mind, it may make sense to time the North American version to coincide with Hippodice and work to foster relationships between the four competitions. The 'two in the spring, two in the fall' approach has a nice symmetry to it, essentially.

Matthew, if you'd like to have someone read over your presentation as a final sanity check before sending it off, I'm happy to do so. If there are any other ways you feel I can help, let me know.

Warm regards,
Rob

jwalduck
Offline
Joined: 09/06/2011
American Hippodice judging model

If the judging would work best with two different groups testing each game why not have each entrant enter two playable proto-types?

I would see this as working this way:
1. The entrant sends two playable prototypes to a central administrator with an entry fee. Sending them both to the same address saves on postage, also means the entrant will not know who will be judging their game.
2. The administrator on recieving all the entries divides them up and and sends each of the testing groups' bundled. Again saves on postage and give the organisers more control over the process.

The question is sending two prototypes that much more difficult than one?

I presume with the distributed model of judging that the central organisers will make no warranty on games surviving the the judging process, let alone being returned.

All entries would therefore be on the basis on no returns and it would be in entrants' self interest to enter prototype copies, not prototype originals. If you are making one copy, then making two should not be twice the work.

Syndicate content


forum | by Dr. Radut