Saturday, May 12, 2012

A call for a meta-H index

A sad aspect of modern academics is rankings matter. For computer science departments the main ranking is from the NRC, and as the CRA has pointed out severe methodological problems. John Regehr and I a few years ago did an informal pilot of ranking departments by citations and our impression is that this worked at least as well as any current ranking method, and was almost fully automatable. Google scholar has recently added some new features that will allow such a ranking to be EASY. For example, here is Pat Hanrahan's page. Note google is computing his h-index which for Pat is 61. This means pat has 61 papers with 61 or more citations, but he does not have 62 papers with 62 or more citations. This is a more stable to outliers than citation counts, and is more robust from error. The problem of course is that being author on a paper doesn't mean you contributed much (or in some cases anything!). For example, perhaps my most cited paper is "Color transfer between images." I am pretty sure I read it, and may have even made some suggestions, but all the creative ideas and coding were by the other guys. I know of some CVs where all of the papers are of this form. I also know many authors that have no papers like that on their CV. Trying to sort that out would be a lovely PhD data mining dissertation topic.

But from that highly imperfect individual data, the related "meta-h index" for an organization does measure something I think is likely more meaningful. The meta-h index for a department is 30 if there are 30 group members with an h-index of 30 or more. One key advantage of that index is that emerging universities outside of the traditional big name universities can quickly get recognition. I imagine we'd see some real players in South America and Asia under this measure that might not show up for decades under the NRC measures.

Further, one could rank research groups, colleges, and whole universities or companies this way. It wouldn't be perfect, but would encourage papers that would ultimately get lots of citations, rather than encouraging papers that follow today's fashions. This would be a good side-effect, and would help bring about the much-needed demise of paywall publications.

I sent this suggestion to google. Please join in and send them more. It would be easy for them, and would encourage authors to enter their information, so would be good for google too.


Peter Shirley said...

James O'Brien raised some sensible objections to h-index and meta-h index measures. A disclaimer is I agree with all criticisms of all objective measures. The question is whether they are better in usefulness than subjective measures. My personal opinion is that they are. And the measurement determines the system of course, so good side effects are what I seek from an engineering perspective.

ccie security bootcamp said...
This comment has been removed by a blog administrator.
Eric Haines said...

My favorite bit: "For example, perhaps my most cited paper is "Color transfer between images." I am pretty sure I read it" - nice, and I'm not poking fun. This does happen, where you contribute something but don't see the final paper. It's those 21 author papers that cause me to wonder. Perhaps weighting by number of authors?

Anyway, it's an interesting topic. A tricky bit is judging how any of these measures actually relate to how influential the person or group is in the research community.

Peter Shirley said...

@Eric. I think the key is we already have a system in place that is garbage, so I am emphasizing side effects. I think all the side effects are positive relative to the current system, and it is less work than the current system.