-
Observation-based vs. active harvesting of human intelligence
I tripped over a reference to "artificial intelligence" the other day. I guess I tripped because it's not a term I hear very much any more. Maybe it's because I hang around with a lot of geeky people, but it seems quaint and maybe a little pretentious.
Instead, I hear about a lot of very specific techniques: Bayesian networks, collaborative filtering and the slope-one algorithm. I guess those fall under the "artificial intelligence" umbrella, but often it's really a matter of harvesting human intelligence and then acting on the results.
Google has just turned on a change that reportedly has been in a fairly wide test, giving logged-in Google users a chance to vote up/down specific items returned by search queries.
This is a huge change for Google, which made its mark by observation-based harvesting of human intelligence. Both Google Search and Google News observe the results of human decisions and use those observations to recommend items.
Google Search places a high value on inbound links, which are considered to reflect whether a page is "authoritative." If a lot of people link to a page, it must be good. This is why blog spammers prowl the net, posting comments that sneakily embed a link to their websites.
Google News looks at the relative prominence that has been given to a story by editors at thousands of news-related websites, then uses that information to help design its top-level presentation. Rather than reflecting the news judgment of an editor at Google (there isn't one), Google reflects a sort of broad consensus among human editors.
The new Google feature -- which it calls SearchWiki -- switches gears and asks people to take an overt action to provide it with information about human judgment.
Let me bring this home to the world of news sites. This is a good thing because the scale and impact of Google will significantly broaden the pool of people who are in the habit of explicitly evaluating items on the net. This is a habit we can use to our advantage.
Many news sites are adding "rate this item" features, then using that to display lists of actively "top rated" stories, often paired with observationally ranked "most emailed" and "most viewed." That's one way to use the information, but it's a fairly naive way.
I'm far more interested in how we might use this information to generate personalized recommendations using collaborative-filtering principles and that mysterious slope-one algorithm that I mentioned.
As so often is the case, there's already a Drupal module for that, one that originated as a 2006 Google Summer of Code project. As we collect rankings, ratings and other overt evaluations on our websites, I'm looking forward to pointing the recommendation module at that data and seeing what comes out of it.
-
It's everybody's game now
Yesterday I began listing assumptions and assertions that are part of our thinking about our evolving website management system. Here's another: It's everybody's game now.
Like it or not, the organizational model that says "you guys work for the newspaper, and you other guys work for the website" is becoming unsustainable. Newsrooms -- or, if you prefer, "news and information centers" -- must become multifunctional, multimedia, multiproduct-focused. Call it what you want: convergence, integration, the end of the world as we know it. It just has to happen.
Tools shape the user. Newspaper journalists carry a terrible burden in the shape of a toolkit focused only on print. That newsroom CMS you all love to hate is only part of that picture, but it's certainly a big part. Other parts of the toolkit may be harder to recognize. Writing styles (inverted pyramid, anecotal lede, various approaches to headlines) are tools, too. The concept of a "story" as a linear, written "article" is a tool, too -- one that is obsolescent.
We can't replace all these tools at once, or even understand how they all should be replaced, but we can go through the kit and look for barriers to performance. If the system requires a knowledge of HTML, that's a barrier. If the system requires that you be in the office to get anything done, that's a barrier. If the system doesn't support the granting of access in appropriate ways, that's a barrier. If the system requires that everything be built around a piece of text and doesn't support video, audio, Flash interactive components, and -- very importantly -- the topic-driven integrated approach that Jeff Jarvis described recently, those are unacceptable barriers.
Craft specialties persist, and not every journalist should be expected to perform every task. But the tools should allow any journalist to play an appropriate role in any medium at any time. Because it's everybody's game now.
-
No editions, please
While we're treading water pending the rollout of our Drupal-based site management project, I thought it might be worth mentioning some of the principles and assumptions behind it. Here's one: No editions, please.
I've seen developers put a great deal of effort into creating Web content managment systems that are intended to reflect the edition structure -- daily, weekly, monthly, whatever -- of a legacy (print) product.
Don't do it. We live in a 24x7 world. The Internet is always on. Information should be available when it makes sense.
I'm not denying the existence of circadian rhythm, nor am I suggesting that news organizations forget all about daily cycles. Most people get up in the morning to engage with a new day, and most newspapers will still be printing daily editions for awhile yet. We still need to plan for days, and publish some information with respect to daily cycles.
But it need not, and should not, rule our lives, or dictate the organizational metaphors we use to display information online.
-
One more reason why feds won't bail out newspapers
Newsosaur Alan Mutter lists a series of reasons why newspapers won't see any of the bailout money that's being passed around by the Treasury.
Here's one more: Diversity in media ownership is one of the incoming Obama administration's agenda items, and consolidation of ownership -- generally funded by heavy borrowing -- is very much a part of the newspaper industry's problem.
New-media folk don't like to admit it, but it's not all about the Internet. We have an ownership crisis.
-
Beware those derivative numbers
Editor and Publisher reports 'Time Spent' at Top Sites Still Declining." This is another case where numbers can fool you.
The Nielsen Online data "tracks the average time spent per person at a site during October." As the story notes, unique users "soared."
"Average time spent" is calculated by dividing the total online time by the number of unique visitors. When the uniques go up, the derivative "average" will drop unless the new users collectively match the behavior of the old/regular users.
Often they don't. There were a lot of election-related stories in October that may have attracted new, casual users who may have had no natural interest in anything else on the website. They may have been referred by the Drudge Report, Daily Kos or some other specialty site. Or they may have come from search engines. If you "get lucky" with a story that scores huge out-of-market traffic, you're going to "get unlucky" with your averages. No way around it.
Search-engine optimization is a change many sites made between 2007 and 2008. On a news site, SEO may drive up unique users by bringing in out-of-market visitors with specific interest in only one story. These folks will generate only one pageview. The result will be depressed time-on-site averages.
If you're creating an incentive program for a site manager, you have to be very careful of issues like this. The time-on-site average is a useful metric in site management, but as always, it has to be considered in context with other measures.