Steve Gillmor's attention.xml sounds complicated - can't wait to see some running code

As an ex-software developer, this sounds complicated but doable and is the best explanation of attention.xml (maybe it should be called attenion.RSS :-) ?) yet. I hope to see some running code and/or discuss this with Steve at Gnomedex (assuming I can scrap together the $400 bucks which I believe is a good deal for a 2 day conference with breakfast and lunch included!).

From Waiting for Attention… Or Something like It .:

QUOTE

To Dare's point, here's how I would describe an appropriate information triage algorithym:

1. Separate duplicate links into two bins, those with unique citations, and those with a delicious/furl-like tagging format. Next, correlate duplicates in both groups with the author's attention rank (where on the OPML or other list the feed resides based on my priority of reading them. The goal here is to separate the attention ranking dynamic from the presentation of feed data. I don't want to see the same link over and over again, especially if the link is disguised by different text in multiple posts, but I do want to know if 6 of my top influentials or reputational filters have cited the same post and use that metadata to push the item and/or feed up my priority list.

2. Throw away or lower the priority of duplicate links, reorder the unique citations around their relative ranking in my attention list, and add other posts by the same authors. If my reading characteristics of a certain feed show I typically read all of the author's posts, include them all. If I typically only read one or two, batch the rest together at a lower priority. Apply vanity and topic feeds next, pulling in keyword hits at a weighting corresponding to how many duplicates there are with the previous scan. Throw away or lower the types of vanity or blogroll hits that typically populate vanity feeds (I want to see hits for "Gillmor" but not those of articles I wrote, hits on my brother but not before hits about me, except those about the Long Tail video that cite my brother instead of me, and so on.)

3. Now assign unique ids to this sort and track my readership patterns (and those of any who subscribe to my attention feed that make that data public to me) not just for what I read but what I don't read. Then apply that weighting data to incoming data on an updating basis. This should cull repeated hits that somehow escape my first layer of filters, and also provide valuable data for marketers should I deliver that back to the cloud or to some private cloud under contract.

4. Now we get into discovery-items that have escaped both my subs and search feeds. First I go to my reputational thought leaders, the subs and recurring items that bubble to the top of my attention list. It's a second-degree-of-separation effect, where the feeds and items that a Jon Udell and a Doc Searls and a Dave Winer prioritize are gleaned for hits and duplicates, and returned as a weighted stream. In turn, each of those hits can be measured for that author's patterns and added in to provide a descending algorithym of influence. All the while, what is not bubbling up is driven further down the stack, making more room for more valuable content.

It's important to remember that this is an open pool of data, massaged, sliced, and diced by not just a Technorati but a PubSub or Feedster or Google or Yahoo or any other service, and my inforouter will gladly consume any return feed and weight it according to the success (or lack of it) that the service provides in improving my results. Proprietary services can play here as well, by providing their unique (and contractually personalized) streams as both a value proposition for advertisers and marketers and as an attractor for more users of their service.

UNQUOTE