Some assembly required

This blog is about unimplemented ideas. At least until they get ticked off; I suppose a few eventually will have implementations too, but fresh posts never will. Because that's the primary purpose of this blog: keeping track of ideas I'd like to dive into, or problems I'd like to see solved. Feel free to join me in implementing, or further developing these ideas. I don't mind working solo, but it's a whole lot more fun working in concert!

Saturday, February 23, 2008

Greasemonkey @require libraries

Problem:

There is a shortage of well abstracted micro-libraries to @require in Greasemonkey scripts.

Solution:

Define useful, minimal feature sets, and implement and/or advertise their specs to prospective other implementors.

Interfaces:


$x(xpath, root) / $X(xpath, root)

$x evaluates an XPath expression, returning an Array of nodes, a string, number or boolean, depending on the return type it evaluates to. $X returns the first match only (in document order). Both take an optional context node to resolve the expression from.

wget(url, cb) / wget$x(url, cb, xpath) / wget$X(url, cb, xpath)

wget does a GM_xmlhttpRequest fetch of an entity and renders it into a DOM that can be queried by XPath, and gets passed on to the callback (second parameter: url). The wget$x methods also do the slicing and pass the $x result, the document, url and xhr object to cb.

MD5 singleton

MD5.string(data) yields the binary MD5 hash, MD5.hex(data) will hex code it prior to returning the result. MD5.base64(data) similarly for Base64 encoding.

SHA1 singleton

Like MD5, but for SHA1 instead, and the string method is called "hash". Also takes an optional second argument 8 or 16 for string width. MD5 ought to be updated to mimic this API instead.

Diff singleton

Renders a text/html diff of two text/plain (unfortunately) inputs. Usage:
var diff = Diff.diff_main(text1, text2);
Diff.diff_cleanupSemantic(diff);
var html = Diff.diff_prettyHtml(diff);


Notes:

  • Incomplete.

  • Update incrementally.

Saturday, January 27, 2007

Track changed URLs via subversion

Problem:

Online resources don't provide a standardized way of tracking changes, much less one well integrated with feed readers and other technology suited for news coverage.

Solution:

Setup an agent that polls tracked URLs for changes, committing any changes to a subversion repository. Provide feeds, linking to diffs and noting their sizes.

Elevator pitch:

Ever tried to track changes to some online API, TOS or similar, by hand? Of course not; it's way too messy and too much laboursome work. And there are just too many of them, anyway; staying on top of each is impossible. It's not a task for humans.

Ever thought of how neat it would be if you could just point some automated agent at your URL of choice, whenever you find the need, and have it magically track that URL, commiting new changes, once they appear, to a subversion repository?

Get a changes timeline you can tune in to instead, to peruse the diffs at your leisure? Well, that is the basic idea of this service. Coupled with feeds for your feed reader, of course, so you can forget about them all, while nothing happens, and be alerted, the times when there is some action.

Without having to opt in on lots of blogs and the like, who may or may not announce the news, and who will most certainly not announce these news in any standard format easy to discover and digest. See, that's what you need Swwwoon for.

Method:

Per host, or perhaps otherwise intelligently grouped cluster of URLs, iterate:

New URL:

  1. Fetch URL and Content-type header

  2. Store as ${repository}/${URL hostname}/${URL path}

  3. svn propset svn:mime-type ${Content-type} ${path}

Old URL:

  1. Fetch URL and Content-type header

  2. Update file contents and mime-type in working copy.

  3. svn diff ${path} | format_commit_message

svn ci -m ${commit message} ${paths}

Requirements:

  • Interface to add an URL to track (web page, bookmarklet)

  • Change tracker agent run at regular intervals

  • Feeds generator; per user, per repository (optional), aggregate all

  • A database of:
    • urls

    • users

    • what users track which urls

    • optional: urls & credentials to remote repositories

Interface:

Add URL:

http://swwwoon/add?url=[URL encoded URL]

Feeds:

  • http://swwwoon/feeds/user/${user public id}
    - all the user's tracked URLs

  • http://swwwoon/feeds/url/${repository path}
    - changes to a single URL

  • http://swwwoon/feeds/subtree/${partial repository path}
    - changes to all URLs in the subtree rooted at the given prefix

  • http://swwwoon/feeds/domin/google.com
    - anything covered at *.google.com

  • Perhaps combinations of the above

Notes:

  • Multiple users tracking the same resource don't waste resources in proportion to their numbers. This is a very good property.

  • Some normalization scheme may prove necessary to cover URL queries too, if present. Or, alternatively, plain disallowing them.

  • Making use of HTTP/1.1 If-Modified-Since, when possible, might prove a worthwhile optimization. It also is likely that won't catch changes of Content-type due to bugs in web servers or their configuration, so it is likely worth verifying with a HEAD request, even when given a 304 Not Modified.

  • Good commit messages aren't in ready supply, but listing content type (for new files, and from/to, when changed), file lengths (in lines, too, for text/* and some xml formats) and, for text variants, diff sizes (+20/-3 lines), are good for a start. Machine readable format, so the feed generator can present a set of nicely annotated links to online diff browsers (via Code Librarian or Trac, for instance), is good.

  • Passing a private user id cookie to associate the URL with your account.

  • Good user id:s are chunks of opaque random ASCII. For instance, a cryptographic hash of a private key and an auto_increment user number. Forcing login/passwords on users is annoying; invent good private keys instead.

  • A public id can be shared with others without issues. Given the open nature suggested here, allowing anyone to see anyone else's tracked URLs, plain integers would work.

Friday, June 30, 2006

Roll your own full post feed

Problem:

Bloggers or other feed providers that don't treat visitors with a full feed.

Solution:

Convert a partial feed to a full feed.

Method:

  • Scrape a partial feed for URLs to full posts.

  • Cut out the portion of the page that contains the post.

  • Encode it into another feed, based in all other aspects on the partial feed.

Requirements:

  • (partial) Feed URL.

  • (per-feed) XPath selector that slices out the post content node.

  • Web server to run the feed-to-feed translator.

Interface:

http://[base url of f2f translator]?feed=[URL encoded feed URL]&post=[URL encoded XPath expression that slices out the full post content node from a full post page]

...producing a new feed to the specs of the two parameters.

Notes:

  • It's probably a good idea to limit the feed handling to one feed format.

  • I would adopt ATOM, and put the Google Reader backend to use for the conversion step.

  • Might prove useful to adopt some caching strategy to only perform each URL + XPath combination once as subscribers fetch the feed and as new content shows up.

  • OTOH, that would probably also render a need for cacheability checks. Cacheability metadata from the feed and HTTP headers of the post URL itself should provide enough guidelines.

  • It's likely that Google Reader (or other online aggregators) could be put to good use for the republication and caching layer, granted that feed subscribers are routed through that service (via HTTP redirects), whereas the Google Reader spider invokes the fetch-and-compose engine.

This isn't an advocacy post; the web is simply mine to consume however I please. I pick my browser, my feed reader, and, true to my habits, my way of browsing - including when and why I opt to take a visit to a site for reading a post, and when I stay in my feed reader.

Command line tools for XPath extraction of data from a given HTML file or URL might prove useful for quick prototyping; any available?

Monday, March 27, 2006

Generic image browser Greasemonkey script

I used to keep two bookmarklets around to do quick image browsing from directory lists, and the like:


The first one even had some keyboard commands that once upon a time enabled zooming to the next and previous image and centering the present image vertically by pressing shift, ctrl and return, but I believe the code rotted sometime, and either way bookmarklets are painful to maintain if you don't keep a copy around of the code prior to minimization. I tend to misplace such data.

Anyway, I'd like to revisit the idea and make this into a handy Greasemonkey script I can keep around and perhaps auto-invoke on some pages too. The only problem I want to address first is finding some really good photo album viewer / image browser to raid some good UI ideas from. On invocation, the script should proceed to:

  • Pick out all linked images in the page, as do the above bookmarklets.

  • Rewrite the page to drop all prior content, replacing it with the album viewer.

  • Add convenient browsing hotkeys for operations like "drop image from album view", "focus previous/next image", "zoom image to full screen"; any others?

I'm also considering something similar to the configuration screen of Mark my links for adding sites or URLs matching some given regexps to always invoke the album view for, right after the page has loaded. Similarly for pages whose referrers match given regexps.

But first I'll have to find some good album viewers. Suggestions? While a bit off target, LightBox (and the (and JQuery greybox redux, for that matter) has some nice visual properties it might be worth lending too, though I have yet to come across a neat and tidy image browser, without clutter soup. I'm thinking something about as tidy as the Google (Web) Search front page, of a few years ago (more clutter has found its way there since, though it's still good).

Tips and feedback very welcome.

Friday, February 03, 2006

Link verifier bookmarklet / Greasemonkey script

Create a bookmarklet that, perhaps in concert with a Greasemonkey script (to spider to off-site links) processes all a[@href] tags in the page invoked on, checks whether they work, and mark those that did not, for instance with a style.textDecoration = 'line-through'.

Hm. Or maybe even a Greasemonkey script which does that and gets invoked via the greasemonkey menu scripts can register themselves for.

Thursday, December 22, 2005

Graphic Emote Greasemonkey Script

Doing graphic emoticons on Blogger is possible. I've done it on my personal blog since I started. What I do is simple -- I write the post normally, as if useing text-emotes, and then go back through the code when I'm done and manually change them all to something like :
<img src="http://www.xanga.com/images/winky.gif" class="emote" alt=";)" />
which works just fine. Why do I use Xanga emoticons? Well, despite how ugly they are, I prefer them to text-emotes. If I found another service with easily-accesible emote images I might switch to using them.

The idea, then, is to make a greasemonkey script that runs on clicking 'Publish Post' and automatically goes through and replaces certain emote strings ( ;), :), :D, :P, etc ) to the appropriate image-tag code.

Friday, December 16, 2005

Tagged OPML to navbar topic links

An OPML list of feeds can have a set of tags added to each feed, for instance for making topic navigation easier in your feed reader -- Google Reader, for instance. Given a (well kept) OPML feed, with quality tags for good feeds you would want to recommend on your site, it should be a rather simple thing, autogenerating a navigation bar of feed links, broken down by topic.

Unfortunately the OPML feed does not provide both feed and blog URLs, at least not those exported from Google Reader, so it is just half way to a site navigation panel of related sites, but it is a start, and the missing bits would probably be easy enough to pick out of each feed linked. ATOM feeds, for instance, seem to have a nice /link[@rel="alternate"][@href] attribute that should be a good pick for such links.