Some assembly required

This blog is about unimplemented ideas. At least until they get ticked off; I suppose a few eventually will have implementations too, but fresh posts never will. Because that's the primary purpose of this blog: keeping track of ideas I'd like to dive into, or problems I'd like to see solved. Feel free to join me in implementing, or further developing these ideas. I don't mind working solo, but it's a whole lot more fun working in concert!

Tuesday, November 29, 2005

external link

MochiKit regexp visualizer

This RegExp visualizer was rather well executed; in some ways better than the one that has been sitting unimplemented in the back of my head ever since I did my first hack using the RegExp match object's index and length properties to do I-don't-remember-what (count word lengths and embolden all words longer than N characters, perhaps).

It lacks only one thing: clearly marking what parts of the input text is part of what match paren pair index. My own plan was to make the entire input text visible somewhere in the page, and then style match 0 (all matched text of the regexp) bold, and 1 (first paren), ...all the way through paren N with differently colored border-bottom:s, at growing padding-bottom distances (since a particular character can be part of many matched parens at the same time, and hence have multiple underlines), and appropriate title attributes for each such span, attributing which parens it was a part of.

MochiKit, again, seems like a really pleasant framework to work with; I've got to play with this. Really. This just might be the ideal entrance door to doing it, too.

The half a second delay until things happens is just silly, though; it ought to be more like a tenth or even fiftieth of a second; computers are fast these days, and this feels like it would be an AJAX application running home to a mother server for answers, rather than done fully client side. It's sluggish, and for no good reason, either.

2 Comments:

Blogger Bob Ippolito said...

Well, unless you write a RegExp parser yourself, you're not going to be able to correctly extract the start and end indices of the groups from the match. Of course, you can search for the result in the string, but it's definitely possible to construct regular expressions that are not correctly displayed with that method (think: nested parens). I'd definitely have implemented that feature if I had thought of a way to do it correctly without writing too much code.

The reason I wrote it was to debug a very big, ugly, and slow regex. With a small delay, it was bad UI because JavaScript execution blocks all user interaction. I cranked it up to half a second so that it'd only try and parse/update when I was clearly ready to see something. I left it that way for the example to emphasize the pattern it was implementing.

That said, if you'd like to improve upon it I'll gladly accept changes. You can shoot them to the mailing list, trac, and/or me directly via email.

 
Blogger Johan Sundström said...

Ah, I didn't remember the provisions for paren match locations were as bad as they are, and for solving the generic case with some guarantee on upper bound execution time I might be prepared to agree.

I might try a solution that will in practice often work, without solving the generic problem (it will fail to find the location under some conditions.) Once you have a match, to determine the paren match spot, iterate through all the possible substring matches for each matched paren. When there was just one, we know its location. When there are more, try executing the regexp again with the n:th substring match replaced for something else. If the regexp no longer matches, we can be reasonably sure that was the match. (It could be worth iterating through all matches and verify that this was the only spot that yielded a non-match -- if we get several, we only really know that we don't know much).

Regarding delays, I think I'll try some in-page gauging of how long (wall) time compiling and executing the regexp takes on average on the machine that runs the script, and adapt the delays accordingly (by some margin), to do as well as each client would allow without ruining the user experience.

That said, if you'd like to improve upon it I'll gladly accept changes. You can shoot them to the mailing list, trac, and/or me directly via email.

I'm not sure where either of those are in URL space (I'm still not into mochikit community and usage), though presume the mailing list would refer to mochikit at googlegroups. Anyway, I'll try to get in touch if and when I do something about this that improves on the status quo.

 

Post a Comment ...in a popup window

Backlinks:

Create a Link

« Home