Perl: substituting without loading a big file

slaniel | Uncategorized | Saturday, April 29th, 2006

This is perhaps related to a question I asked a while ago, but it’s clearer. In MySQL dumps, one has lots of chunks that look like so:

CREATE TABLE tablename ( id int(11) NOT NULL autoincrement, type tinytext NOT NULL, value text NOT NULL, score int(11) NOT NULL default ‘0′, trust int(11) NOT NULL default ‘0′, comments text NOT NULL, added datetime NOT NULL default ‘0000-00-00 00:00:00′, addedby tinytext NOT NULL, lastused datetime NOT NULL default ‘0000-00-00 00:00:00′, usedcount int(11) NOT NULL default ‘0′, userreviewed enum(’yes’,'no’) NOT NULL default ‘no’, PRIMARY KEY (id) ) TYPE=MyISAM; 

To convert a table to InnoDB (the newer, sleaker, sexier database engine), one just replaces TYPE=MyISAM with TYPE=INNODB (modulo capitalization). So then if one were to write a Perl script for this, one would do something like

my $infile =~ s{(CREATE TABLE.*?)TYPE=MyISAM}{$1TYPE=INNODB}gsmi; 

where the sm bit at the end is necessary because the input is a multiline string. And that’s where the trouble comes in. I can’t see any way around reading the whole file in first, via

$infile = join '', <STDIN>; 

Now $infile contains a potentially huge object; I’m dealing with now with a 250-megabyte database. Presumably Perl has some built-in intelligence to deal with such cases, but I’m not sure how it would solve the problem. What one wants is to do this line-by-line —

while(<>) { doSomeStuff(); } 

— but that’s impossible here, because the CREATE TABLE bit starts on one line, and the TYPE="MyISAM ends on another.

So is there any way to do this substitution without reading in the whole thing?

I’ll go do some tests now to see if Perl actually does suck up as much memory on this as I suspect it does.

P.S.: Indeed it does. In fact after slowing my machine to a halt, Linux killed the script.

This must be a well-known problem with a well-known solution. And of course I really want to be using a context-free grammar here rather than a regular expression. But the problem of loading in the whole file before I can parse it will still be with me even if I use such a grammar.

Libertarian sci-fi

slaniel | Uncategorized | Friday, April 28th, 2006

L. Neil Smith’s

works include the novels Pallas, The Forge of the Elders, and The Probablity Broach, each of which won the Libertarian Futurist Society’s annual Prometheus Award for best libertarian novel.”

Sign me up! Libertarian sci-fi! Yes! Award-winning libertarian sci-fi, no less.

The great failing of all the sci-fi that I’ve read is that it takes one idea that would make a nice pamphlet, and expands it into a novel with terrible characterization and ridiculous dialogue. Now imagine the libertarians — known throughout the world for their subtle policy proposals and rich understanding of their fellow-man — giving out an award for the sci-fi novel that best exemplifies libertarianism.

I’ll be honest with you: I shed a single, proud tear when I thought of that.

Munging MySQL logs

slaniel | Uncategorized | Friday, April 28th, 2006

I have a bunch of logs from MySQL, all of whose entries look like so:

060424 9:25:38 41145 Connect dbuser@localhost on 41145 Init DB db 060424 9:25:39 41145 Query SELECT optionname, optionvalue FROM wpoptions WHERE autoload = ‘yes’ 41145 Query SELECT * FROM wpusers WHERE userlogin = ‘admin’ 41145 Query SELECT metakey, metavalue FROM wpusermeta WHERE userid = ‘1′ 

and I needed to extract just the queries — i.e., get the above into this format:

SELECT optionname, optionvalue FROM wpoptions WHERE autoload = ‘yes’ SELECT * FROM wpusers WHERE userlogin = ‘admin’ SELECT metakey, metavalue FROM wpusermeta WHERE userid = ‘1′ 

It’s a trivial text-munging task, but text munging is what Unix does best. The following does the trick:

for i in find . -type f -not -iname '*_queries*' do grep Query $i |sed ’s/^.*Query\s+//’ > \ ${i//.log/_queries.log} done 

There’s a better way to do this with xargs, probably, but the above did the trick. A large collection of these quick hacks to manipulate text makes me love Unix.

P.S.: It was actually a little harder than I thought. The trouble is that sometimes queries span multiple lines, and only the first line contains the word ‘Query’. So one has to be a little more creative to parse MySQL log files. Count one “log entry” as any number of lines whose first line begins with some amount of whitespace followed by a number, and whose subsequent lines do not. So then break up the log file into log entries, munge each of the entries in the way that I did above, basically, and spit out the results. What you’ll end up with is something like a Perl script I wrote.

Humping

slaniel | Uncategorized | Friday, April 28th, 2006

So I’m awake at 2 a.m., which is rather rare, and I happened to sign into Friendster and look for a guy with whom I used to be friends in Somerville, whom I’ve not seen in years. I found him, which was a strange enough trip down memory lane. At the same time, there was a banner ad for some kind of movie or TV show called “Hostel,” which will apparently be out on DVD soon. Which got me riffing on hostels, and the ones I stayed in while in Europe. Which led me to remember one of my last nights in Paris, at a very terrible hostel where I was thoroughly bitten by bedbugs, and where I basically didn’t sleep at all because it was so hot that we had to keep the window open onto the incredibly busy Parisian intersection below.

More to the point, though, I was remembering my roommate while there. I had arrived at the hostel, quite broke and not sure whether I’d actually be able to pay for my time there (my parents wired me some cash, because they are awesome), and looked forward to getting a few moments of relaxation in my hostel room — a little time when I didn’t have to worry about money. So I was laying on my bunk bed, reading something or other, when in walks my roommate, briefly introduces himself, and tells me that he’s got a girl coming up, and yadda yadda  . . .  so could I piss off for a while? Which I did: I wandered around the street below, because I didn’t have enough money to afford the Métro fare. By the time I got back to the hostel, my parents had wired the money, and I had had a little to eat. I went to bed around 11.

 . . . And was awoken at maybe 1:00 in the morning by furtive intercourse on the bunk below. There was a large mirror on the wall opposite the bed, so I had a very clear view of what was going on. It was the first time — and, up to now, the only time — that I have been in the same room while other people have been having sex. And it was incredibly weird.

My roommate was everything that I associate with students at the University of Vermont. As were most of the people I met while in hostels. Remind me never to stay in hostels ever again. But do remind me to stay in that quaint little inn in Bayeux. I just spent 15 minutes hunting for its name; no luck.

P.S.: The hostel in Paris was the Peace and Love Hostel, apparently. Don’t ever go there. It is terrible.

P.P.S.: Should you want to know all about my Europe travels, in mostly unexpurgated diaries, they are on the web. You can read about the excitement that befalls a low-income guy traveling around Europe on his own while receiving unemployment checks.

Cormen, Leiserson, Rivest

slaniel | Uncategorized | Thursday, April 27th, 2006

The first edition of CLR’s Intro to Algorithms is available used on Amazon for $10 or so, whereas used copies of the second edition start at $40. (New copies of the second edition are $80; no new copies of the first edition are available.) Should I buy the first edition? Is there anything wrong with it? I doubt that the second edition has much to say about the Ford-Fulkerson algorithm that the first edition failed to say.

I love buying slightly older editions of books. I’m free-riding on all the poor college students whose classes rapidly make their older editions obsolete.

So brutal

slaniel | Uncategorized | Thursday, April 27th, 2006

For sheer comedic brutality, it would be hard to top “Amazon 1-Click Bankrupts Area Parkinson’s Sufferer”. Though I do think The Onion did top it years ago with “Special Olympics T-Ball Stand Pitches Perfect Game.”

P.S.: “Scholars Discover 23 Blank Pages That May As Well Be Lost Samuel Beckett Play” is quite brilliant. The Onion used to be only funny in the headlines, but they’ve learned to make the articles live up to their promise.

Stacks and functional languages

slaniel | Uncategorized | Thursday, April 27th, 2006

If you have a programming language that doesn’t allow iteration and only allows recursion — as I understand ML, Haskell, and maybe Lisp do — how do you avoid running very rapidly out of memory? I assume they do just fine, but I’m curious how they do it.

Lisp: if/then/else

slaniel | Uncategorized | Thursday, April 27th, 2006

If I’m not mistaken, I think John McCarthy invented the if/then/else construct in “Recursive Functions Of Symbolic Expressions And Their Computation By Machine”. He notates the factorial function as follows:

n! = (n = 0 → 1, T → n · (n-1)!)

where

(expression) → (expression)

means that if the expression on the left side evaluates to true, we use the expression on the right side, and the symbol “T” stands for “true”. McCarthy uses “T → expression” where we would use “else”, because when T is on the left-hand side, the right-hand side will always be evaluated.

So then a sequence of “expression → expression” statements are evaluated in order; whenever we find an expression whose left side is true, we use the right side and bail out of the loop. In Lisp, then, I guess the factorial function would be defined like so:

(defun fact (x) (cond ( (eq (n 0) 1 ) ( (t) (fact n-1) ) ) ) ) 

I may have some bits of syntax off, but that’s the idea. (t in the code is like T above.)

What did people use before if/then/else? I assume there had to be an if in early languages; that seems elemental. You can replace for-loops with if and goto, à la (pseudocode)

function fact(n) { product = 1; k = 1; STARTLOOP: if( k > n ) { goto REACHEDN } if( k <= n ) { product = product * k k = k + 1 goto STARTLOOP } REACHEDN: return product } 

but did earlier languages get rid of if altogether?

New editions of His Dark Materials

slaniel | Uncategorized | Thursday, April 27th, 2006

Damn you, world-that-wants-me-to-spend-money-on-books! The fine folk over at Current Config inform me that there are now trade-paperback editions of the His Dark Materials series: The Golden Compass, The Subtle Knife, and The Amber Spyglass — all with beautiful cover designs. And they’re not badly bound like every single mass-market paperback ever. I may need these on my shelves.

Speaking of which, a friend emailed me a few days ago to ask for my address, so that she could mail a book she had borrowed back to me. I told her that it’s much better for a book to be in someone’s hand than on my shelf, so I suggested that she pass it along to someone in her town who would enjoy it (the book was Paul Auster’s Oracle Night). I take it as a good sign that I’ve not received it yet.

Lisp

slaniel | Uncategorized | Tuesday, April 25th, 2006

I’m reading about Lisp right now, and I’m hitting stuff like this that initially makes my head hurt badly:

(defun pair. (x y) (cond ((and. (null x) (null. y)) (())) ((and. (not. (atom x)) (not. (atom y))) (cons (list (car x) (car y)) (pair. (cdr x) (cdr y)))))) 

But it’s actually easier than it looks. The goal of this function is to take two lists — x and y — and construct a new list that pairs off elements from x and y. So if x is (a b c) and y is (d e f), (pair. (x) (y)) should return ( (a d) (b e) (c f) ). The logic of pair. is like so:

  1. If both x and y are empty lists, then return the empty list ().
  2. If x is not atomic and y isn’t either, then build a new list whose first element is the list (x y) and whose second element is (pair. x y).

This is a recursive definition, so it works like so (in sort-of pseudocode):

(pair. (a b c) (d e f)) = (x y) (pair. (b c) (e f)) = (x y) (b e) (pair. (c) (f) ) = (x y) (b e) (c f) (pair. () ()) = (x y) (b e) (c f) () = (x y) (b e) (c f) 

Lisp looks like it should be scary, but so far it’s not.

(Hat tip and curses directed to Adam Rosi-Kessel, for pointing me to Paul Graham’s essay on software patents, which led me to his essay on Lisp.)

P.S.: The part just a short while after the definition of pair., where he writes a Lisp interpreter in something like 32 lines, is kind of a mind-blower. Friend Seth introduced this stuff to me a year or so ago, but I’m only just internalizing it.

P.P.S.: Over lunch I started reading John McCarthy’s paper in which he (it would seem) laid out the theory and practice of Lisp. The paper is called “Recursive Functions Of Symbolic Expressions And Their Computation By Machine”, and it’s quite good — very readable. And as it happens, the first 15 pages (as far as I got) are basically identical to Paul Graham’s paper — close enough that it’s borderline plagiarism.

Microsoft’s security UI

slaniel | Uncategorized | Tuesday, April 25th, 2006

Microsoft will apparently be building something like sudo into the next version of Windows. Apparently they also blew it horribly.

How much more does the OS have to suck before people bail on it? I realize that corporate America has an enormous installed base, but enough is enough.

Turing test

slaniel | Uncategorized | Sunday, April 23rd, 2006

Using Adam Rosi-Kessel’s code, I’ve now set up a little Turing test in the comment-submission code here. It should reduce the amount of comment spam I get. And I get a hell of a lot of it, some of which I can use a little script to delete, but much of which I need to delete by hand. Reducing the supply altogether is a better idea.

I ought to switch over to WordPress, and I may in fact do that today. It’s a much more sophisticated blog package than Blosxom, which is what I’ve been using ever since I switched from my little hand-rolled system. Among other things, WordPress has lots of anti-spam plugins, particularly Bad Behavior and Spam Karma. They seem really smart, and seem to do exactly what I need.

P.S.: I’m hacking around on the code to clean up accumulated hackery from perhaps two years of little attempts to defeat spam. Pardon our appearance in the meantime.

Honesty

slaniel | Caro, Robert | Sunday, April 23rd, 2006

Related to what Jason said, I’d like to propose a small axiom: honesty is highly prized in theory, but never in practice. People would really rather you not be honest with them. So don’t be. Don’t lie, but learn the art of the dodge. Say just enough that they think you’ve answered their question, but in fact you’ve said nothing. The two big lessons from Caro’s three-volume bio of Lyndon Johnson are these:

  1. Johnson did his best never to stand for anything — so that he could then never be held to account for his views when running for president. And he eventually won.
  2. It is never possible to kiss someone’s ass too much. Lyndon Johnson may be the greatest kiss-ass that the world has ever known.

I have yet to internalize these rules. But I am trying to learn.

Wealth of Networks

slaniel | Uncategorized | Sunday, April 23rd, 2006

I’m sad to report that Yochai Benkler’s new book, The Wealth of Networks, is nigh on unreadable, and would be wholly so if I weren’t already rather steeped in the tradition that he’s addressing. Among its failings:

  1. Benkler writes unclear sentences, and consistently uses fifteen-syllable words where a one- or zero-syllable ones will do. A couple examples among dozens that I’ve encountered in the first 150 pages:

    The most advanced economies in the world today have made two parallel shifts that, paradoxically, make possible a significant attenuation of the limitations that market-based production places on the pursuit of the political values central to liberal societies.

    And later:

    The content and context of an exaction will have a large effect on its efficacy as a device for affecting the choices of the person subject to its influence, and these could change from communication to communication for the same person, let alone for different individuals.

    He needs an editor very badly. Actually, among other things he needs a copyeditor to handle things like putting dashes between compound modifiers; at least one such mistake per page is slowly driving me nuts.

  2. Benkler wants to sound like a theorist at all times, even when he should be grabbing issues by the marrow. The quotes above are symptoms: rather than saying something like, “Suppose Joe loses some autonomy from thus-and-such  . . . ” and then creating a second sentence that deals with Joe, Benkler slathers on the thinnest lacquer of theory to make it sound as though there were something deeper going on. Which leads to point 3.

  3. Benkler is unaware of his audience. If he’s aiming for a general audience, which it seems to me that Larry Lessig is aiming for (and hitting), he needs some flesh and blood. He needs to tell us, “Peer-to-peer technologies, Linux, the Wikipedia and so forth could one day lead to a rebirth of democracy.” Instead we get (I’m making this up, but only because I’m too lazy to hunt and find an identical sentence), “The new modalities of distributed peer-to-peer communications could effect increased dynamism in the political landscape, thereby creating significant attenuation in traditional market- or firm-based economic distribution paradigms.” If I see the word “attenuation” or “modality” again, I will scream. I anticipate much screaming in the next 350 pages. Anyway, all of the academic wording would be worthwhile if Benkler were aiming at people steeped in the Lessig/Litman/earlier-Benkler tradition and had much new stuff to offer us. But in the first 1/3 of the book, he’s not. Which leads to point 4.

  4. Benkler clearly wants to build a very careful argument from the ground up about the new possibilities that distributed peer production (like Linux or the Wikipedia) brings to democracy. Unfortunately (see point 3), this effort is either bad PR or a waste. If Benkler’s reader is new to the canon, then first of all his academic style will turn him away immediately. But secondly, a new reader doesn’t need an axiomatic treatment; he needs broad brushstrokes that clear the ground and form the intuitions. Whereas someone steeped in the tradition already knows what Benkler’s saying; I’ve gotten through 150 pages and have discovered very little new, interesting work. He’s building a work that’s virtually free of citations (cf. Lessig’s Code), because he is committed — in a fetishistic way — to building a purely self-contained argument. This book wants to be the Bible of the new world order. But sad as this might make Benkler, we already have such a Bible; it is Lessig’s Code, which is actually readable.

I really hope that this is all just buildup for the real show, which will come in the last 350 pages and will outshine Benkler’s earlier — brilliant — work. I hope. But if an author can’t get a reader — one who’s biased in the author’s favor from the start — interested within the first 1/3 of the book, what hope is there that he’ll do better in the final 2/3?

An open letter to the women of the world

slaniel | Uncategorized | Saturday, April 22nd, 2006

Dear women of the world,

You smell just fine without perfume. In fact you smell more than just fine; you smell addictively good. Millions of years of evolution have turned you into finely honed tools of sensory attraction. So why do you wear perfume? It would be one thing if you knew how to wear it, but many among your number slather it on as though it were water and you were taking a bath. We’re out of the 13th century; most of us in the industrialized countries take showers every day. You don’t need perfume to cover up some underlying dirtiness. As I sit here, I smell a lingering cloud of perfume left here five minutes ago by a woman who didn’t know how to apply perfume; I am actively repelled. Like salt on food or sugar in coffee, the point of perfume — when done right — is to bring out an underlying note that we might have missed. The point is not to smell the perfume itself. If you can’t wear it right, don’t wear it. Indeed, just don’t wear it at all. If you smell good, you don’t need it; if you don’t smell good, then maybe consider it. But instead of perfume, consider doing something about your body odor. Or consider doing something about the self-confidence that makes you think you need to mask your own beauty.

Likewise when you dress up for formal events. Why do you put on enough makeup that we see the makeup itself? Are we supposed to think, “Wow, what a beautiful woman, who can wear artificial skin”? No. The point is to cover up blemishes and maybe highlight some feature of your beauty. But in literally every single instance, the beautiful women I have known looked beautiful without makeup — or else they were doing an exceedingly subtle job of applying it. And plenty of beautiful women have mangled their looks through inept makeup application.

So just stop with the perfume and the makeup. You look great. Convince yourself that you’re beautiful, and you will no longer need any disguise.

—Steve

“Classified”

slaniel | Uncategorized | Saturday, April 22nd, 2006

I’d be interested in studying the propaganda effects of various keywords. For instance, how about this headline?

CIA Fires Employee Over Leak: Officer disclosed classified data to news media, including intelligence details, CIA spokesman says

It’s on the front page of the Washington Post right now. I wonder how many people will read the article, and how many have already judged the employee guilty. Does the presence of the word “classified” immediately make people think that she’s a traitor? I wonder whether documents are classified in part to encourage this sort of prejudgment. It’s fairly obvious that over the last thirty years (at least since the Pentagon Papers), document classification has often been used to cover up embarrassing information rather than to protect national security. But I wouldn’t be surprised if the word itself makes people think “national security” instead of “coverup.”

The Language Log on Dan Brown

slaniel | Uncategorized | Saturday, April 22nd, 2006

For some reason I was recently reminded of the Language Log (actually, I can be more specific: I was reminded of it because Crooked Timber mentioned that the Language Log will soon publish a book), so I went back and found all their posts on Dan Brown. They really despise his writing, and their glee at pointing out his grammatical blunders is infectious. (Now I’m self-conscious: will they poke fun at me for suggesting that one can catch glee?) Sometimes they go overboard in their criticism — for instance, their assertion that “lecture” is not the droid that Dan is looking for — but basically it’s just a great place to go for some snark. I love what Geoff Pullum’s son wrote about it:

The Da Vinci Code, page 30: "Five months ago, the kaleidoscope of power had been shaken, and Aringarosa was still reeling from the blow." What the fuck does that even mean? Perhaps he meant something like: "The kaleidoscope of power had been shaken and the orange-green pattern of courage had been consumed by the yellow-red jumble of fear"? 

P.S.: No, I’ve not read any of Dan Brown’s books, and I’m quite sure that I don’t want to. But I’ll gladly read the snark about him.

The fate of the MLK Library

slaniel | Uncategorized | Saturday, April 22nd, 2006

I wouldn’t feel particularly obligated to the MLK Library, despite its being built by Mies van der Rohe. In fact, possibly because it was built by van der Rohe, it is an eyesore. Only rare bits of Modernist architecture don’t scream “Fascist.” MLK is safely within the Modernist tradition, in that way.

(800) 234-1357

slaniel | Uncategorized | Friday, April 21st, 2006

I periodically — e.g., just now — get phone calls from (800) 234-1357, which tell me

  1. That they’ve tried several times to get in touch with me.
  2. That it’s not a sales or marketing call.
  3. That I can press 9 to call them right then, or call them back at (800) 234-1357.

I don’t know who this caller is, but I’ve not called them back. You’d think that a valid caller would actually tell me  . . .  oh, I dunno  . . .  who they are.

I googled for the number, to no avail. Does anyone know who this is?

Learning Gentoo

slaniel | Uncategorized | Thursday, April 20th, 2006

I am getting more comfortable with Gentoo as I do more with it at work, but I still think this is the best representation of the experience:

Gentoo Linux: Build Your Own Excitement. Piece. By. Piece. Image of a Lego man wearing a hard hat inserting a jackhammer into another Lego person's bum.

(From Inspirational Linux Posters, via Adam Rosi-Kessel.)

Next Page »

Bad Behavior has blocked 845 access attempts in the last 7 days.