Perl: substituting without loading a big file
This is perhaps related to a question I asked a while ago, but it’s clearer. In MySQL dumps, one has lots of chunks that look like so:
CREATE TABLE tablename ( id int(11) NOT NULL autoincrement, type tinytext NOT NULL, value text NOT NULL, score int(11) NOT NULL default ‘0′, trust int(11) NOT NULL default ‘0′, comments text NOT NULL, added datetime NOT NULL default ‘0000-00-00 00:00:00′, addedby tinytext NOT NULL, lastused datetime NOT NULL default ‘0000-00-00 00:00:00′, usedcount int(11) NOT NULL default ‘0′, userreviewed enum(’yes’,'no’) NOT NULL default ‘no’, PRIMARY KEY (id) ) TYPE=MyISAM; To convert a table to InnoDB (the newer, sleaker, sexier database engine), one just replaces TYPE=MyISAM with TYPE=INNODB (modulo capitalization). So then if one were to write a Perl script for this, one would do something like
my $infile =~ s{(CREATE TABLE.*?)TYPE=MyISAM}{$1TYPE=INNODB}gsmi; where the sm bit at the end is necessary because the input is a multiline string. And that’s where the trouble comes in. I can’t see any way around reading the whole file in first, via
$infile = join '', <STDIN>; Now $infile contains a potentially huge object; I’m dealing with now with a 250-megabyte database. Presumably Perl has some built-in intelligence to deal with such cases, but I’m not sure how it would solve the problem. What one wants is to do this line-by-line —
while(<>) { doSomeStuff(); } — but that’s impossible here, because the CREATE TABLE bit starts on one line, and the TYPE="MyISAM ends on another.
So is there any way to do this substitution without reading in the whole thing?
I’ll go do some tests now to see if Perl actually does suck up as much memory on this as I suspect it does.
P.S.: Indeed it does. In fact after slowing my machine to a halt, Linux killed the script.
This must be a well-known problem with a well-known solution. And of course I really want to be using a context-free grammar here rather than a regular expression. But the problem of loading in the whole file before I can parse it will still be with me even if I use such a grammar.

