Short of implementing some more error checking and making things a bit prettier, the "RPM Query Suite" is pretty much done. I've finished rqp, the counter-part to rqs. rqp imports the requires, provides, and file lists of binary rpm packages into a postgresql (or mysql, or sqlite (although I've only tested with postgresql so far)) database and queries the same data. rqp resembles urpmf/urpmq in functionality although it's not quite as feature-rich. But if all you do is query for which packages contain what file, or what package provides or what package requires that, rqp works great.
Some interesting tidbits: it took rqp 22 minutes to import the rpms of 2007.1/x86_64, which is 5490 packages or 5.6GB of packages. The database contains:
Tag Records : 11 Package Records : 32828 File Records: 4245034 Requires Records: 434693 Provides Records: 154859
Essentially it has the data for the currently supported 11 distributions (MNF2, CD3, CD3/x86_64, CS3, CS3/x86_64, CS4, CS4/x86_64, 2007, 2007/x86_64, 2007.1, and 2007.1/x86_64). That's a lot of data. In contrast, the database for rqs contains:
Tag Records : 4 Package Records: 7791 File Records: 4226875 Source Records : 40233
Largely because rqp needs to distinguish between architecture and Corp3 (which is 4 "binary products" comes from one pool of source rpms). Anyways, I benchmarked rqp versus urpmf, and this is where it gets really interesting. urpmf was run on a 2007.1/x86 system (and keep in mind, urpmf only knows about 2007.1/x86).
[qateam@mercury ~]$ time urpmf whois | wc -l 5.04user 0.33system 0:03.99elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+17476minor)pagefaults 0swaps 34
So urpmf found 34 matches in 3.99s. The same with rqp:
vdanen@artemis:~/ >% time ./rqp -q whois -C rqp $Id: rqp 183 2007-05-05 18:35:24Z vdanen $ Searching database records for substring match for file (whois) 187 matche(s) in database for substring (whois) ./rqp -q whois -C 0.04s user 0.00s system 0% cpu 9.796 total
Whoah! rqp found 187 matches across 11 distributions in pretty much the exact same time. urpmf is written in perl access a single file (although, to be fair, I believe urpmf has to uncompress the hdlist.cz first before reading it). rqp is written in php and access a postgresql database over TCP.
Suffice it to say, I'm impressed with rqp's speed; I thought urpmf would beat it hands down. I haven't done any urpmq runs to look up provides/requires although I imagine the time difference between the two would be about the same. Of course, urpmf/urpmq are more handy because you don't need to import anything or setup a database, etc. rqp is vastly superior for what I need though, as this way I no longer have to jump into a chroot (or various chroots) to run urpmf.
Also, rqp (and rqs) have to do things like establish database connections, read and parse a small configuration file, and other things that urpmf never needs to do. Still need s some more work clean things up, but so far I'm pleased with the basic functionality. 2+ days well-spent in coding this stuff.
And people look at me funny when I tell them I wrote CLI programs in PHP. =)