Today, while surfing the web, I found this YouTube video of Eric Gilson and John Joergensen of Rutgers Camden Law Library presenting at the 2005 CALI (Computer Aided Law Instruction) conference. In it, Gilson & Joergensen discuss how to digitize congressional documents and build a digital library in a low-cost manner using open-source software. Unfortunately, you cannot really see the screen output from the projector, but the concepts are still relevant.
For more specifics about full-text indexing, here is Joergensen at CALI 2010 explaining the Swish-e search engine which Rutgers uses to index the congressional documents:
If you are interested in legal informatics, I highly recommend CALI’s YouTube channel.