Sources of Texts and Corpora

There's a flood of text available online, some of it freely available (e.g. where indicated below). Besides the sources listed here, check also the part of this web project dealing with CALL literature, also a source of text.

Spotted in Papyrus News, 3 Jan 2002: searchable text of thousands of magazines

The On-Line Books Page,, is a website that facilitates access to books that are freely readable over the Internet. Parts of the site include: An index of thousands of on-line books freely readable on the Internet, pointers to significant directories and archives of on-line texts, and special exhibits of particularly interesting classes of on-line books. The On-Line Books Page has been created and maintained by John Mark Ockerbloom, digital library planner and researcher at the University of Pennsylvania.

The Project Gutenberg is a project whose goal is to render in digital format as much uncopyrighted literature as possible. At you can find listings of all the great works that have been done so far. Examples: Defoe's Journal of the Plague Year, Aesop's Fables, 1991 CIA World Factbook, Hitchhiker's Guide to the Internet (and lots similar texts), about everything by Shakespeare (and Twain, Jack London, etc), War of the Worlds, Tarzan of the Apes (lots of Sci Fi), The Arabian Nights by Andrew Lang, ... the list is endless. You download the texts in zip files. Defoe's Journal was about 500K. It's free.

NetLibrary at has reference, scholarly, and professional books which can be viewed online or offline using netLibrary's free Knowledge station software, "which allows you to search, highlight, bookmark, or annotate the text." - Newsweek, Apr. 5 1999, p. 17.

The Internet Classics Archive:

Poetry Audio Links:

See Mike Barlow's comprehensive listing of sites that serve as sources of text:

"Quanta, at, is the electronically produced and distributed magazine of science fiction and fantasy. As such, each issues is packed with fiction from amateur and professional authors from around the world and across the net. It is distributed for FREE across computer networks (mainly the Internet, BITnet and UUCP). It is published in two formats (PostScript for printing to PostScript compatible laser printers, and straight ASCII text)". - from a Neteach posting; the above url gives you a list of files - check the readme file.

The Dartmouth College Anonymous FTP Server, "This archive [of public domain texts] includes contributions from Dartmouth College students, faculty, and staff. For details, please refer to the README files located in most directories." (all packaged in Binhex)

Tzilla Kratter <> suggests the following URLs (annotations mine, GVS)

Deborah Healey likes a mysteries site where "they post new ones every week and keep the old ones around to make for lots of short story reading." Find these at

Over 1000 classic books are archived on a CD-ROM sold for around US$35. Details are at

Movie and TV scripts:

