Sorry, I don't have the images in yet. Integrating 19th-century Documents and a Geographic Information System Erich K. Schroeder Illinois State Museum Springfield, Illinois Paper presented at the Mid-America GIS Symposium Overland Park, Kansas, May 4-7, 1992 ABSTRACT: The Illinois State Museum has made several efforts in integrating 19th-century documents and a GIS. These efforts have focused on two groups of documents; land entry records produced at federal land offices, and commercial county atlases. By relating the legal location recorded on the land office documents with information digitized from modern topographic maps it is possible to produce various maps reflecting, for example, earliest or last date of entry per section, an average date of entry per section, or the distribution of a particular individual's purchases. This process has been completed for the entire state of Illinois. Most 19th- century commercial county land ownership atlases show the locations of farmsteads and other cultural features in rural areas. Several methods for transferring information to the GIS have been tried, including: recording of structure counts per section, and digitizing from photocopies. The current method is to hand-transfer the feature locations to township-and-section outline plots produced at the scale of the original documents. This method minimizes distortion due to inaccuracies in the atlases. Transfer errors are controlled by producing plots in a format that can be overlaid on the atlases. This process is presently ongoing at the Illinois State Museum. Introduction There are always technical difficulties in transferring information from paper records to an electronic medium. These difficulties are often increased when dealing with older records. Problems differ depending on whether the documents are texts or maps (Figure 1). In the case of texts, the general problem stems from the difficulty in translating text to spatial information. In particular, the information may be expressed in unknown or unusual coordinate systems, the writing may not be legible, and there may be a low degree of standardization in how the information was originally coded. On the other hand, maps may have been produced in unknown projections, the printing may be muddy or illegible (particularly when dealing with a copy of the original), the map may have been produced with an unknown level of accuracy and precision, and there is often a difficulty in identifying needed registration points needed by a GIS. In both the cases of texts and maps, the original documents may be in fragile condition, forcing the use of reproductions. My goal in this paper is to present methods of integrating two forms of documents and a Geographic Information System (GIS). One set of documents consists of tabular data recorded from 1814 to the 1890s at federal land offices located in Illinois. The other set consists of 19-century commercial county atlases. Land Office Records Mapping Project The land office records mapping project is an example of going from text-based information to a GIS. I will begin with a brief history of the public domain documents. When the Treaty of Paris ended the War of Independence in 1783, the western boundary of the United States was suddenly extended to the Mississippi River. The new government was faced with the problem of how to dispense and settle the new lands, known as the public domain. In general, the government decided to make land available to private individuals in the hope of settling the new land with privately-owned farms, while at the same time providing funds to help support the new government. To facilitate this, the familiar rectangular survey system was developed based on townships 6 miles on a side. The government provided sets of instructions to surveyors, and most of the United States has been mapped using these methods in some form. The laws that set up surveying methods also stipulated rules for the transference of the public domain lands into private ownership. Land could not be sold until it had been surveyed. It could only be sold in parcels that could be referred to in terms of the township, range, section, and fraction of section (McEntyre 1978:38-40). The minimum allowed size of land parcel changed through time, and was also set by law (Hart 1974:77). When a land transaction took place it was recorded as an entry into a log book at the land office. Since this transaction might have been a purchase, or a grant in which no money was involved, it is best to use the generic term "entries" to refer to these transactions. Surveys began in Illinois in 1804, and the first property was sold at the Shawneetown land office in southern Illinois in 1814 (Matousek 1971; Gaynon 1975; Schroeder 1991). By 1834 there were 10 active land offices in Illinois. As the amount of land remaining in the public domain dwindled, the land offices were closed and consolidated at the Springfield office. When the Springfield office closed in 1876, the records were transferred to Washington, D.C. Later, the federal government returned many of the records to some state agency, in Illinois' case the Illinois State Archives (Gaynon 1975; Smith 1975). Notable to states settled after 1863, Homestead Act records are retained by the Bureau of Land Management. In 1975 the Illinois State Archives began a 9-year project to computerize the land records in their possession (Kremm 1975). The goal of this project was to make the information more accessible to the public while protecting the original documents from overuse. The project was funded through a combination of a grant from the National Endowment for the Humanities and the Illinois State Archives' summer internship program, and a dedicated group of volunteers. Early in the project some nonvoluntary labor was also used in the form of prisoners under the direction of the Illinois Department of Corrections. This was discontinued due to supervisory problems (Robert Bailey, personal communication). As a minimum, each coded record consists of: the date of the transaction, name of the grantee or purchaser, number of acres, price per acre, total price, legal description of the land parcel (principle meridian, township, range, section, and quarter-quarter description or lot number), the type of transaction, and the volume and page of the entry. The county or state of the grantee was also coded, although this was not consistently recorded by the land office personnel. Three additional variables were added: a one- character variable was coded to indicate the sex of the grantee, based on the coder's interpretation of the name; an unique record number was assigned to each record; and a numeric code indicating the present-day county within which the land parcel is located. These last two were added automatically after the records had been read into the computer. Since the project was originally designed to be punched on to cards, the records were limited to 80 columns. As a result of this limitation, over 200 codes and abbreviations were devised during the project. Duplication of codes was controlled by keeping a master set of definitions on index cards. Although a few duplications did make it through the process, there is a record. The project was finished in 1984, and the resulting file contains more than 500,000 records of land entries. The Archives has produced three microfiche hard copies of the file (Figure 2). One is sorted by legal location and the others are sorted alphabetically by the last name of the grantee, one by county and the other for the entire state. The file has been backed up on tape, and is not maintained on line by the archives. In the early 1980s the Illinois department of Energy and Natural Resources contracted with Environmental Systems and Research Inc. to produce the basic state-wide coverages for the Illinois Geographic Information System (ESRI 1984). One of these ARC/INFO coverages contains the township, range, and section boundaries, digitized from the 1:24,000 7.5 minute USGS topographic maps. Each section is a separate polygon, and has the legal location associated as variables for: principle meridian, township and range, and section. In 1988 the Illinois State Museum obtained a tape of the public domain land records from the State Archives. The tape was produced on the Illinois Secretary of State's Honeywell computer to the requirements of the Illinois Department of Energy and Natural Resources' Prime 750, where the GIS resides. Combining the tabular and geographic information required several steps (Figure 3). The first, reading the tape, was done by GIS system programmers. After the position of the numeric variable signifying the county was determined, a FORTRAN program was written to separate the file into county subsets. The steps of translating the file into INFO and rearranging the legal location variables were done through the INFO programming language. Once the structure of the original text file is known it can be easily imported into an existing INFO file structure. However, after attempting it the first time we realized that the variables had to be rearranged before the file could be used. Rearrangement was necessary both in the placement of the variables in the file structure, but also in how some of the variables had been coded. In particular, the original numeric data had been coded with leading zeros as place- holders for the townships, ranges, and sections. This was accounted for by importing the data in a two- stage procedure, using a temporary file that was then related to an empty file with the correct structure. After rearranging the legal location variables, the final steps of relating tabular and geographic information and producing maps of derived values are relatively straightforward. Examples of derived variables include, the average date per section, the percent of the section purchased by a certain date, first or last date of entry per section, or the distribution of a particular individual's purchases. The resulting maps give a view of the settlement of Illinois that has never been seen before. The broad trends of land speculation, early state and national road systems, railroad and canal construction, and environmental preferences are easily visible. Also, the land purchase data can be overlain on other geographical data existing within the GIS, leading to more in-depth modeling. The spatial patterns of land acquisition in an area are the results of an interplay of social, ecological, economic, and legal dimensions. Selective overlays of the land records on other variables will result in an analysis of the relative importance of each dimension, and the change in importance through time. There are, however, some weaknesses in this data set as it has been constructed. Chief among these is the limitation to the scale of the section. The original data, and the INFO file that has been produced from it, contains a legal description down to the quarter- quarter, or about 40 acres. No way has yet been identified to automatically create polygons matching these sub-section legal descriptions on a large scale. More detailed information must be hand-digitized. This limitation sets the scale of detail for models that can be produced with this information. Commercial County Atlas Mapping Project Commercial county atlases are one of the most-used documents for cultural resource studies concerned with 19th-century Euroamerican archaeological sites. The heyday for these atlases in the Midwest was approximately from the late 1860s to the second decade of the 20th century (Conzen 1984a). At their best, these county atlases were produced through careful field observation and are accurate representations of the rural landscape (Conzen 1984b). The problem of moving information from atlases into the GIS has been approached in 3 different ways, by three different projects (Figure 4). In one of these projects (SSC) the goal was to produce a predictive model for 19th- century rural sites. In the other two projects, the goal was primarily to serve as a document of potential site locations. However, the ongoing atlas conversion project will yield data that could be used in a modeling effort. Superconducting Super Collider The first of the projects was part of Illinois' bid for constr uction of the Superconducting Super Collider, or SSC. In this study predictive models were developed for prehistoric and historic period archaeological sites, and for paleobiological sites (McGimsey et al. 1986). The historic site predictive model used both environmental and cultural variables (Figure 5). The dependant cultural variable to be described by the other variables was expressed as structure density per section (Schroeder 1986). Structure density per section was produced simply by counting the number of structures shown in a section on the county atlas. This value was then introduced as an attribute in the modeling process by overlaying on the other variables. The modeling process consisted of calculating the average section-structure density for polygons representing all occurring combinations of the other variables. These values were assigned low, medium, or high site probability status, and mapped (Figure 6). Although some interesting results were derived by this process, discussion of the model is beyond the scope of this paper. The SSC model suffered from several limitations. Chief among these was the observation that point digitization of the structure locations would have produced a more fine-tuned model. The generalization of the data due to the expression as counts per section probably masked a great deal of information, particularly information on where sites are not located. This tendency was further aggravated by the averaging necessary for the manipulation of the density data. (Schroeder 1986:65). In spite of the limitations, this method for moving information from atlas to GIS had at least two strengths. First, coding the density of structures per section was very quick-- perhaps taking 10 percent of the time it would have taken for point digitization. This made it possible to gather information on a 636 mile2 area in a matter of a couple of days. Second, fewer assumptions of the accuracy of atlases were made than if structures had been digitized as points. For example, we can probably assume that most structures are mapped in the correct section, but not necessarily that they are placed correctly within the section. For some purposes, simple density coding may give adequate information. The Knox County Project In another project, the the problem of gathering information on structure location was addressed by directly digitizing the structures from photocopies (Wiant 1990). The structure locations were to be treated as potential archaeological sites in the report, so point locations were required. The sequence of steps was as follows: 1. Photocopies were produced for all the atlas maps of interest. Some were from the original documents, others from microfilm. In every case, the photocopy page was not large enough to contain the entire township. Therefore, multiple copies were made and then pieced together. 2. New tic marks were created in the lambert coordinate system at the corners of the townships. These tic marks are necessary to register the digitized locations with the GIS. 3. The photocopies were oriented into the GIS coordinate system, and all structures shown on map were digitized as points. 4. The resulting coverages were plotted, and the plots were examined for accuracy. Again, this method has both strengths and weaknesses. Perhaps the greatest strength is the absence of a hand-transfer step. This should limit the number of human errors that are introduced to the process. The greatest weakness was a lack of concordance with the digitized locations and features already in the GIS coordinate system. In some cases this registration problem resulted in structures digitized into the wrong section or on the wrong side of a road, requiring the location to be changed (Nick Klobuchar, personal communication 1992). Distortion could have been introduced at several steps: the original production of the atlas, the microfilming, the photocopying, the piecing of the photocopies, the addition of registration points, or the digitization. State-Wide Atlas Project Finally, a project is presently ongoing at the Illinois State Museum to digitize the locations of cultural features from all 19th-century county atlases for Illinois. This project is just beginning--the 1870s atlases from 15 counties have been completed since January of this year. If it is continued to completion, a state-wide coverage containing locations of potential Euroamerican historical archaeological sites will provide planners with information useful on the same scale as the state-wide coverage of prehistoric archaeological sites. The information can also be used to upgrade the modeling methods used in the SSC project. The present method has been developed in response to the previous studies. The current method is to hand-transfer the feature locations to township-and-section outline plots produced at the scale of the original documents (usually 1:31,680, or 2 inches = 1 mile). At this scale, on the plotter that we use, at the most we can produce a plot of 6 townships (2 high by 3 wide), or an area of approximately 216 square miles. Each county requires 4 or more of these plots to be produced, depending on its size and shape. The plots are simple in appearance, consisting of the township outlines in heavy red lines, section outlines in thin black lines, and tic marks corresponding with the corners 7.5 minute USGS topographic maps, the standard registration points used in the Illinois GIS. Since the plotter medium is translucent the plots can be overlain on the original documents, and the feature locations can be traced. We are only recording features outside of urban areas. Most of the features are recorded as point locations, and most of these are farmhouses, however there have also been schools, churches, mills, blacksmith shops, cheese factories, and others. Also, municipal boundaries and the boundaries of cemeteries are mapped and digitized as polygons. The strengths of the hand-transfer method include: 1. All of the map interpretations are done before digitizing. When features are digitized directly from photocopies considerable time can be spent at the digitizer while making decisions about points. In an institution that makes extensive use of GIS this can cause bottlenecks at the digitizer. 2. This method minimizes distortion due to inaccuracies in the atlases. This is because the overlay plot can be centered on each section, minimizing scale or projection discrepancies in the atlas. 3. If scale or other discrepancies exist, they are seen early in the process. Weaknesses in this method include: 1. The method is only feasible for atlases that can be obtained as the original document. A large percentage of the 19th-century atlases for Illinois are easily obtainable only on microfilm. Another method will have to be used for these. 2. Although transfer errors are controlled by producing plots in a format that can be overlain on the atlases, such errors are still possible. 3. Occasionally, a plot is produced that does not contain the required 4 registration points. This occurs when the location of townships and USGS maps do not coincide. In these cases tic marks must be added to the coverage, possibly introducing distortion. 4. There are areas of Illinois that were not part of the rectangular survey. Primarily these are French land grants. These areas are difficult to register to today's maps. Summary In this paper I have covered the methods used in four projects at the Illinois State Museum to integrate 19th-century documents and a GIS. Problems involved in these or similar projects include the general problems involved in transfer of information from paper to electronic, and text to graphic formats. Additional problems may arise because of the age of the documents. In particular, the documents may not be readable, and maps may be drawn in unusual projections or contain elements that no longer exist. The land records project is an example of translating text information to a GIS. The most time-consuming part of the project, that of moving information from paper to electronic format, was done by the Illinois State Archives. This was a long-term project, lasting 9 years. It required people who were familiar with the documents, and who had practice in reading the handwriting of the 19th century. Integrating the text information and the GIS was made possible by the rational basis of the legal location method of land description. Areas of the United States settled before the development of the rectangular survey would be difficult to work with. A negative point is the present scale limitation to the section level. Three different ways of moving information from 19th-century county atlases to the GIS have been tried at the Illinois State Museum. These are: simple coding of counts of structures per section; directly digitizing from photocopies of the atlases; and hand-transfer of feature locations to outline plots. Although there are both strengths and weaknesses to each of these methods, the third method appears to be the most satisfactory at this time. The choice of method to move information from documents to GIS is partially dependent on the nature of the documents. But it is also dependent on the goal of the particular project. If the goal is to produce a graphic inventory of cultural resources, then perhaps little information beyond location need be coded. If the goal is to produce a detailed model of some cultural process, then perhaps a great deal of information beyond location will be coded. When working on a large scale, the data-gathering step will probably be the longest part of the project. Therefore, a great deal of thought should be put into how the information transfer is accomplished. References Cited Conzen, Michael 1984a The County Landownership Map in America, Its Commercial Development and Social Transformation 1814-1939. Imago Mundi 36:9-31. 1984b Maps for the Masses: Alfred T. Andreas and the Midwestern County Atlas Map Trade. Chicago History 13(1):47- 63. Gaynon, David 1975 Evolution and History of the Illinois Land Records. Newsletter of the Illinois State Archives, 1(3):4-8. Hart, John F. 1974 The Spread of the Frontier and the Growth of Population. Geoscience and Man, vol. V, pp. 73-81. Illinois State Archives 1984 Public Domain Computer Conversion Project. From data in record group 952. Illinois State Archives, Springfield. Kremm, Thomas 1975 Land Records Data Project. Newsletter of the Illinois State Archives, 1(3):8-9. McEntyre, John G. 1978 Land Survey Systems. John Wiley & Sons, New York. Matousek, Ladislav 1971 The Beginning of Illinois Surveys. Illinois Libraries 53(1):23-44. Schroeder, Erich K. 1991 The Illinois Public Domain Land Sales and the Settlement of Central Illinois. In Landscape, Architecture, and Artifacts: Historical Archaeology of Nineteenth-Century Illinois, edited by Erich Schroeder, pp. 22-38. 1986 A Model of Historic (ca. 1860) Site Location Probability. In Siting the Superconducting Super Collider in Illinois, by C.R. McGimsey, M.A. Graham, E.K. Schroeder, R.W. Graham, M.D. Wiant, and R. Druhot, pp. 49-65. Illinois State Museum, Springfield, Illinois. Smith, Jane F. 1975 Settlement on the Public Domain as Reflected in Federal Records: Suggested Research Approaches. In Pattern and Process: Research in Historical Geography, edited by Ralph E. Ephramberg, pp. 290-304. Howard University Press, Washington D.C. Wagner, Mark J. 1991 Squatters, Speculators, Improvers, and Settlers: An Examination of Frontier Settlement (1811-1830) in Marion County, Illinois. In Landscape, Architecture, and Artifacts: Historical Archaeology of Nineteenth-Century Illinois, edited by Erich Schroeder, pp. 39-74. Wiant, Michael D. 1990 Cultural Resources. In Illinois Land Report: Salem Township of Knox County, Draft. Illinois Department of Energy and Natural Resources, Springfield. Erich Schroeder (erich@museum.state.il.us)