Now Available!

2017 UA Data User Meeting Presentations

Data 101

  • Designed for new users
  • Subsets of our best variables
  • Tailored for popular research topics


Census data, collected from U.S. Federal Census schedules from 1850-1940, provide a detailed picture of family structure and socio-economic conditions. For the first 30 years of the Early Indicators project, census was one of three data sets making up the Union Army sample (Military and Disease being the other two data sets). In the last 5 years, however, census collection has become the principal data set for the Veteran’s Children’s Census sample.

The U.S. Constitution requires that a population census be taken every 10 years in order to apportion seats in the U.S. House of Representatives and determine the number of votes in the Electoral College and appointments in state and local legislatures. Due to privacy issues, Congress has stipulated a 72-year restriction to access Federal Census schedules. Because of this restriction, the most recent census manuscript available is from 1940. The 1850 census was the first to enumerate all household members by name, not just the heads of household, as well as their ages, occupations, birthplaces, and the value of their real estate. Since the majority of Civil War soldiers in our samples were born around 1840, the 1850 census provides a good picture of their early lives.

See sample census manuscript images here or blank census forms here.

Union Army, USCT, Andersonville, Urban, Oldest Old Samples

When Early Indicators began census linking in 1992, the most recent census publicly available was from 1910. The Union Army sample of 332 companies was searched and linked to 1850, 1860, 1900, and 1910. Linking depended upon the technology available at that time, which included microfilm and a Soundex indexing system. In time, however, advances in technology, additional census releases, and the advent of Ancestry.com and FamilySearch.org enabled researchers to link soldiers more efficiently and with greater accuracy. Thus, they were able to link the last companies of the Union Army sample (27 companies from Indiana and Wisconsin) and all subsequent samples to every census decade available from 1850 to 1930 (all but the lost 1890 census which was destroyed in a fire).

Veteran’s Children’s Census (VCC) Sample

Census collection is the primary focus of the VCC sample, along with the collection of death information. A subset of Union Army soldiers who survived to 1900 and had at least one child was selected for this sample of intergenerational census linking. Research assistants, using Ancestry.com and FamilySearch.org, searched all available census decades 1850-1940 for this sample. The census work already completed for the veteran was checked and updated, and any new census information found for the veteran was added prior to searching the census for the veteran’s children and spouse(s). All children and spouses of the veteran with whom he had children were collected in every decade they were living even when they resided in households separate from the veteran himself.

Because advancements in technology paved the way for a major innovation in the way we collected census, the census data in VCC has a slightly different shape compared to that in the UA samples. Under VCC, every individual found in the census and recorded in our data more than once is linked across the decades, whereas in prior samples, each census was collected as a point in time. This connectivity across decades, along with inferred relationships that remain constant throughout time, allow for a more cohesive study of family structure and intergenerational processes.

Quality Codes

Quality codes indicating the strength of match were assigned to each census household, using a range of 1-4, with 1 being the strongest, verified match and 4 being the weakest.

Soldier Quality Codes: Possible soldier matches on the census manuscript were compared with information from military data previously collected to determine the strength of the match. Researchers assigned a quality code to indicate the reliability of the linkage or, in other words, the degree to which information from the PEN and CMSR align with information in the census. Although great effort has been made to ensure that the quality codes are specific and objective, some subjectivity is involved in each assignment, particularly for weaker matches. In all cases an individual found in the census must have a name and approximate age match to a soldier’s pension information.

VCC Children and Spouse Quality Codes: Quality codes for the soldiers remained the same in all Union Army and VCC samples. However, new quality code rules for children and spouses were created in VCC because online tools allowed researchers access to numerous records containing vital information which guided the census search. [These records included birth, marriage and death records; draft registration cards and military enlistment records; cemetery records and obituaries; and miscellaneous records such as passport applications, passenger lists and newspaper clippings.] An additional change in quality codes for children and spouses allowed the quality code to be carried over from one decade to the next if the same people were living together in multiple decades.

Quality codes have been used with the census data in an attempt to indicate the accuracy of linkage. The codes were designed to be as concise and objective as possible. However, there are many subtleties of census research that cannot be codified. The codes should, nonetheless, prove to be valuable guides to data users. See the codebooks for specific quality code rules.

Census Linkages by Sample and Years Searched

UA X X     X X      
Original USCT     X X X X X X  
Expanded USCT     X X X X X X X
Andersonville X X X X X X X X  
Urban X X X X X X X X  
Oldest Old X X X X X X X X X
1850 Census