5.3.5. The Primary Key: ArchID

A stable, primary key ID number was assigned to all phenomena that were individually mapped. Every feature or artifact, including sites, loci, and individual artifacts that were mapped separately, were assigned their own "ArchID" number in a single number series regardless of the archaeological feature type. The ArchID primary key is a unique identifier integer that was value-free, as no further feature information was embedded in the number. For example, some archaeologists may classify sites by number range. In that system, rock shelter sites might be numbered between 100 and 200, and administrative structures might fall between 400 and 500, for example. However, that type of encoding of meaning into ID numbers is problematic for database design. The primary key approach adopted here is consistent with database normalization methods and the First Normal Form (Codd 1970) where one ensures that each table has a primary key that serves as minimal set of attributes that can uniquely identify a record. The First Normal Form further specifies that repeated fields be eliminated, and that each attribute must contain a single value and not a set of values. This kind of tabular organization is intuitive for those who have worked with computer databases, but the database normalization literature makes these features explicit.

The ArchID approach to numbering sites, loci, and artifacts in a single series is consistent with the low-interpretation field documentation system. The approach to survey provenience used in the Upper Colca survey is low-interpretation because an interpreted hierarchy is not encoded in the proveniencing system. For example, in some systems the Site ID# is primary, and structures and artifacts encountered inside that site are numerically subsumed by the site numbering, unless they are isolates. In other words, sites receive the principal numbering system, and any artifacts and features found "inside" sites receive index numbers from a secondary range that force the site assignment into the proveniencing of every artifact in that area. The weakness and spatial dependence of this system become more evident when features from different temporal occupations are recorded in a single, multicomponent site. In contrast, the upper Colca survey used a single number series so that the "site" assignment number did not intrude into the proveniencing of every feature inside the site, as features were mapped individually and thus were independent spatial entities.

The advantage to this approach, and to categorizing sites in later analysis rather than in the field, is that documentation and interpretation are distinct steps and data can be reinterpreted and individual loci reassigned to other time periods independently of the site context and spatial provenience in which they originally belonged. In other words, the GIS does the work of spatial provenience, proximity, and overlay, while numbering systems are dedicated only to the task of serving as a key for referencing records and tables in the database. Categories and types were used in this document for analysis, data presentation and summary, however, in the course of original data acquisition during fieldwork there was an explicit effort to document features based on simple artifact and feature characteristics rather than by a generalized typology or classification. There is no single file with all ArchID numbers represented, as they are distributed across the nine file types shown above in Figure 5-7a. However, in a post-fieldwork GIS processing step an "All_ArchID_Centroids" point file is created that serves as a single reference point for all ArchID numbers used throughout the season (see Section 5.10.1).