HOME > DEFINITIONS > Glossary of Terms



Related wiki pages: Supplemental Glossary from Evidence Explained, Pending Definitions

To best make sure we are talking about the same thing, we need to define the terms we use. Keep in mind that one term may be defined different ways, and it is important to make note of these different definitions. One term may therefore have more than one definition, so please do not view this as something to be corrected.

If you define a term here, please also include the source of the definition (e.g., NGS definition) or whether it is your own or an otherwise organically developed definition. Please also add definitions from different sources even if they do not appear to be contradictory for the sake of reference.

See also Supplemental Glossary from _Evidence Explained_, 2007

AIIM - The Association for Information and Image Management, an American nonprofit organization, recognized by the ISO, that helps people and organizations with document/records/content management and business process issues. This is one of the organizations that handles ISO standards potentially related to advanced genealogical data interoperability.

ANSEL - ANSI/NISO Z39.47 - Extended Latin Alphabet Coded Character Set for Bibliographic Use (definition partially from NISO) - The formal name for the character set used in the GEDCOM 5.5 standard. It is also known as the American Library Association Character Set. This character set has been superseded by the UTF-8 character set in common practical use.

ANSI - American National Standards Institute - One of the main ISO-affiliated standards bodies, ANSI oversees a wide variety of standards for the United States. ANSI-governed standards include a plethora of diverse categories such as acoustical equipment, energy distribution, and dairy production.

APG - Association of Professional Genealogists ; from the website (in part), "an independent organization whose principal purpose is to support professional genealogists in all phases of their work: from the amateur genealogist wishing to turn knowledge and skill into a vocation, to the experienced professional seeking to exchange ideas with colleagues and to upgrade the profession as a whole ..."

API - Application Programming Interface - (from Wikipedia) An interface implemented by a software program that enables it to interact with other software. It facilitates interaction between different software programs similar to the way the user interface facilitates interaction between humans and computers. An API is implemented by applications, libraries, and operating systems to determine their vocabularies and calling conventions, and is used to access their services. It may include specifications for routines, data structures, object classes, and protocols used to communicate between the consumer and the implementer of the API. APIs are most usually written to allow third-party software developers to communicate with the originating software developer's application.

AQ - Ancestral Quest - A Genealogy Program

ASCII - American Standard Code for Information Interchange - A basic English-language character set first published in 1963 and last updated in 1986. (You can read the Wikipedia definition here.)

Assertion -- An Assertion is a claim or a statement of fact. The fact might be the existence of a Person or the truth of a name Attribute; some claims may be justified with evidence. In some Models an Assertion is an Entity, and its computer representations are with Assertion Records. In other Models an Assertion is a Relationship between an Entity with the fact and possibly an Entity with the Evidence. Computer representations based on these models implement Assertions as References between Records.

Attribute -- An Attribute is a quality or a feature that is a characteristic or inherent part of someone or something. In a Model an Attribute is a property of an Entity, having a name or tag to identify it and a value to give it meaning. In a computer representation an Attribute is a field of a Record.

BCG - Board for Certification of Genealogists - An organization that rigorously tests potential candidates on their knowledge and skills that meet the standards expressed in the BCG Genealogical Standards Manual. The process for becoming certified by the BCG is loosely defined at How to Become Certified. (See also BCG FAQs)

BG - BetterGEDCOM - This refers to the immediate project of this wiki, which is the replacement of the old GEDCOM Standard with an equivalent but enhanced portable data format that can serve as a basis for future development of genealogical technology collaboration standards. Please see the GOALS page of this wiki for details on the more current revisions to the objectives of this project.

BDM - Birth Death Marriage - Genealogical shorthand for basic core facts about a person as opposed to more detailed information such as a biographical narrative or information about property they owned.

Character Set - (derived from Wikipedia) Also called character encoding, charset, character map or code page, character sets are tables of information translating computer code into readable text, symbols or information. Over time, character sets have needed to expand to accommodate things such as multiple languages, special characters on computer keyboards and new symbols such as the Euro currency symbol. Strictly speaking, a character set is simply a set of characters. A coded character set or code page associates a number with each of those characters. A character encoding tells how to represent those numbers. For example, Unicode is a character set. UTF-8 and UTF-16 are both encodings of that character set.

Citation - (from Evidence Explained, 2007; electronic version, p. 820) "statement in which one identifies the source of an assertion. Common forms of citations are source list entries (bibliographic entries), reference notes (endnotes or footnotes), and document labels." See the referenced text for numerous examples and discussions about citations, often therein referred to as "reference notes."

Conclusion (Dictionary) -- A decision reached by reasoning from given premises.
Conclusion (EE) -- A decision; to be reliable it must be based on well-reasoned and thoroughly documented evidence gleaned from sound research.
Conclusion (E&C) -- Information derived by making decisions based on available Information.
Conclusion (Model) -- Any Entity, Attribute or Relationship instance that is created by reasoning and making decisions from available Information.
Conclusion (Computer) -- Any Record or Field of a Record that contains data created by reasoning and making decisions from available Information.

Conclusion-only Model - (which should probably be better described as a Conclusion-only Data Model). During the Evidence and Conclusion Process, the researcher may (in best practice, should) document the individual steps. When using a Conclusion-only Model, the researcher will document their selected evidence, analyses and conclusions as text. The conclusions are entered into the application's database, superseding any prior inferior conclusion, so that a person's current data represents the latest, overall working hypothesis derived from all the available evidence.
Working to a Conclusion-only Model tends to minimise the number of conclusions for an item, e.g. just one Birth event is usually recorded for an individual. Working to an Evidence and Conclusion Model will result in as many Birth Events per real-life person as there have been analyses.

Data Model; Model -- A set of Entities, their Attributes and their Relationships, used to represent a restricted area of human knowledge. Models are used as specifications for the design of computer databases and file formats whose Records represent instances of the Entities. Models in the genealogical area include the key concepts of Genealogy, e.g., Sources, Evidence, Persons, Events, Names, Dates, Places, and others.

DBMS - DataBase Management System - (from Wikipedia) A Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of a database. A DBMS is a system software package that helps the use of integrated collection of data records and files known as databases. DBMSs may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way. Instead of having to write computer programs to extract information, user can ask simple questions in a query language.

Description/evaluation - term being used temporarily to refer to specific information that describes or evaluates a source or passage. Real world examples might be:
*Provenance.
*Author specific items, for example, relationship of the other to the item being cited (someone's mother, etc.); age of author
*Condition or organization (for example, census so faint as to effect readability or census where names organized alphabetically; condition of a photograph; records partially destroyed by fire)

DOI - Digital Object Identifier, managed by the International DOI Foundation. The DOI System is for identifying objects in the digital environment. DOI names are assigned to any entity for use on digital networks. They are used to provide current information, including where they (or information about them) can be found on the Internet. Information about an object may change over time, including where to find it, but its DOI name will not change. NB: The identifier is digital but the objects may not be. It closely follows the CIDOC Ontology and would be capable of identying physical objects, Persons, Places, and Events.

E&C - Evidence and Conlusion

EE - Evidence Explained

Entity -- An Entity is a component of a Data Model. It represents and abstracts a set of objects from the real world. An Entity is composed of Attributes that define its structure; Attributes are abstractions of properties or characteristics of the real objects. An Entity may be in Relationships with other Entities in the Model. When a Model is represented on a computer, a Record is usually defined for each Entity, with Fields that correspond to the Attributes.

Event -- An Event is something that happens in the real world at one or more places and at a time or over a period of time. Events of genealogical significance usually involve Persons who often play specific roles with respect to the Event. Some Events, such as birth or marriage, establish Relationships between Persons. Some genealogical Models include Events as an Entity, some include them as Attributes of the Person or other Entities, and some include them both ways. Analogous to Models, the computer representations of Events can be as separate Records or as Field within Records.

Evidence (dictionary) -- the available body of facts or information indicating whether a belief or proposition is true or valid; information given personally, drawn from a document, or in the form of material objects, tending or used to establish facts; signs; indications.

Evidence ("Evidence Explained") -- information that is relevant to a problem; forms used in genealogy include Best Evidence (original records of the highest quality that survive), Direct Evidence (information that answers or solves a specific research question by itself), Indirect Evidence (relevant information that does not answer a research question by itself), Negative Evidence (an inference based on the absence of information that should exist under given circumstances), and Conflicting Evidence (relevant pieces of information from different sources that contradict each other).

Evidence (E&C Process) -- Information upon which conclusions may be based.

Evidence (Model) -- Any Entity, Attribute or Relationship instance that is wholly created from the available actual Evidence.

Evidence (Computer) -- Any Record or Field of a Record that contains data wholly derived from the available actual Evidence..

Evidence and Conclusion Model - linked to the Evidence and Conclusion Process, but not the same thing. During the Evidence and Conclusion Process, the researcher may (in best practice, should) document the individual steps. Many people will document their selected evidence, analyses and conclusions as text. When using an Evidence and Conclusion Model (which should probably be better described as an Evidence and Conclusion Data Model), the evidence and conclusions (at least) are formally documented in machine readable form.
Specifically, a Source record normally contains details of the source's contents as free-format text or as an image. When working to an Evidence and Conclusion Model, someone's name and age (e.g.) are extracted from that text or image and entered into name and age data items unique to that piece of evidence. This is the Evidence part.
The output from the analysis stage is similarly documented in data items unique to that analysis. This output is identical in format to that from a Conclusion-only Model.
Working to a Conclusion-only Model tends to minimise the number of conclusions for an item, e.g. just one Birth event is usually recorded for an individual. Working to an Evidence and Conclusion Model will result in as many Birth Events per real-life person as there have been analyses. As a result, working to an Evidence and Conclusion Model will show more intermediate steps than otherwise.

Evidence and Conclusion Process - This process is intended to describe the steps genealogical researchers go through. It may be carried out formally, invoking the Genealogical Proof Standard (q.v.), for instance, or informally. In summary, it involves setting a research objective; looking for evidence to support or deny the objective; analysing the evidence and coming to a stated conclusion. Throughout the process, the researcher should review the current results and may loop back to re-start at a previous step - even altering the objective if it appears valueless.

Evidence Person - created when working to an Evidence and Conclusion Model. This represents the evidence going into an analysis stage. An Evidence Person is created in the same format as a real life person would be under the Conclusion-only Model but contains only the data known at that time and will not be updated later. The lowest level of Evidence Person contains data extracted from a single source record.

Fact -- An item of information. In a Model the Attribute Values of Entity instances are Facts, and the existence of the Entity instances themselves are Facts. In computer representations Records and Fields are Facts. Basically everything is a Fact, so the term is not useful in distinguishing any one thing from any other.

FamilySearch - the genealogy website of the LDS Church with resources for genealogists regardless of their religious orientation.

FamilySearch Certified - A term devised by FamilySearch to denote "Certified Products and Services are programs, services, and utilities that are compatible with FamilySearch and conform to FamilySearch standards and systems." (See the current list at the FamilySearch Developers Network - Certified Products and Services.)

FamilySearch Developers Network- A site to provide information and resources for software programmers who support the FamilySearch Platform. (See FamilySearch Developers Network.)

FamilySearch Wiki - A central location for information about archives, libraries, government organizations that house historical records; research methodology; self-described as "free family history research advice for the community by the community." (See FamilySearch Research Wiki.)

FGS - Federation of Genealogical Societies a national society based in the United States offering support to member societies in organization and operation of viable local societies. Each year an annual multi-day conference is held in a different city, providing instruction to individuals and societies on a broad variety of family history topics.

FTM - Family Tree Maker - A Genealogy Program

Field -- In computing a named item that makes up a part of Record; the content of the item is data that represents the value of a fact or item of information.

GEDCOM - (from FamilySearch Developer Network, the current body that originally developed GEDCOM): "GEDCOM stands for Genealogical Data Communications and is a file format specification that allows different genealogical software programs to share data with each other. It was developed by the Family and Church History Department of The Church of Jesus Christ of Latter-day Saints to provide a flexible, uniform format for exchanging computerized genealogical data. This standard is supported by FamilySearch, by the family history products that The Church of Jesus Christ of Latter-day Saints produces, as well as by the vendors of most of the major genealogical software products."

Genealogy - "The study of families in genetic and historical context. It is the study of communities in which kinship networks weave the fabric of economic, political and social life..." (See the Board for Certification of Genealogists(r) home page for more of this definition.

GPS - Genealogical Proof Standard as described by the Board for Certification of Genealogists(r) in the //BCG Genealogical Standards Manua//l and briefly stated on BCG's Genealogy's Standards page.

GRAMPS - Genealogical Research and Analysis Management Programming System - An open-source genealogy software application that uses an XML data store. GRAMPS is currently at version 3.2.4. For more information on the Gramps XML and data model, see GRAMPS Data Model.

Hierarchical Data Model - (from Wikipedia) A data model in which the data is organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent (also known as a 1:many ratio ). All attributes of a specific record are listed under an entity type. In a database, an entity type is the equivalent of a table; each individual record is represented as a row and an attribute as a column. Entity types are related to each other using 1: N mapping, also known as one-to-many relationships. this model is recognized as the first data base model created by IBM in the 1960s. Both GEDCOM data structure and XML data structure more closely adhere to a hierarchical data structure, as opposed to the RDBMS structure most commonly used in genealogical software implementations.

IETF - Internet Engineering Task Force - An international standards body not affiliated with ISO which governs most technical standards for the internet. IETF takes a developmental approach to standards rather than the codified approach preferred by the ISO and thus remains a more flexible, nimble body more suited to the needs of internet-based technologies.

INCITS - InterNational Committee for Information Technology Standards- The primary U.S. focus of standardization in the field of Information and Communications Technologies (ICT), encompassing storage, processing, transfer, display, management, organization, and retrieval of information.

ISO - International Organization for Standardization - The worldwide governing body for standards for just about everything.

Item type or format - term being used temporarily to describe the the source content or form:
*Digital Image
*Record Copy
*Duplicate original

LDS Church - or LDS - Abbreviation for the Church of Jesus Christ of Latter-day Saints, known in genealogy circles for its microfilming and digitization projects, the Family History Library in Salt Lake City, Utah, some 4,500+ Family History Centers throughout the world, and its website FamilySearch.org offering a multitude of resources including free indexes and scanned images of original documents from over a hundred countries.

LFT - Legacy Family Tree - A genealogy management software program for the Windows platform.

Lifelines - Lifelines - An open source genealogy management software program originally developed on Unix, but available on many operating systems, such as Linux, Mac OSX and Windows. It has a very powerful reporting language.

Metadata - (see http://en.wikipedia.org/wiki/Metadata ). Usually summarised as "data about data". For instance, (to take a non-genealogical example that I hope still applies), the metadata about a railway wagon number on American railroads tells us that it is up to ten characters long, consisting of a group of up to four letters followed by a group of up to six numbers. It also tells us something about the meaning of those letters (usually ownership). Knowing the metadata helps a programmer write basic validation on input and understand at least some of what the data means. Much of the GEDCOM standard consists of metadata describing how big data items can be, what they mean, what they relate to, etc.

Name (Personal) -- The words by which a Person is known or referred to. In some genealogical Models a Name is an Attribute of a Person Entity. In some computer representations a Name is a Field of a Person Record. The computer representation of a Name's value is typically restricted with rules about length, character set, and overall formatting.

NISO - National Information Standards Organization - An organization accredited by ANSI and ISO, NISO oversees standards in the United States for libraries, the media, information technology and publishing.

NGS - National Genealogy Society - A genealogical organization in the United States that offers support to individuals by publishing the prestigious National Genealogical Society Quarterly demonstrating excellence in research and reports and by hosting an annual multi-day conference featuring a variety of professional and technical experts.

PAF - Personal Ancestral File - A genealogy management software program produced by the LDS Church for the Windows platform currently at version 5.2.18.0, which is not compatible with the so-called "newFamilySearch" and is not listed as FamilySearch Certified.

Person -- A Person is a real human being who exists or existed. The term is used in Models for the Entity that represents human beings. In Models that contain both Evidence and Conclusion based Entities, the terms Evidence Person or Conclusion Person are sometimes used. Some Models use the term Individual for the Conclusion level Person Entity, or the term Persona for the Evidence level Person Entity. The term Person is also used as the name of the Record type that holds information about human beings in computer Databases and Files.

Persona (1) - A term for an entity in the GENTECH Data Model. The GENTECH definition is abstracted below:

<Definition starts>
PERSONA
...
Definition: Contains the core identification for each individual in genealogical data, and allows information about similarly named or identically named people to be brought together, after suitable analysis, in the same aggregate individual. Because real human beings leave data tracks through time as if they were disparate shadow personas, this entity allows the genealogical researcher to tie together data from different personas that he or she believes belong to the same real person. The mechanism for this, discussed in the text, is to make different PERSONAs part of the same GROUP.
...
Relationships: One PERSONA is based on one ASSERTION. However, note that an ASSERTION may link one PERSONA to a GROUP, and thus many separate PERSONAs can be brought together into a higher level constructed PERSONA.
...
From: GENTECH Genealogical Data Model, version 1.1, 29 May 2000, page 60
<Definition ends>

Commentary - Note there is NO Person entity in the GENTECH Data Model, and a higher level PERSONA may be constructed from several on a lower level - their data is combined to form the information about the higher level Persona. It is unclear to the author why the term Persona is used in the GENTECH Data Model as the entity appears to have all the obvious characteristics of a Person entity.

Persona (2) - A term for an entity in the new FamilySearch ("nFS") Data Model.

A Persona entity appears to be intended to represent the data extracted from one source about one human being. Their Person (not Persona) entity appears to be intended to represent the sum of the current conclusions about one human being. A Person takes its information from one or more Personas. newFamilySearch uses a two-level data model so Persons are only made up of Personas, which are derived only from sources.

Personal Commentary

See Discussion "Differences from FS Personas?" and "The Evidence Architecture of the New FamilySearch Tree"

PFACT - Acronym of "Property, Fact, Attribute, Characteristic, or Trait," coined by T. Wetmore of the Better GEDCOM effort, in an attempt to help avoid confusion caused by the use of these many synonyms for the same concept. Conveniently pronounceable as "fact."

Place - A place is a geographic area that may be larger than a country and in theory as small as a single point in space. Examples of place types are buildings, farms, cities, church parishes, military districts, postal areas, states, countries, continents, oceans etc. The geographic area representing a place may change somewhat over time, i.e. it may grow or shrink. Places, and information about places, may be represented in data models, databases and files.

Place Name – A name of a place, often found in sources. A place may have several names at the same time or at different periods of time, and there may be several places with the same name within a given context. A place may have different names in different languages. Place names may be represented in data models, databases and files.

Place Hierarchy - A place may be located within other higher level places covering larger areas, which may themselves be located in even higher level places, forming a hierarchy. The highest level in the hierarchy is often a country, and the lowest for example a farm. A place may be a member of several hierarchies, often defined by different organizations in public administration. Hierarchies may change over time, e.g. when a local place is included in another country, but the local place does per definition stay the same. Place Hierarchies may be represented in data models, databases and files.

Place Name Hierarchy - A place hierarchy is identified by the names (and optionally types) of the places in the hierarchy during a certain period of time, a Place Name Hierarchy, often found in sources and/or implied by the geographic area covered by the source itself. Place Name Hierarchies can be represented in the current Gedcom by a comma-separated list of names. A hierarchy of names of higher level places provides an additional context for the identification of a place by name, but there could be cases where the hierarchy of names does not identify the higher level context uniquely. Place Name Hierarchies may be represented in data models, databases and files.

Place Types – A classification of a place. Examples of place types are buildings, farms, cities, church parishes, military districts, postal areas, states, countries, continents, oceans etc. The classification may change over time (e.g. a school building changing into a factory), but the type of the highest level places often stay the same. Different terms may be used for a type that essentially means the same classification, possibly using different languages. Place Types may be represented in data models, databases and files.


QUAY - Found in GEDCOM 5.5, "CERTAINTY_ASSESSMENT." From the specification (PDF), p. 38-39, in part, "The QUAY tag's value conveys the submitter's quantitative evaluation of the credibility of a piece of information, based upon its supporting evidence. Some systems use this feature to rank multiple conflicting opinions for display of most likely information first. It is not intended to eliminate the receiver's need to evaluate the evidence for themselves." The specification includes further comment, as below. (Note: These further comments are the subject of much controversy.)
0
Unreliable evidence or estimated data
1
Questionable reliability of evidence (interviews, census, oral genealogies, or potential for bias for example, an autobiography)
2
Secondary evidence, data officially recorded sometime after event
3
Direct and primary evidence used, or by dominance of the evidence
See also "Surety."

RDBMS - Relational Database Management System - (via Wikipedia) A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular commercial and open source databases currently in use are based on the relational database model. A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables. Almost all genealogical software programs store users' genealogical data using an RDBMS.

Record -- In computing a Record is a number of Fields of information that are handled as a whole. Records conform to restrictions that specify the sets of Fields a Record of a particular type may have, and the possible values those Fields may contain. Computer databases consist of potentially huge numbers of Records of potentially many types. Records can be written to and read from files.

Record Identification - term being used temporarily to refer to the specific information consulted in a source. Real world examples might be:
*Household identification (on a census)
*Individual's record name and record content (on a birth certificate or baptismal record)
*Parcel name or number (on a map)

Relationship -- A Relationship is a connection between two or more persons, objects or concepts. Models consist of Entities and Relationships, where the Relationships are viewed as labeled connections between Entities. In computer representations Relationships are often implemented as Fields of Records that refer or point to other Records.

Repository -- An institution such as an archive, government office or library, or any other site or location or service, that collects, manages, archives, curates or indexes, and makes available Source items for Research. Repositories are included in most Data Models as an Entity that represents physical Repositories. In Models that use Object Orientation the Repository Entity may have sub-types to represent different kinds of Repositories. Repositories are represented in computer Databases and Files as Repository Records that conform to the definition of a Repository Model Entity.

Reunion - Leister Productions - A genealogy management software programme produced by Leister Productions for the Macintosh, iPhone and iPad platforms. Current versions are Reunion for Macintosh 9.0c, Reunion for iPhone & iPod touch 1.02 and Reunion for iPad 1.01.

RFC- Request For Comments - (from Wikipedia) A memorandum published by the Internet Engineering Task Force (IETF) describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems. Through the Internet Society, engineers and computer scientists may publish discourse in the form of an RFC, either for peer review or simply to convey new concepts, information, or (occasionally) engineering humor. The IETF adopts some of the proposals published as RFCs as Internet standards.

RM - RootsMagic - a genealogy management software program for Windows 7, Vista, XP, 2000 platforms, created by Bruce Buzbee for RootsMagic, Inc. Free RootsMagic Essential and an full-functionality upgrade currently at version 4.0.96 (13 Aug 2010).

Scholarly Genealogy - A term used by the BetterGEDCOM Project to refer to the practice of genealogy using processes, concepts, definitions and standards defined and advocated by various professional organizations, bodies or individuals. We do not attempt to define who is included in this definition.

SoG - Society of Genealogists- Long-established UK society. It was incorporated under Licence of the Board of Trade as the Society of Genealogists of London on May 8 1911.

Source of the source - a credit line (EE p. 2007, p 427), it refers back to the source author's citation, authorities or parenthetical references. For example, the items below:
*NARA micropublication name and roll (for digital images of certain NARA publications like census)
*Agency and Book and Page or Certificate Number; Repository (for vital record indexes)
*Author Title and FHL film Number (for records in the FS Historical Record Collections)

Source Type - term being used temporarily to describe the source. Examples include terms like database, index, bound manuscript, typescript, photograph, letter, E-mail, listserve archive

Surety - Terminology found in one or more genealogy programs. The Master Genealogist (TMG) uses this term. Definition from the TMG Glossary:
A numerical value assigned to indicate the quality of a source in documenting a given fact recorded in the data set. The surety values are recorded in the citation record. The values are:
3= an original source, close in time to the event
2= a reliable secondary source
1= a less reliable secondary source or an assumption based on other facts in a source
0= a guess
-= the source does not support the information cited or this information has been disproved
About Surety, from the TMG Help file topic "GEDCOM export":
Sureties
When this option is selected, surety values are exported to the extent supported by GEDCOM. Date, place, and memo sureties are exported with a QUAY tag at one level higher (usually 4) than the source citation from which they are referenced.

TMG - The Master Genealogist - A genealogy management software program for the Windows platform owned by WhollyGenes, Inc. TMG is currently at version 7.04.

TNG - The Next Generation of Genealogy Sitebuilding© ("TNG") is a powerful way to manage and display your genealogy data on your own web site, all without generating a single page of static HTML. Instead, your information is stored in MySQL database tables and dynamically displayed in attractive fashion with PHP (a scripting language). TNG is currently at version 9.01 (27 Feb 2012).

Unicode - The Unicode Consortium - The international effort, initially started in the United States, which publishes a unified, worldwide character set. Unicode corresponds with ISO standard 10646 level 3.

UTF-8 - This is the technical name for the Unicode character encoding that is now nearly universally used in computer systems. UTF-8 corresponds to ISO standard 10646-1:2000 Annex D as well as IETF RFC 3629. The terms Unicode and UTF-8 are generally used interchangeably when speaking of computer character sets and encodings. (See this page for detailed information.)

URI - Uniform Resource Identifier - (from Wikipedia) A a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network (typically the World Wide Web) using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI. Subcategories of URIs include URLs and URNs.

URL - Uniform Resource Locator - (from Wikipedia) A subcategory of URI that specifies where an identified resource is available and the mechanism for retrieving it. In popular usage and in many technical documents and verbal discussions it is often incorrectly used as a synonym for URI.

URN - Uniform Resource Name - (from Wikipedia) A subcategory of URI that uses the urn scheme, and does not imply availability of the identified resource. Both URNs (names) and URLs (locators) are URIs, and a particular URI may be a name and a locator at the same time.

UUID - (Universally Unique Identifier) - (from Wikipedia) - An identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else.

Web 2.0 - (from Wikipedia) - The term Web 2.0 is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web. A Web 2.0 site gives its users the free choice to interact or collaborate with each other in a social media dialogue as creators (prosumer) of user-generated content in a virtual community, in contrast to websites where users (consumer) are limited to the passive viewing of content that was created for them. Examples of Web 2.0 include social-networking sites, blogs, wikis, video-sharing sites, hosted services, web applications, mashups and folksonomies.

XML - eXtensible Markup Language - (from Wikipedia) A set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards. XML's design goals emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

Comments

brantgurga 2010-11-13T12:35:39-08:00
Frequent mixups of character set and encoding
I've seen frequently that character set and encoding get mixed up so I tried fixing some of those issues in the glossary and clarify the distinction.
greglamberson 2010-11-13T13:08:07-08:00
Excellent, thanks. I've mixed them up plenty myself but am getting better about using precise and correct terminology (thanks to many mistakes corrected by others).
hrworth 2011-01-09T07:49:46-08:00
Glossary of Terms
It would be very helpful, if those posting Technical Terms and Genealogical Terms would update the Glossary of Terms.

For example, down the left panel of this Wiki, under Gathering Information, there are two terms that have been talked about through out this wiki, but are not listed in the Glossary of Terms.

That is: Evidence and Conclusion Process. Terms like "Evidence Person", "Conclusion Person" are not here.

What do these terms mean to an End User and what do they mean to a Technical Person.

These are just two terms, being used as examples.

I think that having some of this information in the Glossary of Terms will help us all understand what they mean and not have to read through 100's of pages on this Wiki to try to figure out what a term means.

Thank you,

Russ
hrworth 2011-01-09T18:31:19-08:00
From today's messages on the Wiki, these terms should be included:

Conclusion Object
Assertion
Surety

Thank you,

Russ
GeneJ 2011-01-09T18:44:44-08:00
I added a surety definition from the TMG Help/Glossary file.
AdrianB38 2011-01-10T08:39:04-08:00
Mad fool that I am, I have attempted to define:
- Conclusion-only Model,
- Evidence and Conclusion Model,
- Evidence and Conclusion Process,
- Evidence Person.

Comments gratefully accepted. (Oh and I did "metadata" a day or so ago).
hrworth 2011-01-10T11:34:22-08:00
Evidence and Conclusion Model - Question
Looking for some clarification on this topic:

Thank you to who ever posted this on the Wiki:

Evidence and Conclusion Model - linked to the Evidence and Conclusion Process, but not the same thing. During the Evidence and Conclusion Process, the researcher may (in best practice, should) document the individual steps. Many people will document their selected evidence, analyses and conclusions as text.

When using an Evidence and Conclusion Model (which should probably be better described as an Evidence and Conclusion Data Model), the evidence and conclusions (at least) are formally documented in machine readable form.

Specifically, a Source record normally contains details of the source's contents as free-format text or as an image. When working to an Evidence and Conclusion Model, someone's name and age (e.g.) are extracted from that text or image and entered into name and age data items unique to that piece of evidence. This is the Evidence part.

The output from the analysis stage is similarly documented in data items unique to that analysis. This output is identical in format to that from a Conclusion-only Model.
Working to a Conclusion-only Model tends to minimise the number of conclusions for an item, e.g. just one Birth event is usually recorded for an individual. Working to an Evidence and Conclusion Model will result in as many Birth Events per real-life person as there have been analyses. As a result, working to an Evidence and Conclusion Model will show more intermediate steps than otherwise.

Evidence and Conclusion Process - This process is intended to describe the steps genealogical researchers go through. It may be carried out formally, invoking the Genealogical Proof Standard (q.v.), for instance, or informally. In summary, it involves setting a research objective; looking for evidence to support or deny the objective; analysing the evidence and coming to a stated conclusion. Throughout the process, the researcher should review the current results and may loop back to re-start at a previous step - even altering the objective if it appears valueless.


This sounds like how our research is documented, using Sources, Citations, analysis of that information, and perhaps another document for our research.

My question is, Do any genealogy programs handles this work now that is different from our normal documentation?

Understanding how Family researchers how they do and document their research is important to this project.

The second question has to do what, how these terms impact this project?

What are the data elements, specifically to these two models, need to be spelled out on the Wiki for the BetterGEDCOM project?
GeneJ 2011-01-10T15:18:30-08:00
Hi Russ:

As you know, I'm not the expert in what is called the "evidence - conclusion data model."

(1) Do any genealogy programs handle this work?
Short answer, no.
Longer answer, the concept is a little like the "merge as alternate" process using FTM-Ancestry's little green leaf (as I see it in FTM for Mac), except the user would be often developing their own indexed entry data as part of the process of entering the source.

How is this work different from our normal documentation?
If your process is to first enter FTM "fact" related data to your database, and then create a source for that fact, the evidence-conclusion data model would be different.
In the evidence-conclusion data model, you would first enter a source. As part of the process of documenting the source, you'd extract particular databits from the source into particular fields available at the source level. Those extractions would then appear in your database as kind of FTM "fact"--we'll call it a FTM evidence-fact. The field entries in that evidence fact would be specific to the noted source.

I have an appt right now. Will try to check back later and take a stab at you last question.
hrworth 2011-01-10T15:35:29-08:00
GeneJ,

First, I enter SOURCES first, when I pick it up and know or have seen information that I will be using in my file. So, I do NOT create a SOURCE for any fact.

So, I guess that means that I am using the Evidence-Conclusion data model.

NO, I would NOT enter details about any FACT in a Source Record. The Source Record is about the details of that source. Source = something physical, like a Book, an Online Record.

After the Source is entered, then Citations would be entered, linked to that Source, that DOES provide the details of what is in that Source. The Citation is associated to the Fact or Event.

Sorry but "merge as alternate" doesn't mean anything to me. An Ancestry leaf is a hint that there may be someone on Ancestry.com that the user might want to look at.

Russ
GeneJ 2011-01-10T16:52:27-08:00
Russ wrote, "What are the data elements, specifically to these two models, need to be spelled out on the Wiki for the BetterGEDCOM project?"

Believe that as time permits, the data elements are being written onto the BetterGEDCOM comparisons that is accessible from the wiki index as "BetterGEDCOM Comparisons."

https://bettergedcom.wikispaces.com/BetterGEDCOM+Comparisons
mstransky 2011-01-10T19:06:16-08:00
Just wanted to through two cents in here. About the evidence and conclusion model(s).

Knowing that some structures may or mat not support it but might like too.

Can someone understand like a (INDI) like an index card or other segment of data like Location and source book.

We all know how people are structured like a single segment.
Then by the (FAM) record relation performs the pedigree for the "LINKING" of people to people.

My question is can we be creative to make the segment records hold a source "INDEX CARD OF INFO"
and create a "LINKING" structure such like (FAM) to capture a researchers step to step to a conclusion.

WHY? because on an exchange people can share the INDI records without needing a family pedigree if they do or not wish to share. Being that the actual record john smith index card can get passed with out with out imbedded pedigree linkage.

If in the case a Researcher wishes to pass a group of citations from sources they can pass the "Stack of index Segements" with or with out the extra step to step conclusion such like pedigree linkage.

If you don't get my drift, I can wait to we have a beta block of data. or if someone supplies a chunk of such data, I can rewrite and structure the data and let the cards fall.

May you might like it or not, it is just something I have been looking at along with modifying the gedcom pedigree outside of the INDI records to support a more compact linkages for Step, legal, adopted, biological, etc.. relations.

I guess the best way to say it is...
(INDI) (SOUR) (PLAC) are like complete index cards of information on thier own.

While (FAM linkage) (EVIdence linkage) (others) are like strings which are outside of the INFO SEGMENTS but capture the relations and navigation from one to the other.

I do it this way and it seems much more flexible then stuffing many Segment records inside other segment records making them complex to handle from one structure to another.

Please do get offensive by any means, I can wait and rewrite a chuck of someone elses data and so yous what I mean.

its just something to take a look at and maybe consider later.
ttwetmore 2011-01-10T22:49:25-08:00

Let me throw in my own comments about the evidence and conclusion process and the evidence and conclusion model ...

The evidence and conclusion process was the name I proposed for the process a genealogist goes through when discovering and researching ancestors. The process is described in numerous places under different names. As I have described it the steps are:

Collect all the evidence you can find about persons of interest, which may include evidence about persons with similar names as those of interest before you can infer they are not persons of interest. For each piece of evidence record its source.

Organize the evidence by extracting event and person information from each piece of evidence. Each event mentioned and each person mentioned in each piece should be recorded as a separate item. Each of these items should reference where in the evidence it came from.

Evaluate the evidence by arranging the person items into sets that you believe represent the same real individual. Evaluate the events these persons participated in to make conclusions about the real events and real relationships that the real individuals had. For each final individual, record and justify the reasoning behind why you arranged the person items into those individuals.

This is the same process described as “The Inferential Genealogy Process," by Tom Jones. His steps are 1) Start with a Focused Goal; 2) Search Broadly; 3) Understand the Records; 4) Correlate the Evidence; and 5) Write Down Results. The same process can be found described slightly differently in many sources that teach genealogical research techniques.

The evidence and conclusion model is the set of data objects that a genealogist uses when carrying out the evidence and conclusion process. If that process is carried out MANUALLY then those data objects are items that are on paper and/or written down – the evidence might be copies of official records; the extracted information might be separate index cards or slips of paper for each person mentioned in each event found in the evidence; and the conclusions might be piles of those cards or slips that the researcher decides represent the same real individual. The researcher justifies the conclusions by describing the reasoning that led to the final makeup of those piles.

If the evidence and conclusion process is carried out on a computer then the application must provide its own representations of the entities needed to carry out the process. This boils down to a set of records in a computer database. Records to hold the evidence, and records to hold the event information, and records to hold the person information extracted from the evidence are required. Also required are the records used to hold the conclusions, basically the groups of person records the researcher decides apply to the same real individual.

Most of the difference of opinion in the structure of the evidence and conclusion model, at least in my opinion, revolves around the NATURE OF THE PERSON RECORD. In some models (e.g., GEDCOM, GRAMPS) the person records become COLLECTING POINTS for all the information that a researcher believes applies to the same real individual. That is each person record represents a growing conclusion about a real individual that is added to as the researcher makes decisions about the discovered evidence. The original person records that only held the information about a person taken from a single event from the evidence LOOSES ITS INTEGRITY as it joins with other person records to become the final person records representing the concluded real individuals.

The other approach, the one I advocate, is to always KEEP THE LOW LEVEL PERSON RECORDS INTACT, and create new person records to represent the concluded individuals by having a mechanism whereby the new person records refer to the original person records. The original or low level person records I call the evidence persons, and the final records for the assumed real individuals I call the conclusion persons. The major advantage of this alternative is that the original data never loose its integrity so that the PROCESS CAN BE REVERSED and modified, effectively rearranging the piles as it could have been done in the manual process.

Note that the process supported by GEDCOM and GRAMPS is akin to a manual process, that no researcher would condone, of erasing or destroying the first level of index cards or paper slips as the information from those cards and slips are rewritten onto newer and more complete index cards. This process is not reversible because the early evidence based records are destroyed. GRAMPS partially solves this problem by maintaining all the event records where the low level person records came from, but the person records themselves are subsumed into the conclusion persons. The DeadEnds model that I advocate is modeled on the manual process, as the lower level evidence persons are never destroyed, but are used to build person record structures. The DeadEnds model also supports the concept of multi-level conclusions, but I won't go there now. I have written about this in other places. The multi-leve approach also has its analogues in the manual method, which is why I believe it is a valuable approach.

Tom
hrworth 2011-01-10T23:18:38-08:00
Tom,

First, thank you for the details and I am trying to understand. But this dumb end user finds it very difficult to understand the difference between a Person Record and a Low Lever Person Record.

Are you saying, that within a Database, where ever that is, using what ever application, that there will be TWO entries referring to ONE Person.

I must be really slow here. I enter Data into a database, and I am now using three, and could use a fourth, to try to understand this. I enter information about a Person, that persons relationships, and those events or facts about that person.

Each and every entry has a Citation, based on a Source, and with details in that citation about that person, relationship, fact, or event.

As I go along, I evaluate, look for conflicts, look for consistencies, to try to make sure that I have the right information about this "person of interest".

Maybe it would be helpful, if I knew what I am doing wrong so that I can understand your model or explanation.

Why would I want to destroy anything? Why wouldn't I want to keep, record, and evaluate the "good, the bad, and the ugly" that I find in my research.

I have an example. Early on, I found a location specific book about families in a specific area. The author had a number of books in this specific area. So, I recorded what I found, and was pretty certain, that the dates, places, and information was 'reasonable'. Much of the information that I took from that book, was not in my database. But, it was recorded and cited.

Finding more information, from other sources, that information was also recorded and cites. But, stepping back and look at all of the information about a specific person of interest, things didn't add up. Something was wrong. Not everything was wrong, but the data was inconsistent.

Over time, some of that data, from that book, made me conclude not to rely on the data in that book. I made a note in the book, that the data was not reliable, BUT it did become a resource for bits and pieces to help additional places, dates, and people, to research.

And this is just me, but I don't want to toss that book, but don't rely on that information from that book.

That is the evaluation that I have done, for the people in that piece of my family file.

How does that fit your model? What am I doing wrong.

Again, I AM trying to understand what you have posted, here and other places in the Wiki.

Thank you,

Russ
ttwetmore 2011-01-10T23:59:04-08:00

Russ,

There may end up being MANY person records that refer to the same real individual.

I think it is best to be very careful with terminology to avoid confusion. I try to use the word person only to represent a record in a database, and I try to use the term individual to represent a real person who lives or once lived. ALL CONFUSION ABOUT THESE POINTS COMES FROM DIFFERENT USES OF THE WORD PERSON THAT MEAN COMPLETELY DIFFERENT THINGS.

A low level person record is a slip of paper in the manual process or a database record in a genealogical application that represents A PERSON EXACTLY AS MENTIONED IN EXACTLY ONE ITEM OF EVIDENCE THAT YOU HAVE FOUND IN EXACTLY ONE SOURCE. In many cases such a low level person record is NOTHING MORE THAN A NAME linked to some event information, which is why these records have been called NOMINAL RECORDS since analytical genealogy began decades ago. If you find your grandfather in two city directories you create TWO LOW LEVEL PERSON records for him. When you find him on an official death record you create another one. When you find a land record for him or a military record, or a census record, you create more.

In the case of your grandfather this might seem like the height of fancy and silliness, and you might choose not to do it at all (I don't do it by the way), but for more distant ancestors, those completely outside the realm of living memory, you have to do something like this.

So you may end up with MANY low level PERSON RECORDS representing a much smaller SET OF REAL INDIVIDUALS. For your grandfather it's obvious to you which low level person records refer to him, so obvious that you don't even think about the underlying genealogical process your brain is going through. But for a very distant ancestor you will end up with fewer of these records and it will be much more difficult for you to determine which of those records refer to the REAL INDIVIDUAL WHO IS YOUR ANCESTOR and which of those records simply refer to other individuals with similar names. At this point your job is to decide which of those low level person records refer to your ancestor and which don't. This is the "putting into piles" in the manual process or some analogous process in a computer application.

Low level person == evidence person == one mention of one person from one item of evidence

High level person == conclusion person == hypothesis person == represents assumed real individual == built up from one or more low level persons == conclusion reached by researcher

I assume the reason you find this confusing is because you are not thinking about the capability of recording all those low level person records, because modern applications don't give much support for it. This is the old paradigm problem. If your tools don't support an important concept, that keeps your from thinking about the concept. Because of the limitation of modern genealogical applications you have probably come to think of your database as only holding records about real individuals (in my careful sense of the word). That is the paradigm provided by your software. The paradigm shift required for software if it is to truly support the evidence and conclusion process is to also properly support the low level person records. Because your software doesn't support the concept you don't see the need for the concept. You keep all the low level person records "in your head" until you decide to add information to one of the conclusion persons in your database.

If applications are to make the paradigm shift into supporting the evidence and conclusion process, the segregation of the person concept into different levels must occur. Think of the Better GEDCOM model as providing a much needed impetus for driving this paradigm shift into genealogical applications.

Tom
mstransky 2011-01-11T05:49:28-08:00
In my layman terms of a sturture I collect all sorts of documents pictures and sources.

(sid) I give descriptions for each plain documents such as Book, newpaper, pictures, court records. This is just all about the source DOC. not what is inside.

(pid) this is a generic placemarker which is like the conclusion person for the researcher or HIGH level person. NOTE my (fid) also seprate performs pedigree on PID's to keep them linked properly.

When the researcher find a cition they want to create this is done in the (eid) or evidence/event table. Here a person can input "John Smithe" found on page 5, age/birth if given, and father or step father to ?. From this INDEX CARD RECORD the research can link by ID who the (pid)conclusion person it is. and also the (sid) the information came from.

Now I like that all SID source records stand seprate on their own, also PID stand seprate on thier own area, and EID's are a seprate area.

If a researcher wishes to share a pdigree outline only, they only have to export the FID and PID index cards per say.

OR if one researcher only whats the facts and not a half *ss outline thrown together by someone else but just the Soirces and Citations, they only import the SID and EID areas giving them a searchable database with linked citiations to such sources.

If the latter was done, the EID having no person it points too, the researcher can goo one by one verifing each linkage to their own PID table or create people for them. also can disregard faulty collected data.

My (MID) (revamping) holds images of the sources. tis way a researcher can visually verify the other persons so called interptation of the text or hand writting.

Even still this MID can be shared as is on not with others.

LAST if a person wants to only share a complete segemnt of a each of the areas (PID, EID, SID, MID) they choose the person or surname and it will generate an output text similair to gedcom as a complete db but smaller version of the whole. that smalleer one can import what you want with out having to handle a complete DB whitout having to handle all the extra collected data you dont want in the first place which would merge into yours cluttering it up.

So if you grasped my concept, each key function of Collecting data sources (SID), interpetating the data(EID), scanning doc or linking (MID) createing people and pedigrees (PID & FID) are all seprate functions which are NOT IMBEDDED inside of each other making flexible mini DB's which can be shared more rubustly among other.
GeneJ 2011-01-13T16:35:02-08:00
Evidence and Conclusion process/model; new question
Evidence and Conclusion Model definition now includes the clause, "..the researcher may (in best practice, should) document the individual steps."

I am not aware of a best practice calling for "document the individual steps."

What is required is a well written, well organized proof.
AdrianB38 2011-01-14T04:06:20-08:00
Gene - re my (I think it was my) statement that "the researcher may (in best practice, should) document the individual steps."

I note with some interest that you don't know of a best practice calling for "document the individual steps" and you say that "what is required is a well written, well organized proof."

Interesting... This is where my background as a mathematician obviously subtly alters my world-view. I would suggest that no mathematician would ever be allowed to publish a proof that didn't show the individual steps. Not only that but it was always drummed into me as a kid at school "Show your working!" (And if you're trying to find a problem with someone's answer in a maths exam or a genealogy report, seeing the working is essential. I've been there in both maths and genealogy.)

So to me, while the phrase "document the individual steps" might well not appear in specific documents, the idea that a well organized proof might _not_ include the individual steps is utterly alien.

In case you're worried, I think we have to be sensible about what "document the individual steps" might actually mean. If I say that "there is only one person named XXX born about YYYY in ZZZZ in the census", I don't expect anyone to say "I logged on to Ancestry, entered the following search criteria, etc, etc..." This is an important step - I am losing faith with the number of supposedly well-organised proofs that I see on the internet (frequently supposed to be showing good practice in evidence management, ironically) where the logic that someone is actually the relative of the correct name, and not someone else with the same name, is just totally omitted.

Gene - we are in total agreement over the need for a well written, well organized proof. What I'm, I guess, suggesting, is that "well organized" should include showing the individual steps at some sensible level of detail.
DearMYRTLE 2011-03-01T16:47:26-08:00
Mathematicians and others scientists are careful to report every step of their process, and their work is subject to peer review.

For peer review in genealogy, a well-organized proof statement is required, according to the GPS Genealogical Proof Standard, explained in brief at the Board for Certification of Genealogists website: http://www.bcgcertification.org/resources/standard.html

From here we see that the concept "Soundly reasoned, coherently written conclusion" includes these two contributions to credibility:

* Eliminates the possibility that the conclusion is based on bias, preconception, or inadequate appreciation of the evidence
* Explains how the evidence led to the conclusion

In several classes/courses I have taken from Thomas W. Jones, Ph. D., CG (SM), CGL (SM) FNGS, FUGA, he says that every step of the process does not need to appear in the written conclusion.

Other elements of the GPS include:
-- Reasonably exhaustive search
-- Complete and accurate citation of sources
-- Analysis and correlation of the collected information
-- Resolution of conflicting evidence.

GeneJ, since we're talking about BetterGEDCOM dealing with a "futuristic" genealogy database program that has elements not currently available, genealogists PRAY that would include the ability to record the research process in detail, yet for publication, only the well "Soundly reasoned, coherently written conclusion" would appear in a specific report.

Faithful end-user to end-user transfer of data would include all elements of the research process, including discounted or negative research, as we call it in genealogy.

Indeed, the first thing I'd ask GeneJ for, if I'd found her "Soundly reasoned, coherently written conclusion" on a website would be for a detailed write-up of her research process. Only by looking at that could I evaluate if she had done a reasonably exhaustive search of surviving record groups, etc.

So we are talking semantics here.

Both of you are in the same ball park, however.

:)
GeneJ 2011-03-02T14:17:27-08:00
The concept that we want to preserve the "steps" or the "process" has me buggered.

If I were doing work for hire, I'd certainly prepare travelogue style report--what I did and what I found. I don't see that value in a family genealogy database file.

Before I converted to the Mac and in the day when I could print from my software to a word processor, I entered research notes, sources and citations in real time. The most common form in which I shared things were research memos (which were mostly written about sources) and family group sheets. As for the latter, I almost always included a source list with the FGS.

I expect that source list to tell the story of my "body of evidence."

Myrt, you and I discussed this separately a little bit. I use (or used) software that allows me to enter all of my sources. Even when sources become outdated by subsequent research, I don't have to lose the record of those. Likewise, I can create research notes and associates, allowing me to pull even more sources together for review.

Good discussion about this all last night in SL, btw. TY. --G





If you are not conducting research for hire, I don't know why you would need to separately report on your process.
AdrianB38 2011-03-03T06:41:24-08:00
Gene - it's a question partly of degree - what do you mean by "steps", and what do we. I certainly don't feel the urge to read a travelogue from you!

But... One of the major reasons that we cite our sources is to allow others to retrace our logic and find out where we've gone wrong. Now, skip from Family History to me, 30y ago, marking maths exam papers. (This is the top year at school, just before university). If the answer is correct - give them the full 5 marks. If the answer is wrong - what have they written?

If they only wrote the answer, they get zero marks.

If they wrote out all the steps in their argument, I have to go through their logic and find how many errors they have made. If they only made 1 error, they get 4 out of 5 marks. Two errors gets 3 out of 5. (This process is a pain, of course)

The point about the above analogy is that I had to assess the working and I had to see enough of that working to find the error. Similarly, we need to see enough of your working to be able to understand and follow your logic. No more than that - but no less. Now, if the raw data from the sources is sufficiently clear to tell the full story, then no more than the conclusion and the sources are needed. But all too often it isn't and the reader is left grappling with a host of nicely documented sources, wondering how on earth it fits together.

So - if I could read all your stuff and repeat it all myself, quite easily, then your stuff is soundly done. If there's an unanswered question "How did she know that?" - then that's the step that's missing.

Like I said, the classic error is the assembling of a series of records to discover someone's birth details where the logic that someone really is the relative of the correct name, and not someone else with the same name, is just totally omitted. That's the step that's missing there and what I'd want to see.
GeneJ 2011-03-03T08:28:56-08:00
Hiya Adrian ..

I have more to write, but just want to clarify.

When you write, "If they only wrote the answer, they get zero marks."

"Just the answers" has that conclusion only ring to it. Just want to confirm that we are both talking about a database that includes the sources and citations (reference notes and source lists).

We are talking here about the differences between the evidence-conclusion process and the evidence-conclusion model, right?

I see the evidence-conclusion model as favoring a form of compilation (which I think belongs in a research log), where as the evidence-conclusion process is quite different and certainly not limited to compilation based materials.

P.S. And you wrote, "if I could read all your stuff and repeat it all myself, quite easily, then your stuff is soundly done" Except that, in RL genealogy, family historians often differ about what is direct or discernible from any given document.
AdrianB38 2011-03-03T14:59:20-08:00
Gene - I'm not talking particularly any database, process, model or anything.

I'm simply talking about my basic idea that a well argued bit of logic (a "Soundly reasoned, coherently written conclusion") needs to include the intermediate steps ("Explains how the evidence led to the conclusion"). (All this can be written in long-hand or a word-processor, without a genealogy app)

Otherwise, we're back to the mathematicians who didn't show any intermediate reasoning or working,so I couldn't give any credence to their answer when it seemed to be wrong.

And you are VERY right about "family historians often differ about what is direct or discernible from any given document". Isn't that why you should write out what you think it says, and why?

These are the only intermediate steps I want to see - and I'll bet you're actually doing them somewhere!
DearMYRTLE 2011-03-04T03:41:23-08:00
I recall taking an Association of Professional Genealogists Professional Management Course from Elizabeth Shown Mills a few years ago. We had several sessions, and worked through case study citations with EVIDENCE EXPLAINED in open book format.

Elizabeth said something to the effect that its all well and good to leave that trail of source citations for OTHERS, but it is really for US. This way we can retrace our steps later, evaluate our own work, and see the holes in our work with fresh eyes. Through our detailed notes, we can pick up a research project that had been laid aside in favor of another line of research.

How about we create the possibility of keeping track of our steps, so people like me can use the option, but people who don't think it important won't have to worry?

It is a lot easier to provide for more, but use less options, than provide for less but end up needing more.
GeneJ 2011-03-04T09:25:10-08:00
To the extent these "steps," this "process," somehow involves each and every evolutionary advance in each and every conclusion or interpretation of evidence otherwise entered and sourced in a file, then I do not share your opinion.

User to user, specifically aside from client work, I fail to see why documenting that evolutionary process should be important.

Other than for proof comments needed to resolve conflicts in sources cited for the work you are sharing, I don't need to know that for three years in the 1980s you had incorrectly identified your 3rd great grandfather.

I just don't care if you found Joe's marriage record (with its incorrect birth date and location) BEFORE you found the timely entries for his birth and baptismal record. (I care that you found them, and that you explain the difference.)

I'm working right now to update a biography written in 2008; it's one I hope will be published. The head of family is a New Hampshire man who was twice married--14 children. The text has a few explanatory footnotes (these are six proof summaries) and inline references--the text is three pages in length. Because it's a submittal, full reference notes are provided as endnotes and there is a separate source list. Currently, excluding New Hampshire vital records, there are 126 endnotes (11 pages) and 75 unique sources (four pages).

For the most part, the endnotes and source list entries were extracted from the master source/citations in my TMG database. I am editing those notes for context (this presentation is about one family, as opposed to some four generation presentation).

One of the footnotes (as opposed to the endnotes) pertains to little Hannah Preston (see http://theycamebefore.blogspot.com/2010/12/conflict-resolved-another-one-bites.html )
That footnote is now based on my current evidence and analysis. It reads as below:

"Scott Andrew Bartley, ed., _Vermont Families in 1791_, vol. 2, pg. 165, reports death of Hannah 04 Mar. 1797, Rumney, citing VR; submitter Sprague attributes death to Hannah, dau. William Preston and Hannah Healey; however, William and Hannah’s dau. Hannah had married 1774 to Asahel Brainerd, resided Rumney at 1790, and lived to be about ae 88; the Hannah who died at Rumney in 1797 was the 10-mos. old dau. of William and Elizabeth, and twin of Joseph. The 1797 NHVR (Rumney) confirms the parents of Hannah then deceased to have been William and Elizabeth Preston."

More to write, perhaps later. --GJ
GeneJ 2011-03-04T09:30:44-08:00
err ... I meant, "Other than for proof or comments needed to resolve ..."
AdrianB38 2011-03-04T13:02:31-08:00
Gene - "I don't need to know that for three years in the 1980s you had incorrectly identified your 3rd great grandfather"

Yes, I suspect we are actually in violent agreement! It's the question of what "every step" means that's confusing us. I think.

As far as I'm concerned it means every step that leads to the correct conclusion. I'm not anxious to let everyone know I can't tell the difference between 1903 and 1905 in a baptism register, so I'd not write down that I misread Eleanor's baptism. What matters is that the baptism reads 1905 and that I identify it with my Eleanor because X, Y, Z... And I explain all this.

Your explanatory text is a perfect example of showing the steps to id your Hannah's death - in fact it goes beyond the line of duty by killing a false story.

So from where I'm sitting, as I suspected, you're actually doing pretty much what I'd hope for!
GeneJ 2011-03-04T13:04:11-08:00
:)
testuser42 2011-03-04T17:28:42-08:00
This is a good discussion!
I think Myrt has nailed it:
How about we create the possibility of keeping track of our steps, so people like me can use the option, but people who don't think it important won't have to worry?
It is a lot easier to provide for more, but use less options, than provide for less but end up needing more.


In order to be able to document ANY bit of the intermediary steps and the reasons for our decisions, we need the possibility to document EACH of them. Choosing whether to use this possibility is entirely up to the researcher.

I believe GeneJ is a much more thorogh researcher than I've been up to now. But I started my family research on the computer, without much of a clue as to best practices. So now I've trouble to retrace many of my early steps. Now, I've started adding notes wherever I feel it might be necessary for future understanding.
I think Gene has this way of working problems out on paper (or in the research log) and then only very solid PFACTs ever make it into her computer program. This is laudable, but I think many people won't have that discipline ;-)
We can help these people if we make it easier to record the things that Gene has in her research log in an ordered way on the computer.
ttwetmore 2011-03-01T09:35:18-08:00
Moderator for the Glossary
During the developer's meeting on February 28, I accepted the role of moderator for the Better GEDCOM glossary. Primarily this means to make sure work gets done on adding important or useful terms to the glossary. Since I am probably freer throwing around terms without adequate explanation than most other members of the Better GEDCOM community, I felt that taking on this role was justifiable punishment.

I've taken a quick look at the current glossary page and feel that excellent work has already been done; I would recommend that everyone take a look at that glossary because many concepts are already well described. I think the best thing for me will be to create a list of more terms that I believe require a good definition, propose each of them on a separate discussion thread attached to this page, and see how things go. Ultimately the new terms will put into their proper place on the existing glossary page. For the interim I may create a page just for the terms that are currently being worked on so it will be easy to find and think about just those.

To avoid being too influenced by previous works I will write most of my initial proposals straight out of my own head without detailed consultation, though I'll probably use a dictionary or two. This may make some of my terms redundant with others, or may overlap others, but I hope these issues will be easily resolved later.

Tom W.
ttwetmore 2011-03-01T09:58:57-08:00
Repository -- Proposed Term for the Glossary
Here is my first contribution to the glossary. I chose something simple and obvious to get started, the "repository." In writing the definitions I addressed the issue that many terms we use in computer based applications are heavily "overloaded". Sometimes we use the term to refer to a real physical thing; sometimes to an abstract entity in a model; sometimes to a representation of the physical thing in a computer file or database. Maybe I am being pedantic about this, but I think the distinctions can be important and can lead to misunderstandings if not addressed.

So I'm defining three different uses of the term "repository" below, and I've put the context of each one into a parenthesized phrase. I will have to define physical, model, database and file and other terms eventually to be complete, but I hope this isn't too confusing or too much overkill.

Repository (Physical) -- An institution such as an archive or library, or any other site or location or web service, that collects, manages, archives, curates or indexes, and makes available source items for research.

Repository (Model) -- An abstract data model entity that represents physical repositories. In models that use object orientation the repository entity may have sub-types that represent various categories of repositories.

Repository (Database or File) -- A record in a computer file or a computer database that represents an actual repository from the real world.

Tom W
gthorud 2011-03-01T16:20:10-08:00
So, are we going to refer to Physical Repositories, Repository Model and Repository Records? And then define these "two word terms", possibly having a 4't for Repository referring to the 3 others?Or are there other ways to distinguish between them?

We should not use the parenthesis, eg. (Physical), in text using the definitions.
ttwetmore 2011-03-01T16:46:20-08:00
I am open to suggestion. How about one definition that summarizes and gently covers all three without being so blatant? (See below or a proposed change). For Repositories, making these distinctions is a bit over the top. For Individuals and Person, however, among the terms that cause all kinds of misunderstandings, there are six or seven or even more definitions that we all use or mean. Working on Repository first was my attempt to practice understanding how to distinguish the three main ways in we use many terms: 1) as a physical thing in the real world; 2) as a Platonic entity in an ontological network of concepts (in other words, a model object); and 3) as a computer based representation in a database or file. Many arguments about these concepts stem from what where we unthinkingly are on this spectrum, and not all of us take the time to understand what is really going on when we are arguing about these concepts.

How do you like this modification to the Repository definition, that more gently captures all three important interpretations of the term?

Repository -- An institution such as an archive or library, or any other site or location or web service, that collects, manages, archives, curates or indexes, and makes available Source items for Research. Repositories are included in most Data Models as an Entity that represents physical Repositories. In Models that use Object Orientation the Repository Entity may have sub-types to represent different kinds of Repositories. Repositories are represented in computer Databases and Files as Repository Records that conform to the definition of a Repository Model Entity.

Tom W.
hrworth 2011-03-02T03:19:03-08:00
Tom,

Might you consider this:

an archive, government office, library, or other facility where research materials are held

There might then need to be a little more detail on what an archive is, like online, national, state, local. But, as you suggest, some of the Data Models might contain more specific information.

Russ
GeneJ 2011-03-02T07:11:14-08:00
(1) I treat Internet sites (Google Books, RootsWeb, American Ancestors, etc.) as publishers, not repositories.

The old APG-L at RootsWeb has had more than one thread on this topic. To that list in 2004, Helen Leary wrote, "Does this help, If you have to open a door, it's a repository." (http://bit.ly/eN5CpK)

In the same thread, Mills writes, "...I would not consider Ancestry a repository ... In this analogy, the repository would be the Internet or the World Wide Web. But, just as a citation for books does not normally cite a repository, there's rarely a need to cite "Internet" or "World Wide Web." That is generally understood from the URL."

(2) Also, note separately the part of Mills' quote above that says, "a citation for books does not normally cite a repository." In other words, if we are defining "repository" as it should be applied to source lists and citations, we need to either look at it differently or more closely. In a "usage" context, "repositories" are generally not cited for things that were published and reasonably distributed.
hrworth 2011-03-02T07:54:28-08:00
GeneJ,

I understand how you handle those items and why. But, the definition I uses was in Evidence Explained! on page 828.

Your treatment of Internet Sites are addressed in her book, as you know, but they follow under one of the definitions used for Repository.

Para 2.19, page 51 talks about Citing Repositories.

You are also correct that "Repository" may NOT be included in a Citation. But that isn't what is being addressed here.


Russ
GeneJ 2011-03-02T08:06:21-08:00
I don't think my comments were off topic.

If are defining repository as it should apply to source lists and citations, we need to either look at it differently or more closely.
hrworth 2011-03-02T08:10:40-08:00
GeneJ -- Didn't say it was. But, I thought we are trying to define Repository and not it's use in Source Lists and Citation. Just the Term.

Your comments certainly belong in the BetterGEDCOM wiki.

Russ
GeneJ 2011-03-02T09:14:04-08:00
What is the reason then for us to have the originally proposed first and third definitions?
Aren't we okay, for both the first and third, using "Repository" (capital R), extracting the genealogical definition from Evidence Explained (we have it posted already on the wiki - http://bit.ly/eEB4G7 ). Allow participants to further vet that definition.

As for the second proposed definition, "Repository (Model)," are you, Tom, getting at the field or fields used for Repository-Name, Repository-Short Name, Repository-Address, etc., to which you have added the concept of Repository-Type? -G
ttwetmore 2011-03-01T14:34:18-08:00
Model and Ontology -- Proposed Definitions
Here are two more proposed definitions ...

Ontology -- The organization of a domain of knowledge that is usually hierarchical into the Entities and Relationships required to represent information from that domain. In this restricted sense of the word an Ontology is nearly identical to a Model.

Model -- A Network or Structure of Entities and their Relationships, used to represent a restricted knowledge domain or Ontology. Models are used as specifications for the design of computer Databases whose Records generally represent Instances of the Model's Entities. Models are also used as specifications for the design of computer File Formats that hold Records conforming to the Structure of the Model.

Tom W.
gthorud 2011-03-01T16:23:51-08:00
My initial reaction is that Ontology is a bit to complicated. Where do we need to use this term?

Could we not say Data Model?
ttwetmore 2011-03-01T17:13:14-08:00
How about I get rid of term Ontology and then lower case the word in the definition of model? And add Data Model as an alternative term?

Data Model; Model -- A Network or Structure of Entities and their Relationships, used to represent a restricted knowledge domain or ontology. Models provide a map for describing and understanding the domain. Models are used as specifications for the design of computer Databases whose Records represent Instances of the Model's Entities. Models are used as specifications for the design of computer File Formats that hold Records conforming to the Structure of the Model. Models are used as specifications for API's that computer applications use to access services that are performed on representations of Model objects. Models in the genealogical domain include Entities that represent the key concepts of Genealogy, including Sources, Evidence, Persons, Events, Names, Dates, Places, and others.

Tom W.
gthorud 2011-03-01T17:30:06-08:00
Not bad, maybe define "domain" in a parenthesis after the word, unless separately defined. If we do not need to define domain for some other purpose than to be used in this definition, there is no need for a separate definition.

Concepts of Genealogy could perhaps overlap with, or be related to Genealogical entity. See separate discussion topic on that one. _ but I don't really mind since there are examples, and "concepts of Genealogy" is not presented as a defined term.
gthorud 2011-03-01T17:05:16-08:00
PFACT (Property, Fact, Attribute, Characteristic, or Trait)
The current definition is

PFACT - Acronym of "Property, Fact, Attribute, Characteristic, or Trait," coined by T. Wetmore of the Better GEDCOM effort, in an attempt to help avoid confusion caused by the use of these many synonyms for the same concept. Conveniently pronounceable as "fact."


As I have stated before, this term should be deleted, and a more well known term should be used. We need to choose one of Property, Fact, Attribute, Characteristic, or Trait as the preferred one.

The term Characteristic is used in the Requirements Catalog to mean a "PFACT" about people, families, groups, places, ships etc." has been used in many places in the Requirements Catalog. If another one is chosen as the preferred one, we can change it now, but it will be a LOT of work to change it later.

Since my native language is not English, I am on thin ice, but I feel that trait may not be the one - I had to look it up in oder to understand what it is. And Attribute has a specific meaning in data modeling and is also used with a specific meaning in Gedcom.

If we can not choose one, there will be a lot of confusion in the future. A solution is urgently needed.

It would be helpful if some of those who has not participated in the discussion before could contribute.
ttwetmore 2011-03-01T19:52:39-08:00
Yes, I coined the term PFACT. It was during a time when there were many posts coming to the wiki, and the different posters were using the various different words, and the discussions made it clear to me that all were using these terms to mean the same thing, but it was not being recognized, and much confusion was thereby caused.

If we can all agree on one of the synonyms as the final choice, that's fine with me, but I'll be willing to bet as new people come on board, all the old confusions will reappear, with regular discussions recurring about what the term should be. Remember, this is a "democracy", and every one who joins one of these efforts will want to reopen EVERY decision that has been made before they joined.

It was my idea that the term PFACT could serve as our "internal code word" for the concept until we were able to make the final choice toward the end. If that end has come now, so be it, let's retire the term. Personally I don't think it matters when we make the decision. It only has to be make before we have something official to publish, and that might be a long way off.

I much prefer the term Attribute. I don't really care if it has a specific meaning in certain concepts. If it's the best term for the situation, it's the best term for the situation. I hope we don't use Fact or Trait. I would be happy with either Characteristic or Property. Of course Property also has a storied history in computer techy talk. I basically don't like Characteristic because 1) it is the longest term and 2) it was used by the GenTech model and I have an aversion to everything GenTech. But, as always, I'm comfortable (enough) with the majority view.

Tom W
gthorud 2011-03-02T07:01:38-08:00
If we could use search and replace on the whole wiki, incl links, to fix things later - and if I knew that every reader of this wiki would read definitions as the first thing, we could have PFACT - unfortunately that is not the case.
gthorud 2011-03-02T08:13:48-08:00
It might be useful to list some examples of what sort of info that this term might cover, and in which context the term will be used.

And then try to write the definition before the term is chosen.
AdrianB38 2011-03-02T08:40:13-08:00
Personally I dislike Attribute because life can get rather difficult when talking about the attributes of an entity type. Especially if the entity type is called Attribute!

Trait is quite an unusual word.

Fact is something I'd like to reserve for the super-entity type that Event and Characteristic / Property / whatever as sub-types of. If we ever need to refer to it.

Property is slightly dodgy because of its potential confusion with "real estate" but I could live with. Yes, in theory it's just as much an IT term as attribute but I associate it with Object Oriented terminology (classes and all that) and while we're likely to be talking Data Models, I doubt we'll be talking OO. And if we do, I'm sure the IT guys can easily do the standing on their heads necessary to separate the 2 uses of a word. After all, half of them will have written a Data Dictionary at college where all this needs to be done - entity types called "Entity Type", "Attribute" and "Relationship".

Characteristic is neutral - well, it is for me, not sure about Tom! <grin>

So I would prefer one of Property and Characteristic.

Examples - physical appearance, name, occupation, type of location, built-by, etc.
gthorud 2011-03-02T19:54:11-08:00
A link to a previous discussion about this http://bettergedcom.wikispaces.com/message/view/Individual+Data+Elements+Discussions/30806897

Are there other discussions where this is the main issue?
gthorud 2011-03-03T07:26:25-08:00
I am sure that everyone will not agree, but here is my attempt. I think we need to discuss the term in relation to other terms.
The definitions are very informal, and far from complete, but you should be able to understand what I mean.

Attribute – a “part” of a data model entity or a record (I choose this since it is an established term in these contexts since 40-50 years ago) It is my understanding that it is unlikely that we will have an entity or record called attribute, we will have events. Let us handle the Attribute entity when the need occurs, if at all.

Event – A data model entity or a record type. The real world definition should not imply anything about the duration of an event. The distinction between Attributes and Events in Gedcom records is not carried forward into BG. An event entity or record can contain facts transferred in Gedcom Attributes.

Property - a fact or characteristic of a real world object that can be represented by an attribute, event or entity (record) in models or records. (Characteristics is a selected set of properties used to distinguish it from other objects, it is a subset of an object’s properties) It seems to techy to use the term Attribute in the real world.

Fact – In the real world a fact about anything, in models it can be represented by events or attributes (and maybe more). It can be true or false.

My 12 øre worth (6 øre = 1 cent) ….
DearMYRTLE 2011-03-04T04:03:17-08:00
As an end user, I have the following understanding of these terms:

CHARACTERISTICS - distinguishing qualities, personality. (I tend to agree with Gier here.)

Merriam-Webster's Dictionary http://www.merriam-webster.com: Synonyms: affection, attribute, attribution, character, criterion, diagnostic, differentia, feature, fingerprint, hallmark, mark, marker, note, particularity, peculiarity, point, property, quality, specific, stamp, touch, trait"


ATTRIBUTES - This doesn't do anything for me, as I also see this as too techy. But the dictionary states Synonyms: affection, characteristic, attribution, character, criterion, diagnostic, differentia, feature, fingerprint, hallmark, mark, marker, note, particularity, peculiarity, point, property, quality, specific, stamp, touch, trait

PROPERTIES- real estate comes to mind first, then my old chemistry class, so I agree with Adrian. This word probably isn't the best choice if a single word is chosen to describe the uniqueness items that describe/define a Person or Individual.

I wasn't upset with PFACT, but then I am accustomed to learning new acronyms by looking them up in the software's glossary.
testuser42 2011-03-04T17:33:15-08:00
One good thing about PFACT is, there is no chance of confusion. It means nothing but what we say it means.
gthorud 2011-03-01T17:21:24-08:00
Genealogical entity
The term "Genealogical entity" has been used in the Requirements Catalog" to mean "people, families, groups, places, ships etc."

The word ship is AS I UNDERSTAND IT just an example of "vehicles" or other "things" (maybe something owned by a person) and maybe more, that someone (cant remember who) wants to identify in a BG file/model. It might be wise to keep the discussion about what the "ship" is out of the discussion about what a Genealogical entity is, for the moment, and focus on people, families, groups and places, although not excluding what "ship" might represent.

The term Genealogical Entity is as I understand it a term used to denote REAL WORLD objects and relations, as opposed to entities that might be in a data model or file.

Is the term acceptable?
gthorud 2011-03-02T07:10:51-08:00
Well, we will see. Let us get back to Genealogical entity.
AdrianB38 2011-03-02T08:22:10-08:00
I think I used "Genealogical Entity" as pure short hand for that list "people, families, groups, places, ships etc."

It was meant (when I said it) to refer to the entity types within the data model corresponding to people, families, groups, places, ships etc. Note that the list excludes source, citation, repository, for two reasons:
- the list was meant to refer to entity types matching types of things in the real world, i.e. not in the study of genealogy / family history.
- the entity types matching types of things in the study of genealogy / family history might well have somewhat different attributes and relationships, whereas the "real world" types are not too different at a high level of abstraction.

Re ships: Yes, think of a better name, please! It did seem to me though that ships are some of the most obvious examples as they carried so many families and one can get pictures of them on Ancestry, etc. So I could see them playing a big role in telling the rich story of how people crossed the Atlantic - link to maps, e.g.

I did consider "Artefact" as a type name but that sounds too much like what you dig out of the ground.
hrworth 2011-03-02T08:38:47-08:00
Adrian,

As I understand Artifact, it is something that may be held privately. The "stuff" that may have been handed down through a family. Like a Family Bible.

Evidence Explained! had a whole list of privately held Artifacts. There isn't a good definition in that book however.

Old Stuff that people or organizations might collect. I might gather information from an Artifact.

Russ
AdrianB38 2011-03-02T09:05:44-08:00
Russ - thanks: "Artefact" ("Artifact"?) is therefore definitely not the sort of name I want to use to cover Ships, Trains, Planes, Statues, Paintings, etc.
hrworth 2011-03-02T09:17:22-08:00
Adrian,

You might have a Statue or Painting as an Artifact.

Isn't Artefact a variant of Artifact?

I was looking at Evidence Explained! and that is how I understand the word to be spelled. But the "e" may be a variant of the "i".

Russ
GeneJ 2011-03-02T09:20:06-08:00
Humm... why not just Special Entity or Entity-Sp?
GeneJ 2011-03-02T09:25:35-08:00
Question: Are the following all examples of Genealogical Entities? If not, which are and which are not Genealogical Entities?

Ship
Cemetery
School
Church
1926 Firestone Family Reunion
Great Chicago Fire of 1871
Hurricane Katrina
Civil War
Diabetes
California Gold Rush
Catholic
ttwetmore 2011-03-02T10:11:15-08:00
I tend to have strong opinions about these things, and my opinions tend to be very practical and based on my own version of common sense, at least I think so.

For me the guideline for what we should allow to be a genealogical entity is whether it is an important enough "noun" concept that we could imagine having lots of records of that kind of thing in a database, and that we would want those records to be independent, separate records, that we would want to index them, be able to search for them, maybe generate reports about them, have windows that display lists of them, allow them to be referenced from many places, and so on.

And if an noun-type concept IS NOT a genealogical entity, what is it?. Well the answer is very simple -- it would be the value of some attribute of some entity.

Take school for example. If a school were a genealogical entity we would have a separate record in our database for the school, and anyone who attended that school would have a reference to that record:

e.g.,

0 @i1@ INDI
1 NAME Fred /Snurfbucket/
1 EDU
2 SCHOOL @S1@
...

0 @S1@ SCHOOL
1 NAME Churchland High School
...

If a school were not a genealogical entity then the implementation would be like this, via the value of an attribute:

0 @i1@ INDI
1 NAME Fred /Snurfbucket/
1 EDU
2 SCHOOL
3 NAME Churchland High School
...

Another practical consideration is quite important. If the schools are going to be reused hundreds or thousands of times in a database, say because a researcher is working on all graduates from a number of institutions, then if the school concept is a genealogical entity there will be only one copy of its details with then many references to that copy. If the school is not an entity then the details about the school have to be duplicated all those many times. In some sense you can't know the answers to all these questions before hand.

Note something quite interesting here. Most genealogical models now have place/location as a genealogical entityh, whereas most early models did not. Why? To cut down on the duplication of information in databases.

The answer, IMHO, all boils down to whether the concept is "important enough" in the ontology/model of the knowledge domain (in our case genealogy) to deserve its own Entity type, with some of the practical considerations like the duplication one I've just outlined. I think there would be lots of disagreement about your specific list if we all tried to answer. But, I'll give it a try:

Ship -- I'd hope there would be a catchall Entity I could use for this
Cemetery -- I'd hope there would be a catchall Entity I could use for this
School -- I'd hope there would be a catchall Entity I could use for this
ChurchI -- 'd hope there would be a catchall Entity I could use for this
1926 Firestone Family Reunion -- I would treat this as an Event Entity
Great Chicago Fire of 1871-- I would treat this as an Event Entity
Hurricane Katrina-- I would treat this as an Event Entity
Civil War-- I would treat this as an Event Entity
Diabetes -- I would treat this as a value of some medical Attribute
California Gold Rush -- I would treat this an an Event Entity
Catholic -- I would treat this as a value of some religion Attribute

Tom W.
GeneJ 2011-03-02T10:44:52-08:00
Tom, Gier --- So, given the personalized nature of this entity type, if we just replace the word "ship" with "special" or "User Defined," have we not satisfied the original question?

I'm assuming that in practice the word "special" or "user defined" would be replaced by the word of the users choosing.
gthorud 2011-03-02T12:34:15-08:00
A Genealogical entity as proposed is an entity in a Genealogical model that represent a real word person, family, group, place or “ship/thing whatever”. Sources, Citations, Repositories are excluded. Since there will be Sources, Citations, Repositories in a Genealogical data model - there is likely to be confusion.

If no conclusion can be reached, for the purpose of the Requirements Catalog we could perhaps use the whole list, i.e. Entities representing persons..."ship" (whatever "ship" term we decide).

Geir
AdrianB38 2011-03-03T05:54:04-08:00
"Genealogical entity" should not go into the Glossary - I used it purely as short-hand to save writing the list out again and to make the resulting text easier to read, so it has relevance only to that document.

Re "ship" - I'd be happy with Miscellaneous-Entity - my Miscellaneous-Entities would include various ships, various types of steam locomotive, various aircraft types, various medal types, each with at least a name, notes, and a type... At least. But the "database" and application program would still refer to them as Miscellaneous-Entities.
gthorud 2011-03-05T11:22:20-08:00
The term Genealogical entity has been removed from the Req-Cat so there is for the moment no need for this term.
ttwetmore 2011-03-01T17:37:06-08:00
Gier,

I believe the opposite is true. The term Entity seems preferred by most to mean a component of a data model, synonymous with the the term Class. This was talked about a little while ago on this wiki. I have always preferred the term Class for this concept, but it was pointed out to me at the time that Entity has the right modeling "ring" to it, and Class is too "computer science-y", too close to "implemenation" for purist data model people. The use of the term Entity in modeling is time honored and goes way back to the original papers on ER (entity-relation) models and diagrams, from which sprang the marvelous world of relational databases. In other words model people claim that they have precedence on the word Entity and that they should be the only ones allowed to use it. This is my own interpretation of things of course, but you'll find general agreement I think.

I agree that Ship is probably not a front line type in our model. I would point out however, that we should include some generic entity type that can be used to represent all kinds of disparate things. Just like the generic EVENT that can have a TYPE, we need a generic THING that can have the TYPE ship.

Tom W.
gthorud 2011-03-01T19:27:13-08:00
I think our requirements should be understandable by non-computer people, so class sounds a bit too techy for me.

Re the Ship-THING, or THING of type Ship, it might help if there were more concrete examples. Could there be a time when we might want to attach more data than a type, and probably a name, to this "object". Do we want to able to say that the ship is blue? Or does it have a registration number - a car - of subtype Trabant. In what data structure would we want to refer to it - only in events? Should it be level zero? Is it mentioned in a source? Can several individuals own it? Can it be located in places, at a certain time? can it have a story (note)? Can there be a photo of it? Could it be a pet? Or a doll? --- Just trying to trigger some thoughts.

How would a program present information about it in various reports?

Again, I think we need more god examples on order to justify it, but in general there are lots of THINGS that we relate to during our lives.

Well, maybe the discussion should take place on the requirements catalog, since there is/should be a separate requirement, but I don't want to delete the above once I have written it.
ttwetmore 2011-03-01T19:39:52-08:00
Geir,

In my opinion a genealogical model must be able to cover every concept a genealogist or family historian might want it to cover. The ship that an ancestor sailed in when emigrating/immigrating from one location to another is one of those concepts that is very important to many genealogists. I picked this example more or less out of the blue during a discussion on the wiki a long time ago when trying to "stretch" for an idea on why I thought we should be open to ideas. I still feel this is very valid.

I find nothing unusual about allowing some general purpose objects with general purpose attributes into a model. I find nothing unusual about working toward a complete ontology of what genealogists and family historians want to have in their databases, and slowly working towards this more complete ontology. I don't think we can figure out these things from day one, but I think these are reasonable goals.

I can see the smoke coming out of Louis's ears right now as he starts imaging all the tags that my words imply will have to enter our lexicon.

Tom W
ttwetmore 2011-03-01T19:57:04-08:00
Gier asks "Re the Ship-THING, or THING of type Ship, it might help if there were more concrete examples. Could there be a time when we might want to attach more data than a type, and probably a name, to this "object". Do we want to able to say that the ship is blue? Or does it have a registration number - a car - of subtype Trabant. In what data structure would we want to refer to it - only in events? Should it be level zero? Is it mentioned in a source? Can several individuals own it? Can it be located in places, at a certain time? can it have a story (note)? Can there be a photo of it? Could it be a pet? Or a doll? --- Just trying to trigger some thoughts. "

yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes (actually there might be a no in there somewhere, but I hope my intent is clear).

Let "Structured Flexibility" be your guide.

Tom W.
gthorud 2011-03-01T17:52:09-08:00
Individual
Since Gedcom has a structure for Individuals, this should be the preferred term rather than people or persons.
hrworth 2011-03-04T04:52:18-08:00
Tom,

I must be doing everything wrong in my research. I am sorry, but I do NOT START with an "evidence person" nor a "conclusion person". I START with a Person, a name of a person.

I then collect source information that may or may not provide that the Name I have is a real person. I then evaluate the evidence I have to confirm or disprove that this person existed, or that the name I have for this person is correct, incorrect, or in fact may have different names for the same Person.

When that Name/Person becomes an Evidence or Conclude Person, I don't know yet, because I still don't understand how that works. I don't understand, basically, because the various programs that I use to lead me to draw those "conclusions" nor lead me to thinking in those terms.

Russ
ttwetmore 2011-03-04T12:36:07-08:00
Russ,

I have explained this in previous answers to your questions many times.

Since today's genealogical systems don't explicitly support both types of persons, OF COURSE TODAY YOU START WITH "JUST" A PERSON RECORD. That's all there is so you can't start with anything else. I hope you understand that it is a MAJOR GOAL of BETTER GEDCOM to expand the model of genealogical processes to include support for both EVIDENCE AND CONCLUSIONS and therefore evidence persons and conclusion persons.

BUT and HOWEVER, whether you know it or not, whether you understand it or not, when you THINK you are starting with JUST A PERSON, you are ALMOST ALWAYS starting with an EVIDENCE PERSON. I have defined EVIDENCE PERSON MANY TIMES, but it seems I must do it once again. An EVIDENCE PERSON is a Person record that contains THE INFORMATION ABOUT A SINGLE HUMAN BEING THAT COMES FROM A SINGLE (ONE, 1, UNO) ITEM OF EVIDENCE (to repeat, A SINGLE ITEM OF EVIDENCE). And isn't that almost ALWAYS WHERE YOU START TODAY? BUT, as soon as you ADD MORE INFORMATION TO THAT PERSON RECORD, IF THAT INFORMATION COMES FROM ANY OTHER SOURCE OF EVIDENCE, THAT PERSON RECORD IS INSTANTLY TRANSFORMED (RIGHT UNDER YOUR EYES) FROM AN EVIDENCE PERSON TO A CONCLUSION PERSON. No bell goes off, your program doesn't flash anything at you, there is no color change or symbol added. You still call the person JUST A PERSON, because that's all the support your genealogical system provides. But Russ, if you are ever to truly understand the genealogical research process, you must come to grips with what has just happened in front of your eyes. A CRITICAL TRANSFORMATION HAS HAPPENED WHERE SOMETHING THAT WAS ONCE JUST EVIDENCE HAS BEEN CONVERTED INTO SOMETHING THAT IS A CONCLUSION. AS SOON AS YOU ADD INFROMATION FROM TWO OR MORE SOURCES TO A PERSON RECORD YOU ARE MAKING THE VERY DISTINCT AND VERY OBVIOUS AND VERY IMPORTANT DECISION/CONCLUSION/HYPOTHESIS THAT ALL THAT INFORMATION REFERS TO THE SAME REAL HUMAN BEING. THAT IS THE DEFINTION OF A CONCLUSION PERSON. So you ARE WORKING with these TWO TYPE OF PERSON RECORDS ALL THE TIME. Some of the Persons in your database are EVIDENCE PERSONS and some of them are CONCLUSION PERSONS. It is what you do that decides what they are. AND THIS IS ONE OF THE PROBLEMS THAT BETTER GEDCOM HAS SET OUT TO SOLVE. TO SUPPORT A MODEL THAT MAKES THESE TWO TYPES OF PERSON RECORDS OBVIOUS, AND TO MAKE THE TRANSFORMATION FROM EVIDENCE TO CONCLUSION VERY VISIBLE.

Today's systems expect you to build up your person records from all your evidence, meaning that your EVIDENCE PERSONS disappear completely as soon as possible as you combine or add any new infomation. In a model that supports Evidence and Conclusion Persons YOU NEVER GET RID OF THE EVIDENCE PERSON -- THEY ARE PURE EVIDENCE AND THEY SAY FOREVER -- THEY ARE THE RAW MATERIAL OF YOUR GENEALOGY. The fact that today's system force you to get rid of them is a travesty and the main reason for all today's problems with genealogical standards and software, and it is a (in my opinion THE) main goal of Better GEDCOM to do something about this.
hrworth 2011-03-04T13:05:10-08:00
Tom,

Its time for me to bow out of this project.

Good luck,

Russ
GeneJ 2011-03-04T13:17:24-08:00
Hi Tom ..

Only hoping to clarify.

What you are talking above is a particular Evidence-Conclusion MODEL (as opposed to the evidence-conclusion process).

Separately, Tom, I don't think an understanding of the "genealogical research process" really has much to do with whether or not someone would appreciate or implement this particular Evidence-Conclusion Model.

At least as far as I have come to understand the Model, I think it will be attractive to those who favor a compilation style and those who do a lot of thin research.

(1) By the time I consider something "evidence," it is already material to a case at hand. It's not just information I'm adding to my database. I put information in research notes--some of those notes are database friendly, a lot are not. (I have the same formatting and citation linking problems with complex research notes that I have with proof arguments--there is only so far I can push formatting in today's databases. )

(2) You wrote, "Today's systems expect you to build up your person records from all your evidence, meaning that your EVIDENCE PERSONS disappear completely as soon as possible as you combine or add any new infomation."

That is right. My database reflects the changed kaleidoscope, or at least I hope it does. I don't have to loose my old sources, or even my old notes, at least not until I am ready to lose them. Some commentary about all this on another wiki page (http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/32669136).
GeneJ 2011-03-04T13:36:08-08:00
errr... and I think researchers with high process styles will appreciate the Model, too.
ttwetmore 2011-03-04T14:00:09-08:00
GeneJ,

I am sloppy about the model/process terminology. I type so fast that I'm thinking words ahead of the words that are coming out the ends of my fingers. The wrong words often get inserted unawares.

Let's see if we agree. The E & C Model "simply" provides the Entities and Relationships that are required for an application program to support the E & C Process through its user interface. If an application didn't support the process actively it could still let the user have access to the E & C records and let them build up their own conclusion structures. The Model his the things. The Process has the mechanisms to use the things.

Does that agree with you view of the difference between them? The Model provides the required structure; the process uses it. Do you the "steps" of the process as somehow in the model, or are the steps part of the process?
testuser42 2011-03-04T16:55:20-08:00
I made a page to help myself and others understand how the work-flow of genealogy software looks today, and how it might change or stay the same if the software would support the Evidence-Conclusion Process (better).

Working with an Evidence-Conclusion-Model

Maybe if we work at it this could become something to explain this difficult concept with real-life implications.
I believe the whole "Evidence-Conclusion" idea is hard to grasp, there will be many people who will not see how it is different or why it should be implemented. It's kind of like those images of a vase or two faces -- once you get it, you get it. But it's nearly impossible to see one when you see the other.
GeneJ 2011-03-04T21:04:33-08:00
Hiya Tom,

Thanks for taking time to respond, and thank you for asking. We might differ on what the evidence-conclusion process is, I don't know.

  • There is information ABOUT a source, and information IN the source. The information IN the source (if you will, the "data") is not more important than the information ABOUT the source. To support the Evidence-Conclusion process, you have to get that balance right. I don't think the Model has that balance yet--the Model emphasizes data IN the source. (Is this concept is a little US centric, because of the array of jurisdictions, source types, etc.?)

  • The sourcing process (understanding and recording information ABOUT the source) is too misunderstood, too bulky, too frustrating and error prone, too user-to-user unfriendly, and it's traditionally been poorly supported by the mega-sites. Indeed, this part of the process is skipped or otherwise short changed by many users. If we were working to support the process, then before we worked with "data," I think we'd deal with the sourcing process.

I don't find the Evidence Explained source-citation system complicated--but making it work in today's genealogical software is painful!! To make it work in both software AND GEDCOM might alone be worth certification.

Getting sources (ala, master source and full reference note type entries) entered in good form to the database real time during the research process seems the NUMBER ONE stumbling block to the evidence-conclusion process. I look forward to the day when dear Myrt's class teaches us how to convert an Excel based research log to genealogical software so it can be shared via BetterGEDCOM!!

Separately then, there is a difference between information and "evidence." The latter is supposed to be relevant to the problem. It can be direct, indirect or negative. In the Model, is there a step before the data is entered in which the information is determined to qualify as evidence? Where is the step that identifies indirect evidence? Negative evidence?

Aside from qualifying as relevant, then where is the step when evidence is compared and contrasted with all the other known information about the person, the family, the town, the times, etc. in order to learn synergies and find possible conflicts?

Since we are talking about BetterGEDCOM, when sources and these Model steps get recorded and shared, won't other ppl be zapping them into their files? Does that mean somehow "your steps" become recorded as "their steps." Do your sources become recorded as their sources?

Perhaps more later. Food for thought.
ttwetmore 2011-03-04T22:19:49-08:00
GeneJ,

Most interesting. I call the information ABOUT the source the Source. And I call the genealogical relevant information IN the source the Evidence. I hope there is a direct translation there. Of course Sources also contain all kinds of information that are not genealogically significant, so that information doesn't become Evidence.

So when I find a new Source (book, census roll, birth certificate) I record a description of that as a Source Record. I then extract the genealogical information from that Source that I am interested in and call it Evidence. These things that are Evidence are things like Event records that extract information about an Event directly from the information in the Source. They include Person records that can also be extracted. There are a lot of interesting sub-issues about exactly what one can and can't extract and what form it should take (I find census records a great source of all kinds of examples of many issues of extracting Evidence). And I wonder whether, in genealogical research, there is any other kind of Evidence record that needs to be extracted from a Source.

I think the question of how research steps and "work in progress" gets stored in a genealogical database is a very good one and I have thought about it quite a bit. I have an answer that I think works well and I have put the hooks in the DeadEnds model to support it. I have been writing up a description of how that works and hope to get it done soon.
louiskessler 2011-03-04T22:55:31-08:00

If I might add at this point, I think of what Tom refers to as "Evidence" to be the same as what GEDCOM refers to as the "Source-Citation".

I agreed with Tom that Evidence needs to be a record unto its own during the big discussion we had on this at: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138

There are probably a lot of good definitions that arose from that thread, and they shouldn't be lost.

Louis
gthorud 2011-03-05T05:32:26-08:00
I do not think this discussion should take place under the Subject Individual. A more appropriate topic should be created, possibly pointing back to this one.
ttwetmore 2011-03-05T05:49:51-08:00

Geir,

I agree, and have placed my next post on this thread (a very important post if I do say so myself) to the appropriate page about the Evidence and Conclusion Process.
gthorud 2011-03-01T17:58:02-08:00
It might seem unnecessary to define the term, but how do we then create a list of preferred terms in order to be a little consistent wrt terminology.
gthorud 2011-03-01T18:01:15-08:00
It could be defined simply as the preferred term for individual, person and people (although the latter is plural).
ttwetmore 2011-03-01T19:33:12-08:00
Gier,

I disagree with choosing Individual without some discussion. I'm working on definitions for these terms and would appreciate a chance to make my proposals before this decision is made.

In general I much prefer the term Person. A few points.

First, I don't see any reason to give precedence to GEDCOM terminology. There is nothing about GEDCOM that I feel deserves our automatic respect. Short, capitalized, non-words are, IMHO, horrible ways to express tags. The faster we can get away from a number of these GEDCOM mistakes the better. Of course, this doesn't make the word Individual wrong, but it does make INDI something I hope we can put in the trash can.

Second, most models use the term Person instead of Individual. GEDCOM is unusual in that way.

Third, when the Evidence and Conclusion Process is discussed, the term Individual has often been used for the "Conclusion" concept, and the term "Person" has been used for the "Evidence" concept. It would be good to get all these issues aired before making a decision. I don't necessarily agree that we need two different terms for the two concepts (I use the term Person for both of them in the DeadEnds model), but many people have been uncomfortable about using the same term in both contexts in the past, that this distinction between the two words has been established.

The terms Person and Individual are used in many different ways. Please wait for my proposal and then tear it apart. The term Persona is also used in the more limited "Evidence" concept. Also the term Nominal Record has been often used for the Evidence concept. I hope there is enough food for thought in here that you can see it is hard to simply decide on the term Individual before having a chance to think about all the different concepts that that term could apply to.

Tom W.
gthorud 2011-03-02T06:40:12-08:00
Since I entered this in a discussion, it was not my intention to prohibit any discussion.

I have read a huge number of standards and I have worked on a few, but I have never seen one that is not able to choose ONE term for a concept that is so important that Individual, Person or People is in this standard. If we are going to write a standard, we should perhaps try to do it as the others does. Using both Person and Individual would be scrapped in the first meeting of a propped standardization committee.

If we are going to define each term in two or three dimensions, one for the real world, one for data modeling and perhaps one for records, things get complex. See the current proposed definition of Individual, Person. The current Glossary of terms uses parentheses to specify the context. It seems that approach could also be used here. One context, perhaps the default one that perhaps need no context, is “real world”, another one is data modeling and a third is evidence/conclusion. But, preferably there should be a qualified term (e.g. two words), or a different term, used in specific contexts, e.g. Conclusion person - that should have a separate definition. It may sound bureaucratic, but it is a way to create a non-ambiguous standard.

My personal preference in this case, maybe language dependent, is person – that is the every day word, Individual is a more advanced/academic term.

So, when there are several terms, where the choice does not cause any problem with “backwards compatibility” with specific terms in Gedcom, as I think is not the case here (no conflict), we can use a different term. This means that we should consider being backwards compatible where Gedcom has a specific term, but there is no rule saying we shall choose the Gedcom term. But unless we think otherwise, the default should be the Gedcom term in order to minimize confusion. When we redefine such a term that is important in Gedcom, our definition should state explicitly that there is a difference – preferably specifying the difference. If we call our work BetterGEDCOM, we can’t completely ignore Gedcom - it should be a continuation.
hrworth 2011-03-02T06:44:23-08:00
This End User thinks that PERSON is the term to use for the Better GEDCOM.

We shouldn't have two.

Russ
ttwetmore 2011-03-02T08:53:18-08:00
In my definitions I am trying to define the terms as they are being used, not necessarily as we wish they were. Once Better GEDCOM decides what terms it wants to "officially" sanction, then we should add that clarification to the definitions with any possible concomitant simplifications.

The fact is, the terms Individual and Person are currently NOT used as pure synonyms. When you read my definition I hope that some of the subtle differences were apparent.

But I agree that the term Person is much to be preferred over Individual and should be used exclusively if possible. However, if the Better GEDCOM model decides that there should be two DIFFERENT Entities, one for the evidence person and one for the conclusion person, then we may be stuck choosing two names. In the DeadEnds model I use the same Entity, named Person, for both types of person Entities, and all in between, but I have encountered resistance to my approach. The issue for Better GEDCOM is simple -- if BG encompasses the Evidence and Conclusion Process, it must have at least two types of Person entities in its model. What do you want to call them? I want to call them all Persons and make the distinctions on how they are used. Any way you do it, though you still have the describe the subtleties of the different meanings.

Tom W.
hrworth 2011-03-02T09:12:23-08:00
Tom,

Doesn't the term "Person", in what you have said, have at least three attributes (may not be the correct term).

Person:

1 - Evidence
2 - Conclusion
3 - others

You get to the "attribute" of that person using the two processes you mentioned.

Russ
AdrianB38 2011-03-02T09:13:59-08:00
Tom - "at least two types of Person entities in its model"
I seem to remember going through this with you and Greg and my coming to the conclusion that there was no obvious difference between the two types of entity. In fact, if I built layer upon layer of persons (which wasn't necessarily what you were advocating), then the same person entity could be a conclusion person output from one line of reasoning and also an evidence person input to a later line of reasoning.

So it's a matter of _transient_ usage, and no difference (I _suspect_) between the attributes and relationships. On that basis, "If it waddles like a duck and quacks like a duck, then it is a duck". Or - there's only one entity type to me, but 2 NON-exclusive ways to use it.
GeneJ 2011-03-02T09:16:32-08:00
Tom wrote, "if BG encompasses the Evidence and Conclusion Process" ... don't you mean "if BG encompasses the Evidence and Conclusion MODEL."
ttwetmore 2011-03-03T15:25:28-08:00
Russ,

In what I have said, genealogical models that embrace the evidence and conclusion process/model must allow for person records that are limited to hold only the data extractable from a single item of evidence, and must allow for person records that are the gathering places for all information that we believe applies to a real human being. The first are evidence persons the second are conclusion persons. I don't think of these as attributes of a person but as different interpretations of what a person record holds. But one could have an attribute of the person record to indicate what kind of person record it is. And I have also said that I prefer not having just two extremes, that is evidence persons and conclusion persons, but to allow for a tree of person records, conclusion at the base, evidence at the leaves, and partial decisions in between.

I don't understand what you mean by "You get to the "attribute" of that person using the two processes you mentioned," because I don't know what you mean by the two processes.
gthorud 2011-03-01T18:29:36-08:00
Place vs Location
The term Location has been prosed to be used instead of Place. I don't remember where this was proposed, it was one of the early discussions about place, but it may have had something to do with the fact that we also want carry data about higher level "places" eg. Countries, States, Counties, Parishes etc. But I may be wrong about why Location was proposed.

Gedcom uses the term already even if what it carries info about in 5.5 is the names of their places.

We should probably NOT change Place terminology to Location, but - again - English is not my native language.
ttwetmore 2011-03-01T20:42:06-08:00
Person and Individual -- Proposed Terminology
Here is my proposed definition of the Person and Individual terms. I kept them in the same description because they are so synonymous you can't describe one without the other.

Okay, chop, chop, chop away:

Person; Individual -- Both terms are used to mean a real human being who exists or existed. One or both terms are also used in Models as the names of the Entity or Entities that represent human beings. In Models that contain only Conclusion level objects, only one Entity is needed so only one term is used. For example, Individual is used in the GEDCOM model, and Person is used in most other Conclusion only Models. In Models that contain Evidence and Conclusion and sometimes intermediate level information, two Entities are often used so both terms may be used to help clarify and stratify the model. The term Individual is then often used for the Entity that represents Conclusion level information, the Entity that represents all information about separate human beings that has been discovered from all the Evidence, giving the current, complete, summary view of a human being. The term Person is often used for the Entity that represents Evidence level information, the Entity that represents only the information about separate human beings that can be extracted from a single Source or single item of Evidence. One model uses the term Person for a single Entity that represents both Evidence and Conclusion level information, and any number of levels of information between them, where the intermediate level information represents partial conclusions. The term Persona is also used in some Models for the Entity that represents evidence level information about a human being; this is the term used in the GenTech Model. Early in the history of procedural Genealogy the term Nominal Record was also used for the evidence level Person Entity. Evidence and Conclusion information about human beings are represented in computer Database and Files as Records. These Records are called Individual and/or Person Records and conform to the Individual and/or Person Entities defined in the Model.

Tom W.
louiskessler 2011-03-02T18:49:35-08:00
Surety
Tom,

I suggest for "Surety", that instead of using TMG's definition - which is based on GEDCOM's, that you use GEDCOM's definition instead.

GEDCOM's definition is under the QUAY tag:

CERTAINTY_ASSESSMENT:= {Size=1:1}
[ 0 | 1 | 2 | 3 ]
The QUAY tag's value conveys the submitter's quantitative evaluation of the credibility of a piece of
information, based upon its supporting evidence. Some systems use this feature to rank multiple
conflicting opinions for display of most likely information first. It is not intended to eliminate the
receiver's need to evaluate the evidence for themselves.
0 = Unreliable evidence or estimated data
1 = Questionable reliability of evidence (interviews, census, oral genealogies, or potential for bias
for example, an autobiography)
2 = Secondary evidence, data officially recorded sometime after event
3 = Direct and primary evidence used, or by dominance of the evidence

Louis
gthorud 2011-03-02T19:09:15-08:00
Is the TMG value "- = the source does not support the information cited or this information has been disproved" or something similar defined by GPS? Or is that captured by a different concept? (not necesarily a minus sign)
gthorud 2011-03-02T19:24:13-08:00
We have a backwards compatibility requirement, so our values must at least cover the semantics of Gedcom, but could add values as long as you don't end up with one Gedcom value that could be represented by several BG values.
GeneJ 2011-03-02T19:58:57-08:00
Posting link to a prior discussion on the wiki about QUAY specifically.

http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/32262084
AdrianB38 2011-03-03T06:18:14-08:00
I dislike the concept of using application-specific values to define something, as how do we handle the case when another application has a different set of values? The word has a meaning outside a specific application - surely we should be recording that meaning?

As for what that meaning is...

I'd be tempted by snipping this bit from the GEDCOM definition:
- "evaluation of the credibility of a piece of information, based upon its supporting evidence"
The corresponding snipped bit from TMG says:
- "the quality of a source in documenting a given fact recorded in the data set"

Neither quite hits the mark for me as neither are quite clear whether they're talking about the end result or one source's contribution to the end result. Which, based on GEDCOM, is what the concept is about. So how about:
- "evaluation of the credibility of a source of information, specifically when measured against a given hypothesis"
I have to include both source and hypothesis in there since a source can be primary for some possible-fact but secondary for others. It's the interaction of the 2 that's crucial.

The glossary is not the place to define requirements on values, etc.
hrworth 2011-03-03T07:30:13-08:00
Louis,

Evidence Explained! (page 829) says, in the Genealogical Context:

"a term adopted by developers of some relational database software to place a numerical value upon the level of confidence a researcher may have in a source"

So your table at the start of this discussion looks great.

Russ
GeneJ 2011-03-03T17:52:19-08:00
Since the passage describing QUAY is out of date, and since the numbering system used by the different programs now and in the future may vary (so that we'd be comparing apples and oranges), I propose we should use the definition Russ posted above from Evidence Explained, "a term adopted by developers of some relational database software to place a numerical value upon the level of confidence a researcher may have in a source."
AdrianB38 2011-03-04T07:02:44-08:00
By all means let's start with an EE definition, but we need to make sure it's fit for our purposes. The definitions in our Glossary need to start by meaning something in the real world of users. As it's written, it's a definition that applies only to "some relational database software". I'm presuming the whole reason we want it in the glossary is that this term has meaning in the real world - or at least, the real world of genealogy. So the reference to RDB software needs to be snipped to give us:
"a term to place a numerical value upon the level of confidence a researcher may have in a source."

Which is a bit clunky, so let me just twist it to...
"a term placing a numerical value upon the level of confidence a researcher may have in a source."

OK - next question - why does it have to be a numerical value? Is "0" a Surety and "Unreliable" not a Surety? As Gene points out, the numbering systems may vary so let's forget that word and just use:

"a term to place a value upon the level of confidence a researcher may have in a source."

And I can add less words to give us:
"an assessment of the level of confidence a researcher may have in a source."

Compare this to the GEDCOM definition:
- "evaluation of the credibility of a piece of information, based upon its supporting evidence"

Assessment = Evaluation, roughly
Level of confidence - credibility, roughly

But the EE derived definition says "source" whereas GEDCOM says "a piece of information, based upon its supporting evidence"

OK - that GEDCOM phrase, as I said above is a bit of a mouthful but think where it is in the GEDCOM spec'n - under the QUAY tag that links a (tentative) conclusion to a source - as part of what gets called, rightly or wrongly, the citation. Whereas the EE derived definition applies - apparently - only to a source on its own, and doesn't refer to any conclusions.

This difference is vital - it may NOT be what ESM intended and we might have taken her words out of context but let's take the example of a Monumental Inscription from a tombstone, which acts as a source.

Do we use Surety as "an assessment of the level of confidence a researcher may have in a source" and therefore apply a single Surety to the whole of the source - i.e. the whole of the stone's inscription? I suggest that we wouldn't do that at all - we'd all say how the date of death on the stone is probably right because the stone was erected not long after the date of death (let's assume it was, OK?). But we'd also say that the date of birth on the stone is a bit suspect because it was so long ago and the person best placed to know their date of birth isn't around any more. So we're surely going to apply a different Surety value to the stone in relation to the date of birth ("possibly right"?), compared to the value to in relation to the date of death ("probably right"?)

Are we? DOES the surety value of the source change depending on the fact / event / attribute that we're using it to support?

If it does then I suggest we alter the EE derived definition to say:

"an assessment of the level of confidence a researcher may have in a source in relation to a given fact, event or attribute."
hrworth 2011-03-04T09:28:34-08:00
Adrain,

You said:

"evaluation of the credibility of a piece of information, based upon its supporting evidence"

Actually, I link that, and stopping at supporting evidence. Isn't Surety related to the Source and/or Source-Citation?

In your tombstone example, I would put a lower Surety Value, if only the Year were listed as a birth year.

So, if I record the Birth Year, for the Person, as information taking from the Tombstone, with the Tombstone as the Source of the Birth year in my database, I would not put a very high value on it.

So, I like your shorter Definition.

Russ
gthorud 2011-03-04T14:19:38-08:00
Adrian,

Your proposal seems reasonable, except that "fact, event or attribute" does not cover all the places where SOURCE_CITATION can occur in Gedcom, eg. Individual, Note, Media ...". So event or attribute are just examples, for the moment.

I assume that work on Evidence-Conclusion will clarify this later.

((Off topic: The values, numeric or otherwise, will have to be discussed later, but I have a hard time understanding how BG will handle different sets of values used by different applications. It will be a source of incompatibility unless the various sets can not be merged into a super set.))
testuser42 2011-03-04T17:08:59-08:00
Adrian suggested "an assessment of the level of confidence a researcher may have in a source in relation to a given fact, event or attribute."
and Geir added fact, event or attribute" does not cover all the places

So, the "piece of information" from the GEDCOM definition would be all-encompassing?

Surety = "an assessment of the level of confidence a researcher may have in a piece of information." sounds nice to me.
AdrianB38 2011-03-05T09:40:34-08:00
So I'll alter the suggestion of
"an assessment of the level of confidence a researcher may have in a source in relation to a given fact, event or attribute"

to read:
"an assessment of the level of confidence a researcher may have in a source in relation to a given piece of information".
Yes, that's neater.

As far as I understand, I can't have just "an assessment of the level of confidence a researcher may have in a given piece of information" because it's not the _overall_ confidence in the "conclusion" we're looking for, but how _one_ particular source supports the "conclusion".

We could have a birth date supported by a "very probable" confidence-rated birth certificate and a "possible" confidence-rated death certificate. My understanding of where Surety comes in the GEDCOM suggests that Surety applies to the combination of information and source, not just to the source. Comments?
testuser42 2011-03-05T12:10:30-08:00
I can think of three places a "Surety" could be used. Maybe there are more, maybe we don't need as many.

  • Confidence in a Source overall, e.g.: "I trust the information in this book very much." This would attach to the Source Record.
  • Confidence in my reading of a Source: "It was hard to decipher the handwriting here, I did my best but it may still be another number." This would attach to the Evidence Record.
  • Confidence in my conclusions: "I'm quite sure this belongs here." This would be in the Conclusion Record.
GeneJ 2011-03-02T18:55:00-08:00
Hiya Louis.

The GEDCOM definition at least looks like it predates the changes in the Genealogical Proof Standard.

For practices here in the US, perhaps we need to look at it further and in terms of the GPS?
AdrianB38 2011-04-21T11:55:40-07:00
Persona - proposed definition(s)
Persona (1) - Use in GENTECH Data Model

GENTECH definition is transcribed below.
<Definition starts>
PERSONA
Type: Dependent. Requires ASSERTIONs to support the data.

Definition: Contains the core identification for each individual in genealogical data, and allows information about similarly named or identically named people to be brought together, after suitable analysis, in the same aggregate individual. Because real human beings leave data tracks through time as if they were disparate shadow personas, this entity allows the genealogical researcher to tie together data from different personas that he or she believes belong to the same real person. The mechanism for this, discussed in the text, is to make different PERSONAs part of the same GROUP.

Primary Key: Persona-ID

Foreign Keys: None

Relationships: One PERSONA is based on one ASSERTION. However, note that an ASSERTION may link one PERSONA to a GROUP, and thus many separate PERSONAs can be brought together into a higher level constructed PERSONA.

One ASSERTION can describe zero or one PERSONAs.

From: GENTECH Genealogical Data Model, version 1.1, 29 May 2000, page 60
<Definition ends>

Commentary - Note there is NO Person entity in the GENTECH Data Model, and a higher level PERSONA may be constructed from several on a lower level - their data is combined to form the information about the higher level Persona.

It is unclear why the term Persona is used in the GENTECH Data Model as the entity appears to have all the obvious characteristics of a Person entity.

Persona (2) Use in newFamilySearch Data Model

A Persona appears to be intended to represent the data extracted from one source about one human being. A Person appears to be intended to represent the sum of the current conclusions about one human being. A Person takes its information from one or more Personas. newFamilySearch uses a two-level data model so Persons are only made up of Personas, which are derived only from sources.

NB 1: There appears to be nothing in nFS that mandates a source record exists in nFS for the Persona.

NB 2: As indicated above, this is a 2-level model only, so there is no opportunity to combine 2 Personas into a new entity and then combine that new entity with a 3rd one to create a Person. All "combinations" must be done at once.

NB 3: The nFS users never see the term "persona" on-screen - they only enter and combine people. Personas are therefore hidden from the user.

NB 4: I do not have access to nFS documentation about their data model. This text represents people's deductions about that model from the use of nFS.
See
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37800364

http://ancestryinsider.blogspot.com/2010/06/evidence-architecture-of-new.html
AdrianB38 2011-04-21T12:04:34-07:00
Having now checked the nFS usage and the GENTECH usage, they use the terms in WHOLLY different ways. So it's not a good idea to use Persona as a term in BetterGEDCOM's work.
testuser42 2011-04-21T13:53:33-07:00
You're right, that term might be confusing. Darn. Nearly every term for anything is confusing someone or the other...
GeneJ 2011-04-21T14:24:37-07:00
I find persona a better term. I'm sure you'd rather I use that term than the alternative I have in mind. :)
testuser42 2011-04-21T15:15:21-07:00
...we could call it a "Level 1 Person" or L1P ;)
AdrianB38 2011-04-21T15:24:24-07:00
Gene - persona may be nicer in some sense - but which definition are you going to plump for? What's the term mean to you?

The GenTech where there aren't any persons at all?

The nFS term where it doesn't carry forward if you have more than one level?
GeneJ 2011-04-22T00:29:00-07:00
@Adrian,

I'm glad to see the pending definition for Persona. I vote "ours." )

Maybe we can wait to how Mike describes things before we move it either way.

(I'm probably more in the GenTech camp.)
AdrianB38 2011-04-22T04:09:48-07:00
Gene - re "I vote "ours." "

Which means what? _We_ don't have a definition of it. We should only define it if it's of some use, and right now the confusion btw the GenTech view of the term and the LDS view is a good reason to avoid its use. If it does describe a useful concept, then we can create a 3rd definition but this carries its own dangers.
hrworth 2011-04-22T04:25:05-07:00
Adrain,

If I am reading this Wiki, see a term that I don't know or a term that I know but not in the BetterGEDCOM project, I want to go to the Definitions page. I want to know what it means in this context.

There are already a number of multiple choice definitions listed, so as we define this "persona" or any other term, I think it should be there and refined as we move to agree on the term and its use for this project.

Thank you,

Russ
AdrianB38 2011-04-22T05:12:50-07:00
That's true - we might not have use for the term ourselves, but it's here in text, so it needs to be defined.
AdrianB38 2011-05-02T02:40:15-07:00
In lieu of any further comments and a definitions moderator, I have made an "executive decision" and transferred the definitions of persona above to the main page.
NeilJohnParker 2011-11-15T19:39:41-08:00
Glossary of Terms: Attribute
In a {add Data}Model an attribute is a property of a entity and sometimes a relatioship, having a name {delete or tag}
NeilJohnParker 2011-11-15T19:56:25-08:00
Glossary of Terms - Data Model
This definition is not a definition of a Data Model but rather of a Entity - Relatioship Diagram whch is one type of a data model, albeit the most popular and prevasive one.

Models in genealogical area include the key concepts of entities such as {add repositories}, sources, evidence, persons, names, events, facts(PFACT), dates, places, addresses and notes and the relationships between these entities.
NeilJohnParker 2011-11-15T20:13:53-08:00
Glossary of Terms - LFT
Surely it is sufficient to mention which languages Legacy supports, not the release date, that is overkill.
GeneJ 2011-11-16T16:09:55-08:00
I agree, Neil. It looks like the few vendor entries in the glossary were entered about the time the wiki opened.

Wouldn't it make sense to move the vendors names to a more prominent page to support developing a more universal list?

Tucking vendor identification into the glossary doesn't seem quite fitting. --GJ
NeilJohnParker 2011-11-16T16:32:14-08:00
I concur.
bamcphee 2012-02-28T13:53:13-08:00
Would it be now an acceptable time since the last post, if I was to remove version details from each genealogy program?
bamcphee 2012-02-28T13:56:04-08:00
May I add a new page to the Wiki called Vendors (or similar), where the vendors, their web address and programs can be recorded?
GeneJ 2012-02-28T13:57:22-08:00
Hi there,

Removing the version detail if fine with me.

Do you think that if new features are noted, we should put a version number or year in parenthesis?

Thanks. --GJ
bamcphee 2012-02-28T13:59:18-08:00
Do any consider a Wiki page containing genealogy programs as useful? It could be linked from/incorporated in the Glossary page if considered necessary.
GeneJ 2012-02-28T14:19:41-08:00
I like the idea of the vendor pages, in part because we could cross reference research postings to such pages.

I just talked to Andy about the vendor pages. He's working on a template for this. Said to tell you that he would correspond with you directly on this, perhaps tomorrow. --GeneJ
NeilJohnParker 2011-11-15T20:15:49-08:00
Glossary of Terms - metadata
{Capitalize metadata} for consistency.
NeilJohnParker 2011-11-15T20:18:26-08:00
Glossary of terms - metadata
The following should be mentioned.

The most common metadata encountered in information technology is data type: i.e. the operations that can be performed on a data item and its permissable values.
GeneJ 2011-11-16T16:28:47-08:00
Perhaps Adrian might chime in on this ... unless y'all would like me to write some more about metadata. :)
GeneJ 2011-11-16T16:30:32-08:00
AdrianB38 2011-11-17T08:53:25-08:00
The metadata definition mentions railways, so I realise that was one of my definitions.
GeneJ 2011-11-17T08:57:43-08:00
Yes, and we like those railroads!

History tab > AdrianB38 Jan 9, 2011 3:03 pm
NeilJohnParker 2011-11-15T20:26:06-08:00
Glossary of Terms - Persona(2)
Personal Commentary
There appears to be nothing in nFS {replace with new Family Search} or define a new term for the glossary called nFS.
GeneJ 2011-11-16T16:25:04-08:00
I inserted a reference to "nFS" in the body of that term. See if that works.
NeilJohnParker 2011-11-15T20:36:09-08:00
Glossary of Terms - Place
I believe that BG needs a clear or at least a clearer definition of the difference between place and address, especially when it comes to placing data into fields labeled as such.

When we have a person being born in house with the name of say Buchinghams Palace, I believe that this information should be part of address, not place and a Genealogic Data Standard needs to state this. Place should not be stated as Buchingham Palace, London, London, England. Otherwise you will have people logical including civic street addresses as part of place.
AdrianB38 2011-11-16T08:16:10-08:00
Conversely, I have been campaigning for the merging of the concept of place and address, partly because of the difficulty in separating the 2 concepts (especially when you're trying to fit reality into a hierarchy with a fixed number of nodes, so you fiddle the extra node into the address!) and partly because, to a logical mind, both represent areas on (or close to) the face of planet earth so what's the difference?

IF BG continues to separate the 2, then a clearer definition of the difference would indeed be welcome but beware its possible impact on place-name hierarchies with a fixed structure!
NeilJohnParker 2011-11-16T11:46:05-08:00
I believe that "professional" or shall I say quality Genealogy calls for recording places as they were known when the event occurred and that this is "generally" understood (however I suspect its implementation is much less than 100%). Nevertheless, I propose that Place generally consists of three parts (in increasing order or specificity or accuaracy) and I will use the traditional names for these i.e. place, address(within Place), and Location (i.e. Geographic Coordinates of latitude and longitude).
Furthermore, when entering places, the computer software should assume that the user has entered the data in free format and if the user has one or more place hierarchies installed, it should validate the entered data (common separated values) against each of these place hierarchy tables and then indicate to the user all hits. the user would then have the option of selecting zero or one of these hits as the correct value. the user would also have the ability to record which place hierachy was used, or whether the data is in fixed format and whether the place represents the names used at the time the event occured or todays place names and of course surety level. Address would include building names as currently many systems allow.
gthorud 2011-11-17T15:39:28-08:00
The problem is that we need one term that applies to what is called address and place in the last posting, i.e. it can be either an address or a place. So Place, used by Gedcom and most programs (as it seems to me), is probably a long standing compromise. I note that at least one program uses Location, which has been proposed earlier on the wiki.

The address that Adrian writes about is, according to Gedcom, a postal address which may be that of an organization.
NeilJohnParker 2011-11-15T20:49:41-08:00
Glossary of Terms - Place Hierarchy
{add} Child Places are (usually) wholly contained within their parent place area and do not overlap with their siblings place areas. ... often defined by different authorities in public administration or private organizations. Authorities may include federal, state or municipal governments, Religion organizations, military organiations or corporations.
NeilJohnParker 2011-11-15T21:02:01-08:00
Glossay of Terms - Relatioship
A relatioship is a connection between two or more entites (person, object or concept) and/or relatioship. For example a marriage event is a relationship between two persons but zero or more note entities or source entities may be connected to it.
GeneJ 2011-11-16T16:19:12-08:00
...Only adding the spelling relationship.
GeneJ 2011-11-16T16:21:12-08:00
Well, that didn't work either, Neil. We may want this discussion deleted and reset so the term will return correctly from "search posts."
NeilJohnParker 2011-11-15T21:04:17-08:00
Glossary of terms - Repository
The sentence that deals with Repository sub-type seems to be unnecessary in a definition but if it is useful, then please include RDBMS as capable of handling sub-types.
GeneJ 2011-11-16T16:15:24-08:00
We might want to have the discussion about this topic continue in the related thread:

http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/35068382
NeilJohnParker 2011-11-15T21:07:44-08:00
Glossary of Terms - Reunion
General comment, I do not believe that it is wise or useful to include current version numbers for ANY software product (unless its highly germane to the text) as it becomes dated very quickly, its superfolous and only adds to the volume of what is already a length document.