by Lance J. Jacob
Welcome to GCU, Genealogical Computing University. I'm Lance Jacob, your instructor for
this course. If you haven't signed up for GEDCOM 101, you are in the wrong class and are
welcome to leave at this time.
This class is one of a three-part series to be taught over the next three quarters. We
at this non-accredited, fictitious institute hope that you will go on to finish the
complete series of classes. Let's begin.
GEDCOM
GEDCOM stands for GEnealogical Data COMmunications. It is a way of formatting
information to use with different genealogy programs. Normally, these programs would not
be able to understand this information. Some of you may be familiar with the comparison to
foreign languages that I have used in the past. If so, bear with us.
Suppose three individuals had just met and could not understand each other. The first
individual could only understand English, the second French, and the third German. In
order for our three friends to understand each other, they have three options.
The first option is for two of the individuals to learn the language of the third. For
example, the Englishman could learn German, and the Frenchman could also learn German.
This may seem like an easy solution--until the two individuals discover that there are no
equivalent words in the third language to translate some of their specific words. I am not
aware of any language that contains an equivalent word or phrase for every word and phrase
that all other languages have to offer.
The second option is for each individual to learn the languages of both of the other
two individuals. In other words, the two individuals would learn two languages. This may
solve a little more of the communication problems by allowing them all to speak in an
agreed-upon language at any one time. But it can be cumbersome as well.
The third option is for all three to learn and communicate using a new secondary
language developed specifically to allow translation of any word or phrase to and from any
of the three languages. In this situation, the three could meet an additional friend who
only speaks Italian. The Italian would only need to learn the new language to communicate
with the other three as well. If he uses a common word or phrase in Italian for which
there is no equivalent in the new language, they could update the new language by making
up a new equivalent word or phrase. An example of this took place in the gold mines of
South Africa. Over the past twenty to thirty years, a language called Fanagalo (pronounced
fun-a-ga-lo) was developed at the mines to allow all of the workers from many language
backgrounds to communicate. The workers represented English, Afrikaans, various other
European languages, and many Black tribal languages (Zulu, Xhosa, etc.). As a new worker
came to the mines to work, he or she only had to learn Fanagalo to speak with the many
people. It was much better than learning several different languages.
GEDCOM is like the third communication option. There are many genealogical software
packages that have their own individual ways of storing information. One may store dates
and places for only births, marriages, and deaths. Another program may store the same
information, but may allow place names to include twenty more characters than the first
program. Still another program may allow the storage of christening, burial, occupation,
and emigration information, etc. It always seems that you and your newly discovered second
cousin are using different programs. In order for the two of you to share information with
computers, a way is needed to convert the information created by one program into
information that the other program can understand.
Instead of writing conversion utilities, each genealogy program developer needs only to
write a utility program to convert its data to and from one standard data format known as
GEDCOM. In order for ten different programs to exchange information, each only needs two
programs: one to create GEDCOM files from its normal information files, and one to import
information from a GEDCOM file into it.
A program can only receive information from a GEDCOM file that it is programmed to use.
Most kinds of information used by all programs can be stored in a GEDCOM file. If your
program has no place to put a specific kind of information, your program will not be able
to use it, even if it's in the GEDCOM file. For example, if your uncle sends you a GEDCOM
file that includes occupations for the different individuals, and your program doesn't
store occupations, your program can't use the occupation information. In this case, you
would need to do without or choose to use another program that stores occupations.
History
The GEDCOM standard has been developed over time by the Family History Department of
The Church of Jesus Christ of Latter-day Saints with input from developers of other
genealogy programs. An informal meeting of several developers was held during the 1985
annual National Genealogical Society (NGS) Conference of the States in Salt Lake City.
During this meeting the concept of GEDCOM was discussed. It was received enthusiastically
by those in attendance. Since then, similar meetings have been held at subsequent annual
NGS conferences. In addition to meeting in person on an annual basis, discussions have
evolved on public information services such as CompuServe and, more recently, the
Internet. Such communication has allowed developers to help each other understand GEDCOM.
GEDCOM has not been accepted by all genealogy program developers. The Family History
Department maintains a list of registered programs that are GEDCOM-compatible. This is the
best source to determine which programs are compatible with GEDCOM. One can be requested
by contacting the FamilySearch Support Unit at (801) 240-2584. In addition to the
above-mentioned list, the annual Software Directory that is published in GC
contains programs that have and have not been adapted to use GEDCOM.
The first genealogy program to incorporate an experimental version of a GEDCOM utility
program was version 2.0 of Personal Ancestral File, which was released in April 1986.
Since that time, the standard has been greatly expanded to fit the needs of more genealogy
programs. Today's version (5.3) is compatible with the original version.
Basics
GEDCOM data is stored in what is referred to as an "ANSEL" text file. An
ANSEL text file is like an ASCII text file, but it can include international diacritical
marks and special characters. In other words, it can include non-English words that
include letters such as (ä). A regular ASCII text file cannot include such characters. A
GEDCOM file can be viewed with a regular text editor; however, I don't recommend doing
this until you know what you are doing.
If you were to look at the contents of a GEDCOM file, you will notice that it is
nothing but a lot of lines of information that are listed in what I would call a
"hierarchal order."
The first record is the header record, which usually provides information on the name
of the file, the source of the file, and the destination of the file. The second record in
the illustration is an optional submitter record that includes information regarding who
created the GEDCOM file. Following the submitter record are three individual records
containing information about three separate individuals. Next is a family record that
gives details pertaining to which individuals are in the family, and the husband and
wife's marriage date and place. An essential component is the very last line, which is
known as the "trailer" record. It indicates the end of the file.
Each line in a GEDCOM file begins with a level number. Lines that begin with 0 are the
start of new records. Immediately following the level number on each line is a word known
as a GEDCOM tag. Level 0 lines are the only exception to this. If a record number is
associated with the record, it will precede the tag on that line. Each tag indicates what
kind of data follows.
A line of information corresponds to the closest line above it with the next smaller
level number. For example, if a line begins with a level number of 4, it pertains to the
closest line above it that begins with a 3. To illustrate this, suppose that you saw the
following in a GEDCOM file:
1 BIRTH
2 DATE 21 Mar 1953
2 PLAC Salt Lake City, Salt Lake, Utah
3 SOUR Personal knowledge
Both the DATE and PLAC tags are prefaced with a level number of 2. They both refer to
the line that begins with the tag BIRTH because its level number is 1 and it is the
closest line to the DATE and PLAC lines. You've guessed it! The date and place refer to
the birth. The level 3 source tag refers to the place of birth.
Another critical part of GEDCOM files are record numbers. A record number is always
enclosed within two "@" signs. If a record number indicates the number of the
current record, it will appear on the level 0 line immediately following the level number
as mentioned above.
If a record number refers to or "points" to another record, it will follow a
GEDCOM tag. The first individual record has a number of I1 and appears as "I1."
The last line of that record reads "1 FAMS @F1@." The FAMS tag means
"Family as a spouse." The line is interpreted to mean that the family who
includes this individual as a spouse has a number of F1. Look at the family record. Its
number is F1. If you look at the next line it reads "1 HUSB @I1@." I'm sure
you've caught on. It means that the husband (HUSB) of the family is the individual with
the record number of I1.
Look at the third individual, who has a record number of I3. The last line of the
record reads "1 FAMC @F1@." The tag FAMC means "Family as a child."
Your job is to interpret the whole line.
Using the pointer record numbers, you can read a printout of a GEDCOM file and
reconstruct family names, dates, and places.
Many people who are beginning to use their GEDCOM utility programs are curious as to
whether the information they put in the GEDCOM file is really there. There are a few ways
to tell this. The safest way is to use your GEDCOM utility program to copy the contents of
a GEDCOM file (you created) to a new file. You can then use your main genealogy program to
inspect the contents of the new file(s) or database.
It is important to create a new file instead of copying the contents of the GEDCOM file
onto the end of your normal data file(s). If you do this, you will end up with two of
every individual in your file(s).
Another way to inspect a GEDCOM file is to use a text editor. You need to be cautious,
however. If you want to use a word processor, you need to know the difference between a
word processing file and a text file. A word processing file includes special control
characters that tell the word processor to do things such as print in bold letters using a
specific font. Text files do not include such control characters. Control characters will
cause havoc with a GEDCOM file. Putting them in a GEDCOM file is like putting water in the
gas tank of your car.
If you use a word processor, you will want one that has a special process for
retrieving and saving text files. For example, WordPerfect has a "Text In"
option [Ctrl] + [F5] which can retrieve a GEDCOM file to view on your monitor. When you
exit, you will want to exit without saving it. If you do a normal retrieve [Shift] +
[F10], WordPerfect will insert control characters into the GEDCOM file as it retrieves it.
And if you accidentally save it when you exit, the GEDCOM file will "have water in
its gas tank."
In Conclusion
Use GEDCOM if you need to exchange information with someone who uses a different
program, or if you want to import information into your program from the popular
FamilySearch®.