Tool Kit for Structural Analysis of Genealogical Data and Kinship and Marriage Networks: An Introduction to P-Graphs using Ego2Cpl, Pajek, PGRAPH and GEDCOM programs, and Browser Plug-ins for Graphic Displays

SEE 2004: Tools for Marriage Network Analysis

The structural analysis of genealogical and marriage data has four parts or objectives as part of the larger study of human social structure (see text below):

  • 1 assembly of genealogical and marriage data for a population;
  • 2 assembly of rich ethnographic, relational and contextual data;
  • 3 graphic analysis and display of patterns of social structure;
  • 4 network analysis of social relations.
    Software for Large Scale Kinship and Marriage Networks:
    Program URL INPUT   
    format
    OUTPUT format Purpose
    Pajek, Large Network Analysis Package *.NET and *.GED
    (P-graph formats)
    Adobe SVG, KineMage Chime, VRML, Postscript (GSview), UCINET DL files analysis and resultant graphics: see analyzing P-graphs inside PAJEK
    Family Origins, or equivalent commercial software keyboard or import *.GED (GEDCOM) files Genealogical Electronic Data is the standard format
    ged2WWW (free), and ged2HTML (free): GED to html pages *.GED (GEDCOM) files *.HTML (biographies or family tree formats) converts to html files for web publication or private use
    GIM: Genealogical Information Manager (free) useful for very large scale data entry *.GED (Genealogical Electronic Data) Enters genealogical data with automatic name searches
    PAJEK, the LARGE network analysis and graphics package, has adopted many of the routines needed for P-graph network analysis of kinship and marriage networks. Pajek also implemented the reading of *.GED files directly into P-graph format (see below). My Ego2Cpl program converts *.txt files with individual id#s for ego, spouse, fa, mo, into *.net files for Pajek and to GEDCOM files for use with commercial programs such as Kith & Kin, which also lays out P-graph diagrams from GEDCOM files. The original PGRAPH package, in addition to Ego2Cpl, also provides the more specialized tools for structural and statistical analysis of these networks: see Using the PGRAPH package

    Software for Par-Calc Kinship and Marriage Network Analysis:
    Program URL INPUT format OUTPUT format Purpose
    Ego2Cpl by D.White see p-formats TXT:Ego#, Fa#, Mo#, Spouse#  *.NET and *.GED files for Pajek
    *.VED file for Par-Calc
    Simplest possible data entry as text
    Ged2Ego and fname, by M.Schnegg: GED to pgraph ego format *.GED (GEDCOM) files *.TXT (egocentric format) converts to Pgraph for Par-Calc analysis of blood marriages
    A NEW Par-Calc.exe for later DOS versions of WINDOWS *.VED statististical analysis of marriage structure and frequencies of different consanguineal marriages in the population available along with sortf.com

     The Four Parts of Structural Analysis

    Part one. assembling genealogical and marriage data

    The workhorse for genealogical data entry is either my ultra-simple Ego2Cpl data format and conversion program, or D. Blain Wasden and Brian C. Masden's shareware ($20) DOS program, GIM: Genealogical Information Manager. For very large data entry projects, the latter has the advantage of name-lookup when entering new names, a database querying capability, a folder system for families, each folder having data limits of two billion families. It has split or merge capabilities for datasets, is GEDCOM compatible, and has standard pedigree charts and family group record forms. Of crucial importance is the use of Soundex name-matching that retrieves, for every new name that is entered, all of the possible sound-alike or similar names already entered.

    Part two. rich ethnographic, relational and contextual data

    Ego2Cpl format allows you to keep a host of data on each individual in records of any length, simply as additional column formatted text. Data in the name field can be extracted by the Ego2Cpl program for export in labeled files for Pajek. GIM also supports extensive notes and documentation, and is compatible with a host of other programs that use the GEDCOM standards for individual-cum-genealogical data entry. One of the best examples is the Brother's Keeper program, which may be downloaded freely from the net, or purchased with the manual ($45) from John Steele. The program keeps track of more than sixty explicit variables, any combination of which can be used to produce datafiles or reports. Brother's Keeper thus solves the problem of how to communicate between "P-graph" data formats and GEDCOM formats: P-graph outputs are easily constructed from the variables "computer number," "sex," "spouse's number," "father's number," "mother's numbers," and the option to include all spouses, plus whatever variables you prefer. The ego2cpl.exe(use Shift-Left-Mouse-Button) - ego2cpl.for program can also convert from Brother's Keeper report files back to GED files or forward to P-graph formats.

    Part three. graphic analysis and display of patterns of social structure.

    NET (network) files made by Ego2Cpl, or GED files made by any of the host of programs which use GED as the common language of Genealogical Electronic Data (including GIM, BK, or EGO2CPL), are directly readable by the downloadable program Pajek, the Package for Large Network Analysis written by Vladimir Batagelj - Vlado, who is an Associate Professor of Discrete and Computational Mathematics at the University of Ljubjana - and his Computer Science colleague Andrej Mrvar. The program is particularly well suited - for very large networks -- to spring embedding displays, the use of partitions (color-coded) to identify or select subsets, and a host of other possibilities. It will also find blocks of structurally endogamous marriages (as bi-components), provided that P-graph conventions are used where vertices are couples or families, with links between families through sons or daughters.

    When reading GED files, Pajek now uses the P-graph format as the default option for kinship graphs, as opposed to the "genetic graph" developed by Ore, in which each child is connected to both parents, so that siblings create cycles. In P-graphs, siblings do not create cycles: cycles are only created by the relinking of marriages within a population. Labels for lines and line colors now appear WHEN Pajek graphics ARE EXPORTED TO PostScript (see notes). Instructions for further enhancements of pajek graphics are found in the draweps.htm documentation.

    For those who use the older P-graph programs and data formats, the download pg2pajek(use Shift-Left-Mouse-Button) -pg2pajek.for fortran utility converts from P-graph to pajek files keeping the P-graph conventions of couples or families as vertices: see Instructions for analyzing p-graph datasets inside pajek.  The pg2pajek program automatically assigns different colors to male and female lines, and inscribes the names of individuals for printing parallel to the lines themselves. 
    These days more people are using Pajek, and it is sometimes useful to return to the pgraph vector formats, using the pajek2pg executable (here is the pajek2pg source), to run par-calc programs for analysis of blood kin marriages or simulations.

    Part four. network analysis of social relations

    beyond Pajek's network analysis capabilities, the starting point to search for network analysis software including programs such as UCINET is INSNA, the International Network for Social Network Analysis.
    This page has been accessed times since December 1997.

    back to home