Pipeline (AHGPP) Database Location Reference System
Hrvoje Lukatela
Paper presented at the 50th Annual Meeting of the American Society of Photogrammetry,
Washington, D.C. 11  16 March 1984
Abstract
A multiuser geobased information
system was implemented as a
surveying, geotechnical, biological, cadastral and pipeline design
data repository for the Alaska Highway Gas Pipeline Project.
Location reference software was designed as a generalpurpose,
high precision, global coverage system, and
integrated with a commercial
pseudorelational database package.
Locations were described by a set of ellipsoid
related planar cells, and their internal coordinates, derived
by a pseudostereographic ellipsoid to plane projection.
Input and output location data can be given or generated
in any commonly used plane surveying coordinate system.
Retrieval is based on the KP range,
alignment sheet, proximity to other items or any area polygon.
Database resolution surpasses the accuracy of the reference line
field survey. Processing and storage utilization are highly
efficient, an important consideration on a large mainframe
computer complex.
Introduction
A significant amount of work
has been done in recent years in the area of
the design and implementation of
geobased computerised systems combining
high data volumes and large area coverage.
While all such systems share many similarities, an important
distinction can be made based on the spatial analysis usually
performed on the data. If only the
general distribution and correlation of different
entities is of interest, as often is the case in e.g. computerised
natural resource inventories, the positional
precision requirements are
relatively low. Such systems are therefore usually
based on a large coverage plane projection
(e.g. UTM), sometimes using a small rectangular grid cell
identifier as the only location descriptor.
All spatial analysis is carried out by completely
(but safely) ignoring the fact that plane coordinate relationships
are at best only approximations of the actual location geometry.
On the other hand, systems in e.g. cadastral,
control network inventory, large civil engineering and defence
applications require the ability to derive geometric
relationships from location descriptions on the database with
the same level of precision as that normally obtained by field
measurement techniques and instruments employed by those disciplines.
If the area coverage is
continental or even global, differences in plane and true spatial
geometry can no longer be ignored. At this point several options are
available:

Implement the database using large coverage
plane projection coordinates, and selectively
calculate and apply corrections
to the derived geometric data.

Implement the database using only the ellipsoid coordinates,
and derive directly all geometric data required. No corrections
will be necessary, but the geometry processing,
data volume and indexing scheme
might require more computer resources than available
or justifiable for a given application.

Implement the database with a combination of ellipsoid
and plane location
descriptors, so that different tasks are performed using different
mechanisms, with the goal of overall optimization in the usage
of computer resources in data storage,
searching and geometronical processing.
It is interesting to note that the first of these options
describes quite well
the approach used for location description and processing in
manual (precomputer) geobased systems. The tradeoff between the
coverage and precision in most such systems (UTM 1958 is a good
example) is resolved by adopting (by current standards)
extremely high discrepancy tolerance. In consequence, corrections
have to be calculated and applied much more often than
anticipated when the projection was developed
and the system implemented.
Regardless of which of the above options are taken, some
data partitioning and indexing will be required to make location
based retrievals fast and efficient. How this is best done depends
to a large degree on the type of user interface functions,
volume, distribution and composition of the data,
as well as on the performance characteristics of the
computing hardware and software devices available.
Throughout this paper, the term "location reference system"
is used to identify a set of data items and software components
which make possible
location based retrieval of information, and support
required geometronical processing.
Location Reference System Architecture
The major functional requiremets which affected the design
of this location reference system were:

The application system will be implemented
on a mainframe computer, using as much as possible
already available communication network and database software.
Effective integration in the database package
must be possible.

Location data retrieval will be
based on: pipeline section; alignment sheet;
proximity to reference line or other data items;
or any other conveniently defined area polygon.
Location elements should be stored with at least the same precision
as that of the combined field traversing measurements and adjustment.

Since an online database access is required,
data storage volume and
access cost should be minimized. Optimization should be biased
towards the retrieval rather than the initial load of data.

UTM coordinates will be given as location description for
field surveyed items, and must be generated for some output
reports and medium scale plotting. It should be possible to
expand this requirement with other commonly used plane
survey coordinate systems.

Reference line distance accumulation (Kilometerposts, KP)
can be defined
either along the centreline or the ROW boundary line,
slope or horizontal.
KP/offset system should be kept as a simple,
low precision way to reference data. In particular, KP values
produced by a series of pipeline traverses and marked in the field
should be recognized by the system.

Programming of the location reference system routines
must be kept as simple as possible in order to meet the
pipeline design database implementation schedule.
The system is
based on a hierarchical set of location descriptors:
cell identifiers, ellipsoid coordinates,
pseudo stereographic ellipsoid to plane projection,
common plane survey mapping system,
and KP/offset pipeline references.
Different sets of
location elements are by definition only different numerical
representations of one positional geometry,
and can be dynamically transformed from one format into another.
Redundant location elements are checked for consistency upon entry to
the system, and then discarded, to be reconstituted when required.
The system uses spatial, ellipsoid related set of the
data partitioning cells.
Each cell record contains a list of its neighbours.
Two substantially different cell arrangement schemes are possible:

System assigned (and dynamically reassigned) pattern of
cells. Two sets of criteria are used by the system in this
assignment process: optimum projection distortion
tolerance for balance
between plane and ellipsoid geometry calculations, and optimum
cell population size for data retrieval efficiency.

User assigned, orderly cell pattern.
Optimization decisions mentioned above will in this case have to be
made a priori and will remain frozen in a given implementation.
The cell identifier and coordinates are
the only location descriptors
that need to be retained on the system.
They are used for all local spatial manipulations,
including data searching.
Global problems are solved using ellipsoid coordinates.
Location based data searching is performed by first finding the
cell and solving the proximity or similar conditions for
the population of that cell. If further data is required, a list
of its neighbours must be traversed. A cell is rejected if no part
of it can contribute to the search. Otherwise, the coordinates of its
population are dynamically
transformed into the local coordinate system of the
cell in which the exercise was initiated, and search criteria
are evaluated.
The system is therefore capable
of performing both
quick and simple manipulations of data
using plane coordinates, as well as highprecision
ellipsoid geometry
calculations required to match or use directly field measured values.
Ellipsoid Coordinates and Geometry
A discussion of discrepancies between a digital model
using an ellipsoid of rotation as a reference surface
and the true location of
the objects on the surface of the Earth is out of the
scope of this paper.
We will only make the assumption
(valid for any large and complex civil engineering system)
that the derived spatial values, based on rigid ellipsoid
geometry solutions, satisfy the precision
reqiurements of all of the
database geometronical application disciplines.
Ellipsoid geometry has been a subject
of intensive study with the purpose of finding ways to simplify
manual calculations
by accepting assumptions and derivation procedures
yielding only the minimum of required accuracy.
It is safe to say that practical considerations
driving this research have
been transformed considerably with the
introduction of automatic computing devices.
The benefit derived by reducing the number of elementary
arithmetic operations has decreased substantially,
while the cost of increase in calculational accuracy is
no longer proportional (or extraproportional) to the number
of significant digits carried  it is increasing in discrete,
computing device dependent, steps.
Since the loss of accuracy almost always increases with the
"size" of the problem (e.g. length of intersecting lines),
different classical solutions are indicated depending on some
external, individual problem (data!) dependent
criteria. Such a decision requires insight difficult to
replicate in a computerised procedure. For this, "selfadjusting"
algorithms are necessary, capable of producing
maximum accuracy that can be delivered by
the internal numerical resolution,
yet simple and efficient when only
short lines are involved. Iterative algorithms, (as opposed to
expansions to required accuracy) are usually best capable of achieving
this quality.
The methodology employed for construction of these iterative
ellipsoid geometry algorithms is
illustrated by example  a
solution of an ellipsoid intersection problem: given are two points
(P1, P2), with known positions; and their respective tangential plane
azimuths to a third point (Pn, point of intersection).
Ellipsoid coordinates of the Pn are to be determined.
A few terms will be defined to simplify the description of the
procedure:
 Line of sight
 is a given point tangential plane
projection of the vector from a given to
the unknown point.
 Intersection plane
 is a plane containing line of sight
and ellipsoid normal in a
given point.
 Intersection line
 is a straight line common to both intersection planes.
 Paracentre
 is a point (on any straight line), closest to the coordinate origin.
 Pararadius
 is a distance from the paracentre.
The procedure will first check for valid intersection data.
Two conditions must be satisfied: first, intersection planes must not
be parallel or coincident, and second,
first of the two intersection points reached by moving along the
intersection ellipse in the direction of the azimuth starting at
each given point must be the same.
Checking those conditions is simplified by the fact that an
azimuth is internally kept as two direction cosines in
its respective tangential
plane, which is in turn stated by the ellipsoid coordinates
(normal!) of a given point. Both intersection plane
normal equation coefficients can be found by a simple rotation of
the line of sight vectors and their free terms, by using given
point coordinates.
The equations of two intersecting planes and of
the ellipsoid of rotation form a system whose roots are
the cartesian coordinates of the two points of intersection.
The solution is possible by any number of numerical methods, and
transforming the result to the ellipsoid normal form presents no
problem. Even simpler, more direct solution is possible, by
finding an approximate normal through Pn,
and iterating its components using the meridian ellipse productions
until ellipsoid geometry is satisfied, observing in the process
intersection conditions as well as
the elevation of Pn.
The iteration steps are as follows:

Obtain the intersection line
in a normalized parametric equation form
and find the intersection line paracentre coordinates.

Assume the normal to be collinear with the intersection line, and
find the normal pararadius using meridian ellipse geometry
productions.

Set the initial intersection line parameter equal to the
pararadius obtained in the previous step.
(Intersection
point elevation can be taken into account in this and all subsequent
steps involving pararadius.)

Start next iteration step by
finding a new position of the intersection point using
the intersection line paracentre and parametric equations.

Recalculate normal components
using the provisional coordinates of the intersection point. Check
for change since the previous step and exit if none.

Use normal direction cosines to find new normal pararadius.
Compare it with previous step and update the
intersection line pararadius by a corresponding amount.
Repeat from step (4).
Note that the meridian ellipse productions employed are
closed, and the precision depends only
on the criterion used
in step (5). The iteration is so fast, even on lines only one
order of magnitude below the ellipsoid semiaxes, that
a "fuzziness threshold" of the floating point representation can be
used as this criterion.
The most common method of describing the location on
(or slightly above or below) the ellipsoid surface
is a numerical description of a direction of a normal through
the location point.
This can be done by specifying angular values
relative to the equator and reference meridian plane (latitude and
longitude), or using a vector component (direction cosine)
form.
The main advantage of the traditional form is its direct
correspondence to navigational and similar observations,
but it is extremly illsuited for computerised calculations.
With an angular value as a source item, most geometry
calculations will require expensive,
multiple evaluation of trigonometric functions.
If direction cosines are used, this will not be the case, and
the formulae of ellipsoid geometry will become symmetrical and much
easier for programming.
In addition, any numerical collapse will
be directly related to the definition of the problem,
and never to the method used.
With 64 bit floating point arithmetic and data variables,
a comfortable margin of several orders of magnitude
can be maintained over the precision of any field measurements.
While one more element must be kept
for direction cosines (three plus elevation, versus 2+1 for
latitude and longitude)
this provides a simple check for a valid normal element data:
if the vector is normalized after each manipulation, the sum of
the squares of the components will equal one. The most important
consideration remains the type of calculational algorithms
utilized: conventional ellipsoid geometry
derivations typically employ angular values and trigonometric
functions,
while direction cosines are used
by closed iterative procedures and ellipsoid to plane
geometric projection routines.
Pseudostereographic Ellipsoid to Plane Projection
As mentioned before, plane coordinates used by the system may
require frequent and high volume
transformations to the ellipsoid and back.
Problems associated with the transfer
of positions from ellipsoid to the plane and vice versa
can vary from trivial to fairly complex. This complexity depends
on a number of factors: the definition of the mapping function;
the format in which the location is given for
both ellipsoid and plane systems; the precision to be maintained
in a single transfer; and the acceptable numerical deterioration
in a case of multiple transfers.
The mapping selected for this global (ellipsoid) to cell
(plane) coordinate transformation
can best be described as an ellipsoid distortion of
a spherical stereographic perspective projection.
It provides
extremely fast and and precise (including multiple transfer)
mapping algorithms.
Given the population related restrictions on the cell size,
projection distortions are likely to be negligible for all
local geometry problems.
Ellipsoid coordinates of the cell centre point are at the
same time coefficients of a normal form of the projection plane
equation.
In order to avoid the necessity of the "sea level" reduction
of the field measured distances, the
average elevation of the terrain covered by
the cell can be incorporated into the free
term of the projection plane equation.
The projection centre is located on the centrepoint normal, at
twice the depth below the surface of the ellipsoid of
the geometric mean of the radii of curvature
along the meridian and prime vertical.
Ellipsoid coordinates of the centre point, in
addition to the elevation (if used as described above),
are the only values required as the
cell dependent projection parameters.
Depending on the relative efficiency
of data retrieval versus recalculation, and whether most
transformations are performed individually or in a series,
coordinates of the
projection centre, pararadius of the projection centre normal
and the free term of the centrepoint prime vertical equation
can be calculated when a cell is generated, and reused in
all ellipsoid to plane and reverse transformations involving the cell.
A scaling effect takes place in ellipsoid to plane transformations:
ceteris paribus,
fewer digits are required to describe the location than
if the ellipsoid coordinates are used.
Specifically, if cell diameters are in the order of up to ten
kilometers, a 32 bit floating point representation
will be in the order of millimeters. While somewhat below
the precision of the 64 bit ellipsoid coordinates, discrepancies
will not propagate across the cell boundaries.
This is an important consideration for
implementations on systems with a significant difference
in efficiency between
"short" and "long" floating point arithmetic.
Universal Transverse Mercator Projection (UTM 1958)
UTM coordinates are, because of their widespread use and regulatory
support, frequently used for interchange of location data.
Unfortunately, several properties make them particularly unsuitable
for computer system usage
(most of the following comments would apply to all
large coverage plane survey coordinate systems):

Because of the high maximum distortion, only the most
crude geometry processing can be performed using simple plane
coordinate manipulations.

Large systems will likely
extend over more than one UTM zone. If that happens,
the advantage of a single, continuous set of coordinates is lost.
Transformation function domain is not global.

"Long" floating point is normally required to store and
manipulate coordinates. This is a special disadvantage where
multiple coordinates are required to describe an entity.

The transfer of coordinates between the UTM plane and
ellipsoid is a rather involved and inefficient calculation
due to the numerical nature of mapping
algorithms (Taylor's expansion of CauchyRiemann
differential equations).
Most of the advantages of a conformal projection are
lost when field measurements consist of a balanced set of
directions and distances.
To provide the ability for external data interchange,
the system must include a set of
transformation routines, guarantying submillimiter
precision even on the edges of a 6 degree zone in midlatitudes.
UTM coordinates presented to the system are
immediately transformed into ellipsoid format. If an
output request is made for numerical UTM coordinates,
reverse transformation takes place.
Direct orthogonal transformation of cell coordinates
into UTM grid of the output page is used for plotting, as it is
well within the resolution of even large scale plots.
Linear (Kilometerpost) Reference
KP/offset is an often used method of describing the location
on any "linear" engineering system.
Cumulative distance and offset values are relative
to a convenient reference line, usually centre line or ROW boundary
line. Distances are sometimes accumulated and marked
in the field ("original" KP) as the line is surveyed.
There are ;/oe problems inherent with the KP/offset usage:
the most significant resulting
from the fact that whatever is selected as the reference line,
it can, and probably will, change its length in the
design and the construction phases of the project.
Updating these values presents no problem, but references to
"original" field values can become invalid or ambiguous.
All cells that cover the reference
line contain an ordered, doubly linked
list of points representing a
section of the reference line within the cell.
Each item in the list contains point identification, cell coordinates,
and original, field KP label. Each cell also contains true
KP values at its boundary. Mapping from cell coordinates
to KP/offset system is defined as an orthogonal projection
to the closest reference line segment.
List of local coordinates of the reference line segments can
therefore be considered as a set of "mapping parameters" for
cell to KP/offset and reverse
transformations in the same manner as the coordinates
of the centrepoint are for cell to global
transformation. Inquiry
references using "original" values only are resolved
by a bidirectionl search along the line, from a point with the same
"true" KP, until a segment with appropriate "original"
KP range is found.
AHGPP Implementation
The Alaska Highway Gas Pipeline Project is a natural gas
transportation system designed for the primary purpose of
transporting Alaskan natural gas from Prudhoe Bay on the North Slope
of Alaska, south across western Canada, to markets in California
and the midwestern United States.The Canadian portion of the project
is comprised of 3285 km of pipeline.
Due to the complexity of ambiance, primarily in the Yukon region of
the AHGPP, large quantities of engineering, geotechnical, regulatory,
biological, cadastral, pipeline design and other
sitespecific information had to be assembled and continuously
referred to in the process of designing the pipeline.
A computerised database system was implemented to provide a repository
for the most volatile and widely required subset of that information.
Implementation Environment  Hardware
The system was implemented
on a large corporate multiple mainframe
computer complex, IBM 360/370 architecture, with buffered character
CRT display terminals, remote (batch) printers, and an electrostatic
plotter at the central location.
Implementation of the system on a
dedicated CAD computer was considered, but two problems
made mainframe implementation more attractive:
capital cost associated with the acquisition of dedicated hardware,
and inability of existing CAD systems to cost effectively
serve large number of simultaneous users, including those at
remote locations. Online mainframe access
from remote locations was provided via CTR TTY terminals,
1200 baud dialup modems and an async/SNA hardware protocol converter.
The same lines could be used with portable hardcopy terminals to
retrieve batch generated reports.
Plot output was available only at the central site.
Further development, if and when initiated, will
be based on a remote microcomputer. All graphical
output would be generated on the workstation, and data maintenance
for items other than global geometry
would be done there. Mainframe functions would be
reduced to the database retention and block
update control processing,
probably on a cell/discipline/user oriented authorization
scheme.
Implementation Environment  Software
The system is implemented using ADABAS database
management system. It is a multiple inverted file system, intended
primarily for implementation of administrative and commercial systems.
Like most similar products, it suffers from
one serious impediment when used for comprehensive
financial or engineering models: inability to deal with the
floating point numeric data items.
The problem was solved either by storing the value in fixed decimal
format or by
storing a floating point value as a character (byte)
variable, and leave it's manipulation to an external programme.
There was one more reason for storing cell coordinates in the
decimal format. One of the ADABAS support software
components is a dedicated online interpreter and compiler.
While the language was particularly illsuited for
any kind of list processing or modelling programming
(poor data structure handling capabilities,
no workspace management functions, etc.) it
provided very effective data retrieval and index
maintenance functions, and was therefore used
for all online database interaction programmes.
All geometry required for data retrieval and proximity
solutions could therefore be handled without
incurring the overhead of runtime external module
invocation.
Since the system must operate in a
multiple project environment,
a project identifier is appended to each record on file
and all location processing is normally carried out only within
the project.
Alignment sheets are an already established way of
location partitioning of the
pipeline data, and their size (2x7 Km) provides an excellent
compromise between total number of cells on the
system and individual cell population size.
Data is further identified by
cell coordinates (multiple in case of line or area items),
and KP point or range. Some of these items
are incorporated in the database index structures. Programmes are
available to process location descriptors upon entry of an item
to the database, manipulate data in the process of pipeline design
where locations are changed, or
retrieve data through online interactions
or printed or plotted hardcopy output.
Each record representing a location specific entity on the
database contains the following data items:

Project identifier

Entity class (record type) identifier

Cell identifier

True cumulative distance along the reference line (True KP)

Cell coordinates (multiple for a linear or an area item)
In addition, some records include parameter values which
are used by programmes whenever location data is manipulated:
Project record:
 Type and parameters of external mapping projection (including
ellipsoid values) used on the project (e.g. UTM, Lambert, etc)
 Reference line entity and method of KP distance accumulation
Cell (Alignment sheet) record:
 Approximate average elevation
 True KP of upstream and downstream sheet matchline
 Coordinates of logical (non overlapping) cell boundaries
 Ellipsoid coordinates of the cell coordinate system origin
 Ellipsoid to plane transformation parameters
 List of pointers to neighbouring cells
Reference line point record:
 Elevation
 Internal pointers to upstream and downstream point record
 Field marked cumulative distance (Original KP)
Three inverted lists (indexes) comprising location description data
are maintained on the system. Arguments of these lists are concatenated
location description items, as follows:
Cell index:
 Project identifier
 Entity class identifier
 Cell identifier
Start KP index:
 Project identifier
 Entity class identifier
 True start KP
End KP index:
 Project identifier
 Entity class identifier
 True end KP
The last index structure contains only
linear and area items which occur as
contiguous segments along the line. It makes possible bidirectional
traversing of item records. Unlike the neighbouring cell pointer
list in the cell record, this index structure is maintained
automatically by the inverted list update mechanism.
(The conceptual design of
both the pipeline database and the location reference components
remained
independent of the particular package, and any multiple index
data access method or file management system could have been
used with similar effectiveness.)
The main programming component of the location reference system
is a library of PL1 subroutines, performing elementary tasks, usually
on single problem data (e.g. conversion of coordinates between
ellipsoid and UTM, ellipsoid and cell, and cell and KP/offset
systems; closed iterative solutions of common ellipsoid geometry
propositions; planar geometry problems and transformations and
data conversions).
Plotting is implemented by batch executing database traverse
programmes, which produce a high level, device independent
flat plot files. These are subsequently reprocessed
by a graphical postprocessor generating electrostatic plotter output.
At this point, one of the two types of
files containing cell transformation parameters
is used. The first contains a copy of cell centre points, and
provides the ability to transform cell coordinates on plot file
to a UTM defined output page. Second contains cell sets of reference
line points and their page coordinates digitized from
alignment sheets based (where these are used) on nonortho mosaics.
In this case, a separate orthogonal transformation
will be calculated for each reference line segment, and used
for all items in its KP range.
The first step in
establishing new project data is entry of skeleton cell records and
location of reference line points.
Cumulative distances along the reference line and cell projection
parameters are next calculated, and stored on the database. Any other
project data can now be entered, described by location either
using external mapping system, KP/offset references or definition of
proximity geometry relative to items already on the system.
As each item is entered, inverted lists are maintained automatically
by the database mechanism, making the item instantly part of inquiry
traverses of project data.
In case of a change of a reference line (pipeline rerouting)
one or more cells are locked out from the active system, and updated
positions of reference line stations are entered. Total population
of the cell is then traversed, it's KP references recalculated and
updated. The rest of the project is then adjusted by adding or
subtracting the required amount from all KP refernces downstream from
the revision area.
Conclusion
Major software components must be developed and integrated into
generally available database packages if they are to be used
with success for geobased applications. Once this is achieved,
significant benefits can be derived from implementation
on a mainframe complex: simultaneous online service
of an extensive user base; large data volume capability;
economical remote access and capacity expansion and
utilization of existing corporate capital investment and
software engineering and construction skill base.
References:
Bomford, G. 1975, ed. Geodesy, London, Oxford University Press
Broughton, C. 1980,
COSINE, a LandSurveying Database Application:
Paper presented to the 1980 S2K User Group meeting, Toronto
Maggio, R.C., Baker, R.D., Harris, M.K. 1983,
A Geographic Data Base for Texas Pecan:
Photogrammetric Engineering and Remote Sensing, Vol 49, pp.4752
Radlinski, W.A. 1977, Modern Land Data Systems  A National
Objective:
Photogrammetric Engineering and Remote Sensing, Vol 43, pp.887890
Tobey, W.M. 1928, Geodesy, Geodetic Survey of Canada,
Publication No. 11
Wehde, M. 1982, Grid Cell Size in Relation to Errors in Maps
and Inventories Produced by Computerized Map Processing:
Photogrammetric Engineering and Remote Sensing, Vol 48, pp.12891298
, 1982 rev., The Alaska Highway Gas Pipeline Project
 Project Overview, Foothills Pipe Lines (Yukon) Ltd., Calgary
...
