By Jackson Cothren and Bruce Gorham, University of Arkansas Center
for Advanced Spatial Technologies (www.cast.uark.edu),
Fayetteville, Ark.
Traditional image-content analysis methods in machine vision and
photogrammetry used gray-scale and shape characteristics to
extract roads, buildings, etc., while remote sensing focused on
spectral signatures to classify pixels in smaller scale images. Now
feature-extraction techniques—implemented in software such as Definiens Imaging’s eCognition package (www.definiens-imaging.com)—are
able to effectively combine these separate approaches to create a
more robust, powerful feature-extraction capability.
In particular, eCognition merges pixels into homogenous regions,
or objects, within an image based on “color” as well as size and
shape. The resulting segments have dozens of “signatures” that can
be used to extract buildings, roads, agriculture fields, tree
stands and other features. To learn more about eCognition’s
object-oriented analysis capabilities, researchers at the
University of Arkansas Center for Advanced Spatial Technologies
(CAST) recently used the software to extract impervious surfaces
from DigitalGlobe QuickBird pan-sharpened satellite imagery of
Fayetteville, Ark.
Beyond Pixels
The computer vision research community has known for years that
meaningful information can’t be represented by individual image
pixels. In fact, as early as the mid-1970s computer vision
researchers were developing methods to intelligently group pixels
into objects that had meaning in the real world (Rosenfeld and Kak,
1982). Image segmentation, as this grouping approach came to be
known, is the first step in a process that automatically extracts
and identifies features in an imaged scene. The process has been
applied in many machine vision applications, ranging from
inspecting and measuring parts on a conveyor belt to recognizing
text and, more recently, human faces. The digital photogrammetry
research community has been actively involved in applying image
segmentation as a first step in automatic feature matching across
high-resolution stereo images (Schenk, 2001). With products like
eCognition, such technology also can be applied to remote sensing
classification problems.
Remote sensing has concentrated on classifying
individual pixels based on their reflectivity in several spectral bands
represented by a like number of pixel values. Statistical and heuristic
methods are used and at least partially implemented in virtually all
image processing software. However, with high-resolution imagery from
today’s digital airborne and satellite-based systems, these traditional
techniques are limited because of the relatively small spectral range of
the sensors.
This is where eCognition’s segmentation approach excels. Instead of
working with pixels as the most basic element of the image, eCognition
first segments the image into spectrally homogenous objects. The user
can somewhat control the size and shape in the segmentation process
through various weighting parameters, making the resulting objects more
compact and smooth at the expense of spectral homogeneity. The resulting
objects become the most basic element of the image, and each has its own
“signature.” A partial list would include the mean value and standard
deviation of its constituent pixels, size, perimeter, primary
orientation, compactness and texture—or the degree to which a pattern is
present in each band. All of these measures are available to determine
what feature the image object represents.
For example, Figure 1 shows a segmented
multispectral QuickBird image of a residential area. Notice the single
object highlighted in red. Indeed, it corresponds to what an interpreter
would consider a feature (a cul-de-sac), and some components of its
signature are shown on the right. Based on these values, it would be
fairly easy to automatically identify most of the other “cul-de-sac
objects” in the scene.
In fact, these objects and their signatures may be used in a
more-or-less conventional supervised or unsupervised classification. The
extended signature enhances the power of both. But there’s more to it
than an extended signature. Traditional classifications take into
account only one feature (whether it’s a pixel or an image object) at a
time. There’s also information contained in an object’s relationship to
nearby objects. For example, one might reason that a spectrally dark
object to the northwest of a bright, rectangular object identified as a
building might be classified as shadow. The eCognition software provides
an interface for defining these kinds of rules to aid in object
classification—and, equivalently, feature extraction.
Impervious Surface Extraction
CAST researchers have had several opportunities to apply this
object-oriented approach to feature extraction. One opportunity came in
the spring and summer of 2003, when CAST staff worked with the city of
Fayetteville to generate an ortho-image update of the city’s
fast-growing utility service area. Six QuickBird Basic scenes were ortho-rectified
and pan-sharpened to produce 1:4,800-scale, 2-foot ground sample
distance (GSD) images that could be displayed as either 11-bit
true-color or color-infrared composites. In addition, the city’s
engineers were interested in extracting impervious surfaces from the
ortho-images. Because of limited funding and the large area involved,
the researchers ruled out manual extraction and chose to investigate
eCognition’s capabilities.
The researchers determined it’s possible to distinguish impervious and
permeable surfaces in high-resolution imagery using pixel-based
classification methods with about an 80 percent success rate. In fact,
the supervised classification of the QuickBird ortho-images using all
four bands yielded a 78 percent accuracy rate based on a large number of
independent ground truth points. Most of the failures were caused by
confusion between asphalt and shadow and between bare earth and
concrete.
To address these concerns, the researchers segmented the image—a portion
of the result is shown in Figures 1 and 2—and from the resulting objects
developed a training set classified as impervious or permeable. A
supervised classification of the objects based only on this training set
probably would result in only slightly better accuracy. So, in addition
to the class membership rules generated by the training set, the
researchers identified two additional rules to help address the
difficulty of separating shadow from asphalt and classify the confused
objects.
Obviously, most shadowed portions of the
scene can’t be reliably identified as either impervious or
impermeable—although it might be possible to suggest a class based
on a surrounding object—and should therefore be classified as
such. To help recognize shadows, the researchers developed one
rule, or “membership function,” in eCognition that stated an
object is more likely to be asphalt paving or shingles if it’s
near other impervious objects and reflects that asphalt is more
likely to be present in built-up areas.
Another rule considers the size of the object relative to its
permeable neighbors—the larger its relative size, the less likely
it is to be asphalt. This reflects the likelihood that areas in
shadow caused by tree stands are much smaller than the area of the
stand. In Figure 3, notice that the “membership function” is a
curve. This illustrates another eCognition tool: fuzzy
classification. An object may be assigned some membership
probability in several classes using different models.
As shown in Figure 4, this fuzzy classification also was used to
help distinguish permeable bare soil from impervious surface based
on the amount of an object’s infrared reflectance. As the pixel
value in the near-infrared band increased, so did the probability
of an object being bare solid.
These two rules increased the classification
accuracy from 79 percent to nearly 90 percent, primarily due to the
ability to more accurately classify shadow objects. Bare soil confusion
was still present, but the fuzzy classification at least quantified the
confusion and allowed the researchers to vary the threshold to identify
problem areas.
These aren’t the only possible rules, nor are they the best set of
rules. However, they do illustrate the power eCognition users have to
identify relationships between classes and use them to their advantage.
Future Directions
CAST researchers are still working with eCognition to better understand
its capabilities and applications. One obvious direction is to
incorporate ever more available Light Detection and Ranging (LiDAR)
elevation data into the segmentation process. Researchers have shown how
elevation data may be integrated into the segmentation process to more
effectively extract buildings in aerial or high-resolution satellite
images.
Another related application is to incorporate terrestrial LIDAR
point-cloud information, along with color digital images, into the
segmentation process. Figure 5 shows a portion of a digital image of a
building that has been co-registered with a 3-D point cloud collected
with an Optech ILRIS-3D laser scanner and segmented with eCognition.
Notice how individual bricks and concrete features are easily extracted
and identified. This kind of data analysis offers capabilities that
neither photogrammetry nor point-cloud analysis alone could provide.
References
Rosenfeld, A. and Kak, A. C. 1982. Digital Picture Processing, Volume 2.
Academic Press Inc.
Schenk, T. 2001. Digital Photogrammetry, Volume 1. TerraScience.