[ic] Donation: Geocode and proximity usertags (GIS functions)

Christopher Wenham cwenham at synesmedia.com
Wed Dec 17 21:47:15 EST 2003


 I recently had to develop some GIS-ish functions for a couple of 
projects. The following usertags are a result, which I've released 
under the GPL.

 Geocode is a wrapper for a third-party service at 
http://www.geocoding.com/, which will take any street address in the 
US and convert it to Earth coordinates (Latitude and Longitude, like 
on a GPS).

 Proximity makes use of a table full of geocoded street addresses to 
find stations within a search radius centered on a Zip code or another 
pair of coordinates. EG: "Find all stores within 10 miles of zip code 
11756". Since it searches by Earth coordinates, it will find stations 
that may not be in the same zip or state, but still be within the 
given radius. 

 With the two you can create a classic "store finder" quickly. 

 The third-party geocoding service isn't expensive, running at about 
$90 for 3,000 geocoding credits. To make zip-code searches work with 
the proximity tag you'll need a table of zip codes and their 
corresponding Earth coordinates. The US Census Bureau provides such a 
table for free on their web site (vintage 1990), but I've packaged a 
version ready for use with Interchange on my web site:
 http://www.synesmedia.com/twiki/bin/view/Cyc/Proximity


UserTag geocode Order address city state zip
UserTag geocode addAttr
UserTag geocode Documentation <<EOD

By Chris Wenham of Synesmedia, Inc. - www.synesmedia.com
This software is distributed under the terms of the GNU Public License.
Version 1.0, December 15, 2003.

Geocode a US street address, using a third-party online service at 
http://www.geocode.com/ (Tele Atlas). Returns Earth coordinates in 
latitude and longitude. Useful for building 'store finder' utilities, 
and for saving you the chore of hitting the streets with a GPS receiver.

Installation

1) Go to http://www.geocode.com/ and sign up for their Eagle Test Drive.
You will get a username and password and should also get a few free
geocoding credits for testing with.

2) Go to their Download area and download the "EZ-Locate Sample Source 
Code v1.xx". When you compile this on your Linux/Unix machine it should 
produce an executable called "rie".

3) Add the following variables to your Interchange catalog's variables.txt

GEOCODE_PATH	/path/to/rie	Geocoding
GEOCODE_LOGIN	your_login	Geocoding
GEOCODE_PASS	your_pass	Geocoding

4) Put geocode.tag in your Interchange's code/UserTag directory. Restart.

5) Pay the nice people at geocode.com for some more geocoding credits.

Synopsis

	<p>The Empire State Building is at lat./lng. [geocode
		address="350 5th Ave"
		city="New York"
		state="NY"
		zip="10118"
	] on your GPS unit</p>

    Will print:

	The Empire State Building is at lat./lng. 40.748318,-073.985223 on
	your GPS unit.

Will return a comma separated latitude and longitude. This can be parsed
and stored in separate lat/lng fields in your addresses table so you can
use it later with the [proximity] tag. It could also be passed untouched
into the origin="" parameter of the proximity tag for use when doing a
radius search, like this:

[proximity origin="[geocode address='...' city='...' state='..' zip='..']"]
	<p>{ADDRESS}, {CITY}, {STATE} {ZIP}</p>
[/proximity]

At the time of writing, Tele Atlas provides about 3,000 geocoding credits
for $90, and it's cheaper per coding if you buy more. However, since 
buildings don't generally pick up and move anywhere, it might make sense 
to cache the results or even store them permanently so you don't waste 
credits.

Side effects

If you don't feel like parsing the results, the lat and long are also set
in the following temporary scratch variables.

	gc_latitude
	gc_longitude
	The latitude in longitude, in 15-digit precision. 

	gc_popdens
	Tele Atlas gets a lot of its data from the US Census Bureau, and 
	they've seen fit to include some bits and pieces in the results for 
	each address. I have no use for most of this, but the population 
	density rating looked interesting, so I decided to set it as a 
	side-effect variable just in case it became useful.
	The possible values are:

			U = Urban
			R = Rural
			Blank = unspecified


Future improvements

This tag should probably also parse and store the Postal Standardized
version of the adddress, plus the match type.

EOD
UserTag geocode Routine <<EOR
sub {
	my ($address,$city,$state,$zip,$opt) = @_;
	#Debug("Geocoder inputs: $address, $city, $state, $zip, $opt");
	my $cmd = $Variable->{GEOCODE_PATH};
	my @args;
	push @args, '-l';
	push @args, '-u';
	push @args, $Variable->{GEOCODE_LOGIN};
	push @args, '-p';
	push @args, $Variable->{GEOCODE_PASS};
	push @args, "-g $address|$city|$state|$zip";

	#Debug("Args: $cmd ". join(' ', @args));

	my $out = '';
    open(GEO, '-|') || exec ($cmd, @args);

	while (<GEO>) {
		$out .= $_
	}

	my ($count,$header,$data) = split /\n/,$out;
  	my @results = split /\|/,$data;
	if (@results) {
  		my ($lat,$lng,$pop) = ($results[7],$results[8],$results[21]);

		$Tag->tmp('gc_popdens',$pop);
		$Tag->tmp('gc_latitude',$lat);
		$Tag->tmp('gc_longitude',$lng);
		#Debug("About to return");
		return "$lat,$lng";
	} else {
		return;
	}
}
EOR


UserTag proximity Order zip
UserTag proximity addAttr
UserTag proximity hasEndTag
UserTag proximity Documentation <<EOD

By Chris Wenham of Synesmedia, Inc. - www.synesmedia.com
This software is distributed under the terms of the GNU Public License.
Version 1.0, December 15, 2003.

Find locations within a search radius of an origin, specified either as
a zip code or a latitude/longitude pair. Format the results with a user
supplied template.

This tag and others can be found at
http://www.synesmedia.com/twiki/bin/view/Cyc/Proximity

Requirements

* Zip code lookup table (may have come with this tag, otherwise, check
  the US Census Bureau's web site to get a free one:
  http://www.census.gov/geo/www/gazetteer/places.html )

* A table with geocoded locations. (Suggesion: add numeric "lat" and 
  "lng" fields to any table of street addresses, then use a geocoding 
  service such as http://www.geocode.com/, or a handy GPS unit and a
  lot of walking, to find their Earth coordinates)

Store and pass coordinates expressed in degrees with double precision 
(15 digits), where zero on each plane is the Prime Meridian (longitude) 
and Equator (lattitude).

Synopsis

  [proximity zip="90210"]
	<p>{NAME} <br>{ADDRESS}, {CITY}, {STATE} {ZIP} is only {DISTANCE} miles from you.</p>
  [/proximity]

Arguments

  zip="90210"
	Any zip or postal code that will match up against an entry in your
	Zip lookup table. This tag will expect to find GPS coordinates in
	that table's "lat" and "lng" columns.
	If you provide a US zip code, this tag will first attempt a direct
	match, but if it can't find the zip in your lookup table, it will
	search up and down until it does find one. 
	If you provide a zip code, you don't need to use the origin argument
	below.

  origin="-73.2412348,140.33458734"
	GPS coordinates to use instead of looking up a Zip code. Specified
	as lat/lng. Accepts the same format as the output of the [geocode] 
	tag.

  max_radius="10"
	Maximum radius, in miles, to search within. Defaults to 10.

  min_radius="0"
	Minimum radius, in miles. Defaults to 0. 

  max_results="10"
	Maximum number of results to return. Defaults to 10.

  offset_results="0"
	Skip the first n results. This can be used with max_results to do
	basic pagination. It's up to you whether you want to do pagination
	based on min/max radius, or offsets.

	See the Side Effects section for scratch variables set by this tag
	with the last/least distant station's distance.

  query="SELECT * FROM addresses WHERE (lat < {LATHIGH} AND lat > {LATLOW}) AND (lng < {LNGHIGH} AND lng > {LNGLOW})"
	A SQL query that will return a list of geocoded addresses to search
	for proximity to the origin. If your database does not have any GIS
	features, such as MySQL, then this query should make use of 
	the {LATHIGH}, {LATLOW}, {LNGHIGH} and {LNGLOW} placeholders. These
	are substituted for the coordinates of a bounding rectangle. All
	these do is restrict the search space for the "proper" radius search
	that comes later. It'll still work if you pass 
	"SELECT * FROM addresses", but performance will suffer on large 
	tables.

  origin_query="SELECT * FROM zips WHERE zip = '{ZIP}'"
	The proximity tag assumes you have a table called "zips" with three
	columns: zip, lat, lng. If you pass a zip code to search from, then
	the tag will look for it here. If, however, you have a different 
	table name and structure, you can override the whole SQL query here.
	NOTE: If it doesn't find a match on the zip code that's initially
	passed, the tag will search up and down numerically until it does.
	Therefore, you need to keep the '{ZIP}' so the tag can substitue for
	whatever zip code it's searching for at the time. DO NOT try to
	pass the zip code this way. Use the zip="" parameter instead.

  lat_field="latitude"
  lng_field="longitude"
  origin_lat_field="zip_lat"
  origin_lng_field="zip_long"
	The proximity tag also assumes the column names for latitude and
	longitude are called "lat" and "lng" in your tables, respectively.
	If this isn't the case, set their actual names with these parameters.

  header="<p>The following were found within 10 miles of {CITY}, {STATE}</p>"
  footer="<p>Distances calculated from approximate center of {ZIP}</p>"
	An optional header and footer displayed only if results were found. 
	Any field returned by the origin_query can be addressed with 
	"{FIELDNAME}" template placemarkers.
	
  noresults="<p>Sorry, there were no results for {ZIP} in {CITY}, {STATE}</p>"
	Optional template to display a message when there are no results.


Optional configuration settings

The following variables can be set in your catalog's variable.txt

  PROX_QUERY
  PROX_ORIGIN_QUERY
  PROX_LAT_FIELD
  PROX_LNG_FIELD
  PROX_ORIGIN_LAT_FIELD
  PROX_ORIGIN_LNG_FIELD
	These all correspond to the parameters of the same name, described
	above. 

Side effects

This tag will set the following temporary scratch variables before exit.

  prox_furthest_dist
	The distiance, in miles, that the furthest "station" was found and
	included in the results. This may not be the furthest station in the
	database, just the furthest that still came within max_radius.

  prox_nearest_dist
	Like the above, but the distance to the nearest found station.

	NOTE: Both prox_furthest_dist and prox_nearest_dist are saved with
	20-digit precision. This is so you can pass them as arguments to
	min_/max_radius as accurate starting/ending points. You might want
	to format these numbers before you display them as user info. 

  prox_display_count
	The number of stations displayed in the results. Use this to find
	out if there were fewer than max_results stations found.


EOD
UserTag proximity Routine <<EOR
use Math::Trig;

sub distance {
    my ($lat11,$long1,$lat21,$long2) = @_;
    my $lat1 = deg2rad($lat11);
    my $lat2 = deg2rad($lat21);
    my $deltalat = deg2rad($lat1)-deg2rad($lat2);
    my $deltalong = deg2rad($long1)-deg2rad($long2);
    my $dist = sprintf "%05.15f", 1.1515*60*rad2deg(acos (sin($lat1) *sin($lat2) + cos($lat1) * cos($lat2) * cos($deltalong)));
}

sub {
	# Options gathering ----------------------------------------------
	my ($zipcode,$opt,$tpl) = @_;
    my $ref = ::database_exists_ref('zips');

	$opt->{origin_query} ||= $Variable->{PROX_ORIGIN_QUERY} || "SELECT * FROM zips WHERE zip = '{ZIP}'";
	$opt->{query} ||= $Variable->{PROX_QUERY} || "SELECT * FROM affiliate WHERE (lat < {LATHIGH} AND lat > {LATLOW}) AND (lng < {LNGHIGH} AND lng > {LNGLOW})";

    $opt->{max_radius} ||= 10;
	$opt->{max_results} ||= 10;
	$opt->{max_zip_tries} ||= 10;

	$opt->{lat_field} ||= $Variable->{PROX_LAT_FIELD} || 'lat';
	$opt->{lng_field} ||= $Variable->{PROX_LNG_FIELD} || 'lng';
	$opt->{origin_lat_field} ||= $Variable->{PROX_ORIGIN_LAT_FIELD} || 'lat';
	$opt->{origin_lng_field} ||= $Variable->{PROX_ORIGIN_LNG_FIELD} || 'lng';
	
	my $origin = {};
	if ($opt->{origin}) {
		($origin->{lat}, $origin->{lng}) = split ',',$opt->{origin};
	}
	# ------------------------------------------------------------- ##

	# Find nearest zip-code centroid ---------------------------------
	if (!$origin->{lat}) {
      $zipcode =~ s/^\s+//;
      $zipcode =~ s/\s+$//;
      $zipcode = substr($zipcode,0,5);

      my $higherzip = my $lowerzip = my $tryzip = $zipcode;
      $higherzip = sprintf '%05d',++$higherzip;

	  # Find nearest recognized zip code
      for (my $i = 0; $i < $opt->{max_zip_tries}; $i++) {
		my $query = $opt->{origin_query};
		$query =~ s/{ZIP}/$tryzip/;
		$origin = $ref->query({ sql => $query, hashref => 'origin' });
        last if $origin->[0]->{zip};
        if ($tryzip eq $lowerzip) {
            $tryzip = $higherzip;
            $higherzip = sprintf '%05d',++$higherzip;
        }
        else {
            $lowerzip = $tryzip = sprintf '%05d',--$lowerzip;
        }
      }
	  $origin = $origin->[0];
	}
	# ------------------------------------------------------------- ##

	# Get a rough bounding rectangle. This is only so we can work with a
	# smaller slice of the database, and deliberately errs on being too 
	# big. It is _NOT_ how we select the final results.

	my ($latlow,$lathigh,$lnglow,$lnghigh) = (-180,180,-180,180);
	$latlow = $origin->{lat} - ($opt->{max_radius} / 47.5);
	$lathigh = $origin->{lat} + ($opt->{max_radius} / 47.5);
	$lnglow = $origin->{lng} - ($opt->{max_radius} / 47.5);
	$lnghigh = $origin->{lng} + ($opt->{max_radius} / 47.5);

	my $query = $opt->{query};
	$query =~ s/{LATLOW}/$latlow/g;
	$query =~ s/{LATHIGH}/$lathigh/g;
	$query =~ s/{LNGLOW}/$lnglow/g;
	$query =~ s/{LNGHIGH}/$lnghigh/g;
	$query =~ s/{ORIGLAT}/$origin->{lat}/g;
	$query =~ s/{ORIGLNG}/$origin->{lng}/g;
	$query =~ s/{MAX_RADIUS}/$opt->{max_radius}/g;

    my $stations = $ref->query({ sql => $query, hashref => 'stations' });
	# ------------------------------------------------------------- ##

	# Fill a hash with all addresses, keyed by distance --------------
    my %closest = ();
    my $n;
    foreach my $station (@{$stations}) {
        $n++;
        my $dist = &distance($station->{$opt->{lat_field}},$station->{$opt->{lng_field}},$origin->{$opt->{origin_lat_field}},$origin->{$opt->{origin_lng_field}});
		$station->{distance} = sprintf "%5.1f",$dist;
        my $dkey = sprintf "%03d",$n;
        $closest{"$dist$dkey"} = $station;
    }
	# ------------------------------------------------------------- ##

	# Select results -------------------------------------------------
	$n = 0;
	my ($nearest,$furthest) = ($opt->{max_radius},0);
	my $out;
    foreach my $dist (sort {$a <=> $b} keys %closest) {
	 next if $dist < $opt->{min_radius};
     last if $dist > $opt->{max_radius};
	 last if $n > $opt->{max_results};
	 if (!$opt->{offset_results} or ($n > $opt->{offset_results})) { 
		 $out .= $Tag->uc_attr_list($closest{$dist}, $tpl);
		 $nearest = $dist if $dist < $nearest;
		 $furthest = $dist;
	 }
     $n++;
    }
	if (!$n) {
	    $Tag->tmp('prox_furthest_dist',0);
		$Tag->tmp('prox_nearest_dist',0);
		$Tag->tmp('prox_display_count',0);
		if ($opt->{noresults}) {
			return $Tag->uc_attr_list($origin, $opt->{noresults}) 
		} else { 
			return; 
		}
	}
	$out = $Tag->uc_attr_list($origin, $opt->{header}) . $out if $opt->{header};
    $out .= $Tag->uc_attr_list($origin, $opt->{footer}) if $opt->{footer};
	# ------------------------------------------------------------- ##

	# Set side-effects -----------------------------------------------
	$Tag->tmp('prox_furthest_dist',$furthest);
	$Tag->tmp('prox_nearest_dist',$nearest);
	$Tag->tmp('prox_display_count',$n);
	# ------------------------------------------------------------- ##

    return $out;
}
EOR


-- 
Chris Wenham - Synesmedia, Inc.
http://www.synesmedia.com
516-620-4110 / 1-888-255-7573
Fax: 516-908-7824


More information about the interchange-users mailing list