map_blind (10) [Avatar] Offline
#1
ERROR: invalid regular expression: parentheses () not balanced
CONTEXT: PL/pgSQL function "location_extract" line 65 at assignment
PL/pgSQL function "normalize_address" line 142 at assignment
PL/pgSQL function "geocode" line 10 at assignment


Been having trouble interpreting this. Any help appreciated.

Thanks,
Chris
regina.leo (265) [Avatar] Offline
#2
Re: Tiiger Geocoding error
Chris,

What is the script you are running? Sorry can only tell from your below its somethng in chapter 10 tiger geocoder, but not sure exactly whether its the install or a sample code snippet from the chapter.

Thanks,
Leo and Regina
map_blind (10) [Avatar] Offline
#3
Re: Tiiger Geocoding error
It was running the batch geocode example from Chapter 10 against some 38000
addresses:

UPDATE hudson_deed_geocoded_3
-- 3 multicolumn update
SET (rating, norm_address, pt)
= (g.rating,
COALESCE ((g.addy).address::text, '')
|| COALESCE(' ' || (g.addy).predirabbrev, '')
|| COALESCE(' ' || (g.addy).streetname,'') || ' ' || (g.addy).streettypeabbrev
|| COALESCE(' ' || (g.addy).location || ', ', '')
|| COALESCE(' ' || (g.addy).stateabbrev, '') || COALESCE(' ' || (g.addy).zip, '')

,
ST_SnapToGrid(g.geomout, 0.000001)
)
-- 4 just one rec per addid
FROM (SELECT DISTINCT ON (addid) addid, (geocode(address)).*
FROM hudson_deed_geocoded_3 As ag
WHERE ag.rating IS NULL
ORDER BY addid, rating) As g
WHERE g.addid = hudson_deed_geocoded_3.addid;

This code worked before when passed Street #, Street, City, State, Zip. This led me to think it might be the data I was sending in so removed a couple of hundred
rows without Street# & Street (Location). None of the addresses had zip codes but I concatenated in a space so it wouldn't be taken as null..

Perhaps this is a more generalized Postgresql question but which/where line should I be looking to balance parens when it says "location_extract" line 65 at assignment? I took these to be line #'s from functrions in the geocode functions list, though the last "geocode" line 10 at assignment is also the 10th line in the snippet.

As I said, this snippet worked prior, so I was wondering how to interpret the error message.

Thanks again,
Chris
regina.leo (265) [Avatar] Offline
#4
Re: Tiiger Geocoding error
Chirsi,
We'll check this out to se if we see anything not quite right about the function. You are right it does sound like the location_extract function at fault.

I'm guessing the problem is one of your addresses has a reserved regular expression term like a ( or { or [ or etc] perhaps in a very unusual location and the function isn't escaping it properly so its being treated as part of a regular expression phrase and thus making an invalid non-balanced regular expression.

Which version of PostgreSQL are you running by the way?
regina.leo (265) [Avatar] Offline
#5
Re: Tiiger Geocoding error
oops sorry for misspelling your name. Keyboard stuck.
map_blind (10) [Avatar] Offline
#6
Re: Tiiger Geocoding error - embedded parentheses found
Regina.Leo,

Thanks for the nudge in the right direction.

Using this simple Select:

Select addid, deeds_id from hudson_deed_geocoded_3
where address like '%(%';

returned 4 instances:

addid deeds_id -- from data sent in as address
6955 19312 -- 1112 88TH ST (REAR),North Bergen, NJ
19234 73079 --COLES & 14TH ST.(VAC.),Jersey City, NJ
19308 73774 --123 BRUNSWICK ST. (INSD),Jersey City, NJ
35778 7576 -- 95 BURMA ROAD (REAR),Jersey City, NJ

None were found for '%{%' or '%[%'. '%\%' returned every row (which I found
disheartening), but escaped '%\%' returned none.

These post-location instances (Rear, Vac, Insd) may not have any meaning
in the context of addy but I'm hoping that removing the parens will allow the batch
to run.

I'm going to try to figure out an offset strategy as these queries take quite a while- ie just give me the first 5000 rows at this time . Actually, this is another question:
I have tried to Limit, pretty much by stuffing Limit # anywhere I can imagine in the batch geocode and get a syntax error. For the moment I'll just create some smaller tables.

The magician's apprentice is now properly cautioned about inspecting the data
prior to popping it into the black box.

Thanks again.

Chris
regina.leo (265) [Avatar] Offline
#7
Re: Tiiger Geocoding error - embedded parentheses found
Actually we were going to rewrite that query in our next update. Turns out doing it the way we did it with the inner ().* is very slow (pre 9.0) as I think it actually calls the function once for each field (so you might be able to get 5 times better speed by getting rid of the .*). Thanks for reminding us about that. Didn't realize we still had that in there.

Below is a description of the issue:

http://www.postgresonline.com/journal/archives/160-Output-parameters,-custom-data-type-gotchas.html
map_blind (10) [Avatar] Offline
#8
Re: Tiiger Geocoding error - embedded parentheses found
So, I took the bait and upgraded to 9.0.1 and postgis 2.0.0. Purely binaries as GEOS only builds reliably with gcc 4.5.0 and GDAL, ah well, GDAL, no one seems quite sure how that might build ....

And now have a consistent error in tiger geocoder batch script (still the same old slow one but thought it might scoot along faster in 9.0) i.e.


-- 4 just one rec per addid
FROM (SELECT DISTINCT ON (addid) addid, (geocode(address)).*
FROM hudson_deeds_geocoded_2 As ag
WHERE ag.rating IS NULL
ORDER BY addid, rating) As g
WHERE g.addid = hudson_deed2_geocoded_2.addid;

Results in:

ERROR: column ag.rating does not exist
LINE 20: WHERE ag.rating IS NULL
^

********** Error **********

ERROR: column ag.rating does not exist
SQL state: 42703
Character: 761

I sort of took ag as a temp table constructed within this batch code
and don't know where to look for it or test whether it is being constructed-- when is the next update by the way?

Chris
regina.leo (265) [Avatar] Offline
#9
Re: Tiiger Geocoding error - embedded parentheses found
Well as far as book goes -- not for at least another 2 weeks. But we'll try to get a source code update sooner than that.

ag is just an alias for your hudson_deeds_geocoded_2. Does that table have a rating coulmn? Sounds like its complaining because you have no rating column in your hudson_deeds_geocoded_2.

Hmm haven't tried tiger on the 9.0/PostGIS 2.0.0 yet. Guess we should. I can't think of why it wouldn't work.

For PostGIS 2.0 you don't need GDAL just yet unless you build --with-raster support and then its only used in the rtpostgis.dll anyway so shouldn't effect postgis-2...dll functionality.
regina.leo (265) [Avatar] Offline
#10
Re: Tiiger Geocoding error - embedded parentheses found
Chris,

Tested tiger geocoder in 9.0/PostGIS 2.0 and it gave some errors in loading the scripts. I suspect this may be just because PostgreSQL 9.0 plpgsql is stricter and also actually tries to test the innards of the function, so may have been an issue all along. It didn't like the variable verbose since that is a keyword.

We have a patched version - at http://www.postgis.us/downloads/tiger_geocoder_2009.zip

Short of it in utility
ullable_levenshtein.sql and geocode ate_attributes.sql

Not sure if those were cause of issues you were experiencing using 9.0.

Now regarding the other issue you have with the sql below -- I'll post a revised snippet of the code you were copying from that should be much faster (well I think should be at least twice as fast)
regina.leo (265) [Avatar] Offline
#11
Re: Tiiger Geocoding error
We've updated the code10.sql file -- if you just want that file you can download from

http://www.postgis.us/downloads/ch10_code_data.zip (this include the geocoder too that is packaged as a separate file.).

Short answer -- so you don't need to redownload -- we changed the below snipped to:
-- note the addr_to_geocode I think is equivalent to your hudson_deed_geocoded_3 -- again I think you may have just forgotten to add a rating field to that table.

-- Listing 10.3 Create dummy records to batch geocode
-- 1 create test data
CREATE TABLE addr_to_geocode(addid serial NOT NULL PRIMARY KEY,
rating integer,
address text,
norm_address text, pt geometry);
INSERT INTO addr_to_geocode(address)
VALUES ('1000 Huntington Street, DC'),
('4758 Reno Road, DC 20017'),
('1021 New Hampshare Avenue, Washington, DC 20010');
-- 2 batch geocoding
UPDATE addr_to_geocode
-- 3 multicolumn update
SET (rating, norm_address, pt)
= (g.rating,
COALESCE ((g.addy).address::text, '')
|| COALESCE(' ' || (g.addy).predirabbrev, '')
|| COALESCE(' ' || (g.addy).streetname,'')
|| ' ' || COALESCE(' ' || (g.addy).streettypeabbrev, '')
|| COALESCE(' ' || (g.addy).location || ', ', '')
|| COALESCE(' ' || (g.addy).stateabbrev, '')
|| COALESCE(' ' || (g.addy).zip, '')
,
ST_SnapToGrid(g.geomout, 0.000001)
)
-- 4 just one rec per addid
FROM (SELECT DISTINCT ON (addid) addid, (g1.geo).*
FROM (SELECT addid, (geocode(address)) As geo
FROM addr_to_geocode As ag
WHERE ag.rating IS NULL ) As g1
-- 5 pick lowest rating
ORDER BY addid, rating) As g
WHERE g.addid = addr_to_geocode.addid;

Let us now if you still run into trobules.

Have' had a chance to track down the regular expression issue you brought up
regina.leo (265) [Avatar] Offline
#12
Re: Tiiger Geocoding error - embedded parentheses found
Chris,

Sorry missed your question about the offset strategy. You see this line you had


FROM (SELECT DISTINCT ON (addid) addid, (geocode(address)).*
FROM hudson_deed_geocoded_3 As ag
WHERE ag.rating IS NULL
ORDER BY addid, rating) As g


-- you would change it to


FROM (SELECT DISTINCT ON (addid) addid, (g1.geo).*
FROM (SELECT addid, (geocode(address)) As geo
FROM (SELECT * FROM hudson_deed_geocoded_3
WHERE rating IS NULL ORDER BY addid LIMIT 500) As ag
WHERE ag.rating IS NULL ) As g1


I know the subselects are kind of ugly.

Hope that helps,
Regina
map_blind (10) [Avatar] Offline
#13
Re: Tiiger Geocoding error - embedded parentheses found
it took me a while to understand that 'verbose' meant something in the error - then I went and commented out the verbose part of the loop and the query ran...

I understood that 'verbose' was going to tell one why something wasn't running, i.e. you have to have a location & etc as opposed to failing silently.

That being said, several thousand different addresses returned the same point, so maybe there's more to that loop than I thought. I'll take a spin with the snippets below.