The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

vyurik (76) [Avatar] Offline
#1
Dear Author,
I've been running the IngestionSchemaManipulationApp but got this exception:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "state" among (OBJECTID HSISID NAME ADDRESS1 ADDRESS2 CITY STATE POSTALCODE PHONENUMBER RESTAURANTOPENDATE FACILITYTYPE PERMITID X Y GEOCODESTATUS, county);
at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:224)
at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:224)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.Dataset.resolve(Dataset.scala:223)
at org.apache.spark.sql.Dataset.col(Dataset.scala:1269)
at net.jgp.books.sparkWithJava.ch03.lab100IngestionSchemaManipulation.IngestionSchemaManipulationApp.start(IngestionSchemaManipulationApp.java:67)
at net.jgp.books.sparkWithJava.ch03.lab100IngestionSchemaManipulation.IngestionSchemaManipulationApp.main(IngestionSchemaManipulationApp.java:26)

Your comment is appreciated.
Vitaly
Jean Georges Perrin (15) [Avatar] Offline
#2
Hi Vitaly,

I just tried, and this is the output I get:

*** Right after ingestion
+--------+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+--------+------------+-----------+-------------+
|OBJECTID|     HSISID|                NAME|            ADDRESS1|ADDRESS2|       CITY|STATE|POSTALCODE|   PHONENUMBER|  RESTAURANTOPENDATE|     FACILITYTYPE|PERMITID|           X|          Y|GEOCODESTATUS|
+--------+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+--------+------------+-----------+-------------+
|    1001|04092016024|                WABA|2502 1/2 HILLSBOR...|    null|    RALEIGH|   NC|     27607|(919) 833-1710|2011-10-18T00:00:...|       Restaurant|    6952|-78.66818477|35.78783803|            M|
|    1002|04092021693|  WALMART DELI #2247|2010 KILDAIRE FAR...|    null|       CARY|   NC|     27518|(919) 852-6651|2011-11-08T00:00:...|       Food Stand|    6953|-78.78211173|35.73717591|            M|
|    1003|04092017012|CAROLINA SUSHI &a...|5951-107 POYNER V...|    null|    RALEIGH|   NC|     27616|(919) 981-5835|2015-08-28T00:00:...|       Restaurant|    6961|-78.57030208|35.86511564|            M|
|    1004|04092030288|THE CORNER VENEZU...|    7500 RAMBLE WAY |    null|    RALEIGH|   NC|     27616|          null|2015-09-04T00:00:...|Mobile Food Units|    6962|  -78.537511|35.87630712|            M|
|    1005|04092015530|        SUBWAY #3726| 12233 CAPITAL BLVD |    null|WAKE FOREST|   NC|27587-6200|(919) 556-8266|2009-12-11T00:00:...|       Restaurant|    6972|-78.54097555|35.98087357|            M|
+--------+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+--------+------------+-----------+-------------+
only showing top 5 rows

root
 |-- OBJECTID: string (nullable = true)
 |-- HSISID: string (nullable = true)
 |-- NAME: string (nullable = true)
 |-- ADDRESS1: string (nullable = true)
 |-- ADDRESS2: string (nullable = true)
 |-- CITY: string (nullable = true)
 |-- STATE: string (nullable = true)
 |-- POSTALCODE: string (nullable = true)
 |-- PHONENUMBER: string (nullable = true)
 |-- RESTAURANTOPENDATE: string (nullable = true)
 |-- FACILITYTYPE: string (nullable = true)
 |-- PERMITID: string (nullable = true)
 |-- X: string (nullable = true)
 |-- Y: string (nullable = true)
 |-- GEOCODESTATUS: string (nullable = true)

We have 3440 records.
*** Dataframe transformed
+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+------------+-----------+------+-------------------+
|  datasetId|                name|            address1|address2|       city|state|       zip|           tel|           dateStart|             type|        geoX|       geoY|county|                 id|
+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+------------+-----------+------+-------------------+
|04092016024|                WABA|2502 1/2 HILLSBOR...|    null|    RALEIGH|   NC|     27607|(919) 833-1710|2011-10-18T00:00:...|       Restaurant|-78.66818477|35.78783803|  Wake|NC_Wake_04092016024|
|04092021693|  WALMART DELI #2247|2010 KILDAIRE FAR...|    null|       CARY|   NC|     27518|(919) 852-6651|2011-11-08T00:00:...|       Food Stand|-78.78211173|35.73717591|  Wake|NC_Wake_04092021693|
|04092017012|CAROLINA SUSHI &a...|5951-107 POYNER V...|    null|    RALEIGH|   NC|     27616|(919) 981-5835|2015-08-28T00:00:...|       Restaurant|-78.57030208|35.86511564|  Wake|NC_Wake_04092017012|
|04092030288|THE CORNER VENEZU...|    7500 RAMBLE WAY |    null|    RALEIGH|   NC|     27616|          null|2015-09-04T00:00:...|Mobile Food Units|  -78.537511|35.87630712|  Wake|NC_Wake_04092030288|
|04092015530|        SUBWAY #3726| 12233 CAPITAL BLVD |    null|WAKE FOREST|   NC|27587-6200|(919) 556-8266|2009-12-11T00:00:...|       Restaurant|-78.54097555|35.98087357|  Wake|NC_Wake_04092015530|
+-----------+--------------------+--------------------+--------+-----------+-----+----------+--------------+--------------------+-----------------+------------+-----------+------+-------------------+
only showing top 5 rows

+---------------+-----------+-----+---------------+------+---------------+
|           name|       city|state|           type|county|             id|
+---------------+-----------+-----+---------------+------+---------------+
|           WABA|    RALEIGH|   NC|     Restaurant|  Wake|NC_Wake_0409...|
|WALMART DELI...|       CARY|   NC|     Food Stand|  Wake|NC_Wake_0409...|
|CAROLINA SUS...|    RALEIGH|   NC|     Restaurant|  Wake|NC_Wake_0409...|
|THE CORNER V...|    RALEIGH|   NC|Mobile Food ...|  Wake|NC_Wake_0409...|
|   SUBWAY #3726|WAKE FOREST|   NC|     Restaurant|  Wake|NC_Wake_0409...|
+---------------+-----------+-----+---------------+------+---------------+
only showing top 5 rows

root
 |-- datasetId: string (nullable = true)
 |-- name: string (nullable = true)
 |-- address1: string (nullable = true)
 |-- address2: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- zip: string (nullable = true)
 |-- tel: string (nullable = true)
 |-- dateStart: string (nullable = true)
 |-- type: string (nullable = true)
 |-- geoX: string (nullable = true)
 |-- geoY: string (nullable = true)
 |-- county: string (nullable = false)
 |-- id: string (nullable = true)

*** Looking at partitions
Partition count before repartition: 1
Partition count after repartition: 4


Have you modified anything, if so can you share your source? (It's ok if you did!!)

From the exception, it looks like you can only two columns:
  • "OBJECTID HSISID NAME ADDRESS1 ADDRESS2 CITY STATE POSTALCODE PHONENUMBER RESTAURANTOPENDATE FACILITYTYPE PERMITID X Y GEOCODESTATUS"

  • county


  • This could explain the issue as you do not have a column named "state".

    jg
    vyurik (76) [Avatar] Offline
    #3
    Hi Jean Georges,
    My apology. I recall that I've opened your csv files to increase the some columns width and most probably misformatted the column.
    I reloaded these files from GIT and everything worked fine.
    Best regards,
    Vitaly
    Jean Georges Perrin (15) [Avatar] Offline
    #4
    You know what? First, no need to apologize! Second, it makes me happy to get feedback and that people experiment and play with the data, the example, and learn! Don't hesitate to come back here!