I had to pull some data from a source, to create a Google map today. But, the only format the address data was in was a three-column PDF.
As I toiled over the best way to go about parsing this data, I remembered the good old days of RTF files, and how life was so simple back then!
The RTF option is a prime candidate for getting PDF data into a usable format. I proceeded as follows:
- Copy all PDF data and paste it into Windows Wordpad
- Save the file as an RTF
- Create a script, in programing language of choice
- Read the RTF into a variable
- Use the carriage return character to split all lines into an Array or list
- Iterate over the list, parsing RTF codes from each line, and using them to identify the pieces of information you need
- Output the data
- In my case, I needed to geocode a bunch of addresses, so I copy/pasted the entire contents of my output over to http://www.findlatitudeandlongitude.com/batch-geocode/