PDF to Spreadsheet Pro (PSP)

Users Guide

Created July 14, 2010

Steve Hannah

Web Lite Solutions Corp.

Updated for Version 1.3

September 20, 2010


Table of Contents

PDF to Spreadsheet Pro (PSP)

Users Guide

Table of Contents

Introduction

Specifications:

Installation

Performing your first PDF to Spreadsheet Conversion

Improving Row and Column Detection

Using Static Column Guides

Adding and Removing Column Guides Manually

Converting to Spreadsheet

How to upgrade from the Demo version to the Full Version

How to get a License Code

Copying to a different Computer

Reactivating your License Code

FAQ


Introduction

PDF to Spreadsheet (PSP) is a utility for Mac OS X that allows you to convert your scanned PDFs into editable spreadsheets with the click of a button.  It uses advanced OCR (optical character recognition) technology to analyze your documents and find out where the column and row dividers are.

This guide will show you how to get started using PSP and how to obtain the best results for conversion.

Specifications:

Supported input file types:

Output file formats:

Requirements:

Installation

  1. Download the PSP Demo from the web site (http://solutions.weblite.ca/pdf-to-spreadsheet).
  2. Unzip the PDF To Spreadsheet-x.y.z.dmg.zip file, and mount the resulting disk image (DMG file on your desktop)  (this step may occur automatically depending on your computer’s preferences.
  3. In the PDF To Spreadsheet disk image window, drag the PDF to Spreadsheet application icon into your Applications folder, then eject the PDF to Spreadsheet disk image.
  4. Optionally you may make a shortcut to the PDF to Spreadsheet application on your dock.  You can do this by opening your Applications folder, locating the PDF to Spreadsheet application, and dragging it onto your dock where you wish it to appear.

Performing your first PDF to Spreadsheet Conversion

  1. Open PDF to Spreadsheet Pro.

  2. Once opened select “Open” from the “File” menu, and select your PDF file in the open dialog.



  3. After a few moments you should see the contacts list in the main viewer of PSP.  It will have red grid lines over the page to indicate where PSP thinks the row and column guides are located.


  4. If the column and row guides that PSP shows you (marked with red lines) do not accurately mark the cells of the spreadsheet, refer to the next section “Improving Row and Column Detection” for tips on improving the results.  In addition you may want to check out the section “Adding and Removing Column Guides Manually” to learn how to add your own column guides.
  5. Click the “Convert to Spreadsheet” button in the top right of the window.  This will open a dialog that allows you to enter the page range that you wish to convert.  Keep “All” selected so that we can just convert the entire document.  Then press the start conversion button.




  6. The conversion will take a few minutes, but when it is done you will see a table displayed with the contents of your document.

  7. At this point you can either save the document as a .CSV file that can be opened in Excel, or you can copy and paste the contents directly into Excel.  For this example, let’s save the file.
  8. Select “Save” from the “File” menu, then enter “my-contacts.csv” for the file name, and click “Save”.

  9. Open Excel (or your preferred spreadsheet application), and open the file that we saved.  At this point you are able to edit the data on the spreadsheet or do whatever you like with it.

Improving Row and Column Detection

PDF to Spreadsheet Pro will automatically attempt to figure out where your column and row dividers reside.  It does this by analyzing the page for what looks like row and column dividers.  In some cases the default settings may fail to yield accurate results.  You can try to improve the results by selecting a region of the page on which the row and column calculation should be based.  

You can do this by pressing your mouse on the page and dragging a rectangular region.  You should see a blue translucent box form as you drag.  This represents the horizontal stripe in which columns will be detected, and the vertical stripe in which rows will be detected.  When you release the mouse button you’ll see a progress bar appear momentarily as the rows and columns are recalculated.

You can experiment with different regions to see what yields the best results for your document.  Please note that your region doesn’t need to cover the entire contents you want to convert.  You may find that selecting only a very small part of the page yields the best results.

Please see http://solutions.weblite.ca/pdf-to-spreadsheet/video.php for a video tutorial on using the select region feature to improve your row and column calculation results.

Using Static Column Guides

As of version 1.3.1, the “Static Column Guides” is the default setting for PDF to Spreadsheet Pro.  What this means is that when you perform a calculation of column guides for a particular page, these guides are applied to the entire document.  The alternative is to perform a separate calculation on each page of the PDF, however this may result in different numbers of columns on different pages.

You can disable Static Column Guides by de-selecting the “Use Static Column Guides” in the “Settings” menu.

Note that row calculation is always performed independently on each page regardless of the “Static Column Guides” setting.

Adding and Removing Column Guides Manually

If automatic column detection still leaves you with incorrect results, you can manually add and remove column guides by clicking on the appropriate positions in the top ruler.  Clicking on the position of an existing column guide, will remove that divider.

Clicking in an empty section of the ruler, will create a new column guide at that position.

Note that “Static Column Guides” must be enabled in order for manual creation and deletion of column guides to be allowed.

Converting to Spreadsheet

Once you have set up your column dividers the way you like them, you can convert the document into a spreadsheet.  

Click the “Convert to Spreadsheet” button in the top right of the window.

This opens a dialog that allows you to select a page range and the following options for conversion:

  1. Use OCR - (Optical Character Recognition). This option is necessary for PDFs that were produced from images or a scanner, because the actual text is not embedded in the PDF file itself.  Do not check this box if your PDF was produced from a “print-to-pdf” option or from an application.  It will cause the conversion to run far more slowly and it may result in some text being extracted inaccurately (because it works by deciphering text that is contained as part of images).  For more information about this option, see the “Optical Character Recognition” section later in this manual.
  2. Remove Grid Lines - This option is only relevant if you also check the “Use OCR” box.  It is helpful for spreadsheets that actually include the black grid lines between cells.  These lines tend to confuse the OCR engine and produce less accurate results than spreadsheets that don’t explicitly include the grid lines.  Checking this box will cause PDF to Spreadsheet Pro to attempt to erase the gridlines before passing the images to the OCR engine for processing.  It can yield much better results.

Once you have chosen your conversion settings, click the “Convert” button in the lower corner of the dialog.  A progress indicator should pop up keeping you posted as to the progress of the conversion.  Upon completion, the progress box will disappear leaving you with a tabular/spreadsheet version of the document as chosen by your conversion settings.

At this point the content is still inside PDF to Spreadsheet Pro, so you can review the conversion to make sure that it meets your expectations.  If everything looks good you can export it into Microsoft Excel or your preferred spreadsheet program using one of the following methods:

  1. Select the entire spreadsheet using Command-A on your keyboard (i.e. Select All).  The select “Copy” from the “Edit” menu to copy the entire sheet to the clipboard.  Then open your Spreadsheet program (e.g. MS Excel).  Finally open a blank spreadsheet document and paste the contents into it.
  2. Save the spreadsheet as a CSV file by selecting “Save” from the “File” menu.

How to upgrade from the Demo version to the Full Version

In order to upgrade to the full version, all you need to do is enter a valid license code into the registration dialog, and your copy will be unlocked.

Instructions if you already have a license Code

These instructions assume that you already have a valid license code.  If you don’t have a license code yet, please proceed to the next section “How to get a license code”.

Steps:

  1. Open your demo copy of PDF to Spreadsheet Pro
  2. Select “Register” from the “File” menu.  This will open a registration dialog with a field to enter your license code.
  3. Paste your license code into the “License code” field and click the “Register” button.
  4. If all goes well you’ll receive a message that the registration was successful and your copy will be unlocked so you can proceed as normal.

How to get a License Code

In order to upgrade your demo version of PDF to Spreadsheet Pro to the full version you can visit the website (http://solutions.weblite.ca/pdf-to-spreadsheet) and purchase a license key.  Once the transaction is complete, you will receive your valid license code instantly via email along with instructions on how to register your copy.

Copying to a different Computer

When you enter the license code to register your copy of PDF to Spreadsheet Pro, it is activated only for that one computer.  If you copy it to a different computer or attempt to use it as a different user on the same computer it will revert back to the demo version.

Not to worry, reactivating your license code for a different computer is quite easy.

Reactivating your License Code

License codes are valid for 48 hours or 5 activations after they are activated.  This means that from the time that you purchase a license code you will have 48 hours to enter it into your copy of PDF to Spreadsheet Pro.  Once the license code is entered, your copy of PDF to Spreadsheet Pro is registered for life.  However if you copy PSP to a different computer you will need to re-register it.  Then you will need a new license code.

You can reactivate expired license codes on the website at http://solutions.weblite.ca/pdf-to-spreadsheet/reactivate.php .  This will send you a new license code which you can then use to register your copy of PDF to Spreadsheet on a different computer.

FAQ

Question: How do I convert only specific pages of my document?

Answer: When you click “Convert to Spreadsheet”, a dialog box opens that allows you to specify the page range that you wish to convert.  The default selection is “All”, but you can specify a page range by selecting “From”, and then enter your page range in the text fields.  If you just wanted to convert the first 2 pages of the document you would enter “1” and “2” in the respective text fields.

Question: What does the “Straighten Image” button do?

Answer: Most scanned documents are not perfectly straight.  This can make it difficult to find the row and column dividers of the content on each page.  PDF to Spreadsheet Pro provides the straighten image function to try to rotate each page so that it is straight.  This generally yields much more accurate results, so this option is selected by default.  In the odd case that better results can be obtained without trying to straighten the page, you can unselect this option and run the conversion on the document directly.