Convert Table from PDF to Excel Using Power Query

Convert Table from PDF to Excel Using Power Query

The Power Query has proved to be a life-saver feature for many excel users. This feature helps to extract and import data from various sources to excel. To name a few, the power query feature allows you to extract data from another excel workbook or files, from websites, text/CSV and JSON file format, and many more. On this imported data, you can perform data transformation on it. In this tutorial, would learn how to copy, convert, and export the table from PDF to Excel using the Power Query Excel feature.

This amazing excel feature is inbuilt in Excel 2016 and other higher versions of Excel. However, if you are using older excel versions, then you can download and install an external add-in and get the power query feature in Excel 2010 (Professional Plus Version) and Excel 2013. Once installed, a new tab named ‘Power Query’ would get added to the Excel ribbon.

Let us now unlock the technique of converting table from PDF file to excel.

Convert Table from PDF to Excel [Step by Step Guide]

Below steps would help you to extract and convert the tables from PDF document to Excel:

  • Go to the Data tab > Get & Transform Data Group > Get Data > From File > From PDF (see image below)
Get Data From PDF Navigation
  • In the ‘Import Data‘ dialog box, search for your PDF file and click on the ‘Import‘ button.
  • As a result, excel would open the Navigator window showing the list of all the tables available in the PDF. The list would be placed within a folder (where Folder Name = PDF Name), as shown below.
Navigator Window with PDF Tables
  • Simply select the table to see the preview of the table in the ‘Preview’ window. Click on the ‘Load’ option to directly extract the PDF table to Excel.
Load Button Power Query
  • As a result, you would notice that Excel creates a new worksheet and places the selected table there. It also creates a query named ‘Table010 (Page 16)’. This shows that the extracted PDF table is on the 16th page of the PDF and it is the tenth table over there. Also, it has loaded 22 rows of data to Excel.
PDF Table Loaded to Excel - Power Query

Now, let us suppose that there are some changes in the table in the original PDF file and you want to have these changes in Excel.

Fortunately, to get the updated table from PDF to excel, you need not perform all the above steps again. You simply have to refresh the query and that’s it. On refresh, the power query would automatically get the updated data from PDF File to the Excel table.

One of the most popular ways is to right-click on the table and click the Refresh option (see image below).

Right-Click and Refresh Power Query

There are many other methods of refreshing Excel power query.

Now, let us see what did power query do in background while you were performing above steps?

Power Query Editor – What Happens in Background

While fetching the table from the PDF file to Excel, the power query excel tool performs some transformation steps on the imported data. There are-

  1. Extraction of data from the source PDF.
  2. Selection of the table to import.
  3. Changing the data types of the table header.

You can find all these steps in the ‘Power Query Editor‘ window. To get the ‘Power Query Editor’ window, click anywhere on the data table, and navigate to Query tab > Edit group > Edit option.

Navigation to Power Query Editor Window

On the right side of this window, under the ‘Query Settings‘, you would find the name of the query as ‘Table010 (Page 16)‘ and the default applied steps.

Applied Steps on PDF Table

To change the power query name, simply type a nice name in the ‘Name’ section and press Enter.

Power Query Name Change

Using the numerous options available on this window, you can perform transformations to your data. As you start performing power query transformation, it records those steps in sequence and places them one below the other in the ‘Applied Steps’ section.

Now, when the source PDF table data gets changed, you can refresh your power query and it would perform all the Applied Steps in sequence, one after the other, to get updated data from PDF to Excel.

Let us take one example to illustrate how power query transformation works.

Convert Table from PDF to Excel Using Power Query

Example of Power Query Transformation

Let us make the following transformation to the imported table from PDF:

  1. Remove columns for 31st March 2020.
  2. Remove the first row having the text – ‘Quarter-End’ and ‘Year-End’.
  3. Make the first row as the header row.

Firstly, open the ‘Power Query Editor’ window using below path.

Navigation to Power Query Editor Window

Remove Columns in Power Query

To remove the columns for 31st March 2020, select the columns by clicking on the column headers (press Ctrl keyboard key to select multiple columns). Now, go to Home tab > Manage Column group > Remove Columns > Remove Columns option.

Remove Columns Navigation Power Query

Another way is to select all columns other than those that you want to delete and then, use the option ‘Remove Other Columns‘.

Remove Columns Navigation Power Query 2

As a result, excel would remove column 3 and 5 from the table.

Remove First Row in Power Query

To delete the first row, go to Home tab > Reduce Rows group > Remove Rows > Remove Top Rows.

Remove Top Row Excel Pivot Table Navigation

In the ‘Remove Top Rows‘ dialog box that appears, type the count of the rows that you want to remove from the top. In our case, as we want to remove only the first header row, type ‘1’, and click on the OK button. See the image below:

Remove Top Rows dialog box

This would instantly remove the first row (top 1 row) from the table.

Convert First Row of Data As Header Row

To make or convert the first row of data as header row in power query, simply follow this navigation path – Go to Transform Tab > Table group > Use First Row As Headers > Use First Row As Headers option.

Use First Row As Header Excel

This would take the first row of the data as a header of the data. The transformed data would look like this:

Power Query After Transformation Steps

Now, check the ‘Applied Steps’ section. You would notice that excel has added three new transformation steps to it. These steps are nothing but the transformations that have carried out.

New Applied Steps after Transformation

Now, let us re-import the transformed data back to excel. To load the data back from power query to excel, navigate to Home tab > Close & Load.

Power Query Editor - Close & Load Icon

As a result, your PDF table is back to Excel.

Next time when the source PDF table data gets changed, you simply have to refresh your power query and it would perform all the Applied Steps in sequence, one after the other, and import the updated data from PDF to Excel.

This is what I call as the beauty of power query in excel !! Don’t you also accept this point !!

With this, we have completed the topic on how to convert or export table/tables from PDF to Excel and also gone through some of the additional transformations that you can perform on the imported data. Your views, thoughts, and review in the comment section below are of paramount importance to us.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.