The Power Query has proved to be a life-saver feature for many excel users. This feature helps to extract and import data from various sources to excel. To name a few, the power query feature allows you to extract data from another excel workbook or files, from websites, text/CSV and JSON file format, and many more. On this imported data, you can perform data transformation on it. In this tutorial, would learn how to copy, convert, and export the table from PDF to Excel using the Power Query Excel feature.
This amazing excel feature is inbuilt in Excel 2016 and other higher versions of Excel. However, if you are using older excel versions, then you can download and install an external add-in and get the power query feature in Excel 2010 (Professional Plus Version) and Excel 2013. Once installed, a new tab named ‘Power Query’ would get added to the Excel ribbon.
Let us now unlock the technique of converting table from PDF file to excel.
Convert Table from PDF to Excel [Step by Step Guide]
Below steps would help you to extract and convert the tables from PDF document to Excel:
- Go to the Data tab > Get & Transform Data Group > Get Data > From File > From PDF (see image below)
- In the ‘Import Data‘ dialog box, search for your PDF file and click on the ‘Import‘ button.
- As a result, excel would open the Navigator window showing the list of all the tables available in the PDF. The list would be placed within a folder (where Folder Name = PDF Name), as shown below.
- Simply select the table to see the preview of the table in the ‘Preview’ window. Click on the ‘Load’ option to directly extract the PDF table to Excel.
- As a result, you would notice that Excel creates a new worksheet and places the selected table there. It also creates a query named ‘Table010 (Page 16)’. This shows that the extracted PDF table is on the 16th page of the PDF and it is the tenth table over there. Also, it has loaded 22 rows of data to Excel.
Now, let us suppose that there are some changes in the table in the original PDF file and you want to have these changes in Excel.
Fortunately, to get the updated table from PDF to excel, you need not perform all the above steps again. You simply have to refresh the query and that’s it. On refresh, the power query would automatically get the updated data from PDF File to the Excel table.
One of the most popular ways is to right-click on the table and click the Refresh option (see image below).
There are many other methods of refreshing Excel power query.
Now, let us see what did power query do in background while you were performing above steps?
Power Query Editor – What Happens in Background
While fetching the table from the PDF file to Excel, the power query excel tool performs some transformation steps on the imported data. There are-
- Extraction of data from the source PDF.
- Selection of the table to import.
- Changing the data types of the table header.
You can find all these steps in the ‘Power Query Editor‘ window. To get the ‘Power Query Editor’ window, click anywhere on the data table, and navigate to Query tab > Edit group > Edit option.
On the right side of this window, under the ‘Query Settings‘, you would find the name of the query as ‘Table010 (Page 16)‘ and the default applied steps.
To change the power query name, simply type a nice name in the ‘Name’ section and press Enter.
Using the numerous options available on this window, you can perform transformations to your data. As you start performing power query transformation, it records those steps in sequence and places them one below the other in the ‘Applied Steps’ section.
Now, when the source PDF table data gets changed, you can refresh your power query and it would perform all the Applied Steps in sequence, one after the other, to get updated data from PDF to Excel.
Let us take one example to illustrate how power query transformation works.
Example of Power Query Transformation
Let us make the following transformation to the imported table from PDF:
- Remove columns for 31st March 2020.
- Remove the first row having the text – ‘Quarter-End’ and ‘Year-End’.
- Make the first row as the header row.
Firstly, open the ‘Power Query Editor’ window using below path.
Remove Columns in Power Query
To remove the columns for 31st March 2020, select the columns by clicking on the column headers (press Ctrl keyboard key to select multiple columns). Now, go to Home tab > Manage Column group > Remove Columns > Remove Columns option.
Another way is to select all columns other than those that you want to delete and then, use the option ‘Remove Other Columns‘.
As a result, excel would remove column 3 and 5 from the table.
Remove First Row in Power Query
To delete the first row, go to Home tab > Reduce Rows group > Remove Rows > Remove Top Rows.
In the ‘Remove Top Rows‘ dialog box that appears, type the count of the rows that you want to remove from the top. In our case, as we want to remove only the first header row, type ‘1’, and click on the OK button. See the image below:
This would instantly remove the first row (top 1 row) from the table.
Convert First Row of Data As Header Row
To make or convert the first row of data as header row in power query, simply follow this navigation path – Go to Transform Tab > Table group > Use First Row As Headers > Use First Row As Headers option.
This would take the first row of the data as a header of the data. The transformed data would look like this:
Now, check the ‘Applied Steps’ section. You would notice that excel has added three new transformation steps to it. These steps are nothing but the transformations that have carried out.
Now, let us re-import the transformed data back to excel. To load the data back from power query to excel, navigate to Home tab > Close & Load.
As a result, your PDF table is back to Excel.
Next time when the source PDF table data gets changed, you simply have to refresh your power query and it would perform all the Applied Steps in sequence, one after the other, and import the updated data from PDF to Excel.
This is what I call as the beauty of power query in excel !! Don’t you also accept this point !!
With this, we have completed the topic on how to convert or export table/tables from PDF to Excel and also gone through some of the additional transformations that you can perform on the imported data. Your views, thoughts, and review in the comment section below are of paramount importance to us.