VB script code to allow auto update of excel filter lists

June 20, 2020 Reading time: ~1 minute

Private Sub Worksheet_Change(ByVal Target As Range)

ActiveSheet.AutoFilter.ApplyFilter

End Sub


Catchup data processing procedure for students

June 14, 2020 Reading time: 5 minutes

  1. Obtain all students enrollment reports in print to pdf files (not printed and re-scanned, if this happens it won’t work)

  2. Place all pdf files in one directoru called Enrolment_Reports

  3. Copy file pdf_2_csv.pl to same directory

  4. run python pdf_2_csv.py

  5. All files should now be in csv format

  6. Check all files to make sure the commas are all in the right places. Sometimes the pdf table is not correct and this causes errors in the csv. It is better to fix any problems now than later.

  7. If needed run sed -i ‘s/matching_pattern/replacement_pattern/g’ *.csv for any issues with the files.

  8. Example of no result is:
    AURTTC001,Inspect and service cooling systems,,,,10

    Example of a result is:
    AURETR029,Diagnose and repair charging systems,12/09/2019,1/04/2020,Competent,30

  1. run ls *.csv | grep -Po '.*(?=\.)' > student_list.csv

  2. Open student_list.csv and do a search/replace for newline and replace it with a comma, save the file.

  3. Copy file csv_2_xls.pl to directory with csv files

  4. run perl csv_2_xls.pl catch_ups.xlsx *.csv

  5. Now there should be an excel file in the directory

  6. Open the excel file with libreoffice and add a sheet to the front and call it summary

  7. Go to the first student sheet and copy and paste all the units of competency in to column A, starting at A3

  8. Place Student Name in cell A1

  9. go to cell B1 and type = in the cell, then navigate to student_list tab and select the first name and then press enter.

  10. Select the square in the bottom right hand corner of cell B1 when it is highlighted and drag it to the number of students required. When you run out of students, the number 0 will be present in the cell.

  11. Go back to B3 and paste the following formula in to the cell

    =IF(VLOOKUP($A3,INDIRECT(B$1&".A7:F42"),5,0)="","YTC",VLOOKUP($A3,INDIRECT(B$1&".A7:F42"),5,0))

  12. Press enter and you should have a result in the cell. YTC will appear when there is no result recorded

  13. As done with the names, grab the square in the highlighted cell with the result and drag horizontal to the last name and then vertically to last row.

  14. If all the sheets are correct, you should have a full results table. If there are #N/A then you have a problem on the student sheet. You need to fix by cutting and pasting. Note: only column 1 and 5 are used from the student sheet. The other columns may be blank if not needed.


Excel spreadsheets for student catchups

June 14, 2020 Reading time: ~1 minute

Main summary sheet formula

Libreoffice

=IF(VLOOKUP($A3,INDIRECT(B$1&".A1:F42"),5,0)="","YTC",VLOOKUP($A3,INDIRECT(B$1&".A1:F42"),5,0))

Excel 365

=IF(VLOOKUP($A3,INDIRECT("'"&B$1&"'!"&"A7:F42"),5,0)="","YTC",VLOOKUP($A3,INDIRECT("'"&B$1&"'!"&"A7:F42"),5,0))


Combine CSV files to one spreadsheet

June 13, 2020 Reading time: 11 minutes

Link --> https://askubuntu.com/questions/1080394/how-to-combine-multiple-csv-files-to-one-ods-file-on-the-command-line-one-sheet

perl code:

With perl:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
# CPAN modules required:
use Spreadsheet::Write;
use Text::CSV;

my $xlsx_file = shift @ARGV;
$xlsx_file .= ".xlsx" unless $xlsx_file =~ /\.xlsx$/;
my $xlsx = Spreadsheet::Write->new(file => $xlsx_file);
my $csv = Text::CSV->new({binary => 1});

for my $csv_file (@ARGV) {
    my @rows = ();
    open my $fh, "<:encoding(utf8)", $csv_file;
    while (my $row = $csv->getline($fh)) {
        push @rows, $row;
    }
    $csv->eof or $csv->error_diag();
    close $fh;  

    (my $sheet_name = $csv_file) =~ s/\.[^.]+$//;   # strip extension
    $xlsx->addsheet($sheet_name);
    $xlsx->addrows(@rows);
}
$xlsx->close();

And use it like:

/path/to/create_xlsx.pl file.xlsx *.csv

If perl is not your thing, a bit of googling reveals:


Converting PDF to CSV using tabula-py

June 12, 2020 Reading time: 9 minutes

source: http://theautomatic.net/2019/05/24/3-ways-to-scrape-tables-from-pdfs-with-python/

tabula-py

tabula-py is a very nice package that allows you to both scrape PDFs, as well as convert PDFs directly into CSV files. tabula-py can be installed using pip:

1
pip install tabula-py

If you have issues with installation, check this. Once installed, tabula-py is straightforward to use. Below we use it scrape all the tables from a paper on classification regarding the Iris dataset (available here).

1
2
3
4
5
import tabula
 
 
tables = tabula.read_pdf(file, pages = "all", multiple_tables = True)

The result stored into tables is a list of data frames which correspond to all the tables found in the PDF file. To search for all the tables in a file you have to specify the parameters page = “all” and multiple_tables = True.

You can also use tabula-py to convert a PDF file directly into a CSV. The first line below will find the first table in the PDF and output it to a CSV. If we add the parameter all = True, we can write all of the PDF’s tables to the CSV.

1
2
3
4
5
# output just the first table in the PDF to a CSV
tabula.convert_into(file, "iris_first_table.csv")
 
# output all the tables in the PDF to a CSV
tabula.convert_into(file, "iris_all.csv", all = True)

tabula-py can also scrape all of the PDFs in a directory in just one line of code, and drop the tables from each into CSV files.

1
tabula.convert_into_by_batch("/path/to/files", output_format = "csv", pages = "all")

We can perform the same operation, except drop the files out to JSON instead, like below.

1
tabula.convert_into_by_batch("/path/to/files", output_format = "json", pages = "all")


LazyCoderOZ

I am a Linux guy, been around for 20+ years using Linux as my daily driver.
This is my blog on my discoveries and notes so I don't forget how I have done things :)