PowerShell: Split a large CSV and process in multiple Jobs

I had a large CSV of data to be loaded in through the FIM Service. Single threading this operation could have taken (literally) days, so I decided to have a go at multi-threading it, and here’s the skeleton script.

$csv = import-csv <my file>
$JobRows=1000 ## The number of CSV rows to process in each Job
$NumJobs = [math]::Ceiling($csv.count / $JobRows)

for ($i=0; $i -lt $NumJobs; $i++)
{
    [int]$StartRow = ($i * $JobRows)
    [int]$EndRow=(($i+1) * $JobRows - 1)
    write-host ("Rows {0} to {1}" -f $StartRow.ToString(),$EndRow.ToString())

    Start-Job -ArgumentList (,$csv[$StartRow..$EndRow]) -ScriptBlock { 
        PARAM ($CSVRows)

        foreach ($row in $CSVRows)
        {
            <... do something ...>
        }
    }
}

A couple of points when using this with the FIM or MIM service:

  • The FIMAutomation snapin and any function scripts you want to use have to be loaded inside the Start-Job script block, after the “PARAM and before the “foreach”.
  • It’s hard to see what is going on inside each Job. Writing out a log file inside the script block does slow it down but helps with keeping track of progress. I used a timestamped file name to save having to pass extra parameters into the script block:
    • $LogFile = “C:\Temp\{0}.txt” -f (get-date -format “s”).Replace(“-“,””).Replace(“:”,””).Replace(” “,””)

And some general notes:

  • It was necessary to pass the array as “(,$csv[$StartRow..$EndRow])” with the leading “(,” and closing “)”. I’m not sure what was happening when I just passed “$csv[$StartRow..$EndRow]” but it definitely wasn’t working. Apparently doing it this way forces it to be passed to the script block as an array.
  • You can check the status of the jobs using Get-Job.

Leave a Reply

Your email address will not be published. Required fields are marked *


*