My work often deals with productionalizing machine learning, especially around Python and R. Integrating ML into SQL Server production environments is of particular interest, and in this post I’ll detail how one can run a Python or R script as a step in a SQL Server Agent job.
As data scientists work with both ETL and statistical tools, the interface between these is an important and (often neglected) area of study.
Let’s say you’ve run an SSIS package or stored procedure as part of an SQL Agent (ETL) job, and want to do some data mining or predictive work as a subsequent step. Here’s how:
First, set up the SQL Agent job to run a PowerShell script (that’s on the Desktop) as a step with something like the following as the command (this is for both R/Python):
cd "C:\Users\levi.thatcher\Desktop"
PowerShell ./rtest.ps1
For the R scenario, the PowerShell could contain this:
$R = "C:\Program Files\R\R‐3.2.2\bin\i386\Rscript.exe"
& $R 'test.R'
and the test.R script (also on the Desktop) could contain this:
setwd("C:/Users/levi.thatcher/Desktop")
fileConn<‐file("Routput.txt")
writeLines(c("Hello","World"), fileConn)
close(fileConn)
Alternatively, for the Python scenario, the PowerShell could contain this:
$python = "C:\Python27\python.exe"
& $python 'test.py'
and the test.py script (also on the Desktop) could contain this:
f = open('Pyoutput.txt','w')
f.write('hello world\n')
f.close()
While simple, this is an effective path that combines several powerful tools. Note that one has to give the NT Service\SQLSERVERAGENT user the appropriate execute permissions on the corresponding folders.
As always, enjoy your infra!