R In Scala with jvmr package
You haven’t got Spark MLlib set up and you happen to need to run a quick logistic regression in scala, then you should go to ScalaNLP for Breeze and Nak. But if you just need to run a really quick one and you happen to have R installed.
This is what you can do thanks to the jvmr package, a wrapup of the rJava package:
import org.ddahl.jvmr.RInScala
import breeze.linalg._
import breeze.stats.distributions._
import scala.util._
val R = RInScala()
val gau = Gaussian(0.0,1.0)
val x = new DenseMatrix(100,2,gau.sample(200).toArray) /**Generate two X predictors in Scala*/
val y = Seq.fill(100)(Random.nextFloat).map(_<0.5).toArray /**Generate vector Y in Scala*/
R.y = y map {case true => 1 case _ => 0} /**Send Y to R, has to be as integer rather than t/f*/
R.x1 = x(::,1).toArray /**Send X as Vector to R, denseMatrix from breeze lib not supported in jvmr*/
R.x2 = x(::,0).toArray /**same as above*/
R> """res = glm(y~x1+x2, family=binomial(link=probit))$coefficients""" /**Logistic regressin in R*/
R> """library("ggplot2", lib.loc="C:/R/R-2.15.2/library")"""
R> """qplot(glm(y~x1+x2, family=binomial(link=probit))$residuals)""" /**Plot the residue in R*/
val res = R.res /**Get the coefficients back to Scala*/
This is the plot, which is meaningless anyway:
And some good blogs about doing statistics with scala:
Brief introduction to Scala and Breeze for statistical computing
A frequentist approach to probability