You haven’t got Spark MLlib set up and you happen to need to run a quick logistic regression in scala, then you should go to ScalaNLP for Breeze and Nak. But if you just need to run a really quick one and you happen to have R installed.

This is what you can do thanks to the jvmr package, a wrapup of the rJava package:

import org.ddahl.jvmr.RInScala
import breeze.linalg._
import breeze.stats.distributions._
import scala.util._

val R = RInScala()

val gau = Gaussian(0.0,1.0)
val x = new DenseMatrix(100,2,gau.sample(200).toArray)         /**Generate two X predictors in Scala*/
val y = Seq.fill(100)(Random.nextFloat).map(_<0.5).toArray       /**Generate vector Y in Scala*/
R.y  = y map {case true => 1 case _ => 0}   /**Send Y to R, has to be as integer rather than t/f*/
R.x1 = x(::,1).toArray /**Send X as Vector to R, denseMatrix from breeze lib not supported in jvmr*/
R.x2 = x(::,0).toArray /**same as above*/

R> """res = glm(y~x1+x2, family=binomial(link=probit))$coefficients"""  /**Logistic regressin in R*/
R> """library("ggplot2", lib.loc="C:/R/R-2.15.2/library")"""
R> """qplot(glm(y~x1+x2, family=binomial(link=probit))$residuals)"""   /**Plot the residue in R*/

val res = R.res /**Get the coefficients back to Scala*/

This is the plot, which is meaningless anyway: My helpful screenshot

And some good blogs about doing statistics with scala:

Brief introduction to Scala and Breeze for statistical computing

A frequentist approach to probability


Published

09 April 2014

Tags


blog comments powered by Disqus