sparkavro: Manupilate Apache Avro file with sparklyr
I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.
chezou/sparkavro
_sparkavro - Load Avro data into Spark with sparklyr_github.com
Installation
Use {devtools}
to install sparkavro.
devtools::install_github(“chezou/avrospark”)
Simple usage
You can read and write Avro file as follows:
library(sparklyr)
library(sparkavro)
sc <- spark_connect(master = “spark://HOST:PORT”)
df <- spark_read_avro(sc, “test_table”, “/user/foo/test.avro”)
spark_write_avro(df, “/tmp/output”)
This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.