|
Where to look for API:
Following are the places where one should look for transformations and functions:
|
||||||||
|
Converting to Spark types:
|
||||||||
|
Working with Booleans:
Following are the methods available on Column type to perform
tests.
Note that most of the methods have an operator equivalent
methods. These are preferred in Scala.
The verbose ones are preferred in Java.
Recollect that === is a method that we are invoking on the
Column type.
In Spark, you should always chain
together and filters as a sequential filter.
The reason for this is that even if Boolean statements are
expressed serially (one after the other), Spark will flatten all of
these filters into one statement and perform the filter at the same time,
creating the and statement for us. Multiple or statements
need to be specified in the same statement
Below note how we define condition as Column type. We can
create such Column type instances where ever we want. However they will be
analyzed only in context of a Dataframe
As shown above its often easier to
just express filters as SQL statements than using the
programmatic DataFrame interface and Spark SQL allows us to do this
without paying any performance penalty.
|
||||||||
|
Working with
Numbers:
We show multiple ways of performing the, --- using col, using
expr ,using select Expr.
By default, the round function rounds up if you’re exactly in between
two numbers. You can round down by using the bround
Following shows the definition of stat method on Dataframe
Documentation of
DataFrameStatFunctions: https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.DataFrameStatFunctions
Note how using stat methods in select are simpler
|
Search This Blog
Saturday, 13 April 2019
CH6.1 Working with Booleans, Numbers
Subscribe to:
Comments (Atom)