Converting to Spark
types(literals):
|
||||
Adding Columns:
|
||||
Renaming Columns:
|
||||
Reserved Characters and keywords:
We can refer to columns with reserved characters (and not
escape them) if we’re doing an explicit string-to-column reference, which is
interpreted as a literal instead of an expression. We only need to escape
expressions that use reserved characters or keywords.
|
||||
Case Sensitivity:
-- in SQL
set spark.sql.caseSensitive true |
||||
Removing Columns:
|
||||
Changing Column
type(casting):
Target Datatype can be specified
using Strings or one of the Spark internal types.
|
Search This Blog
Thursday, 11 April 2019
CH5.5 Spark Literals,adding Columns, removing Columns, Renaming Columns, casting
CH5/4 Creating DF,select and selectExpr
|
DataFrame Transformations (Manipulating DataFrames)
When
working with individual DataFrames there are some fundamental objectives.
These break down into several core operations
|
|||||||||
|
Creating DataFrames:
val df1 =
spark.read.format("json").load("/data/flight-data/json/2015-summary.json")
|
|||||||||
|
Select and selectExpr:
scala> df.select("DEST_COUNTRY_NAME").show(2)
+-----------------+
|DEST_COUNTRY_NAME|
+-----------------+
| United States|
| United States|
+-----------------+
only showing top 2 rows
scala> df.select("DEST_COUNTRY_NAME",
"ORIGIN_COUNTRY_NAME").show(2)
+-----------------+-------------------+
|DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|
+-----------------+-------------------+
| United States| Romania|
| United States| Croatia|
+-----------------+-------------------+
only showing top 2 rows
This allows to use SQL like expressions in the select
We can treat selectExpr as a simple way to build up
complex expressions that create new DataFrames. In fact, we can add any valid
non-aggregating SQL statement, and as long as the columns resolve, it will be
valid!
Following adds new
column to our DataFrame. Similar to sql we are using * to represent all the
columns.
(Note that if we want to group by and then perform
aggregation, the technique is different. With select we can only perform
aggregations over entire DF.)
|
Subscribe to:
Comments (Atom)