|
DataFrame Transformations (Manipulating DataFrames)
When
working with individual DataFrames there are some fundamental objectives.
These break down into several core operations
|
|||||||||
|
Creating DataFrames:
val df1 =
spark.read.format("json").load("/data/flight-data/json/2015-summary.json")
|
|||||||||
|
Select and selectExpr:
scala> df.select("DEST_COUNTRY_NAME").show(2)
+-----------------+
|DEST_COUNTRY_NAME|
+-----------------+
| United States|
| United States|
+-----------------+
only showing top 2 rows
scala> df.select("DEST_COUNTRY_NAME",
"ORIGIN_COUNTRY_NAME").show(2)
+-----------------+-------------------+
|DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|
+-----------------+-------------------+
| United States| Romania|
| United States| Croatia|
+-----------------+-------------------+
only showing top 2 rows
This allows to use SQL like expressions in the select
We can treat selectExpr as a simple way to build up
complex expressions that create new DataFrames. In fact, we can add any valid
non-aggregating SQL statement, and as long as the columns resolve, it will be
valid!
Following adds new
column to our DataFrame. Similar to sql we are using * to represent all the
columns.
(Note that if we want to group by and then perform
aggregation, the technique is different. With select we can only perform
aggregations over entire DF.)
|
Search This Blog
Thursday, 11 April 2019
CH5/4 Creating DF,select and selectExpr
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment