|
def agg(expr: Column, exprs: Column*): DataFrame
Compute aggregates by specifying a series of
aggregate columns. Note that this
function by default retains the grouping columns in its output. To not retain grouping columns, set
spark.sql.retainGroupColumns to false.
The available aggregate methods are
defined in org.apache.spark.sql.functions.
//
Selects the age of the oldest employee and the aggregate expense for each
department
//
Scala:
import
org.apache.spark.sql.functions._
df.groupBy("department").agg(max("age"),
sum("expense"))
//
Java:
import
static org.apache.spark.sql.functions.*;
df.groupBy("department").agg(max("age"),
sum("expense"));
Note
that before Spark 1.4, the default behavior is to NOT retain grouping
columns. To change to that behavior, set config variable
spark.sql.retainGroupColumns to false.
//
Scala, 1.3.x:
df.groupBy("department").agg($"department",
max("age"), sum("expense"))
//
Java, 1.3.x:
df.groupBy("department").agg(col("department"),
max("age"), sum("expense"));
def agg(exprs: Map[String, String]): DataFrame
(Scala-specific)
Compute aggregates by specifying a map from column name to aggregate
methods. The resulting DataFrame will also contain the grouping columns.
The
available aggregate methods are avg, max, min, sum, count.
//
Selects the age of the oldest employee and the aggregate expense for each
department
df.groupBy("department").agg(Map(
"age" -> "max",
"expense" -> "sum"
))
def agg(aggExpr: (String, String), aggExprs:
(String, String)*): DataFrame
(Scala-specific)
Compute aggregates by specifying the column names and aggregate methods.
The resulting DataFrame will also contain the grouping columns.
The
available aggregate methods are avg, max, min, sum, count.
//
Selects the age of the oldest employee and the aggregate expense for each
department
df.groupBy("department").agg(
"age" -> "max",
"expense" -> "sum"
)
|
No comments:
Post a Comment