|
def cume_dist(): Column
Window
function: returns the cumulative distribution of values within a window
partition, i.e. the fraction of rows that are below the current row.
N
= total number of rows in the partition
cumeDist(x)
= number of values before (and including) x / N
def dense_rank(): Column
Window
function: returns the rank of rows within a window partition, without any
gaps.
The
difference between rank and dense_rank is that denseRank leaves no gaps in
ranking sequence when there are ties. That is, if you were ranking a
competition using dense_rank and had three people tie for second place,
you would say that all three were in second place and that the next person
came in third. Rank would give me sequential numbers, making the person
that came in third place (after the ties) would register as coming in
fifth.
This
is equivalent to the DENSE_RANK function in SQL.
def lag(e: Column, offset: Int, defaultValue:
Any): Column
Window
function: returns the value that is offset rows before the current row,
and defaultValue if there is less than offset rows before the current row.
For example, an offset of one will return the previous row at any given
point in the window partition.
This
is equivalent to the LAG function in SQL.
def lag(columnName: String, offset: Int,
defaultValue: Any): Column
Window
function: returns the value that is offset rows before the current row,
and defaultValue if there is less than offset rows before the current row.
For example, an offset of one will return the previous row at any given
point in the window partition.
This
is equivalent to the LAG function in SQL.
def lag(columnName: String, offset: Int): Column
Window
function: returns the value that is offset rows before the current row,
and null if there is less than offset rows before the current row. For
example, an offset of one will return the previous row at any given point
in the window partition.
This
is equivalent to the LAG function in SQL.
def lag(e: Column, offset: Int): Column
Window
function: returns the value that is offset rows before the current row,
and null if there is less than offset rows before the current row. For
example, an offset of one will return the previous row at any given point
in the window partition.
This
is equivalent to the LAG function in SQL.
def lead(e: Column, offset: Int, defaultValue:
Any): Column
Window
function: returns the value that is offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. For
example, an offset of one will return the next row at any given point in
the window partition.
This
is equivalent to the LEAD function in SQL.
def lead(columnName: String, offset: Int,
defaultValue: Any): Column
Window
function: returns the value that is offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. For
example, an offset of one will return the next row at any given point in
the window partition.
This
is equivalent to the LEAD function in SQL.
def lead(e: Column, offset: Int): Column
Window
function: returns the value that is offset rows after the current row, and
null if there is less than offset rows after the current row. For example,
an offset of one will return the next row at any given point in the window
partition.
This
is equivalent to the LEAD function in SQL.
def lead(columnName: String, offset: Int): Column
Window
function: returns the value that is offset rows after the current row, and
null if there is less than offset rows after the current row. For example,
an offset of one will return the next row at any given point in the window
partition.
This
is equivalent to the LEAD function in SQL.
def ntile(n: Int): Column
Window
function: returns the ntile group id (from 1 to n inclusive) in an ordered
window partition. For example, if n is 4, the first quarter of the rows
will get value 1, the second quarter will get 2, the third quarter will
get 3, and the last quarter will get 4.
This
is equivalent to the NTILE function in SQL.
def percent_rank(): Column
Window
function: returns the relative rank (i.e. percentile) of rows within a
window partition.
This
is computed by:
(rank
of row in its partition - 1) / (number of rows in the partition - 1)
This
is equivalent to the PERCENT_RANK function in SQL.
def rank(): Column
Window
function: returns the rank of rows within a window partition.
The
difference between rank and dense_rank is that dense_rank leaves no gaps
in ranking sequence when there are ties. That is, if you were ranking a
competition using dense_rank and had three people tie for second place,
you would say that all three were in second place and that the next person
came in third. Rank would give me sequential numbers, making the person
that came in third place (after the ties) would register as coming in
fifth.
This
is equivalent to the RANK function in SQL.
def row_number(): Column
Window
function: returns a sequential number starting at 1 within a window
partition.
|
No comments:
Post a Comment