pyspark系列 日期函式

2021-08-21 13:59:43 字數 3613 閱讀 8652

日期函式

from pyspark.sql.functions import current_date

spark.range(3).withcolumn('date',current_date()).show()

# +---+----------+

# | id| date|

# +---+----------+

# | 0|2018-03-23|

# | 1|2018-03-23|

from pyspark.sql.functions import current_timestamp

spark.range(3).withcolumn('date',current_timestamp()).show()

# +---+--------------------+

# | id| date|

# +---+--------------------+

# | 0|2018-03-23 17:40:...|

# | 1|2018-03-23 17:40:...|

# | 2|2018-03-23 17:40:...|

# +---+--------------------+

from pyspark.sql.functions import date_format

df = spark.createdataframe([('2015-04-08',)], ['a'])

df.select(date_format('a', 'mm/dd/yyy').alias('date')).show()

from pyspark.sql.functions import to_date, to_timestamp

# 1.轉日期

df = spark.createdataframe([('1997-02-28 10:30:00',)], ['t'])

df.select(to_date(df.t).alias('date')).show()

# [row(date=datetime.date(1997, 2, 28))]

# 2.帶時間的日期

df = spark.createdataframe([('1997-02-28 10:30:00',)], ['t'])

df.select(to_timestamp(df.t).alias('dt')).show()

# [row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

# 還可以指定日期格式

df = spark.createdataframe([('1997-02-28 10:30:00',)], ['t'])

df.select(to_timestamp(df.t, 'yyyy-mm-dd hh:mm:ss').alias('dt')).show()

# [row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

from pyspark.sql.functions import year, month, dayofmonth

df = spark.createdataframe([('2015-04-08',)], ['a'])

df.select(year('a').alias('year'),

month('a').alias('month'),

dayofmonth('a').alias('day')

).show()

from pyspark.sql.functions import hour, minute, second

df = spark.createdataframe([('2015-04-08 13:08:15',)], ['a'])

df.select(hour('a').alias('hour'),

minute('a').alias('minute'),

second('a').alias('second')

).show()

from pyspark.sql.functions import quarter

df = spark.createdataframe([('2015-04-08',)], ['a'])

df.select(quarter('a').alias('quarter')).show()

from pyspark.sql.functions import date_add, date_sub

df = spark.createdataframe([('2015-04-08',)], ['d'])

df.select(date_add(df.d, 1).alias('d-add'),

date_sub(df.d, 1).alias('d-sub')

).show()

from pyspark.sql.functions import add_months

df = spark.createdataframe([('2015-04-08',)], ['d'])

df.select(add_months(df.d, 1).alias('d')).show()

from pyspark.sql.functions import datediff, months_between

# 1.日期差

df = spark.createdataframe([('2015-04-08','2015-05-10')], ['d1', 'd2'])

df.select(datediff(df.d2, df.d1).alias('diff')).show()

# 2.月份差

df = spark.createdataframe([('1997-02-28 10:30:00', '1996-10-30')], ['t', 'd'])

df.select(months_between(df.t, df.d).alias('months')).show()

計算當前日期的下乙個星期1,2,3,4,5,6,7的具體日子,屬於實用函式

from pyspark.sql.functions import next_day

# "mon", "tue", "wed", "thu", "fri", "sat", "sun".

df = spark.createdataframe([('2015-07-27',)], ['d'])

df.select(next_day(df.d, 'sun').alias('date')).show()

from pyspark.sql.functions import last_day

df = spark.createdataframe([('1997-02-10',)], ['d'])

df.select(last_day(df.d).alias('date')).show()

Sql Server系列 日期和時間函式

1.獲取系統當前日期函式getdate getdate 函式用於返回當前資料庫系統的日期和時間,返回值的型別為datetime。select getdate 2.返回utc日期的函式getutcdate utcdate 函式返回當前utc 世界標準時間 日期值。select getutcdate 3...

王道機考系列 日期類問題

有兩個日期,計算兩個日期之間的天數,如果兩個日期是連續的,我們規定他們之間的天數為兩天。輸入 有多組輸入,每組資料有兩行,分別表示兩個日期,形式為yyyymmdd 輸出 每組資料輸出一行,即日期差值。樣例輸入 20110412 20110422 樣例輸出 11解析 將原問題統一到乙個確定的起點區間上...

pyspark 日期格式

1.獲取當前日期 from pyspark.sql.functions import current date spark.range 3 withcolumn date current date show id date 0 2018 03 23 1 2018 03 23 2.獲取當前日期和時間f...