Analyze the code

Analyze the UDF function used in pySpark given below.import pandas as pd from pyspark.sql.functions import pandas_udf from pyspark.sql import Window df = spark.createDataFrame( [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], (“id”, “v”)) @pandas_udf(“double”) def mean_udf(v: pd.Series) -> float: return v.mean()[‘v’])).show() df.groupby(“id”).agg(mean_udf(df[‘v’])).show() w = Window \ .partitionBy(‘id’) \ .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) df.withColumn(‘mean_v’, mean_udf(df[‘v’]).over(w)).show()

Which of the following statements regarding this code are valid?


a. This type of UDF does not support partial aggregation

b. all data for a group or window will be loaded into memory by this code.

c. Only unbounded window is supported by this code

d. Both 1,2

e. All of these

Skills Covered

  • IT-Programming Languages/Frameworks


  • Fundamentals

Question Type

  • MCQ

Related Posts