Pyspark 中的示例字數
基礎示例只是官方 pyspark 文件中給出的示例。請點選此處訪問此示例。
# the first step involves reading the source text file from HDFS
text_file = sc.textFile("hdfs://...")
# this step involves the actual computation for reading the number of words in the file
# flatmap, map and reduceByKey are all spark RDD functions
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
# the final step is just saving the result.
counts.saveAsTextFile("hdfs://...")