来自 CSV 的简单聚合
鉴于 CSV 文件 peoples.csv
:
1,Reed,United States,Female
2,Bradley,United States,Female
3,Adams,United States,Male
4,Lane,United States,Male
5,Marshall,United States,Female
6,Garza,United States,Male
7,Gutierrez,United States,Male
8,Fox,Germany,Female
9,Medina,United States,Male
10,Nichols,United States,Male
11,Woods,United States,Male
12,Welch,United States,Female
13,Burke,United States,Female
14,Russell,United States,Female
15,Burton,United States,Male
16,Johnson,United States,Female
17,Flores,United States,Male
18,Boyd,United States,Male
19,Evans,Germany,Male
20,Stephens,United States,Male
我们想按国家和国家+性别统计人数:
public class TableExample{
public static void main( String[] args ) throws Exception{
// create the environments
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
final BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment( env );
// get the path to the file in resources folder
String peoplesPath = TableExample.class.getClassLoader().getResource( "peoples.csv" ).getPath();
// load the csv into a table
CsvTableSource tableSource = new CsvTableSource(
peoplesPath,
"id,last_name,country,gender".split( "," ),
new TypeInformation[]{ Types.INT(), Types.STRING(), Types.STRING(), Types.STRING() } );
// register the table and scan it
tableEnv.registerTableSource( "peoples", tableSource );
Table peoples = tableEnv.scan( "peoples" );
// aggregation using chain of methods
Table countriesCount = peoples.groupBy( "country" ).select( "country, id.count" );
DataSet<Row> result1 = tableEnv.toDataSet( countriesCount, Row.class );
result1.print();
// aggregation using SQL syntax
Table countriesAndGenderCount = tableEnv.sql(
"select country, gender, count(id) from peoples group by country, gender" );
DataSet<Row> result2 = tableEnv.toDataSet( countriesAndGenderCount, Row.class );
result2.print();
}
}
结果是:
Germany,2
United States,18
Germany,Male,1
United States,Male,11
Germany,Female,1
United States,Female,7