為何選擇 Streams

Created: November-22, 2018

讓我們檢查以下兩個用於讀取檔案內容的示例：

第一個，它使用非同步方法讀取檔案，並提供一個回撥函式，一旦檔案完全讀入記憶體就呼叫該函式：

fs.readFile(`${__dirname}/utils.js`, (err, data) => {
  if (err) {
    handleError(err);
  } else {
    console.log(data.toString());
  }
})

第二個，它使用 streams 來逐個讀取檔案的內容：

var fileStream = fs.createReadStream(`${__dirname}/file`);
var fileContent = '';
fileStream.on('data', data => {
  fileContent += data.toString();
})

fileStream.on('end', () => {
  console.log(fileContent);
})

fileStream.on('error', err => {
  handleError(err)
})

值得一提的是，這兩個例子完全相同。那有什麼區別呢？

第一個更短，看起來更優雅
第二個讓你做一些處理上的檔案而被閱讀（！）

當你處理的檔案很小時，使用 streams 時沒有實際效果，但是當檔案很大時會發生什麼？（這麼大，需要 10 秒才能將其讀入記憶體）

如果沒有 streams，你將等待，絕對不做任何事情（除非你的程序做其他事情），直到 10 秒鐘通過並且檔案被完全讀取，然後才能開始處理檔案。

使用 streams，你可以一目瞭然地獲取檔案的內容，只要它們可用 - 並且可以讓你在讀取檔案時處理該檔案。

上面的例子沒有說明如何將 streams 用於進行回撥方式時無法完成的工作，所以讓我們看另一個例子：

我想下載一個 gzip 檔案，解壓縮並將其內容儲存到磁碟。鑑於檔案的 url，這是需要做的：

下載檔案
解壓縮檔案
將其儲存到磁碟

這是一個[小檔案] [1]，儲存在我的 S3 儲存中。以下程式碼以回撥方式執行上述操作。

var startTime = Date.now()
s3.getObject({Bucket: 'some-bucket', Key: 'tweets.gz'}, (err, data) => {
  // here, the whole file was downloaded

  zlib.gunzip(data.Body, (err, data) => {
    // here, the whole file was unzipped

    fs.writeFile(`${__dirname}/tweets.json`, data, err => {
      if (err) console.error(err)

      // here, the whole file was written to disk
      var endTime = Date.now()
      console.log(`${endTime - startTime} milliseconds`) // 1339 milliseconds
    })
  })
})

// 1339 milliseconds

這是使用 streams 看起來的樣子：

s3.getObject({Bucket: 'some-bucket', Key: 'tweets.gz'}).createReadStream()
  .pipe(zlib.createGunzip())
  .pipe(fs.createWriteStream(`${__dirname}/tweets.json`));

// 1204 milliseconds

是的，處理小檔案時速度並不快 - 測試檔案權重 80KB。在一個更大的檔案上測試這個，71MB gzipped（382MB 解壓縮），表明 streams 版本更快

下載 71MB，解壓縮然後將 382MB 寫入磁碟花了 20925 毫秒 - 使用回撥方式。
相比之下，使用 streams 版本時需要 13434 毫秒才能完成相同的操作（對於不那麼大的檔案，速度提高 35％）