Swift 切割文章的字串資料

qedqed6
5 min readJan 29, 2023

--

在兩種常見的作業系統下,常用的換行字元是不一樣的

  • Windows : \r\n
  • Unix and all Unix-like : \n

處理網路上的小說或是文章時,可能遇到的是

  • \r\n, \n 的混合狀況。

stackoverflow : Difference between \n and \r?

若是想要進行字串的切割,並以換行字元為分界。

下面展示混合版本的字串作為案例,並使用不同的分割字串方法。

import Foundation

let text = "Line 1\nLine 2\r\nLine 3\n\nLine 4\r\n\r\nLine 5\r\n\nLine 6\n\r\nLine 7"
print(text)
// Line 1
// Line 2
// Line 3
//
// Line 4
//
// Line 5
//
// Line 6
//
// Line 7

// Case 1
print(text.split { $0 == "\n" || $0 == "\r\n" }.map { String($0) })
// ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]

// Case 2
print(text.replacingOccurrences(of: "\r", with: "").split(separator: "\n").map { String($0) })
// ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]

// Case 3
print(text.components(separatedBy: .newlines).filter { !($0.isEmpty) })
// ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]

// Case 4
print({
var array = [String]()
text.enumerateLines { line, stop in
array.append(line)
}
return array
}())
// ["Line 1", "Line 2", "Line 3", "", "Line 4", "", "Line 5", "", "Line 6", "", "Line 7"]

// Case 5
print({
var array = [String]()
text.enumerateLines { line, stop in
array.append(line)
}
return array.filter { !($0.isEmpty) }
}())
// ["Line 1", "Line 2", "Line 3", "", "Line 4", "", "Line 5", "", "Line 6", "", "Line 7"]

稍微整理一下

原始的字串輸出到螢幕上會是:

Line 1
Line 2
Line 3

Line 4

Line 5

Line 6

Line 7

用不同的方法切割後:

Case 1 : ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]
Case 2 : ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]
Case 3 : ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]
Case 4 : ["Line 1", "Line 2", "Line 3", "", "Line 4", "", "Line 5", "", "Line 6", "", "Line 7"]
Case 5 : ["Line 1", "Line 2", "Line 3", "Line 4", "Line 5", "Line 6", "Line 7"]

使用 iPhone 14 Pro 的模擬器

測試中文小說的分割字串的時間,案例字元數是六百萬左右( 6192595 )。

Case 1 Time : 1.738974928855896     second
Case 2 Time : 1.6501209735870361 second
Case 3 Time : 0.16579699516296387 second
Case 4 Time : 0.07893598079681396 second
Case 5 Time : 0.10182702541351318 second

測試英文小說的分割字串的時間,案例是字元數是 六十萬左右(659380)。

Case 1 Time : 0.14956402778625488   second
Case 2 Time : 0.13497304916381836 second
Case 3 Time : 0.014245986938476562 second
Case 4 Time : 0.0065430402755737305 second
Case 5 Time : 0.008479952812194824 second

--

--

qedqed6

吾乃阿克西斯教教義信奉者!汝,勿要忍耐。想喝的時候就喝,想吃的時候就吃便好。因為明天並不見得還能吃得到。