import java.util.regex.*;
publicclassTEST {
publicstaticvoidmain(String[] args) {
StringsubjectString="This is a sentence. " +
"So is \"this\"! And is \"this?\" " +
"This is 'stackoverflow.com!' " +
"Hello World";
String[] sentences = null;
Patternre= Pattern.compile(
"# Match a sentence ending in punctuation or EOS.\n" +
"[^.!?\\s] # First char is non-punct, non-ws\n" +
"[^.!?]* # Greedily consume up to punctuation.\n" +
"(?: # Group for unrolling the loop.\n" +
" [.!?] # (special) inner punctuation ok if\n" +
" (?!['\"]?\\s|$) # not followed by ws or EOS.\n" +
" [^.!?]* # Greedily consume up to punctuation.\n" +
")* # Zero or more (special normal*)\n" +
"[.!?]? # Optional ending punctuation.\n" +
"['\"]? # Optional closing quote.\n" +
"(?=\\s|$)",
Pattern.MULTILINE | Pattern.COMMENTS);
MatcherreMatcher= re.matcher(subjectString);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}
}
}
这是输出:
This is a sentence. So is "this"! And is "this?" This is 'stackoverflow.com!' Hello World
这个会做得很好。我对句子的定义:句子以非空格开头,以句号、感叹号或问号(或字符串结尾)结尾。在结束标点之后可能有一个结束语。
[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)
import java.util.regex.*; public class TEST { public static void main(String[] args) { String subjectString = "This is a sentence. " + "So is \"this\"! And is \"this?\" " + "This is 'stackoverflow.com!' " + "Hello World"; String[] sentences = null; Pattern re = Pattern.compile( "# Match a sentence ending in punctuation or EOS.\n" + "[^.!?\\s] # First char is non-punct, non-ws\n" + "[^.!?]* # Greedily consume up to punctuation.\n" + "(?: # Group for unrolling the loop.\n" + " [.!?] # (special) inner punctuation ok if\n" + " (?!['\"]?\\s|$) # not followed by ws or EOS.\n" + " [^.!?]* # Greedily consume up to punctuation.\n" + ")* # Zero or more (special normal*)\n" + "[.!?]? # Optional ending punctuation.\n" + "['\"]? # Optional closing quote.\n" + "(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS); Matcher reMatcher = re.matcher(subjectString); while (reMatcher.find()) { System.out.println(reMatcher.group()); } } }
这是输出:
This is a sentence.
So is "this"!
And is "this?"
This is 'stackoverflow.com!'
Hello World
正确匹配所有这些(最后一句没有结尾标点符号),结果并不像看起来那么容易!