메모

strsplit 분리자 정규식에서 escape backslash가 두개인 이유...펌

맘편한넘 2013. 8. 12. 13:38
>str= "AAC|Australia Acquisition Corp. - Ordinary Shares|S|N|D|100"
> strsplit(str,"\\|")
[[1]]
[1] "AAC"                                          
[2] "Australia Acquisition Corp. - Ordinary Shares"
[3] "S"                                            
[4] "N"                                            
[5] "D"                                            
[6] "100"   

I wonder \\| is equal to | ?
maybe \\|is equal to \| ,
why can strsplit(str,"\\|") work?

share|improve this question

===

Since

|

has a special meaning in reg-exps it needs to be escaped, so to match | the actual regular expression is

\|

Since \ in turn is a special character when declaring string literals (you probably recognize it from \n etc.), the \ needs to be escaped itself. I.e., in order to create a string literal containing \| you need

\\|
===
 
\\|

Because it's a quoted string. In a quoted string, you can include a " character by escaping it with a \. A \ itself then also needs to be escaped to be a single literal backslash. So your quoted string means: \|.

Now in a regular expression a | is a special character that is not matched literally unless it is escaped. Regular expressions in R also escape with a backslash, so the string literal "\\|" means the string \| which is an expression matching exactly |. Why "\\|" works is because that means matching exactly | which appears as the separator in the string you're splitting.