I'm for adding regular expressions and other string-handling functions. String manipulation is one of the most common tasks in my programs. Are mzscheme's regular expressions good enough?
Can this tiny Ruby program that I recently wrote for my own use be easily converted to Arc? If not, what additions to the language would make it easy to convert this program? (Please keep the string containing the dates unchanged; don't pamper the language.)
This is largely a library issue - Arc could certainly do with better support for string ops, regexen, and especially datetime manipulation. But here's what I came up with (note that date-days is pretty inaccurate most of the time):
I think you've highlighted at least one gap in Arc's arsenal. Using Arc2:
arc> (ssplit " foo bar ")
Error: "reference to undefined identifier: _ssplit"
You needed to use ssplit, but Arc doesn't have it.
I don't think the importance of string ops should be underestimated. Strings ops are just as essential as numerical ops. A language that cannot effortlessly manipulate strings is a low-level language in my book. If people are supposed to be testing Arc by using it instead of the languages they were using, Arc needs string ops. Can't they be easily lifted from mzscheme?
Remember the thread on implementing Eliza in Lisp and Ruby? No one posted an Arc version.
Oh, sorry, I should have specified: I used Anarki-specific stuff in several places. Mostly the date-manipulation, but also ssplit (I actually hadn't realized that wasn't in arc2. Yikes). Using Anarki, it should work, though.
I totally agree that strings ops are important. If I recall, PG has also said something to this effect, so I wouldn't be surprised if more of them crop up in the next few releases.
A distinction between regexen and strings is actually very handy. I've done a fair bit of coding in Ruby, where this distinction is present, and a fair bit in Emacs Lisp, where it's not.
There are really two places where it's really important. First, if regexen are strings, then you have to double-escape everything. /\.foo/ becomes "\\.foo". /"([^"]|\\"|\\\\)+"/ becomes "\"([^\"]|\\\\"|\\\\\\\\)+\"". Which is preferable?
Second, it's very often useful to treat strings as auto-escaped regexps. For instance,
a_string.split("\D+")
is actually valid Ruby. It's equivalent to
a_string.split("D+")
because D isn't an escape char, which will split the string on the literal string "D+". For example
"BAD++".split("D+") #=> ["BA", "+"]
Now, I'm not convinced that regexen are necessary for nearly as many string operations as they're typically used for. But I think no matter how powerful a standard string library a language has, they'll still be useful sometimes, and then it's a great boon to have literal syntax for them.
Ok, so what it comes down to, is that you don't want escapes to be processed. Wouldn't providing a non-escapable string be far more general, then?
Since '\D+' clashes with quote, maybe /\D+/ is a good choice for the non-escapable string syntax. Only problem is that using it in other places might trigger some reactions as the slashes make everybody think of it as "regex syntax".
Escaping isn't the only thing. Duck typing is also a good reason to differentiate regular expressions and strings. foo.gsub("()", "nil") is distinct from foo.gsub(/()/, "nil"), and both are useful enough to make both usable. There are lots of similar issues - for instance, it would be very useful to make (/foo/ str) return some sort of match data, but that wouldn't be possible if regexps and strings were the same type.
Now we're getting somewhere :) For this argument to really convince me, though, Arc needs better support for user defined types. It should be possible to write special cases of existing functions without touching the core definition. Some core functions use case forms or similar to treat data types differently. Extending those is not really supported. PG has said a couple of times;
"We believe Lisp should let you define new types that are treated just like the built-in types-- just as it lets you define new functions that are treated just like the built-in functions."
Using annotate and rep doesn't feel "just like built-in types" quite yet.
If the "x" modifier is used, whitespace and comments in the regex are ignored.
re =
%r{
# year
(\d {4})
# separator is one or more non-digits
\D+
# month
(\d\d)
# separator is one or more non-digits
\D+
# day
(\d\d)
}x
p "the 1st date, 1984-08-08, was ignored".match(re).captures
--->["1984", "08", "08"]