Arc Forumnew | comments | leaders | submitlogin
How would you code this in Arc?
4 points by lojic 6086 days ago | 8 comments
The following Ruby program parses the leaders html page and produces a simple report of the first 10 entries. I wasn't sure if Arc had the ability to scrape a web page yet, so I assumed input from a file. Just curious how Arc fares with simple parsing and formatting.

  PATTERN = /(\d\d?)\..*?.+?<u>(.+?)<\/u>.+?(\d+)</m
  sum = count = 0
  File.new("arcleaders.html").read.scan(PATTERN) do |index, user, score|
    puts "%2.2s. %-15s = %4d" % [index, user, score.to_i]
    sum += score.to_i
    count += 1
    break if count >= 10
  end
  puts '='*26
  puts 'Total   = %4d' % sum
  puts 'Average = %4d' % (sum / count)
Output is as follows (note alignment):

  brian@airstream:~/sync/code/ruby$ ruby arcleaders.rb
   1. almkglor        =  926
   2. sacado          =  600
   3. nex3            =  594
   4. kennytilton     =  454
   5. kens            =  411
   6. lojic           =  369
   7. eds             =  300
   8. cchooper        =  275
   9. absz            =  248
  10. drcode          =  209
  ==========================
  Total   = 4386
  Average =  438


5 points by sacado 6086 days ago | link

I think pattern matching is the thing that shouldn't be missing in Arc. I can hardly do what you describe without them and I don't want to implement a regex engine this evening, so let's assume it exists :)

  (= sum 0 count 0)

  (def strn (length str (o before t))
    (let result str
      (while (< (len result) length)
        (= result (if before (+ " " result) (+ result " "))))
      result))
'strn is just leaving whitespaces where needed : if you want str to be displayed on 4 characters, it will display it with whitespaces on the left( or on the right if you give one more argument :

  (strn 4 "10") ==> "  10"
  (strn 4 "10" t) ==> "10  "
Now, we redefine * so as to be able to do

  (* "=" 5) ==> "====="

  (redef * args
    (if (and (is (len args) 2) (isa args.0 'string) (isa args.1 'int))
      (let result ""
        (for i 1 args.1
          (= result (+ result args.0)))
        result)
      (apply old args)))
Finally, the action callback (to be called on each user) :

  (def action (index user score)
    (prn (strn 2 index) ". " (strn 15 user nil) " = " (strn 4 score))
    (++ sum (coerce score 'int))
    (++ count))
And the "main" code :

  (w/infile f "arcleaders.html"
    (pmatch "(\\d\\d?)\\..*?.+?<u>(.+?)<\\/u>.+?(\\d+)" action))
  (prn (* "=" 26))
  (prn "Total   = " (strn 4 (coerce sum 'string)))
  (prn "Average = " (strn 4 (coerce (/ sum count) 'string)))
The implementation of pmatch is left as an exercise to the reader :)

-----

3 points by lojic 6086 days ago | link

Thanks, but you forgot the important part about only retrieving the first 10 entries. I purposely included that to see how Arc would handle prematurely exiting an iteration.

Also, I think you may have forgotten to read the file. w/infile only opens the file, right?

I realize Arc doesn't have regex's yet, but some folks have been asserting that you can do just fine w/o them, so I was curious about an Arcy way to solve the parsing w/o regex's.

-----

1 point by sacado 6086 days ago | link

You're right. For the file's content, I just forgot to give it to the 'pmatch function (that's her file). For the test, well, that's quite trivial, provided you know there will be at least 10 values in your file (but, being given your code, that's the case anyway). Now let's consider pmatch only returns one value, then returns it (instead of a list of all values). As the file is a parameter, it is easy to know what's left to read.

  (w/infile f "arcleaders.html"
      (while (< count 10) (pmatch f "(\\d\\d?)\\..*?.+?<u>(.+?)<\\/u>.+?(\\d+)" action)))

-----

1 point by lojic 6086 days ago | link

I realize pmatch hasn't been written yet, but this seems odd. You're calling (pmatch f pat action) repeatedly, so where is the state info kept regarding the index of the last match, etc.? Your example reminds me of strtok in this regard, but I doubt that's what you had in mind.

With the Ruby example, the scan function invokes the supplied block repeatedly, but the break statement within the block will exit the scan function which is the desired behavior.

It's somewhat moot until someone writes pmatch though.

-----

2 points by almkglor 6086 days ago | link

> so where is the state info kept regarding the index of the last match, etc.?

In the stream state?

-----

1 point by sacado 6086 days ago | link

Well the f parameter has an index anyway, so the function knows what is left to be parsed anytime you call it, I guess ? Hmmm, I'll try to make a dumb version of pmatch and see what happens...

-----

2 points by kens 6086 days ago | link

As far as I can tell, there's no way to scrape a web page from Arc, as it doesn't let you open outgoing sockets. This is "almost perversely inconvenient" (http://www.paulgraham.com/arcll1.html).

Although I guess there's always (system "wget"). http://arclanguage.org/item?id=3522

-----

2 points by drcode 6084 days ago | link

Unfortunately, this is a good arc anti-challenge.

-----