The line deriving(Eq,Show) means that you can now do this:
Prelude BST> Tree 'a' Empty Empty
Tree 'a' Empty Empty
Prelude BST> let a = Tree 'a' Empty Empty
Prelude BST> a
Tree 'a' Empty Empty
Prelude BST> let b = Tree 'b' Empty Empty
Prelude BST> b
Tree 'b' Empty Empty
Prelude BST> a == b
False
Prelude BST> let c = Tree 'c' a b
Prelude BST> let d = Tree 'c' a b
Prelude BST> c == d
True
All this is guaranteed at compile time. Moreover, the haskell version is much more readable. The use of modules removes unnecessary function prefixing, and there is no doubt about what the arguments to functions are when they are pattern matched.
Curiously, though, the arc version actually has less tokens by my measurement
but here is the interesting thing- now lets count just the unique tokens
bst.arc 41
bst.hs 47
This is in part due to haskell's pattern matching semantics where the same function definition is repeated multiple times, and the '|' character is repeatedly used. Also, there are more functions in the arc version, and in both pieces of code, variable names are usually one character and function names have multiple characters.
A more balanced tree will speed up 'find' and 'insert'. If those actions are performed far more often than 'remove', it would indeed be a good idea. Without knowing the use case, I cannot say which I would prefer.