{"id":1242,"date":"2010-12-19T21:51:06","date_gmt":"2010-12-20T02:51:06","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/?p=1242"},"modified":"2010-12-19T21:51:06","modified_gmt":"2010-12-20T02:51:06","slug":"apex-my-editor-project","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2010\/12\/19\/apex-my-editor-project\/","title":{"rendered":"Apex: My Editor Project"},"content":{"rendered":"<p> Lots of people were intrigued by my reference to my editor project. So I&#8217;m sharing the current language design with you folks. I&#8217;m calling it <em>Apex<\/em>, as a homage to the brilliant Acme, which is the editor that comes closest to what I&#8217;d like to be able to use.<\/p>\n<p> So. The language for Apex is sort of a combination of <a href=\"http:\/\/scientopia.org\/blogs\/goodmath\/2010\/11\/30\/the-glorious-horror-of-teco\/\">TECO<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Sam_(text_editor)\">Sam<\/a>. Once the basic system is working, I&#8217;m planning on a UI that&#8217;s modeled on <a href=\"http:\/\/en.wikipedia.org\/wiki\/Acme_(text_editor)\">Acme<\/a>. For now, I&#8217;m going to focus on the language.<\/p>\n<p><!--more--><\/p>\n<h3>Goals<\/h3>\n<p> It&#8217;s always good to be clear about what you&#8217;re trying to do. So for the Apex command language, my goals are:<\/p>\n<ol>\n<li> <b>Conciseness<\/b>: since I&#8217;m planning on using this for all of my everyday programming, it&#8217;s really important for it to be concise. It doesn&#8217;t matter if it&#8217;s easy to read if I need to type something like <code>forall match in regexp.match(\"foo\") do match.replace(\"bar\") end<\/code>. It&#8217;s just too damned much typing for an everyday task. In what I describe below, a global search and replace is <code>g\/foo\/,{r'bar'}<\/code>.<\/li>\n<li> <b>Consistency<\/b>: everything works in roughly the same way.  Everything can succeed or fail, and all of the semantics are based around that idea. Everything that takes parameters takes parameters in the same way. If something works in one context, it&#8217;ll work in another. <\/li>\n<li> <b>Clarity<\/b>: if you look at the code fragments below, this one takes a bit of explanation. The conciseness of the syntax means that to someone who isn&#8217;t familiar with the language, it&#8217;s going to be absolutely impossible to read. But the way that things work is straightforward, and so once you understand the basic \tideas of the syntax, you can easily read a program. It&#8217;s not like TECO where you need to know the specific command in order to have a clue of what it does. And the parser can look at an entire program, and tell you before executing any of it whether it&#8217;s got a syntax error.<\/li>\n<\/ol>\n<h3>Syntax<\/h3>\n<p> The syntax for commands is: <\/p>\n<pre>\nstmt: sub | command\n\nsub: 'sub'  sub_params FUN_IDENT sub_params '{' command '}'\n\nsub_params : '(' ( VAR_IDENT ( ',' VAR_IDENT )*  )?  ')'\n\ncommand : choice_command ( '?' simple_command ':' simple_command )\n\nchoice_command : seq_command ('^' seq_command)*\n\nseq_command : simple_command ( '&' simple_command )*\n\nsimple_command : atomic_command\n               | '[' command ']'\n\n\natomic_command: ( params )? command_name ( post_params )?\n\t      |  params '!' VAR_IDENT\n\npost_params: post_param (',' post_param)*\n\npost_param: QUOTED_STRING\n          | PATTERN\n          | block\n          | '$(' expr ')'\n\nparams\t: NUMERIC_LITERAL\n        | '(' ( expr ( ',' expr )* )? ')'\n\nquoted_param: QUOTE_CHAR  ( NON_QUOTE_CHAR )* QUOTE_CHAR\n            | '(' expr ')'\n\n\nexpr : NUMERIC_LITERAL\n     | QUOTED_STRING\n     | funcall\n     | block\n     | command\n     | VAR_IDENT\n\n\nfuncall: params FUN_IDENT ( quoted_param )?\n\nblock : '{' ( '|'  VAR_IDENT (',' VAR_IDENT)*  '|' )?\n            command '}'\n\nFUN_IDENT = '@' [A-Za-z_+-*\/=!%^&amp;&gt;&lt;]+\n\nVAR_IDENT = '$' [A-Za-z_]+\n<\/pre>\n<h3>Commands<\/h3>\n<p> This is a language focused on text editing, so the core of it is built around buffers. All of the language constructs implicitly work on a buffer. Within the buffer, you have a <em>focus<\/em>. The focus is the current location of the cursor. The interesting bit, though, is that the cursor isn&#8217;t necessarily <em>between<\/em> two characters. It can span over a range of text, all of which is under the cursor. In other words, the the currently selected range of text and the cursor are the same thing.<\/p>\n<p> Commands all work in terms of either moving the cursor, or modifying the contents of the cursor. Most commands have a long name, and a short abbreviated name.<\/p>\n<dl>\n<dt><b>Cursor Motion<\/b><\/dt>\n<dd>\n<dl>\n<dt><b>Pattern Search<\/b>: <code>s+\/pattern\/<\/code><\/dt>\n<dd> Moves the cursor so that it covers the next instance of the pattern in the current buffer. Returns the start position of the match. There&#8217;s also a &#8220;s-&#8221; version, which looks for the previous instance of the match.<\/dd>\n<dt> <b>move<\/b>: <code><em>number<\/em> m <em>unit<\/em><\/code><\/dt>\n<dd> Moves the cursor by a specified distance. The units are <code>c<\/code> (for characters), <code>l<\/code> (for lines), or <code>p<\/code> (for pages). So <code>3ml<\/code> means &#8220;move the cursor&#8221; forward three lines. Returns the start position of the cursor after the move.<\/dd>\n<dt> <b>jump<\/b>: <code><em>number<\/em> j <em>unit<\/em><\/code><\/dt>\n<dd> Jumps the cursor to a specific position. The units are the same as for the <code>m<\/code> command, where &#8220;character&#8221; units specify column numbers. Returns the <\/dd>\n<dt><b>extend<\/b>: <code>e <em>motion-command<\/em><\/code> <\/dt>\n<dd> Extend cursor. The cursor is extended by the effect of the following command. So, for example, since <code>3mc<\/code> is a command that means &#8220;move the cursor forward three characters, <code>3emc<\/code> is a command that means &#8220;extend the cursor forward by three characters &#8211; it moves the end-point of the cursor forward by three, without changing thestart. <code>-3eml<\/code> adds the previous three lines to the cursor. <code>es+\/foo\/<\/code> extends the cursor to include the next match for &#8220;foo&#8221;.<\/dd>\n<dt><b>pick<\/b>: <code>(expr, expr)p<\/code><\/dt>\n<dd> Selects a range of text as the current cursor. Each \texpression is interpreted as a location.  <code>(3lj,4pj)p<\/code> covers the range from the \tbeginning of the third line, to the end of the fourth page. \t<code>(s+\/foo\/, s+\/bar\/)p<\/code> covers the range from the beginning of the first match of &#8220;foo&#8221; to the end of the first match of &#8220;bar&#8221;.<\/dd>\n<dt><b>selectall<\/b>: <code>*<\/code><\/dt>\n<dd> Makes the current cursor cover the entire buffer.<\/dd>\n<\/dl>\n<\/dd>\n<dt><b>Edits<\/b><\/dt>\n<dd>\n<dl>\n<dt><b>delete<\/b>: <code>d<\/code><\/dt>\n<dd> Delete the contents of the cursor. If it&#8217;s followed by a variable name, then the deleted text is inserted into that variable.<\/dd>\n<dt><b>copy<\/b>: <code>c$var<\/code><\/dt>\n<dd> Copy the contents of the selection into a variable.<\/dd>\n<dt><b>insert<\/b>: <code>i'text'<\/code> <\/dt>\n<dd> Inserts text <em>before<\/em> the cursor. The quote character can actually be any character: the first character after an <code>i<\/code> is the delimiter, and the insert string runs to the next instance of that delimiter.<\/dd>\n<dt><b>append<\/b>: <code>a'text'<\/code><\/dt>\n<dd> Appends text <em>after<\/em> the cursor. Quotes work just like\t<code>i<\/code>.<\/dd>\n<dt><b>replace<\/b>: <code>r'text'<\/code><\/dt>\n<dd> Replaces the current contents of the cursor with the new<br \/>\n\ttext.<\/dd>\n<\/dl>\n<\/dd>\n<dt><b>Control Flow<\/b><\/dt>\n<dd>\n<dl>\n<dt><b>global<\/b>: <code>g\/pattern\/,block<\/code><\/dt>\n<dd> A simple loop construct. For each match of the pattern within the current cursor, execute the block. So, for example, to do a global search and replace of foo with bar, <code>* g\/foo\/,{r'bar'}<\/code>.<\/dd>\n<dt><code>stmt ^ stmt<\/code><\/dt>\n<dd> Choice\/logical or statement: any statement can either succeed or fail. <code>^<\/code> allows you to combine statements so that the second one only executes if the first one fails. The statement\tas a whole succeeds if either the first or second statement succeeds. Ret turns the value of the statement that succeeds.<\/dd>\n<dt><code>stmt &amp; stmt<\/code><\/dt>\n<dd> Sequencing\/logical and. The second statement will only be executed if the first one succeeds, and the entire statement succeeds only if both succeed. Returns the value of the second statement.<\/dd>\n<dt><code>( stmt )<\/code><\/dt>\n<dd> Should be obvious, eh?<\/dd>\n<dt><code> stmt1 ? stmt2 : stmt3<\/code><\/dt>\n<dd>If-then-else. A simple if-then without an else is just a <code>,<\/code> sequence. You can get an if-then-else effect\twithout this, but it&#8217;s tricky enough to justify adding this.<\/dd>\n<dt><b>loop<\/b>: <code>l{block}<\/code><\/dt>\n<dd> A general loop. Executes the block over and over as long as it succeeds.<\/dd>\n<dt><b>execute<\/b>: <code>x <em>block<\/em><\/code><\/dt>\n<dd> Executes the block on the current cursor. The contents of the current cursor becomes the target buffer of the body of the block, and the cursor is set to position 0 of that target buffer.  <\/dd>\n<\/dl>\n<\/dd>\n<dt><b>Variables<\/b><\/dt>\n<dd>\n<dl>\n<dt><code>$ident<\/code><\/dt>\n<dd> Any symbol starting with a <code>$<\/code> is a variable. In an expression, a variable name evaluates to its value.<\/dd>\n<dt><b>set!<\/b>: <code>expr!$ident<\/code><\/dt>\n<dd> Assign the result of executing the preceeding expression to a variable. If the variable is already defined in this scope, or in any enclosing scope, update it; otherwise, create a new local variable.<\/dd>\n<\/dl>\n<\/dd>\n<dt><b>External Interaction<\/b><\/dt>\n<dd>\n<dl>\n<dt><code>&lt;'shellcommand'<\/code><\/dt>\n<dd>Execute <code>shellcommand<\/code> in an external shell, and\tinsert the standard out from the command into the position at the start of the current cursor; then set the cursor to cover the inserted text.<\/dd>\n<dt><code>&lt;&lt;'shellcommand'<\/code><\/dt>\n<dd>Some as the &lt; command, except that it also inserts the contents of stderr from the shell command.<\/dd>\n<dt><code>|'shellcommand'<\/code><\/dt>\n<dd>Execute <code>shellcommand<\/code>, with the current cursor as its standard input, and replace the contents of the cursor with the standard output.<\/dd>\n<dt><code>||'shellcommand'<\/code><\/dt>\n<dd>Same as <code>|<\/code>, except that it also inserts the contents of stderr.<\/dd>\n<\/dl>\n<\/dd>\n<dt><b>I\/O<\/b><\/dt>\n<dd>\n<dl>\n<dt><b>write<\/b>: <code>w<\/code><\/dt>\n<dd> Write the current buffer out to a file. If no filename is specified, then use the buffer&#8217;s associated filename. If a filename is specified, then write it to that file, and update the buffer&#8217;s filename to match the written name.<\/dd>\n<dt><b>open<\/b>: <code>o'filename'<\/code><\/dt>\n<dd> Open a file in a new buffer.<\/dd>\n<dt><b>revert<\/b>: <code>v<\/code><\/dt>\n<dd> Discard all changes to this buffer.<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<h3>Expressions<\/h3>\n<p> In general, any command is also usable an expression. Every command returns a value: motion commands return the new cursor position; edit commands return any deleted text, or the size of the change.<\/p>\n<p> Control statements don&#8217;t depend on true and false values; instead,  they&#8217;re defined in terms of success and failure. Any statement can succeed or fail.<\/p>\n<p> Arithmetic is done using built-in functions.<\/p>\n<h3>Blocks<\/h3>\n<p> A lot of statements take <em>block<\/em> parameters. A block is an executable code fragment. Blocks are enclosed in braces. They always implicitly take the current cursor as a parameter. In the case of the &#8220;x&#8221; and &#8220;g&#8221; commands, the block is executed using the current selection as if it were the entire buffer. In addition to the selection, a block can take additional parameters. They&#8217;re written   by enclosing them in &#8220;|&#8221;s at the beginning of the block. For example,  you could define a block that returned the sum of its parameters by writing:<\/p>\n<pre>\n  {|$x, $y| ($x,$y)@+ }\n<\/pre>\n<p> Parameters for a block <em>preceed<\/em> its call. So to invoke the block above, you could use: <block>(3,2)x{|$x,$y| ($x,$y)@+)}<\/block>, which would then return 5.<\/p>\n<p> Blocks are lexically scoped; a block declared inside of another block can access variables from that enclosing block.  <\/p>\n<p> You can declare named subroutine. A named subroutine is mostly syntactic sugar  for a block. The main difference is that if you go to the trouble of creating a named subroutine, then you can declare both prefix and postfix parameters.  The names of named subroutines always start with an &#8220;@&#8221; symbols. A named subroutine just associates a  global name with a block. <\/p>\n<pre>\n  fun ($x) @fact {($x,0)@= ? {1} : { ($x, ($x,1)@-@fact)@* }\n<\/pre>\n<p> When calling a block, the parameters preceed it. So to get the  factorial of 10, you&#8217;d write <code>10@fact<\/code>.<\/p>\n<p> For numeric arguments to commands, you can just put the expression before the command instead of a number. For example, to move to line fact(4), you&#8217;d write: <code>4@fact jl<\/code>. For string parameters that appear in quoted positions, if you use an &#8220;$()&#8221; instead of a quote character, then the contents are evaluated as an expression, and the result is used as the string parameter value. So to insert the <em>string<\/em> &#8220;5@fact&#8221;, you could write <code>i'5@fact'<\/code>. To insert the result of evaluating it, you&#8217;d write &#8220;<code>i$(5@fact)<\/code>&#8220;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lots of people were intrigued by my reference to my editor project. So I&#8217;m sharing the current language design with you folks. I&#8217;m calling it Apex, as a homage to the brilliant Acme, which is the editor that comes closest to what I&#8217;d like to be able to use. So. The language for Apex is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[54],"tags":[103,146,190,226,242],"class_list":["post-1242","post","type-post","status-publish","format-standard","hentry","category-programming","tag-apex","tag-editor","tag-language","tag-sam","tag-teco"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-k2","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1242","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=1242"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1242\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=1242"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=1242"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=1242"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}