-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathchapter3.tex
More file actions
1203 lines (985 loc) · 50.3 KB
/
chapter3.tex
File metadata and controls
1203 lines (985 loc) · 50.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% Thinking Forth
%% Copyright (C) 2004 Leo Brodie
%% Initial transcription by Joseph Knapka, based on OCR output
%% provided by John Hogerhuis
%%
%\chapternum{THREE}
\chapter{Preliminary Design/\hy Decomposition}\Chapmark{3}%
\index{P!Preliminary Design|(}%
\initial Assuming you have some idea of what your program should
accomplish, it's time to begin the design. The first stage,
preliminary design, focuses on shrinking your mountainous problem into
manageable molehills.
In this chapter we'll discuss two ways to decompose your \Forth{}
application.
\section{Decomposition by Component}%
\index{C!Components:!decomposition by|(}%
\index{D!Decomposition!by component|(}%
\index{P!Preliminary Design!decomposition by component|(}%
Has this sort of thing ever happened to you? You've been planning for
three months to take a weekend vacation to the mountains. You've been
making lists of what to bring, and daydreaming about the slopes.
Meanwhile you're deciding what to wear to your cousin's wedding
next Saturday. They're informal types, and you don't want to overdress.
Still, a wedding's a wedding. Maybe you should rent a tuxedo anyway.
For all this planning, it's not until Thursday that you realize the
two events coincide. You have expletives for such moments.
How is such a mental lapse possible, in one so intelligent as
yourself? Apparently the human mind actually makes links between
memories. New ideas are somehow added onto existing paths of related
thoughts.
\wepsfiga{fig3-1}{Pools of thought not yet linked}
In the mishap just described, no connection was ever made between the
two separately-linked pools of thought until Thursday. The conflict
probably occurred when some new input (something as trivial as hearing
Saturday's weather report) got linked into both pools of thought. A
lightning flash of realization arced between the pools, followed
inexorably by thunderous panic.
A simple tool has been invented to avoid such disasters. It's called a
calendar. If you had recorded both plans in the same calendar, you would
have seen the other event scheduled, something your brain failed to do
for all its intricate magnificence.
\begin{tip}
To see the relationship between two things, put them close
together. To remind yourself of the relationship, \emph{keep} them
together.
\end{tip}
These truisms apply to software design, particularly to the preliminary
design phase. This phase is traditionally the one in which the designer
dissects a large application into smaller, programmer-sized modules.
In \Chap{1} we discovered that applications can be
conveniently decomposed into components.
\begin{tip}
The goal of preliminary design is to determine what components are
necessary to accomplish the requirements.
\end{tip}
For instance, you might have an application in which events must occur
according to some predetermined schedule. To manage the scheduling,
you might first design a few words to constitute a ``schedule-building
lexicon.'' With these words you'll be able to describe the order of events
that must occur within your application.
Thus within a single component, you'll not only share information,
but also work out potential conflicts. The wrong approach would be to let
each functional module ``know'' things about its schedule that could
potentially conflict with another module's schedule.
How can you know, in designing a component, what commands the
using components will need? Admittedly, this is something of a ``chicken
vs. egg'' problem. But \Forth{} programmers handle it the same way
chickens and eggs do: iteratively.
If the component is well-designed, completeness doesn't matter. In
fact, a component need only suffice for the current iteration's design. No
component should be considered a ``closed book'' until the application
has been completed---which, in the case of maintained applications, is
never.
As an example, imagine that your product needs to ``talk'' to other
machines in the outside world via a universal I/O chip that is part of your
system. This particular chip has a ``control register'' and a ``data
register.'' In a badly designed application, pieces of code throughout the
program would access the communication chip by simply invoking the
\code{OUT} instruction to put an appropriate command byte into the command
register. This makes the entire application needlessly dependent on that
particular chip---very risky.
Instead, \Forth{} programmers would write a component to control the I/O
chip. These commands would have logical names and a convenient
interface (usually \Forth{}'s stack) to allow usage by the rest of the
application.
For any iteration of your product's design, you would implement only
the commands needed so far---not all the valid codes that could be
sent to the ``control register.'' If later in the project cycle you
realize that you need an additional command, say one to change the
baud rate, the new command would be added to the I/O chip lexicon, not
to the code needed to set the baud rate. There's no penalty for
making this change except the few minutes (at most) it takes to edit
and recompile.
\begin{tip}
Within each component, implement only the commands needed for the
current iteration. (But don't preclude future additions.)
\end{tip}
What goes on inside a component is pretty much its own business. It's
not necessarily bad style for definitions within the component to share
redundant information.
For instance, a record in a certain data structure is fourteen bytes
long. One definition in the component advances a pointer 14 bytes to
point to the next record; another definition decrements the pointer 14
bytes.
As long as that number 14 remains a ``secret'' to the component and
won't be used elsewhere, you don't need to define it as constant. Just use
the number 14 in both definitions:
\begin{Code}
: +record 14 record# +! ;
: -record -14 record# +! ;
\end{Code}
On the other hand, if the value will be needed outside of the component,
or if it's used several times within the component and there's a good
chance that it will change, you're better off hiding it behind a name:
\begin{Code}
14 Constant /record
: +record /record record# +! ;
: -record /record negate record# +! ;
\end{Code}
(The name \forth{/RECORD}, by convention, means ``\/bytes per record.'')
\section{Example: A Tiny Editor}\program{editor1}
Let's apply decomposition by component to a real problem. It would be
nice to design a large application right here in \Chap{3}, but alas, we
don't have the room and besides, we'd get sidetracked in trying to
understand the application.
Instead, we'll take a component from a large application that has
already been decomposed. We'll design this component by decomposing it
further, into subcomponents.
Imagine that we must create a tiny editor that will allow users to
change the contents of input fields on their terminal screen. For instance,
the screen might look like this:
%!! no bf cmtt font (use Courier?)
\begin{quote}
\textsf{\textbf{Name of Member}\quad \fbox{Justine Time\underline{~}\quad}}
\end{quote}
The editor will provide three modes for users to change the contents of
the input field:
%!! original style: label in italics, no indentation
\begin{description}
\item[Overwrite.] Typing ordinary characters overwrites any characters
that were there before.
\item[Delete.] Pressing the combination of keys ``Ctrl D'' deletes the
character under the cursor and slides the remaining characters
leftwards.
\item[Insert.] Pressing the combination of keys ``Ctrl I'' switches the
editor into ``Insert Mode,'' where subsequently typing ordinary
characters inserts them at the cursor position, sliding the remaining
characters rightwards.
\end{description}
As part of the conceptual model we should also consider the error or
exception-handling; for instance, what is the limit of the field? what
happens in insert mode when characters spill off the right? etc.
That's all the specification we have right now. The rest is up to us.
Let's try to determine what components we'll need. First, the editor
will react to keys that are typed at the keyboard. Therefore we'll
need a keystroke interpreter---some kind of routine that awaits
keystrokes and matches them up with a list of possible operations. The
keystroke interpreter is one component, and its lexicon will consist
of a single word. Since that word will allow the editing of a field,
let's call the word \forth{EDIT}.
The operations invoked by the keystroke interpreter will comprise a
second lexicon. The definitions in this lexicon will perform the
various functions required. One word might be called \forth{DELETE}, another
\forth{INSERT}, etc. Since each of these commands will be invoked by the
interpreter, each of them will process a single keystroke.
Below these commands should lie a third component, the set of
words that implement the data structure to be edited.
\wepsfiga{fig3-2}{Generalized decomposition of the Tiny Editor problem.}
Finally, we'll need a component to display the field on the video
screen. For the sake of simplicity, let's plan on creating one word only,
\forth{REDISPLAY}, to redisplay the entire field after each key is pressed.
\begin{Code}
: editor BEGIN key revise redisplay ... UNTIL ;
\end{Code}
This approach separates revising the buffer from updating the display.
For now, we'll only concentrate on revising the buffer.
Let's look at each component separately and try to determine the words
each will need. We can begin by considering the events that must occur
within the three most important editing functions: overwriting,
deleting, and inserting. We might draw something like the following on
the back of an old pizza menu (we won't pay much attention to
exception-handling in the present discussion):
\begin{description}
%!!small layout differences between the '84 and '94 edition; using '94 layout
\item[To Overwrite:]~\\[\smallskipamount]
\begin{tabular}{@{}p{2.1in}>{\ttfamily}p{2.1in}}
Store new character into byte pointer to by pointer.
Advance pointer (unless at end of field).
& \parbox[t]{2.1in}{
F U N \fwbox{\pointto{K}} T I O N A L I T Y\\
F U N \fwbox{\pointto{C}} T I O N A L I T Y\\
F U N C \pointto{T} I O N A L I T Y}\\
\end{tabular}
\item[To Delete:]~\\[\smallskipamount]
\begin{tabular}{@{}p{2.1in}>{\ttfamily}p{2.1in}}
Copy leftwards, by one place, the string
beginning one place to the right of the pointer.
Store a ``blank'' into the last
position on the line.
& \parbox[t]{2.1in}{
F U N C T I O N \pointto{S} \fwbox{A L I T Y}\\
F U N C T I O N \fwbox{\pointto{A} L I T Y} Y\\
F U N C T I O N \pointto{A} L I T Y \fwbox{\newlength{\charheight}\settoheight{\charheight}{K}\rule{0cm}{\charheight} }}\\
\end{tabular}
% Note, pointers-and-boxes omitted in this diagram for some reason.
\item[To Insert:]~\\[\smallskipamount]
\begin{tabular}{@{}p{2.1in}>{\ttfamily}p{2.1in}}
Copy rightwards, by one place,
the string beginning at the pointer.
%!!'84 ed: the string beginning one place\\ to the right of the pointer.
Store new character into
byte pointed to by pointer.
Advance pointer (unless at end of field).
& \parbox[t]{2.1in}{
F U N \fwbox{\pointto{T} I O N A L I T Y}\\
F U N \pointto{T} \fwbox{T I O N A L I T Y}\\
F U N \fwbox{\pointto{C}} T I O N A L I T Y\\
F U N C \pointto{T} I O N A L I T Y} \\
\end{tabular}
\end{description}
We've just developed the algorithms for the problem at hand.
Our next step is to examine these three essential procedures, looking
for useful ``names''---that is procedures or elements which can
either:
\begin{enumerate}
\item possibly be reused, or
\item possibly change
\end{enumerate}
We discover that all three procedures use something called a ``pointer.''
We need two procedures:
\begin{enumerate}
\item to get the pointer (if the pointer itself is relative, this
function will perform some computation).
\item to advance the pointer
%!!next line not part of enumeration (not indented in '94 edition).
\suspend{enumerate}
Wait, three procedures:
\resume{enumerate}
\item to move the pointer backwards
\end{enumerate}
because we will want ``cursor keys'' to move the cursor forward and back
without editing changes.
These three operators will all refer to a physical pointer somewhere
in memory. Where it is kept and how (relative or absolute) should be
hidden within this component.
Let's attempt to rewrite these algorithms in code:
\begin{Code}
: key# ( returns value of key last pressed ) ... ;
: position ( returns address of character pointed-to) ;
: forward ( advance pointer, stopping at last position) ;
: backward ( decrement pointer, stopping at first position) ;
: overwrite key# position c! forward ;
: insert slide> overwrite ;
: delete slide< blank-end ;
\end{Code}
To copy the text leftwards and rightwards, we had to invent two new
names as we went along, \forth{SLIDE<} and \forth{SLIDE>} (pronounced
``slide-backwards'' and ``slide-forwards'' respectively). Both of them
will certainly use \forth{POSITION}, but they also must rely on an element
we've deferred considering: a way to ``know'' the length of the
field. We can tackle that aspect when we get to writing the third
component. But look at what we found out already: we can describe
``Insert'' as simply ``\forth{SLIDE> OVERWRITE}''.
In other words, ``Insert'' actually \emph{uses} ``Overwrite'' even though
they appear to exist on the same level (at least to a Structured
Programmer).
Instead of probing deeper into the third component, let's lay out
what we know about the first component, the key interpreter. First we
must solve the problem of ``insert mode.'' It turns out that ``insert'' is
not just something that happens when you press a certain key, as
delete is. Instead it is a \emph{different way of interpreting}
some of the possible keystrokes.
For instance in ``overwrite'' mode, an ordinary character gets stored
into the current cursor position; but in ``insert mode'' the remainder
of the line must first be shifted right. And the backspace key works
differently when the editor is in Insert Mode as well.
Since there are two modes, ``inserting'' and ``not-inserting,'' the
keystroke interpreter must associate the keys with two possible sets of
named procedures.
We can write our keystroke interpreter as a decision table (worrying
about the implementation later):
\bigskip
\begin{tabular}{>{\ttfamily}l>{\ttfamily}l>{\ttfamily}l}
\emph{\textrm{Key}} &\emph{\textrm{Not-inserting}}& \emph{\textrm{Inserting}}\\
Ctrl-D & DELETE & INSERT-OFF\\
Ctrl-I & INSERT-ON & INSERT-OFF\\
backspace & BACKWARD & INSERT< \\
left-arrow & BACKWARD & INSERT-OFF\\
right-arrow & FORWARD & INSERT-OFF\\
return & ESCAPE & INSERT-OFF\\
any printable & OVERWRITE & INSERT \\
\end{tabular}
\bigskip
\noindent We've placed the possible types of keys in the left column,
what they do normally in the middle column, and what they do in
``insert mode'' in the right column.
To implement what happens when ``backspace'' is pressed while in
Insert Mode, we add a new procedure:
\begin{Code}
: insert< backward slide< ;
\end{Code}
(move the cursor backwards on top of the last character typed, then slide
everything to the right leftward, covering the mistake).
This table seems to be the most logical expression of the problem at
the current level. We'll save the implementation for later (\Chap{8}).
Now we'll demonstrate the tremendous value of this approach in
terms of maintainability. We'll throw ourselves a curve---a major change
of plans!
\section{Maintaining a Component-based Application}
How well will our design fare in the face of change? Envision the following
scenario:
We originally assumed that we could refresh the video display simply
by retyping the field every time a key is pressed. We even implemented
the code on our personal computer, with its memory-mapped video that
refreshes an entire line in the blink of a scan cycle. But now our
customer wants the application to run on a telephone-based network,
with all I/O being done at a not-so-fast baud rate. Since some of our
input fields are almost as wide as the video screen, maybe 65
characters, it just takes too long to refresh the entire line on every
key stroke.
We've got to change the application so that we only refresh that
part of the field that actually changes. In ``insert'' and ``delete,'' this
would mean the text to the right of the cursor. In ``overwrite'' it would
mean changing just the single character being overwritten.
This change is significant. The video refresh function, which we
cavalierly relegated to the key interpreter, now must depend on which
editing functions occur. As we've discovered, the most important names
needed to implement the key interpreter are:
\begin{Code}[fontfamily=cmss]
FORWARD
BACKWARD
OVERWRITE
INSERT
DELETE
INSERT<
\end{Code}
None of their descriptions make any reference to the video refresh
process, because that was originally assumed to happen later.
But things aren't as bad as they seem. Looking at it now, the process
\forth{OVERWRITE} could easily include a command to type the new
character where the terminal's cursor is. And \forth{SLIDE<} and
\forth{SLIDE>} could include commands to type everything to the right
of, and including, \forth{POSITION}, then reset the terminal's cursor
to its current position.
Here are our revised procedure names. The commands just added
are in boldface:
%!! "KEY# EMIT" and "RETYPE" in boldface
\begin{Code}[commandchars=\&\{\}]
: overwrite key# position c! &poorbf{key# emit} forward ;
: &poorbf{retype ( type from current position to}
&poorbf{end of field and reset cursor) ;}
: insert slide> &poorbf{retype} overwrite ;
: delete slide< blank-end &poorbf{retype} ;
\end{Code}
Since these are the only three functions that change memory, they are
the only three functions that need to refresh the screen. This idea is
critical. We must be able to make such assertions to assure program
correctness. The assertion is intrinsic to the nature of the problem.
Note that the additional problem of video refresh adds an additional
``pointer'': the current cursor position on the screen. But decomposition
by component has encouraged us to view the \forth{OVERWRITE} process as
changing both the data field and the video vision of it; similarly with
\forth{SLIDE<} and \forth{SLIDE>}. For this reason it seems natural
now to maintain only one real pointer---a relative one---from which we
can compute either the data address in memory, or the column number on
the screen.
Since the nature of the pointer is wholly hidden within the three
processes \forth{POSITION}, \forth{FORWARD}, and \forth{BACKWARD}, we
can readily accommodate this approach, even if it wasn't our first
approach.
This change may have seemed simple enough here---even obvious. If so,
it's because the technique ensures flexible design. If we had used a
traditional approach---if we had designed according to structure, or
according to data transformation through sequential processes---our
brittle design would have been shattered by the change.
To prove this assertion, we'll have to start all over again from
scratch.
\section{Designing and Maintaining a Traditional Application}
Let's pretend we haven't studied the Tiny Editor problem yet, and
we're back with a minimal set of specs. We'll also start with our
initial assumption, that we can refresh the display by retyping the
entire field after each keystroke.
According to the dictum of top-down design, let's take the
widest-angle view possible and examine the problem. \Fig{fig3-3}
depicts the program in its simplest terms. Here we've realized that
the editor is actually a loop which keeps getting keystrokes and
performing some editing function, until the user presses the return
key.
\wepsfigb{fig3-3}{The traditional approach: view from the top.}
Inside the loop we have three modules: getting a character from the
keyboard, editing the data, and finally refreshing the display to match
the data.
Clearly most of the work will go on inside ``Process a Keystroke.''
Applying the notion of successive refinement, \fig{fig3-4} shows the
editor problem redrawn with ``Process a Keystroke'' expanded. We find it
takes several attempts before we arrive at this configuration. Designing
this level forces us to consider many things at once that we had deferred
till later in the previous try.
\wepsfigt{fig3-4}{A structure for ``Process a Keystroke.''}
For instance, we must determine all the keys that might be pressed.
More significantly, we must consider the problem of ``insert mode.'' This
realization forces us to invent a flag called \forth{INSERT-MODE} which gets
toggled by the ``Ctrl I'' key. It's used within several of the structural
lines to determine how to process a type of key.
A second flag, called \forth{ESCAPE}, seems to provide a nice structured
way of escaping the editor loop if the user presses the return key while
not in insert mode.
Having finished the diagram, we're bothered by the multiple tests
for Insert Mode. Could we test for Insert Mode once, at the beginning?
Following this notion, we draw yet another chart (\fig{fig3-5}).
As you can see, this turns out even more awkward than the first
figure. Now we're testing for each key twice. It's interesting though,
how the two structures are totally different, yet functionally
equivalent. It's enough to make one wonder whether the control
structure is terribly relevant to the problem.
\wepsfigt{fig3-5}{Another structure for ``Process a Keystroke.''}
Having decided on the first structure, we've finally arrived at the
most important modules---the ones that do the work of overwriting,
inserting, and deleting. Take another look at our expansion of
``Process a Character'' in \fig{fig3-4}. Let's consider just one of the
seven possible execution paths, the one that happens if a printable
character is pressed.
In \fig{fig3-6}(a) we see the original structural path for a printable
character.
Once we figure out the algorithms for overwriting and inserting
characters, we might refine it as shown in \fig{fig3-6}(b). But look at
that embarrassing redundancy of code (circled portions). Most
competent structured programmers would recognize that this redundancy
is unnecessary, and change the structure as shown in \fig{fig3-6}(c).
Not too bad so far, right?
\subsection{Change in Plan}
Okay, everyone, now act surprised. We've just been told that this
application won't run on a memory-mapped display. What does this
change do to our design structure?
\wepsfigt{fig3-6}{The same section, ``refined'' and ``optimized.''}
Well, for one thing it destroys ``Refresh Display'' as a separate
module. The function of ``Refresh Display'' is now scattered among the
various structural lines inside ``Process a Keystroke.'' The structure of
our entire application has changed. It's easy to see how we might have
spent weeks doing top-down design only to find we'd been barking down
the wrong tree.
What happens when we try to change the program? Let's look again
at the path for any printable character.
\fig{fig3-7} (a) shows what happens to our first-pass design when we
add refresh. Part (b) shows our ``optimized'' design with the refresh
modules expanded. Notice that we're now testing the Insert flag twice
within this single leg of the outer loop.
But worse, there's a bug in this design. Can you find it?
In both cases, overwriting and inserting, the pointer is incremented
\emph{before} the refresh. In the case of overwrite, we're displaying
the new character in the wrong position. In the case of insert, we're
typing the remainder of the line but not the new character.
Granted, this is an easy problem to fix. We need only move the refresh
modules up before ``Increment Pointer.'' The point here is: How did we
miss it? By getting preoccupied with control flow structure, a
superficial element of program design.
\wepsfigt{fig3-7}{Adding refresh.}
In contrast, in our design by components the correct solution fell
out naturally because we ``used'' the refresh component inside the editing
component. Also we used \forth{OVERWRITE} inside \forth{INSERT}.
By decomposing our application into components which use one another,
we achieved not only \emph{elegance} but a more direct path to
\emph{correctness}.\program{editor2}
\section{The Interface Component}%
\index{I!Interface component|(}
In computer science terminology, interfacing between modules has two
aspects. First, there's the way other modules invoke the module; this is
the control interface. Second, there's the way other modules pass and
receive data to and from the module; this is the data interface.
Because of \Forth{}'s dictionary structure, control is not an issue.
Definitions are invoked by being named. In this section, when we use the
term ``interface'' we're referring to data.
When it comes to data interfaces between modules, traditional wisdom
says only that ``interfaces should be carefully designed, with a
minimum of complexity.'' The reason for the care, of course, is that
each module must implement its own end of the interface (\fig{fig3-8}).
This means the presence of redundant code. As we've seen, redundant
code brings at least two problems: bulky code and poor maintainability.
A change to the interface of one module will affect the interface
of the opposite module.
\wepsfiga{fig3-8}{Traditional view of the interface as a junction.}
There's more to good interface design than that. Allow me to introduce
a design element which I call the ``interface component.'' The purpose
of an interface component is to implement, and
\emph{hide information about},%
\index{I!Information-hiding}
the data interface between two or more other components
(\fig{fig3-9}).
\wepsfiga{fig3-9}{Use of the interface component.}
\begin{tip}
Both data structures and the commands involved in the communication of
data between modules should be localized in an interface component.
\end{tip}
Let me give an example from my own recent experience. One of my hobbies
is writing text formatter/editors. (I've written two of them, including
the one on which I am writing this book.)
In my latest design the formatter portion contains two components.
The first component reads the source document and decides where to
make line and page breaks, etc. But instead of sending the text directly to
the terminal or printer, it saves up a line's worth at a time in a ``line
buffer.''
Similarly, instead of sending printer-control commands---for
bold-facing, underlining, etc.---as the text is being formatted, it
defers these commands until the text is actually sent. To defer the
control commands, I have a second buffer called the ``attribute
buffer.'' It corresponds, byte-for-byte, with the line buffer, except
that each byte contains a set of flags that indicate whether the
corresponding character should be underlined, boldfaced, or whatever.
The second component displays or prints the contents of the line
buffer. The component knows whether it is transmitting to the terminal
or to the printer, and outputs the text according to the attributes
indicated by the attribute buffer.
Here we have two well-defined components---the line-formatter and
the output component---each one shouldering part of the function of the
formatter as a whole.
The data interface between these two components is fairly complex.
The interface consists of two buffers, a variable that indicates the
current number of valid characters, and finally a ``knowledge'' of what
all those attribute patterns mean.
In \Forth{} I've defined these elements together in a single screen. The
buffers are defined with \forthb{CREATE}, the count is an ordinary
\forthb{VARIABLE}, and the attribute patterns are defined as
\forthb{CONSTANT}s, such as:
\begin{Code}
1 Constant underness ( bit mask for underlining)
2 Constant boldness ( bit mask for boldface)
\end{Code}
The formatting component uses phrases like \forth{UNDERNESS SET-FLAG}
to set bits in the attribute buffer. The output component uses phrases
like \forth{UNDERNESS AND} to read the attribute buffer.
\subsection{A Design Mistake}
In designing an interface component, you should ask yourself ``What is
the set of structures and commands that must be shared by the
communicating components?'' It's important to determine what elements
belong to the interface and what elements should remain within a
single component.
In writing my text formatter, I failed to answer this question fully
and found myself with a bug. The problem was this:
I allow different type widths to be used: condensed, double width,
etc. This means not only sending different signals to the printer, but
changing the number of characters allowed per line.
I keep a variable, called \forth{WALL}, for the formatter.
\forth{WALL} indicates the right margin: the point beyond which no
more text can be set. Changing to a different type width means
changing the value of \forth{WALL} proportionately. (Actually, this
turns out to be a mistake in itself. I should be using a finer unit of
measurement, the number of which remains constant for the line.
Changing type widths would mean changing the number of units per
character. But getting back to the mistake at hand\dots)
Alas, I was also using \forth{WALL} inside the output component to
determine how many characters to display. My reasoning was that this value
would change depending on what type-width I was using.
I was right---99\% of the time. But one day I discovered that, under
a certain condition, a line of condensed text was being somehow cut
short. The final couple of words were just missing. The reason turned out
to be that \forth{WALL} was getting changed before the output component had
a chance to use it.
Originally I had seen nothing wrong with letting the output component
blithely use the formatter's \forth{WALL} as well. Now I realized that the
formatter had to leave a separate variable for the output component,
to indicate how many valid characters were in the buffers. This would
leave any subsequent font commands free to change \forth{WALL}.
It was important that the two buffers, the attribute commands, and the
new variable were the \emph{only} elements that could be shared
between the two modules. Reaching into either module from the other
one spells trouble.
The moral of this story is that we must distinguish between data
structures that are validly used only within a single component and those
that may be shared by more than one component.
A related point:
\begin{tip}
Express in objective units any data to be shared by components.
\end{tip}
For example:
\begin{itemize}
\item Module A measures the temperature of the oven.
\item Module B controls the burner.
\item Module C makes sure the door is locked if the oven is too hot.
\end{itemize}
The information of global interest is the temperature of the oven,
expressed objectively in degrees. While Module A might receive a value
representing the voltage from a heat sensor, it should convert this
value to degrees before presenting it to the rest of the application.%
\index{C!Components:!decomposition by|)}%
\index{D!Decomposition!by component|)}%
\index{I!Interface component|)}%
\index{P!Preliminary Design!decomposition by component|)}
\section{Decomposition by Sequential Complexity}%
\index{D!Decomposition!by sequential complexity|(}%
\index{P!Preliminary Design!decomposition by sequential complexity|(}%
\index{S!Sequential complexity, decomposition by|(}
We've been discussing one way to do decomposition: according to
components. The second way is according to sequential complexity.
One of \Forth{}'s rules is that a word must already have been defined to
be invoked or referred to. Usually the sequence in which words are
defined parallels the order of increasing capabilities which the words
must possess. This sequence leads to a natural organization of the
source listing. The powerful commands are simply added on top of the
elementary application (\fig{fig3-10}a).
Like a textbook, the elementary stuff comes first. A newcomer to
the project would be able to read the elementary parts of the code before
moving on the advanced stuff.
\wepsfigt{fig3-10}{Two ways to add advanced capabilities.}
But in many large applications, the extra capabilities are best
implemented as an enhancement to some private, root function in the
elementary part of the application (\fig{fig3-10}b). By being able to
change the root's capability, the user can change the capability of all the
commands that use the root.
Returning to the word processor for an example, a fairly primitive
routine is the one that starts a new page. It's used by the word that
starts a new line; when we run out of lines we must start a new
page. The word that starts a new line, in turn, is used by the routine
that formats words on the line; when the next word won't fit on the
current line, we invoke \forth{NEWLINE}. This ``uses'' hierarchy demands that
we define \forth{NEWPAGE} early in the application.
The problem? One of the advanced components includes a routine that
must be invoked by \forth{NEWPAGE}. Specifically, if a figure or table
appears in the middle of text, but at format time won't fit on what's
left of the page, the formatter defers the figure to the next page
while continuing with the text. This feature requires somehow
``getting inside of'' \forth{NEWPAGE}, so that when \forth{NEWPAGE} is
next executed, it will format the deferred figure at the top of the
new page:
\begin{Code}
: newpage ... ( terminate page with footer)
( start new page with header) ... ?holdover ... ;
\end{Code}
How can \forth{NEWPAGE} invoke \forth{?HOLDOVER}, if \forth{?HOLDOVER} is
not defined until much later?
While it's theoretically possible to organize the listing so that the
advanced capability is defined before the root function, that approach is
bad news for two reasons.
First, the natural organization (by degree of capability) is
destroyed. Second, the advanced routines often use code that is defined
amid the elementary capabilities. If you move the advanced routines to
the front of the application, you'll also have to move any routines they
use, or duplicate the code. Very messy.
\index{V!Vectored execution|(}
You can organize the listing by degree of complexity using a technique
called ``vectoring.'' You can allow the root function to invoke (point
to) any of various routines that have been defined after the root
function itself. In our example, only the \emph{name} of the routine
\forth{?HOLDOVER} need be created early; its definition can be given later.%
\index{V!Vectored execution|)}
\Chap{7} treats the subject of vectoring in \Forth{}.%
\index{D!Decomposition!by sequential complexity|)}%
\index{P!Preliminary Design!decomposition by sequential complexity|)}%
\index{S!Sequential complexity, decomposition by|)}
\section{The Limits of Level Thinking}%
\index{L!Level@``Level'' thinking, limits of|(}
Most of us are guilty of over-emphasizing the difference between
``high-level'' and ``low-level.'' This notion is an arbitrary one. It
limits our ability to think clearly about software problems.
``Level'' thinking, in the traditional sense, distorts our efforts in
three ways:
\begin{enumerate}
\item It implies that the order of development should follow a
hierarchical structure
\item It implies that levels should be segregated from each
other, prohibiting the benefits of reusability
\item It fosters syntactical differences between levels (e.g.,
assembler vs. ``high-level'' languages) and a belief that the
nature of programming somehow changes as we move further from
machine code.
\end{enumerate}
Let's examine each of these misconceptions one by one.
\subsection{Where to Begin?}
\begin{interview}
I asked \person{Moore}\index{M!Moore, Charles|(} how he would go about
developing a particular application, a game for children. As the child
presses the digits on the numeric keypad, from zero to nine, that same
number of large boxes would appear on the screen.
\person{Moore}:
\begin{tfquot}
I don't start at the top and work down. Given that exact
problem, I would write a word that draws a box. I'd start at
the bottom, and I'd end up with a word called \forth{GO},
which monitored the keyboard.
\end{tfquot}
\noindent How much of that is intuitive?
\begin{tfquot}
Perhaps some degree of it. I know where I'm going so I don't
have to start there. But also it's more fun to draw boxes than
to program a keyboard. I'll do the thing that's most fun in
order to get into the problem. If I have to clean up all those
details later, that's the price I pay.
\end{tfquot}
\noindent Are you advocating a ``fun-down'' approach?%
\index{F!Fundown@``Fun-down'' approach|(}
\begin{tfquot}
Given that you're doing it in a free-spirit fashion, yes. If
we were giving a demonstration to a customer in two days, I'd
do it differently. I would start with the most visible thing,
not the most fun thing. But still not in that hierarchical
sequence, top down. I base my approach on more immediate
considerations such as impressing the customer, getting
something to work, or showing other people how it's going to
work to get them interested.
If you define a level as ``nesting,'' then yes, it's a good
way to decompose a problem. But I've never found the notion of
``level'' useful. Another aspect of levels is languages,
metalanguages, meta-metalanguages. To try and split hairs as
to which level you are on---assembler level, first integration
level, last integration level---it's just tedious and not
helpful. My levels get all mixed up hopelessly.
\end{tfquot}\index{M!Moore, Charles|)}%
\index{F!Fundown@``Fun-down'' approach|)}
\end{interview}
Designing by components makes where you start less important. You
could start with the key interpreter, for instance. Its goal is to
receive keystrokes and convert them to numbers, passing these numbers
to an internally invoked word. If you substitute the \Forth{} word
\forth{.} (``dot,'' which prints a number from the stack), then we can
implement the key interpreter, test it, and debug it without using
routines that have anything to do with drawing squares.
%!! original had a typo: s/it without using/it. without using/
% that was fixed in the '94 edition and here
On the other hand, if the application required hardware support (such as a
graphics package) that we didn't have on hand, we might want to
substitute something available, such as displaying an asterisk, just to
get into the problem. Thinking in terms of lexicons is like painting a
huge mural that spans several canvases. You work on all the canvases at
once, first sketching in the key design elements, then adding splashes of
color here and there\dots{} until the entire wall is complete.
\begin{tip}
In deciding where to start designing, look for:\medskip
\begin{itemize}
\item areas where the most creativity is required (the areas where change
is most likely)
\item areas that give the most satisfying feedback (get the juices
flowing)
\item areas in which the approach decided upon will greatly affect other
areas, or which will determine whether the stated problem can be solved at
all
\item things you should show the customer, for mutual understanding
\item things you can show the investors, if necessary for the rent.
\end{itemize}
\end{tip}
\subsection{No Segregation Without Representation}%
\index{O!Object|(}%
The second way in which levels can interfere with optimal solutions is
by encouraging segregation of the levels. A popular design construct
called the ``object'' typifies this dangerous
philosophy.\ifeightyfour\else\footnote{Editor's note: But see the
recant in the 1994 Preface on page \pageref{preface94}\ifofour, and
the clairification in the 2004 Preface on page
\pageref{preface2004}\fi. Think of something like Windows COM
``objects'' or CORBA.
Real object oriented programming, as it originates in Smalltalk, does
not hide information from the programmer. Adding a ``scrambled''
method to the ``egg master object'' is no problem. Smalltalk works by adding
methods to known classes, you don't even need to subclass them. You
can look inside an object and its source code whenever you want. And
table driven method dispatching can be quite efficient.
\hfill\person{Bernd Paysan}}\fi
\begin{tfnote}{Bernd Paysan}
Object oriented programming within the \Forth{} context needs better
coverage here.
\end{tfnote}
An object is a portion of code that can be invoked by a single name,
but that can perform more than one function. To select a particular
function you have to invoke the object and pass it a parameter or a
group of parameters. You can visualize the parameters as representing
a row of buttons you can push to make the object do what you want.
The benefit of designing an application in terms of objects is that,
like a component, the object hides information from the rest of the
application, making revision easier.
There are several problems, though. First, the object must contain a
complicated decision structure to determine which function it must
perform. This increases object size and decreases performance. A
lexicon, on the other hand, provides all usable functions by name for
you to invoke directly.
Second, the object is usually designed to stand alone. It can't take
advantage of tools provided by supporting components. As a result, it
tends to duplicate code inside itself that will appear elsewhere in
the application. Some objects are even required to parse text in order
to interpret their parameters. Each may even use its own syntax. A
shameless waste of time and energy!
%!! should be captionless but framed
\wepsfigp{no-scrambled}{``No scrambled?''}
Finally, because the object is constructed to recognize a finite set
of possibilities, it's difficult to make additions to the row of
buttons when a new function is needed. The tools inside the object
have not been designed for reuse.%
\index{O!Object|)}
The idea of levels pervades the design of the IBM Personal
Computer. Besides the processor itself (with its own machine
instruction set, of course), there are these software levels:
\begin{itemize}
\item the set of utilities written in assembler and burned into the
system's ROM
\item the disk operating system, which invokes the utilities
\item the high-level language of choice, which invokes the operating
system and the utilities
\item and finally, any application using the language.
\end{itemize}
\noindent The ROM utilities provide the hardware-dependent routines:
those that handle the video screen, disk drives, and keyboard. You
invoke them by placing a control code in a certain register and
generating the appropriate software interrupt.
For instance, software interrupt \code{10H} causes entry to the video
routines. There are 16 of these routines. You load register \code{AH} with
the number of the video routine you want.
Unfortunately, in all 16 routines there is not one that displays a text
string. To do that, you must repeat the process of loading registers and
generating a software interrupt, which in turn must make a decision
about which routine you want, and do a few other things you don't
need---for \emph{every single character}.
Try writing a text editor in which the entire screen may need to be
refreshed with each keystroke. Slow as mail! You can't improve the speed
because you can't reuse any of the information within the video routines
except for what's provided on the outside. The stated reason for this is to
``insulate'' the programmer from device addresses and other details of
the hardware. After all, these could change with future upgrades.