Weak-Memory-Local-Reasoning/dissertation.tex at master · iwehrman/Weak-Memory-Local-Reasoning · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%
%%% File: utthesis2.doc, version 2.0jab, February 2002
%%%
%%% Based on: utthesis.doc, version 2.0, January 1995
%%% =============================================
%%% Copyright (c) 1995 by Dinesh Das.  All rights reserved.
%%% This file is free and can be modified or distributed as long as
%%% you meet the following conditions:
%%%
%%% (1) This copyright notice is kept intact on all modified copies.
%%% (2) If you modify this file, you MUST NOT use the original file name.
%%%
%%% This file contains a template that can be used with the package
%%% utthesis.sty and LaTeX2e to produce a thesis that meets the requirements
%%% of the Graduate School of The University of Texas at Austin.
%%%
%%% All of the commands defined by utthesis.sty have default values (see
%%% the file utthesis.sty for these values).  Thus, theoretically, you
%%% don't need to define values for any of them; you can run this file
%%% through LaTeX2e and produce an acceptable thesis, without any text.
%%% However, you probably want to set at least some of the macros (like
%%% \thesisauthor).  In that case, replace "..." with appropriate values,
%%% and uncomment the line (by removing the leading %'s).
%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\documentclass[11pt]{report}         %% LaTeX2e document.
\usepackage {utthesis2}              %% Preamble.

\usepackage[bookmarks=true,bookmarksopen=true]{hyperref}
\usepackage{url}
\usepackage{datetime}
\usepackage{enumerate}
\usepackage{graphicx}
\usepackage{proof}
\usepackage{tocbibind}
\usepackage{bcprules}

\synctex=1

\include{notation}
\include{definitions}

%\setcounter{secnumdepth}{3}
%\setcounter{tocdepth}{3}

% \mastersthesis                     %% Uncomment one of these; if you don't
\phdthesis                         %% use either, the default is \phdthesis.

% \thesisdraft                       %% Uncomment this if you want a draft
                                     %% version; this will print a timestamp
                                     %% on each page of your thesis.

\leftchapter                       %% Uncomment one of these if you want
% \centerchapter                     %% left-justified, centered or
% \rightchapter                      %% right-justified chapter headings.
                                     %% Chapter headings includes the
                                     %% Contents, Acknowledgments, Lists
                                     %% of Tables and Figures and the Vita.
                                     %% The default is \centerchapter.

% \singlespace                       %% Uncomment one of these if you want
% \oneandhalfspace                   %% single-spacing, space-and-a-half
% \doublespace                       %% or double-spacing; the default is
                                     %% \oneandhalfspace, which is the
                                     %% minimum spacing accepted by the
                                     %% Graduate School.

\renewcommand{\thesisauthor}{Ian Anthony Wehrman}    %% Your official UT name.

\renewcommand{\thesismonth}{December}     %% Your month of graduation.

\renewcommand{\thesisyear}{2012}      %% Your year of graduation.

\renewcommand{\thesistitle}{Weak-Memory Local Reasoning}     %% The title of your thesis; use
                                     %% mixed-case.

\renewcommand{\thesisauthorpreviousdegrees}{B.S.; M.S.C.S.}
                                     %% Your previous degrees, abbreviated;
                                     %% separate multiple degrees by commas.

\renewcommand{\thesissupervisor}{Warren A.\ Hunt, Jr.}
                                     %% Your thesis supervisor; use mixed-case
                                     %% and don't use any titles or degrees.

\renewcommand{\thesiscosupervisor}{J Strother Moore}
                                     %% Your PhD. thesis co-supervisor; if any.
                                     %% Use mixed case and don't use any titles
                                     %% or degrees. Uncomment if you
                                     %% have a co-supervisor.
                                     %% (Ignored for Master's)

\renewcommand{\thesiscommitteemembera}{Josh Berdine}
\renewcommand{\thesiscommitteememberb}{E.\ Allen Emerson}
\renewcommand{\thesiscommitteememberc}{Donald S. Fussell}
\renewcommand{\thesiscommitteememberd}{C.\ A.\ R.\ Hoare}
% \renewcommand{\thesiscommitteemembere}{}
% \renewcommand{\thesiscommitteememberf}{}
% \renewcommand{\thesiscommitteememberg}{}
% \renewcommand{\thesiscommitteememberh}{}
% \renewcommand{\thesiscommitteememberi}{}
                                     %% Define your other committee members here;
                                     %% use mixed case and don't use any titles
                                     %% or degrees.  Uncomment as many
                                     %% as neccessary. (Ignored for Master's)

\renewcommand{\thesisauthoraddress}{\url{ian@wehrman.org}}
                                     %% Your permanent address; use "\\" for
                                     %% linebreaks.

\renewcommand{\thesisdedication}{For my mother, Sally Marie Lorino, from whom I learned to learn.}
                                     %% Your dedication, if you have one; use
                                     %% "\\" for linebreaks.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%
%%% The following commands are all optional, but useful if your requirements
%%% are different from the default values in utthesis.sty.  To use them,
%%% simply uncomment (remove the leading %) the line(s).

\renewcommand{\thesiscommitteesize}{6}
                                     %% Uncomment this only if your thesis
                                     %% committee does NOT have 5 members
                                     %% for \phdthesis or 2 for \mastersthesis.
                                     %% Replace the "..." with the correct
                                     %% number of members.

% \renewcommand{\thesisdegree}{...}  %% Uncomment this only if your thesis
                                     %% degree is NOT "DOCTOR OF PHILOSOPHY"
                                     %% for \phdthesis or "MASTER OF ARTS"
                                     %% for \mastersthesis.  Provide the
                                     %% correct FULL OFFICIAL name of
                                     %% the degree.

% \renewcommand{\thesisdegreeabbreviation}{...}
                                     %% Use this if you also use the above
                                     %% command; provide the OFFICIAL
                                     %% abbreviation of your thesis degree.

% \renewcommand{\thesistype}{...}    %% Use this ONLY if your thesis type
                                     %% is NOT "Dissertation" for \phdthesis
                                     %% or "Thesis" for \mastersthesis.
                                     %% Provide the OFFICIAL type of the
                                     %% thesis; use mixed-case.

% \renewcommand{\thesistypist}{...}  %% Use this to specify the name of
                                     %% the thesis typist if it is anything
                                     %% other than "the author".

%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

\thesiscopyrightpage                 %% Generate the copyright page.

\thesiscertificationpage             %% Generate the PhD. certification page.

\thesistitlepage                     %% Generate the title page.

%\thesissignaturepage                %% Generate the Master's signature page.

\thesisdedicationpage                %% Generate the dedication page.

\begin{thesisacknowledgments}
This dissertation would not have been possible without the help of many people. I thank Warren Hunt, who has been a wellspring of support in favor of my doctoral degree. I am certain that I would not have not completed this project without his encouragement and generous technical, moral and financial support. I also thank Josh Berdine, who has been a true intellectual mentor to me. If I have contributed anything to our field of interest then for that I have him to thank.
%
% Many other mentors have guided me throughout the degree program.
%
I am privileged to have had the opportunity to work with and learn from Tony Hoare, whose insight into the science of programming and the broader scientific process has been invaluable. I am in debt to J Moore for accepting me as an advisee at a difficult moment and for guiding me back to a successful path.
% Jay Misra taught me that ideas are not well developed until they admit simple explanations.
I thank also Allen Emerson and Don Fussell for their participation on my dissertation committee. Finally, I warmly acknowledge the support of my family and friends. My mother has always been my unflagging champion; I do not know how I could have made it without her love through good and bad times. While in Austin, my wonderful friends---in particular Joel Brandt, Richard Chang, Benjamin Delaware, Anne Proctor and Chelsea Weathers---have broadened my horizons and brightened my life. Their compassion, camaraderie and loyalty means more to me than they likely know.
\end{thesisacknowledgments}

\begin{thesisabstract}
Program logics are formal logics designed to facilitate specification and correctness reasoning for software programs. Separation logic, a recent program logic for C-like programs, has found great success in automated verification due in large part to its embodiment of the principle of local reasoning, in which specifications and proofs are restricted to just those resources—--variables, shared memory addresses, locks, etc.—--used by the program during execution.

Existing program logics make the strong assumption that all threads agree on the values of shared memory at all times. But, on modern computer architectures, this assumption is unsound for certain shared-memory concurrent programs: namely, those with races. Typically races are considered to be errors, but some programs, like lock-free concurrent data structures, are necessarily racy. Verification of these difficult programs must take into account the weaker models of memory provided by the architectures on which they execute.

This dissertation project seeks to explicate a local reasoning principle for x86-like architectures. The principle is demonstrated with a new program logic for concurrent C-like programs that incorporates ideas from separation logic. The goal of the logic is to allow verification of racy programs like concurrent data structures for which no general-purpose high-level verification techniques exist.

\end{thesisabstract}

\tableofcontents                     %% Generate table of contents.
% \listoftables                      %% Uncomment this to generate list
                                     %% of tables.
\listoffigures                     %% Uncomment this to generate list
                                     %% of figures.

% \chapter{...}                        %% Begin your thesis text here; follow
%                                      %% the report style and group your text
%                                      %% in chapters, sections, etc.

% \chapter{...}                        %% More chapters.

% \addcontentsline {toc}{chapter}{Bibliography}
%                                      %% Force Bibliography to appear in contents

% \begin{thebibliography}{..}          %% Start your bibliography here; you can
% \bibitem{...} ...                    %% also use the \bibliography command
% \end{thebibliography}                %% to generate your bibliography.


\chapter{Introduction}
\label{ch:introduction}

Most concurrent software verification techniques rely on a
surprisingly strong assumption: namely, that all processes agree on
the value of shared memory at all times. This is, of course, not
generally true, but it is often a \emph{safe} assumption because of
implicit guarantees provided by the memory models of modern computer
architectures, which guarantee that programs without races will not
observe such inconsistencies. The soundness of most concurrent
software verification techniques therefore relies on race-freedom of
the program under study. This is not considered a major shortcoming
though because races usually indicate a program error.

There are, however, useful and interesting programs for which races do not indicate an error. For example, concurrent data structures, which optimize for speed and throughput by using locks and memory fence instructions sparingly, are often racy by design. Their correctness is demonstrated by relating the executions of the comparatively daring implementation to those of its simpler, abstract counterpart. Constructing such a relation therefore requires a technique that is tolerant of races. But that requirement comes with a serious consequence: any technique that tolerates races soundly must also admit that processes may observe the inconsistencies in the value of shared memory that result from the peculiarities of the architecture's memory model.

The verification literature offers little insight into the problem of verifying concurrent data structures and other inherently racy programs. This is because a model of a contemporary memory adds serious complication to an already difficult problem, but also because until recently formal specifications of common architectures' memory models did not exist publicly. (Or, perhaps, privately.) Fortunately, the latter problem has been alleviated with recent safety specifications for the x86, Power and ARM memory models \cite{DBLP:conf/tphol/OwensSS09,DBLP:conf/popl/2009damp}. So, for these architectures, the path toward a solution to the correctness problem of concurrent data structures and other important programs now lies essentially unimpeded.

\section{An Illustrative Example}
\label{sec:intro-ex}

\begin{figure}[h]
\centering
%\resizebox{\textwidth}{!}{
\begin{tabular}{cc||cc}
  \multicolumn{4}{c}{\textit{// initially:} $f_0 \mapsto 0 \fsep f_1 \mapsto
  0$}\vspace{0.5em}\\

  \begin{minipage}{0.25\textwidth}
    $\left[f_0\right] := 1;$ \\
    if $(\left[f_1\right] == 0)$ then\\
    \textit{// ~~critical section}
  \end{minipage} & \hspace{0.5cm} & \hspace{0.5cm} &
  \begin{minipage}{0.25\textwidth}
    $\left[f_1\right] := 1;$ \\
    if $(\left[f_0\right] == 0)$ then\\
    \textit{// ~~critical section}
  \end{minipage}\vspace{1em}\\
  \multicolumn{2}{c}{Process $P_0$} & \multicolumn{2}{c}{Process
  $P_1$}\\
\end{tabular}
%}
\caption{\label{fig:dekker} Dekker's algorithm}
\end{figure}

To motivate study of a local reasoning principle for weak memory models, we consider the problem of reasoning about programs executing on such models. Some of the issues involved can be illustrated by considering the small pseudocode program in Figure~\ref{fig:dekker}.\footnote{The result of dereferencing a pointer variable $x$ is indicated by $\left[ x \right]$.}

The initial condition states that the two pointer variables, $f_0$ and $f_1$ have distinct values, each of which are addresses into shared memory at which the value is 0. The program is a concurrent composition: process $P_i$, for $i \in \left\lbrace 0,1 \right\rbrace$, sets its flag by storing the value 1 at address $f_i$, then optionally enters its critical section if the result of loading the address at other flag $f_{1-i}$ is 0.\footnote{Load and store are assumed to be atomic operations when operating on integer-valued data.}

This is a simplification of Dekker's algorithm for mutual exclusion; it should not be possible for both processes to enter their critical sections simultaneously. An informal correctness argument can be made that relies on a widely assumed property of the underlying memory model, \emph{sequential consistency}, defined by Lamport \cite{DBLP:journals/tc/Lamport79} to mean that, \begin{quotation}\noindent the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. \end{quotation}

The informal correctness argument for the program in Figure~\ref{fig:dekker} is as follows. In any execution, either process $P_0$ sets flag $f_0$ before process $P_1$ sets flag $f_1$, or conversely, because we may assume all events are totally (sequentially) ordered. In the first case, $f_0$ is set before $f_1$, which happens before $P_1$ loads $f_0$ because the total order of events respects the program orders. So, if $P_0$ sets its flag first, the load of $f_0$ returns 1 and $P_1$ will not enter its critical section. A symmetric argument shows that if $P_1$ sets $f_1$ before $P_0$ sets $f_0$, then $P_0$ will not enter its critical section. In both cases, at most one of the two processes may enter its critical section.\footnote{This program does not, of course, preclude deadlock.}

Sequential consistency is crucial to this argument. If the events are not totally ordered, the case split is not exhaustive. If the program orders are not included in the total order, we may not conclude that the first store precedes the other process' load, despite the fact that the first store precedes the second store (in the total order) and the second store precedes the load (in the program order). In either case, mutual exclusion may fail.

Unfortunately, common multiprocessor architectures do not generally guarantee sequential consistency, and so neither the informal argument above nor more rigorous arguments based on formalizations of sequential consistency are valid. And although the memory models of various architectures all seem to be strictly weaker than sequentially consistent models, they are not individually comparable. Roughly, the Power and ARM architectures guarantee the first part of sequential consistency (a total order on memory events), but not the second (that this order includes the program orders) \cite{DBLP:conf/popl/2009damp,DBLP:conf/asplos/ChongI08}. Such memory models are called \emph{weakly consistent}. Conversely, the x86 architecture does not guarantee a total order on memory events, but does guarantee that the observed partial order is consistent with the program orders \cite{DBLP:conf/tphol/OwensSS09}. These memory models are said to have the \emph{total-store ordering} (TSO) property.

There are, however, conditions under which these architectures guarantee a program's executions to be sequentially consistent---namely, in the absence of data races. These so-called ``data-race free'' (DRF) guarantees provide sufficient conditions under which sequential consistency can be recovered and used for a correctness argument. By guarding the memory-accessing commands in the program in Figure~\ref{fig:dekker} with synchronization primitives like locks to eliminate races, the previous correctness argument again becomes valid. Such a program transformation might make sense for a conservative programmer concerned with correctness, but it does not constitute a helpful verification strategy because the transformation does not preserve the original program's semantics.

A less drastic semantics-altering transformation is to add fence instructions directly after the processes' store operations; this too results in an implementation of Dekker's algorithm that preserves mutual exclusion. The fences ensure that the processes do not attempt their loads until after their respective stores have completed. But proving that this modified program is correct requires an argument quite different from the one above: the loads in this program race with their opposite stores, and so DRF guarantees cannot be applied to recover sequential consistency. Hence, any correctness argument for this modified program must be cognizant of the peculiarities of the underlying memory model. Indeed, correctness arguments for an x86-like memory model are completely different for a correctness argument for an ARM-like memory model.

\section{Project Description}
\label{sec:project-description}

This dissertation project does not, by far, solve the verification problem for racy concurrent programs executing on a weak memory models generally. Its more modest goal is instead to step toward this by developing a \emph{program logic} for the verification, by proof, of partial correctness properties of certain programs that interact with a particular weak model of memory. Implicit in this is the goal to explicate a \emph{local reasoning principle} for a weak memory model. Traditional correctness reasoning and specification is global: the entire system must be accounted for, which makes scaling to large programs difficult. Local reasoning dictates instead that reasoning and specification be restricted to just those resources---program variables, shared-memory addresses, locks, etc.---that are accessed or modified by the program during execution.

The particular programs studied in this dissertation are structured, C-like programs with pointers and pointer arithmetic, and concurrency constructs, including memory fences. The language was chosen based on the level of detail with which racy concurrent programs are typically described in the literature: i.e., with structured imperative control-flow constructs like loops and if-statements, but with memory fences and locking explicitly specified.

The particular weak memory model explored in this dissertation is x86-like; in particular, based on the x86-TSO memory model as defined by Owens, Sarkar and Sewell \cite{DBLP:conf/tphol/OwensSS09}. This model was chosen for a variety of reasons. First, x86 multiprocessors are now in extremely widespread use, commonly found in servers, desktops, laptops, tablets, smartphones and many other small computing appliances. Second, unlike most other modern multiprocessor architectures (e.g., ARM and Power \cite{DBLP:conf/popl/AlglaveFIMSSN09,DBLP:conf/pldi/SarkarSAMW11}), the memory model is both well understood and has, thanks to Owens et~al., a clear, simple and formal specification.

In this project, local reasoning is explored in the context of an x86-like memory model in particular by developing two program logics that embodies such a principle: a logic for purely sequential programs that execute on a single processor; and a more general logic for concurrent programs that execute on multiple processors in parallel. The sequential fragment of this logic is loosely based on separation logic, a recent Hoare-style logic which has spurred a revolution in high-level program reasoning due to the simplicity with which it handles pointers using a local reasoning principle. The concurrent extension of the logic is similarly based on a concurrent extension to separation logic.

\subsection{Components and Dependencies}
\label{sec:components}

The various components of the program logics described in this dissertation and their explicit dependencies are pictured in the (transitively reduced) graph of Figure~\ref{fig:dependency-graph}. The components are represented by shapes that indicate their approximate type: semantic objects by octagons; formal languages by squares; semantic relationships by ovals; deduction systems by hexagons; and key properties by trapezoids.

\begin{figure}[ht]
%\begin{center}
\includegraphics[scale=0.5]{dependency-graph/dg-reduced}
\caption{\label{fig:dependency-graph}Dependencies among the components of the project}
%\end{center}
\end{figure}

The memory model is shown in a double-lined octagon, which reflects the assumption in this project that it is complete and correct, and is not further modified from its operational definition in \cite{DBLP:conf/tphol/OwensSS09}, which is summarized in Section~\ref{sec:memory-models}.

The machine model---in particular, the notion of machine state---depends on, but is distinct from, the memory model. For example, the memory model dictates that each processor has a private set of named registers, whereas in the machine models that we shall define, a single shared set of variable names is assumed. We will also take the liberty in the machine model to relax other restrictions on the notion of state from the memory model, such as the requirement that writes buffered by a single processor are totally ordered.

The programming language is a structured C-like language with concurrent composition. It does not explicitly depend on any other components of the project. The programming language semantics relates the programming language to the machine model, and hence depends on them both. The uniprocessor machine model and sequential programming language are described in Section~\ref{sec:sequential-programs}; the multiprocessor machine model and concurrent programming language are described in Section~\ref{sec:programming-language}.

The assertion language also does not explicitly depend on any other components of the project.\footnote{Implicitly, of course, it depends significantly on the memory and machine models.} The assertion language semantics associates the assertion language to sets of machine states (as defined by the machine model) with a particular structure. Assertions about uniprocessor machine states are described in Section~\ref{sec:uniprocessor-assertions}; assertions about multiprocessor machine states are described in Section~\ref{sec:assertions}.

Ideally there would also be a proof theory of assertions and a corresponding soundness theorem. We have chosen not to focus on a proof theory of assertions in this project, but will indicate some semantic implications and equivalences that would be relevant to that end in Section~\ref{sec:sequential-algebra} for the uniprocessor case, and in Section~\ref{sec:algebra} in the multiprocessor case.

The specification language encompasses the programming and assertions languages, and its semantics is given in terms of the semantics of programs and assertions. The proof theory of specifications relies on the existence of a suitable proof theory of assertions for determining entailments. The soundness of the specification logic relies on the soundness of the proof theory of assertions as well as the semantics of programs and assertions. Specification of sequential programs are described in Section~\ref{sec:sequential-specifications}; specifications of concurrent programs are described in Section~\ref{sec:specifications}.

\subsection{Contributions and Status}
\label{sec:contributions}

The significant contributions of this dissertation project are as follows:

\begin{enumerate}
  \item An x86-like operational semantics based on state transitions for a C-like programming language with pointers and pointer arithmetic, memory fences and concurrency constructs. The model is novel, and is expressive enough to describe both processor-parallel and interleaved thread executions.

  \item An assertion language for describing, naturally and concisely, x86-like system configurations, and a formal semantics in terms of the aforementioned states. Both the language and the model are novel.

  \item A program specification logic---i.e., a formal language of specifications and a proof system---for describing and deducing partial correctness of the aforementioned C-like programs in terms of the aforementioned assertions, as well as a formal semantics of specifications that relates the x86-like execution of C-like programs among states described by assertions. The specification logic additionally embodies an x86-specific principal of local reasoning, which allows proofs to be constructed by describing the interaction between the program and the small portion of system resources relevant to its behavior; and subsequently generalizing the proof to describe the interaction of the program with more complete system descriptions. This is the first known local reasoning principle for a weak memory model.
\end{enumerate}

The languages, models and deduction system described above are completely and formally defined, having undergone hundreds of revisions. There are however two major components left unfinished:
\begin{enumerate}
  \item There is no proof system for the assertion language. Omitting this component of the project was an early, intentional decision for two reasons. First, a proof system is necessary for practical application of the specification logic---and in particular automation of the specification logic---but is not critical for the study of the specification language. In practice, syntactic entailment is a practically important approximation of semantic entailment, but in principle simply having a well defined notion of semantic entailment is sufficient. Second, it is hypothesized that the traditional inference rules of first-order logic are sound for the assertion language defined here---as is the case for related theories of similar logics---and also that there is no complete set of rules---as this language encompasses, e.g., arithmetic. The task of finding a suitable proof system for the assertion can thus be seen as a purely practical issue.

  \item The soundness proof for the specification logic is incomplete. Although the logic is hypothesized to be sound w.r.t.\ the model described later in the document, this is of course a serious drawback. Most of the relevant lemmas have been proved previously with various, preliminary versions of the model. There are no known problems with the model that are blocking a soundness proof beyond sheer size of the proof as derived from the complexity of the model. Indeed, the proofs are mostly straightforward inductions. Throughout the document, key intermediate properties (marked as propositions) that are expected to hold are noted along with, in some cases, proof sketches.
\end{enumerate}

Finally, beyond these technical omissions, it must be admitted that this dissertation gives little indication of how proofs ought to proceed in the logic; only a handful of small examples will be given later on. This is because the logic has evolved with soundness to the x86 memory memory model foremost in mind, instead of as a logical system of interest independent of its potential models. Although the examples presented do indicate that the logic is capable of highly non-trivial program reasoning, the extent of its capability is not yet well understood, and remains as perhaps the most important aspect of future work.

\chapter{Background}
\label{ch:background}

This chapter describes background material necessary to understand the technical content of this dissertation, beginning with mathematical preliminaries and frequently used notation in Section~\ref{sec:preliminaries}, followed by an overviews of research on program logics in Section~\ref{sec:program-logics} and memory consistency models in Section~\ref{sec:memory-models}.

\section{Mathematical Preliminaries}
\label{sec:preliminaries}

Terms are defined throughout this document with the following meta-notation: \[ \mathit{object} \eqdef \mathit{definition}.\] We additionally use the following meta-notation for defining predicates: \[ \mathit{predicate} \iffdef \mathit{definition}.\] For example, for a set $S$ and object $o \notin S$ we define \[S^o \eqdef S \uplus \set{o}\] as the extension of set $S$ by object $o$. Using this notation, a domain $S$ can be lifted to its \emph{optional domain} by writing $S^\bot$ as shorthand for $S \uplus \set{\bot}$, assuming $\bot \notin S$.

\subsection{Relations}
\label{sec:relations}

For any set $A$, we write $\ident{A}$ for the identity relation on $A$. For a binary relation $R$ on $A$ and $n \in \setnaturals$, we write $\relexp{R}{n}$ for the $n$-fold iteration of $R$, defined by induction on $n$ as follows: \begin{align*} \relexp{R}{0} \eqdef &\ident{A} \\
\relexp{R}{n+1} \eqdef & \relexp{R}{n} \rcomp R.\end{align*} We write $\tcl{R}$ and $\rtcl{R}$ for the transitive and reflexive-transitive closures of $R$, respectively: \begin{align*}
    \tcl{R} \eqdef \bigcup_{n \in \setpositives} \relexp{R}{n} \andif
    \rtcl{R} \eqdef \bigcup_{n \in \setnaturals} \relexp{R}{n}
\end{align*}

\subsection{Functions}
\label{sec:functions}

For a (possibly partial) function $f : A \pfun B$ and $a \in A$ and $b \in B$, we write $\funup{f}{\ptup{a}{b}}$ for the updated function: \[ \funup{f}{\ptup{a}{b}} \eqdef \fun{x} \begin{cases}
  b & \text{if $x = a$} \\
  f(x) & \text{otherwise.}
\end{cases}\]

We write $f(a) = \bot$ if the partial function $f$ is not defined at point $a$, i.e. if $a \notin \dom{f}$, and $\defined{f(a)}$ otherwise. The everywhere-undefined function is indicated by $\nil$. The partial sum $f \uplus g$ of partial functions $f$ and $g$ is defined, when $\dom{f} \cap \dom{g} = \nil$, as follows: \[ f \uplus g \eqdef \fun{x}\begin{cases}
    f(x) & \text{if $x \in \dom{f}$} \\
    g(x) & \text{else if $x \in \dom{g}$} \\
    \bot & \text{otherwise.}
\end{cases}\]

 When convenient, we also write $f_a$ as shorthand for $f(a)$. $a \mapsto b$ is shorthand for the unique partial function $f$ such that $f(a) = b$ and is undefined otherwise. For $A' \subseteq A$, $\restrict{f}{A'}$ is the restriction of $f$ to domain $A'$.

For (possibly partial) functions $f,g : A \pfun B$, we we write $f \override g$ for the result of \emph{overriding} $f$ with $g$: \[ f \override g \eqdef \fun{x} \begin{cases}
  g(x) & \text{if $x \in \dom{g}$} \\
  f(x) & \text{otherwise.}
\end{cases} \] An obvious property is that, if $g(a) = b$, for some $b \in B$, then also $(f \override g)(a) = b$. Some additional properties follow in Proposition~\ref{lem:override}.

\begin{proposition}
    \label{lem:override}
    Let $f,g,h : A \pfun B$.
    \begin{enumerate}
        \item $f \override \nil = \nil \override f = f$
        \item $f \override (g \override h) = (f \override g) \override h$.
        \item $\dom{f \override g} = \dom{f} \cup \dom{g}$.
        \item If $\dom{f} \cap \dom{g} = \nil$ then $f \override g = f \uplus g$, and hence $f \override g = g \override f$.
    \end{enumerate}
\end{proposition}

\subsection{Lists}
\label{sec:lists}

The empty list is denoted by $\lnil$, the literal list by $\llit{o,\ldots, o'}$, list construction by $o \lcons l$, and list concatenation by $l \lapp l'$, for objects $o$ and lists $l$. We write $\listof{\mathcal{T}}$ to indicate lists of elements drawn from the set $\mathcal{T}$, and $\arnil$ for the function $(\fun{x} \lnil)$.

For a list $l : \listof{(A \times B)}$, we write $\funof{l}$ for the corresponding partial lookup function: \[ \funof{l} \eqdef \fun{x} \begin{cases}
  b &\text{ if $l = l' \lapp \llit{(x,b)}$}\\
  \funof{l'}(x) &\text{ if $l = l' \lapp \llit{(y,b)}$ with $x \neq y$}\\
  \bot & \text{otherwise.}
\end{cases}\] For $A' \subseteq A$, $\restrict{l}{A'}$ is the sublist restriction of $l$ to domain $A'$.

For convenience, we lift these function definitions pointwise to sets of lists. For example, for a set $L$ of lists, $a \lcons L \eqdef \setof{a \lcons l}{l \in L}$.

The set of \emph{interleavings} of lists $m,n$, written $m \merge n$, is defined by recursion on the structure of $m$ and $n$: \begin{align*}
  m \merge \lnil &\eqdef \set{m} \\
  \lnil \merge n &\eqdef \set{n} \\
  %a \lcons m' \merge b \lcons n' & \eqdef \bigcup_{l_b \in m' \merge (b\lcons n')} \, \bigcup_{l_a \in (a \lcons m') \merge n'} (a \lcons l_b) \merge (b \lcons l_a) \\
  a \lcons m' \merge b \lcons n' & \eqdef a \lcons (m' \merge (b\lcons n')) \cup b \lcons ((a \lcons m') \merge n').
\end{align*}

We now define a subset of the interleavings of lists of pairs from $A \times B$ that play an analogous role to function overriding. The result of overriding a list $m$ with another $n$, written $m \override n$, is defined as follows: \[ l \in m \override n \iffdef \funof{l} = \funof{m} \override \funof{n}.\] As with function overriding, list overriding has the property that, if $\funof{n}(a) = b$, for some $b \in B$, and $l \in m \override n$, then $\funof{l}(a) = b$. Because all elements of $m \override n$ have the same lookup function, we may safely extend the list lookup notation as follows: \[ \funof{m \override n} \eqdef \fun{x} \funof{l}(x),\] for arbitrary $l \in m \override n$.

As for function overriding, list overriding has the basic property that, if $a \in \dom{\funof{n}}$, then $\funof{n}(a) = \funof{n \override m}(a)$. It also has the other following analogous properties, as noted by Proposition~\ref{lem:list-override}.
\begin{proposition}
    \label{lem:list-override}
    Let $l,m,n : \listof{A \times B}$.
    \begin{enumerate}
        \item $l \override \nil = \nil \override l = l$
        \item $l \override (m \override n) = (l \override m) \override n$.
        \item $\dom{\funof{m \override n}} = \dom{\funof{m}} \cup \dom{\funof{n}}$.
        \item If $\dom{\funof{m}} \cap \dom{\funof{n}} = \nil$ then $m \override n = m \merge n$, and hence $m \override n = n \override m$.
        \item $(m \lapp n) \in (m \override n) \subseteq (m \merge n)$.
    \end{enumerate}
\end{proposition}


\subsection{Universes}
\label{sec:universes}

The various universal sets are declared and in some cases defined in Figure~\ref{fig:universes}. Note that, in the case of memory locations (i.e., addresses into memory) and processor identifiers, 0 is excluded. Also note that we shall later use the single set of identifiers $\setidentifiers$ to represent both program variables and logical variables.

\begin{figure}[ht]
  \centering
  \begin{tabular}{rcl|l}
    \multicolumn{3}{c}{Set} & Description \\ \hline
    \multicolumn{3}{l|}{$\setidentifiers$} & Identifiers \\
    $\setvalues$ & $=$ &  $\setintegers$ & Values \\
    $\setlocations$ & $\subseteq$  &  $\setpositives$ & Memory locations \\
    $\setprocessors$ &$\subseteq$ &  $\setpositives$ & Processor identifiers
  \end{tabular}
  \caption{\label{fig:universes}Universal Sets}
\end{figure}


\section{Program Logics}
\label{sec:program-logics}

Given that the motivating problem is to reason about concurrent programs executing on a particular memory model, how might one approach the correctness of the program in Figure~\ref{fig:dekker} and others like it such as concurrent data structures? One solution---perhaps, for now, the best---is to reason directly about the program semantics in the following way:
\begin{enumerate}
\item formalize the semantics of the programming language using a general purpose logic (e.g., first-order logic, higher-order logic, type theory);

\item prove that the semantics agrees with the memory model;

\item characterize the intended program property using the general logic;

\item prove using the general logic that the semantic object which represents the program at hand possesses this property.
\end{enumerate}
This is a perfectly reasonable strategy and, by using a proof assistant for a selected general purpose logic (e.g., ACL2 \cite{DBLP:journals/tse/KaufmannM97}, Isabelle/HOL \cite{DBLP:books/sp/NipkowPW02}, or Coq \cite{CoqBook}), is within the realm of feasibility for many programs and some experts.\footnote{As an alternative to defining a high-level programming language semantics that comports with a high-level specification of a memory model, it is also possible to formally define the machine that implements the memory directly, and give the semantics of programs in terms of possible executions of this machine. This tack was taken, e.g., in work on reasoning about the behavior of multi-threaded Java programs w.r.t.\ a detailed model of the Java Virtual Machine \cite{DBLP:conf/jvm/MooreP01}.} But, due to the generality of the logic and complexity of the semantics of programs under study, one expects such formalizations and proofs to be exceptionally complex. And although experts are certainly able to develop methodologies and abstractions to tame this complexity, the desire to reason at a higher and more intuitive level is manifest.

This is just the purpose of a \emph{program logic}, which allows high-level formal reasoning that codifies the programmer's intuition about the behavior and correctness of the program under study. Ideally, the program logic incorporates those methodologies and abstractions that have been most useful to expert users reasoning directly about semantic objects in more general logics.

The situation is analogous to the use of temporal logics for studying reactive systems. Though it is technically possible to reason about about such systems using a general-purpose logic, both human reasoning (e.g., Unity \cite{unity-book}) and automation (e.g., model checking \cite{model-checking}) were facilitated by specialized logics.

\subsection{Hoare Logic}
\label{sec:hoare-logic}

Hoare introduced the first program logic for an Algol-like language in his seminal 1969 paper \cite{DBLP:journals/cacm/Hoare69}. A program $c$ is specified with a pair of assertions $P$ and $Q$, written in first-order logic, that describe pre- and post-execution system states, respectively. Hoare described two related logics, which differ in the style of specification: in the \emph{total correctness} logic specifications are written $\left\langle P \right\rangle c \left\langle Q \right\rangle$ and require program termination as a necessary condition for satisfaction; in the \emph{partial correctness} logic specifications are written $\htriple{P}{c}{Q}$ and allow divergent executions of the program to satisfy \emph{any} specification. In the sequel, we focus on logics of partial correctness.

The axioms and inference rules are either \emph{structural}, directed by the program syntax, or \emph{logical}, directed by the logical operations of the assertion language. The \textsc{choice} rule, for reasoning about nondeterministic choice command $\cchoice{c}{c'}$, is an example of a structural rule: \infrule[choice]{\htriple{P}{c}{Q} \andif \htriple{P}{c'}{Q} }{\htriple{P}{\cchoice{c}{c'}}{Q}} According to this rule, in order to prove a specification $\htriple{P}{\cchoice{c}{c'}}{Q}$, it suffices to prove the same specification of each of the constituent commands: $\htriple{P}{c}{Q}$ and $\htriple{P}{c'}{Q}$. This is because the nondeterministic choice command may execute either $c$ or $c'$, but not both; and so if each constituent command satisfies the specification, then so must the composed command.

The \textsc{disj} rule, for reasoning about specifications in which the primary connective of the pre-condition is a logical disjunction, is an example of a logical rule: \infrule[disj]{\htriple{P}{c}{Q} \andif \htriple{P'}{c}{Q}}{\htriple{P \disj P'}{c}{Q}} According to this rule, to prove a specification of an arbitrary comment in which the pre-condition is a disjunction, it suffice to prove a specifications in which the pre-conditions consist of each of the disjuncts. Intuitively, regardless of whether the command is executed in a state that satisfies $P$ or $P'$ it shall terminate in a state that satisfies $Q$, or diverge.

The structural and logical rules of Hoare logic make proof construction partially mechanical. But determining whether a program meets its specifications remains generally undecidable due to the expressiveness of the assertion language. For example, the \emph{rule of consequence} allows for the relaxation of specifications by the arbitrary strengthening of pre-conditions and weakening of post-conditions: \infrule[cons]{P' \Rightarrow P \,\,\,\,\,\, \left\lbrace P \right\rbrace c \left\lbrace Q\right\rbrace \,\,\,\,\,\, Q \Rightarrow Q' }{\left\lbrace P' \right\rbrace c \left\lbrace Q' \right\rbrace} For correct application of the rule, validity of the first-order logic implications must be proved. But first-order validity is, of course, undecidable in general, so while Hoare logic does ease some of the pain of proof construction, it is not a panacea.

%% Hoare logic has been thoroughly studied since its
%% introduction. Notably, it has been shown to be complete, in the sense
%% that true program specifications are provable if validity of
%% implications between assertions can be determined.

\subsection{Separation Logic}
\label{sec:separation-logic}

As successful as Hoare Logic has been, a significant drawback is its inability to soundly cope with pointer variables and dynamically allocated memory, thus severely complicating its application to low-level systems programs. To illustrate the problem, consider a simple program that updates the value at an address $x$ by writing to a dereferenced pointer: $\cstore{x}{2}$. To be clear, this program does not update the value $x$, but instead the value in memory at the address whose value is $x$. If we write $\mathit{heap}(x) = 2$ to mean that the value in memory address $x$ is equal to 2, then the following specification is clearly true: \[ \htriple{\mathit{heap}(x) = 1}{\cstore{x}{2}}{\mathit{heap}(x)=2}.\] We might then wish to prove a stronger specification, which describes the behavior of this program in a larger memory, with two allocated addresses $x$ and $y$: \[ \htriple{\mathit{heap}(x) = 1 \conj \mathit{heap}(y) = 1}{\cstore{x}{2}}{\mathit{heap}(x)=2 \conj \mathit{heap}(y) = 1}. \] This is easily derivable in Hoare logic using the \emph{rule of constancy}, which allows an arbitrary\footnote{Actually, there is a simple syntactic restriction on $F$: namely, that $\fv{F} \cap \mod{c} = \nil$; i.e., that the free variables of the frame are not modified by the command. This holds in the example because no variables are modified by dereferencing assignment.} assertion $F$---called a \emph{frame}---to be conjoined uniformly onto the pre- and post-conditions of a derived specification: \infrule[const]{\htriple{P}{c}{Q}}{\htriple{P \conj F}{c}{Q \conj F}.} The problem, of course, is that this strengthened specification is not true. In case the pointer variables $x$ and $y$ are \emph{aliased}---i.e., if $x = y$---after the pointer update it shall certainly not be the case that $\mathit{heap}(y) = 2$, for the update to the memory at $x$ also implicitly updated the memory at $y$.

In the presence of pointer variables, the rule of constancy---which is crucial to the scalability of Hoare logic---is only sound in case the memory locations referred to by the frame are disjoint or separate from those in the \emph{footprint} of the command; i.e., the part of the system state referenced by the program during its execution. To maintain soundness, therefore, proofs in Hoare logic about such programs require a variety of ad hoc extensions and onerous side conditions. (See, e.g., Richard Bornat's 2000 paper \cite{DBLP:conf/mpc/Bornat00}, which incorporates many sophisticated ideas into Hoare logic.) But after more than forty years of research, a major breakthrough finally came with the invention of \emph{separation logic} \cite{DBLP:conf/lics/Reynolds02}, generally credited to John Reynolds and Peter O'Hearn. Separation Logic is a Hoare-style program logic insofar as the axioms and inference rules are similar; the crucial difference between it and Hoare logic is the choice of assertion language. Instead of the classical first-order logic assertions used by Hoare logic specifications, separation logic makes use of a different logic---a theory of the logic of \emph{bunched implications} pioneered by O'Hearn, Pym and others \cite{DBLP:journals/bsl/OHearnP99,DBLP:journals/tcs/PymOY04}---for describing system states with a notion of a \emph{heap}---a finite partial function, into which pointers point, that represents part of memory---and disjointness of said states. The salient formulas that capture these notions are the \emph{points-to assertion}, $\ell \mapsto v$, and the \emph{separating conjunction}, $P \fsep Q$. Models of points-to assertion $\ell \mapsto v$ are heaps with exactly one address allocated, given by $\ell$, and with value $v$ stored at that address: i.e., $h \sentails \ell \fpointsto v$ iff $h = \set{(\ell,v)}$. Models of the separating conjunction $P \fsep Q$ are heaps that can be partitioned by address into two subheaps, one of which models the formula $P$ and the other $Q$: i.e., $h \sentails P \fsep Q$ iff $h = h_P \uplus h_Q$ such that $h_P \sentails P$ and $h_Q \sentails Q$.

Besides soundness w.r.t.\ a sequential C-like language with pointers and dynamic memory management, separation logic is important because it embodies the principle of \emph{local reasoning}. Unlike with Hoare logic, reasoning may be restricted to a program component's footprint, from which one may generalize to complete system states. O'Hearn, Reynolds and Yang informally describe local reasoning in the context of sequential pointer programs as follows \cite{DBLP:conf/csl/OHearnRY01}: \begin{quotation}\noindent To understand how a program works, it should be possible for reasoning and specification to be confined to the [memory addresses] that the program actually accesses. The value of any other [addresses] will automatically remain unchanged.\end{quotation}

Separation logic embodies the principle of local reasoning with its \emph{small axioms} and its \emph{frame rule}. The small axioms describe the programming language's primitive commands, specifying only their respective footprints. For example, the small axiom\footnote{Actually, an axiom schema parametrized by the expressions $e$ and $e'$.} for the pointer assignment command (i.e., store command) requires with its pre-condition only that the relevant location be allocated with some value,\footnote{$e \mapsto -$ is shorthand for $\exists v\,.\,e \mapsto v$.} and the resulting post-condition describes only the result of updating this location: \infax[store]{\left\lbrace e \mapsto -\right\rbrace \,\left[e\right] := e' \,\left\lbrace e \mapsto e'\right\rbrace} The local specification can then be generalized to a global specification using the \emph{frame rule}:\footnote{As in the rule of constancy, the frame rule also requires that $\fv{F} \cap \mod{c} = \nil$.}  \infrule[frame]{\htriple{P}{c}{Q}}{\htriple{P \fsep F}{c}{Q \fsep F}} The frame rule of separation logic replaces the rule of constancy of Hoare logic. It can be used to soundly derive the desired specification for the aforementioned program as follows: \[ \infer[\textsc{frame}]{\htriple{x \fpointsto 1 \fsep y \fpointsto 1}{\cstore{x}{2}}{x \fpointsto 2 \fsep y \fpointsto 1}}{\infer[\textsc{cons}]{\htriple{x \fpointsto 1}{\cstore{x}{2}}{x \fpointsto 2}}{\infer[\textsc{store}]{\htriple{x \fpointsto - }{\cstore{x}{2}}{x \fpointsto 2}}{}}}\]

Besides having been used to give human-readable proofs to a variety of algorithms that manipulate complex pointer-based data structures (e.g., the Schorr-Waite graph-marking algorithm \cite{Yang-thesis}), useful fragments of separation logic have been automated as part of program verifiers and static analyses, which have been successfully applied to programs with tens of thousands of lines of source code~\cite{DBLP:conf/fmco/BerdineCO05,DBLP:conf/cav/YangLBCCDO08,DBLP:conf/cav/BerdineCI11}.

\subsection{Concurrent Program Logics}
\label{ch:concurrent-program-logics}

Research into logics for concurrent programs has progressed independently from research into logics for increasingly expressive sequential programs. Major efforts are summarized below.

\paragraph{The Owicki-Gries Logic} Early attempts at handling concurrency within a program logic culminated in an extension of Hoare logic by Owicki and Gries \cite{DBLP:journals/cacm/OwickiG76}, which adds a rule for parallel composition of two program components with the cumbersome side condition that every assertion in one component's specification be invariant under the operation of each atomic command executed by the other component. While elegant in its simplicity, the side condition restricts the logic's usefulness. First, every application of the parallel composition rule requires a number of invariant proofs quadratic in the size of the components, making scalability difficult. Second, the side condition creates dependencies on the \emph{proofs} of the component specifications, not just on the specifications themselves. This effectively rules out independent proof construction for the individual components and yields a highly non-compositional logic.

\paragraph{Rely/Guarantee} Some relief from these problems was provided by Jones in his rely/guarantee logic \cite{JonesRelyGuar}, in which Hoare-style specifications are augmented with two additional assertions: the \emph{rely} condition, which bounds the interference from the environment that a component can tolerate while still meeting its pre- and post-specifications; and the \emph{guarantee} condition, which bounds the interference the program itself may inflict upon the environment. Application of the rely/guarantee parallel composition rule requires proofs that each process' rely condition subsumes the others' guarantee conditions.

While a considerable improvement over the Owicki-Gries logic---subsumption proofs linear in the number of components versus quadratic in their size, and dependence only among components' specifications instead of their proofs---rely/guarantee still has shortcomings. The logic cannot be considered truly compositional because each component may be specified with a variety of interference conditions, and it is not clear which are appropriate until attempting the parallel composition. Furthermore, it can be laborious to specify sufficiently strong guarantee conditions. For example, the guarantee condition for a component with three variables ($x,y,z$) that updates only one ($x := x+1$) must describe not just the relevant variable change ($x' = x + 1$), but also that all others remain the same ($\ldots \wedge y' = y \wedge z' = z$)---the latter condition being difficult because it suggests a sort of quantification over variable names not possible in first-order logic, instead requiring an explicit numeration of state variables.

\paragraph{RG-Sep} A partial solution to this problem of specification has recently appeared via separation logic. A new concurrent program logic from Vafeiadis and Parkinson, dubbed RG-Sep \cite{DBLP:conf/concur/VafeiadisP07} (``a marriage of rely/guarantee and separation logic''), mates a generalization of separation logic's assertion language and frame rules with the rely/guarantee logic. Besides inheriting the local reasoning features of separation logic, RG-Sep eases the pain of specifying environmental interference by semantically partitioning the system state into private and shared parts. A new class of boxed assertions $\boxed{P}$ is used to describe shared state, and the logical operations are adjusted so that, e.g., a separated conjunction of boxed assertions $\boxed{P} * \boxed{Q}$ allows their footprints to overlap.\footnote{An interesting consequence is that $\boxed{P} * \boxed{Q}$ is logically equivalent to $\boxed{P \wedge Q}$.} The proof rules are then modified so that, e.g., parallel composition requires only that each component be tolerant of environmental interference to shared state, not private state. This could be used in the previous example to obviate the explicit enumeration needed to describe invariance of the irrelevant state variables.

Embodied within RG-Sep (along with its successors \cite{DBLP:conf/esop/DoddsFPV09}) are some of the most advanced ideas about high-level reasoning techniques for fine-grained concurrent shared-memory programs, including local reasoning. Vafeiadis' 2008 dissertation \cite{VafeiadisDissertation} includes correctness proofs for a variety of complex, racy concurrent data structures using the logic. RG-Sep is by no means simple, but does yield relatively concise, readable proofs about difficult algorithms. Indeed, the only significant criticism leveled here is that, as with all the other concurrent logics discussed, RG-Sep is not sound w.r.t.\ weak memory models for racy programs.

\paragraph{Concurrent Separation Logic} A simplified variant of the original Owicki-Gries logic does away with the complicated interference side condition, instead restricting its scope to well locked\footnote{The exact definition of ``well locked'' is technical and rather complicated, but purely syntactic, and hence preferable to the logical interference side conditions imposed by the full Owicki-Gries logic.} and race-free programs. For this smaller (but still large and useful) class of programs, it is possible to use the Owicki-Gries logic to perform \emph{invariant-based} reasoning, in which threads' interaction with shared state may rely only on a specified invariant, and must always preserve that same invariant.

As with Jones' original rely/guarantee logic, this simplified Owicki-Gries logic suffers from the difficulty of constantly having to specify, at each step of the proof, not just what has changed in the program state, but also what has not changed. Once again, the solution is to incorporate the concurrent reasoning of the simplified Owicki-Gries logic in a separation logic-style logic. This logic, called Concurrent Separation Logic (CSL), originally developed by Peter O'Hearn and Stephen Brookes \cite{DBLP:journals/tcs/OHearn07,DBLP:journals/tcs/Brookes07}, does just that. Hoare-style specifications are elaborated with an invariant $I$: \[ \spec{I}{P}{c}{Q},\] which indicates informally that, from a state which can be partitioned into a shared part, which satisfies $I$, and a private part, which satisfies $P$, that 1) the program $c$ does not ever encounter a memory error; 2) that throughout execution some portion of the state always satisfies $I$; and 3) if the program terminates, then it does in a state which can be partitioned finally into a shared part that satisfies $I$ and a private part that satisfies $Q$. The \textsc{par} rule for reasoning about the parallel composition of program components $c$ and $c'$ is, by comparison with the other approaches, extremely simple: \infrule[par]{\spec{I}{P}{c}{Q} \andif \spec{I}{P'}{c'}{Q'}}{\spec{I}{P \fsep P'}{c \oppar c'}{Q \fsep Q'}} This rule asserts that if $c$ and $c'$ can be proved to maintain the same invariant along with their own private specifications, then the parallel combination $c \oppar c'$ must also maintain that invariant and, assuming the private portions of state accessed by the two components are disjoint, maintains their private specifications as well.

\section{Memory Consistency Models}
\label{sec:memory-models}

A \emph{memory consistency model} (or just \emph{memory model}) is a specification of a concrete implementation of a shared memory system. Memory consistency models bound the possible results of reading an address in shared memory according to the history of writes and reads that have already taken place or are in progress. For the sake of reasoning about the behavior of concurrent software programs interacting with a shared memory, memory consistency models are a crucial abstraction: not only are concrete implementations of shared memory in modern computer systems incredibly complex, but their details may additionally be considered proprietary and secret.\footnote{As a consequence of this complexity and secrecy, it can be challenge to determine whether or not a memory implementation is soundly described by a particular memory consistency model. This interesting problem is beyond the scope of this project.}

Many memory consistency models have been defined and studied \cite{SteinkeNutt,higham98weak,DBLP:journals/computer/AdveG96}. Below, we discuss (informally) two of particular interest: sequential consistency, the most commonly used abstraction of memory for well-behaved programs; and the x86-TSO memory model, which describes the interaction between memory and arbitrary software programs on x86-like multiprocessor computers.

\subsection{Sequential Consistency}
\label{sec:sc}

Sequential consistency was originally defined by Leslie Lamport in 1979 \cite{DBLP:journals/tc/Lamport79}. For a sequentially consistent program, he wrote: \begin{quotation}\noindent the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. \end{quotation}

As an example, consider again the program in Figure~\ref{fig:dekker}, in which thread 0 writes to $f_0$ and then reads from $f_1$; and thread 1 writes to $f_1$ and reads from $f_0$. If we write $(p,o,\ell,v)$ to indicate that thread $p$ performs operation $o$ (where $o \in \set{r,w}$ is either a read or a write) on location $\ell$ with value $v$ (indicate the value written or value read), then: \[ (0,w,f_0,1), (1,w,f_1,1), (1,r,f_0,1), (0,r,f_1,1)\] is a sequentially consistent execution, because the operations of each thread occur, within the execution, in the same order as the operations appear in their respective programs (with the writes preceding the reads), and the results of the loads clearly are consistent with the total order of operations given.

On the other hand, the following two executions are not sequentially consistent: \begin{align*}
  (0,r,f_1,0), (1,w,f_1,1), (1,r,f_0,0), (0,w,f_0,1) \\
  (0,w,f_0,1), (0,r,f_1,0), (1,w,f_1,1), (1,r,f_0,0),
\end{align*} In the first execution, it is not the case that the operations of thread 0 take place in the order specified by the program: the read precedes the write. In the second execution, the order of operations is consistent with the order specified by the program, but the result is not the same as if the operations were executed in that order: the final read of $f_0$ by thread 1 should not result in value 0, because it directly succeeds the write to $f_1$.

Operationally, this constraint on the possible execution traces of a program can be realized by modeling memory as a simple location-value map, in which reads and writes to memory happen atomically. That is, memory is represented by a (possibly partial) function $\setlocations \pfun \setvalues$; and the semantics of the program is defined such that no more than one thread may read or update the memory at once. This, of course, is the most widespread model of memory used in the semantics of imperative, shared-memory programs, and consequently the vast majority of program reasoning and verification techniques tacitly assume that programs are sequentially consistent; i.e., that all its executions can be described by its interactions with this simple model of memory.

\subsection{The x86-TSO Memory Model}
\label{sec:x86-TSO}

After a great deal of investigation, experimentation and discussion with manufacturers, Owens, Sarkar and Sewell published a specification \cite{DBLP:conf/tphol/OwensSS09} of the x86 memory model. This model has become widely accepted as accurate \footnote{An earlier paper \cite{DBLP:conf/popl/SarkarSNORBMA09}, suggested that the x86 memory model more closely resembled \emph{causal consistency}---a model in which there is a single agreed-upon order not just for the store events, but for all causally related memory events---but this was later contradicted by counterexamples that lead to the current proposed TSO model.} and forms the basis for a great deal of related research---e.g., the Comp-Cert project-in-progress \cite{Leroy-Compcert-CACM}, which attempts to construct a realistic (i.e., which performs realistic optimizations) and formally verified compiler for C and C++.

The model identified by Owens et al.~is essentially a \emph{total-store order} (TSO) model which, as the name implies, indicates that there is a single total order, agreed upon by all processors, that organizes store events. Their x86-TSO model is described as a collection of legal traces of memory events, defined both axiomatically and operationally. The latter, informally, is described in terms of \emph{write buffers}: per-processor FIFO queues of ``writes'' (i.e., location-value pairs). The informal semantics of store events entails buffering a new write in the processor's write buffer; the semantics of load events entails returning the value of the most recent buffered write to the intended location in the processor's write buffer or, if no such buffered write exists, of the shared memory.

Formally, the operational model is given with a labeled transition relation between machine states. These states are four-tuples $(R,m,B,l)$ in which: \begin{itemize}
      \item $R : \setprocessors \tfun \setidentifiers \pfun \setvalues$ represents a register file for each processor;
      \item $m : \setlocations \pfun \setvalues$ represents a shared memory;
      \item $B : \setprocessors \tfun \listof{(\setlocations \times \setvalues)}$ represents a write buffer for each processor;
      \item $l : \setprocessors^\bot$ represents a global lock.
\end{itemize}

Transitions (labeled by memory events) between states indicate the possibility and effect of those memory events. For example, in any state $(R,m,B,l)$, processor $p$ may write a value $v$ into its register $i$; i.e., it may update the state such that $R_p(i) = v$. Similarly, if $R_p(i) = v$ then $p$ may read value $v$ from its register $i$. A summary of the other events processor $p$ may perform is as follows: \begin{itemize}
      \item it may load from its write buffer the most recent value of a location---or, if a write to that location is not found in its write buffer, from memory---if the lock is either available ($i.e.$, the lock value is $\bot$) or is held by $p$, but not if some other processor $q \neq p$ holds the lock;
      \item it may store a value to a memory location by adding a new write to the head of its write buffer regardless of the status of the lock;
      \item it may flush (or, synonymously in this document, commit) the least recent write in its buffer to memory if it holds the lock or the lock is available;
      \item it may fence, flushing all writes buffered on $p$ to memory, resulting in an empty write buffer;
      \item it may acquire the lock (i.e., change the lock value in the current state to $p$) if the lock is available;
      \item it may release the lock (i.e., change the lock value in the current state to $\bot$) if it holds the lock.
\end{itemize}

The x86-TSO memory model includes all the sequentially consistent executions, as well as others. For example, the sequentially consistent execution from the previous section: \[ (0,w,f_0,1), (1,w,f_1,1), (1,r,f_0,1), (0,r,f_1,1), \] is shown valid (informally) under the x86-TSO model as follows. Assume in the sequel that no processor holds the lock. The first thread may perform its store operation by enqueuing a write $(f_0,1)$ in its buffer, and then immediately flushing that write to memory; and then similarly for the second thread, which enqueues a write $(f_1,1)$ and then immediately flushes it to memory. If the second thread then reads location $f_0$ it finds value 1 (because it has no other writes to $f_0$ in its buffer); and similarly if the first thread then reads location $f_1$ it finds value 1 for the same reason.

Next, consider again the two non-sequentially consistent executions of Figure~\ref{fig:dekker} from the previous section: \begin{align*}
  (0,r,f_1,0), (1,w,f_1,1), (1,r,f_0,0), (0,w,f_0,1) \\
  (0,w,f_0,1), (0,r,f_1,0), (1,w,f_1,1), (1,r,f_0,0),
\end{align*} The first non-sequentially consistent execution is invalid w.r.t.\ the x86-TSO model as well, because the operations of a single thread again do not take place in the order specified by the program. The second non-sequentially consistent execution, however, is valid under the x86-TSO model: the first process enqueues a write $(f_0,1)$ in its buffer, and then reads $0$ from location $f_1$ in memory (because its buffer contains no writes to $f_0$); then the second process enqueues a write $(f_1,1)$ in its buffer, and then reads $0$ from location $f_0$ in memory (because its buffer has no writes to $f_1$, only the other buffer).

When combined with the earlier claim that the x86-TSO memory model also includes all sequentially consistent executions, this example shows that x86-TSO memory model is strictly weaker than sequential consistency. Consequently, x86-TSO is considered to be a \emph{weak memory model}.

Note that, despite the completely informal examples above, this specification of the x86-TSO memory model only \emph{bounds} the sort of memory events that can occur in program executions; it does not give meaning to the programs of any  particular language, like x86 assembly or C programs. It can, however, be used to give semantics to programs in those languages. Owens et al.\ use their model as a guide in assigning semantics to a significant subset of the x86 assembly language, for example. The semantics of a simple C-like programming language used in this dissertation project, described in Sections~\ref{sec:sequential-programs} and~\ref{sec:programming-language}, is also guided by the x86-TSO memory model. And although it should be possible to prove that this semantics respects the bounds of the model, that is not the focus of the project. But even without such a correspondence proof, it should be clear that the semantics of the programming language to be described later is manifestly weak.

\subsection{Data-Race-Freedom Guarantees}
\label{sec:drf}

Although the memory models of most modern computer architectures---including, as we have seen, x86-like computer architectures---are relatively weak, allowing additional executions compared to a sequentially consistent memory model, the tacit assumption that programs are sequentially consistent made by most verification techniques is actually, in many cases, perfectly reasonable. This is because modern computer architectures, including x86, guarantee sequentially consistent execution for a large class of programs: namely, the data-race free programs.\footnote{An x86-specific notion of data-race, as well as theorems about data-race-freedom guarantees for the x86, are described in \cite{DBLP:conf/ecoop/Owens10}.} Because of these so-called data-race-freedom (DRF) guarantees, verification techniques which assume sequential consistency are perfectly sound in case the program under consideration is race-free. And because races are generally considered to be errors, a reasonable verification workflow is to first check that a program is race-free before proceeding with the verification of more elaborate properties; or, alternatively, to leverage techniques which succeed only for race-free programs. But not all correct programs are race-free; for these programs, verification techniques may need to account for the full complexities of the underlying memory model.

\chapter{A Sequential Program Logic}
\label{ch:uniprocessor}

In this chapter we describe a program logic for a sequential, single-processor, weak-memory programming model. The semantics of programs and assertions will be given in terms of system states with a single write buffer, and the logic is tailored to reason about the behavior of sequential programs w.r.t.\ these states. The logic developed in this chapter is not especially useful; the behavior of sequential program execution on this model is essentially the same as on the typical, strong-memory model (in which memory is modeled as a single array of addresses), and existing logics for reasoning about this behavior are certainly simpler than the one developed here. But the sequential program logic is a pedagogical stepping stone toward the concurrent program logic, for which the behavior of programs may well be significantly different from that of a strong-memory model. The development of the sequential logic is vastly simpler than the weak-memory, multi-processor logic, and many of the more difficult issues in that task can be introduced and explained more easily in a single-processor setting.

A detailed description and soundness proof of an earlier single-processor, weak-memory program logic was given previously \cite{wmsldetails}.

\section{An Example Sequential Proof}
\label{sec:ex-seq-proof}

Consider the following simple sequential program $c$, which loads the value at an address $x$ into variable $t$, writes the value $t+1$ into address $y$, and then flushes that write back to memory with a fence instruction: \[ c \eqdef \cload{t}{x} \opseq \cstore{y}{t+1} \opseq \cfence.\] This sequential program will fail with a memory error unless the value of $x$, from which the program loads, is a properly allocated memory address, as well as the value $y$, to which the program writes. Any provable specification of this program must require this of $x$ and $y$ initially. If the value of $x$ and $y$ are indeed allocated addresses, and the value in memory at address $x$ is initially some value $z$, then upon termination the write buffer will be empty and the value in memory at location $y$ will be $z+1$.

In the program logic we develop throughout this chapter, we express this informal specification of $c$ with the following formal specification: \begin{align}\label{eq:seq-ex-spec} \triple{x \fpointsto z \fsep y \fpointsto -}{c}{x \fpointsto z \fsep y \fpointsto z+1}.\end{align}
We write $e \fpointsto f$ to mean that the value in memory at the address given by the integer-valued expression $e$ is equal to the value given by expression $f$, or $e \fpointsto -$ if the value in memory at $e$ is irrelevant. We also write the \emph{separating conjunction} $e \fpointsto f \fsep e' \fpointsto f'$ to mean that $e$ and $e'$ are distinct, allocated memory addresses, with values in memory given by $f$ and $f'$ respectively. Hence, the pre-condition on the left asserts that $x$ and $y$ are allocated memory locations, the former with value $z$, and the post-condition on the right asserts that, upon executing the command $c$, the value at $x$ remains $z$ and the value at $y$ has been set to $z+1$.

Note that the specification in Equation~\ref{eq:seq-ex-spec} above would not be true if the traditional \emph{additive conjunction} $\conj$ were used instead of the separating conjunction. That is, the following specification: \[ \triple{x \fpointsto z \conj y \fpointsto -}{c}{x \fpointsto z \conj y \fpointsto z+1}.\] is not true because it allows for $x$ and $y$ to be \emph{aliased}---i.e., to denote the same memory address. For in that case the store to $y$ would change the value in memory at $y$ as well as at $x$, leaving $x \fpointsto z+1$ instead of $x \fpointsto z$ as specified. Also note that the specification in Equation~\ref{eq:seq-ex-spec} would not be true without the trailing $\cfence$ command because the buffered write will not necessarily have committed to memory at the moment the commands have completed their execution.

A \emph{proof sketch} of the program specification in Equation~\ref{eq:seq-ex-spec} is as follows, the details of which will be explained shortly:
\Sketch{

  \aconn{$x \fpointsto z \fsep y \fpointsto -$}

  \acomm{$\cload{t}{x}$}

  \aconn{$x \fpointsto z \fsep (y \fpointsto - \conj t = z)$}

  \acomm{$\cstore{y}{t+1}$}

  \aconn{$x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1) $}

  \acomm{$\cfence$}

  \aconnl{$x \fpointsto z \fsep y \fpointsto z+1$}

}
As the name indicates, a proof sketch provides a skeleton of a complete proof of a program specification. For a program $c_1 \opseq c_2$, a proof sketch may take the form:
\Sketch{

  \aconn{$P$}

  \acomm{$c_1$}

  \aconn{$R$}

  \acomm{$c_2$}

  \aconnl{$Q$}

} which indicates that a complete proof will include proofs of the sub-specifications $\triple{P}{c_1}{R}$ and $\triple{R}{c_2}{Q}$ that will be combined, in a final step, using a rule of inference (\textsc{seq}) for composing specifications of sequentially combined commands:
\[ \infer[{\textsc{seq}}]{\triple{P}{c_1 \opseq c_2}{Q}}{\infer{\triple{P}{c_1}{R}}{\vdots} & \infer{\triple{R}{c_2}{Q}}{\vdots}}\] where the vertical dots represent the remaining proofs to be supplied.

Returning to the proof sketch of the program specification in Equation~\ref{eq:seq-ex-spec}, we see that to complete the proof we must derive three sub-specifications, one for each primitive command: \begin{enumerate}
  \item $\triple{x \fpointsto z \fsep y \fpointsto -}{\cload{t}{x}}{x \fpointsto z \fsep (y \fpointsto - \conj t = z)}$
  \item $\triple{x \fpointsto z \fsep (y \fpointsto - \conj t = z)}{\cstore{y}{t+1}}{x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)}$
  \item $\triple{x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)}{\cfence}{x \fpointsto z \fsep y \fpointsto z+1}$
\end{enumerate}
The first specification asserts that the result of loading the value $z$ in memory at address $x$ into variable $t$ results in a state that is otherwise unchanged except that $t = z$. The second specification asserts that the result of storing the value $t+1$ to the address $y$ results in the addition of a new buffered write to the state, which may later be committed to memory, overwriting the current, unspecified value. The third specification asserts that the result of explicitly flushing this write to memory overwrites the existing value in memory at $y$ with the value $z+1$.

% When necessary for space reasons, we shall abbreviate the pre- and post-conditions of the three specifications above by $P_n$ and $Q_n$, as follows: \begin{align*}
% P_1 \eqdef & x \fpointsto z \fsep y \fpointsto - & Q_1 \eqdef & x \fpointsto z \fsep (y \fpointsto - \conj t = z) \\
% P_2 \eqdef & x \fpointsto z \fsep (y \fpointsto - \conj t = z) & Q_2 \eqdef &  x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1) \\
% P_3 \eqdef & x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1) & Q_3 \eqdef &  x \fpointsto z \fsep y \fpointsto z+1
% \end{align*}

We shall now derive these three specifications. For the first specification, we start by using an instance of the \textsc{load} axiom scheme to show that the result of evaluating the load command $\cload{t}{x}$ from pre-condition $x \fpointsto z$ yields the post-condition $x \fpointsto z \conj t = z$.
\[
  \infer[\textsc{load}]{\triple{x \fpointsto z}{\cload{t}{x}}{x \fpointsto z \conj t = z}}{}
\] We then use the \emph{spatial frame rule} \textsc{frame-sp} to extend the pre- and post-condition with a description of the value of an additional, distinct memory address $y$: \[
  \infer[\textsc{frame-sp}]{\triple{y \fpointsto - \fsep x \fpointsto z}{\cload{t}{x}}{y \fpointsto - \fsep (x \fpointsto z \conj t = z)}}{
    \infer[\textsc{load}]{\triple{x \fpointsto z}{\cload{t}{x}}{x \fpointsto z \conj t = z}}{}
  }
\] Finally, we observe that the derived pre- and post-conditions are not syntactically equal to those of the desired specification, but are logically equivalent: \begin{align*}
y \fpointsto - \fsep x \fpointsto z \sequiv & x \fpointsto z \fsep y \fpointsto - \\
y \fpointsto - \fsep (x \fpointsto z \conj t = z) \sequiv & x \fpointsto z \fsep (y \fpointsto - \conj t = z)
\end{align*} More generally, the derived pre-condition is logically implied by the desired pre-condition, and the derived post-condition logically implies the desired post-condition. Hence, we may strengthen the pre-condition and weaken the post-condition accordingly with the rule of consequence \textsc{cons}: \[
  \infer[\textsc{cons}]{\triple{x \fpointsto z \fsep y \fpointsto -}{\cload{t}{x}}{x \fpointsto z \fsep (y \fpointsto - \conj t = z)}}{
    \infer[\textsc{frame-sp}]{\triple{y \fpointsto - \fsep x \fpointsto z}{\cload{t}{x}}{y \fpointsto - \fsep (x \fpointsto z \conj t = z)}}{
      \infer[\textsc{load}]{\triple{x \fpointsto z}{\cload{t}{x}}{x \fpointsto z \conj t = z}}{}
    }
  }
\] This completes the derivation of the first specification.

For the second specification, we begin with an instance of the axiom scheme for the store command in which the pre-condition describes just the value of $y$ in memory: \[ \infer[\textsc{store}]{\triple{y \fpointsto -}{\cstore{y}{t+1}}{y \fpointsto - \fseq y \fwriteu t+1}}{}\] The post-condition describes, with the \emph{leads-to} assertion $y \fwriteu  t+1$, the addition of a new, buffered write to address $y$ with value $t+1$. The \emph{temporal separating conjunction} $y \fpointsto - \fseq y \fwriteu t+1$ indicates that the writes described on the left side precede those on the right side.

Next, we again apply the spatial frame rule \textsc{frame-sp} to extend the specification with a description of the value of memory at $x$ and the value of variable $t$, namely $(x \fpointsto z \conj t = z)$, which we abbreviate for space reasons as $F_2$ below:
\[ \infer[\textsc{frame-sp}]{\triple{F_2 \fsep y \fpointsto -}{\cstore{y}{t+1}}{F_2 \fsep (y \fpointsto - \fseq y \fwriteu t+1)}}{
  \infer[\textsc{store}]{\triple{y \fpointsto -}{\cstore{y}{t+1}}{y \fpointsto - \fseq y \fwriteu t+1}}{}
  } \]

The derived pre-condition is logically equivalent to---and, hence, is implied by---the desired pre-condition, and the derived post-condition logically implies the desired post-condition \begin{align*}
(x \fpointsto z \conj t = z) \fsep y \fpointsto - \sequiv & x \fpointsto z \fsep (y \fpointsto - \conj t = z)\\
(x \fpointsto z \conj t = z) \fsep (y \fpointsto - \fseq y \fwriteu t+1) \sentails & x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)
\end{align*} Hence, we finish the derivation of the second specification with an application of the rule of consequence \textsc{cons}:
\[ \infer[\textsc{cons}]{\triple{x \fpointsto z \fsep (y \fpointsto - \conj t = z)}{\cstore{y}{t+1}}{x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)}}{\infer[\textsc{frame-sp}]{\triple{F_2 \fsep y \fpointsto -}{\cstore{y}{t+1}}{F_2 \fsep (y \fpointsto - \fseq y \fwriteu t+1)}}{
  \infer[\textsc{store}]{\triple{y \fpointsto -}{\cstore{y}{t+1}}{y \fpointsto - \fseq y \fwriteu t+1}}{}
  }}
 \]

For the third and final specification, we begin with the axiom for the $\cfence$ command: \[ \infer[\textsc{fence}]{\triple{\femp}{\cfence}{\fbaru}}{}\] This small axiom allows for the introduction, from an empty system description, of the $\fbaru$ assertion. Intuitively, this assertion describes the effect of a barrier on the system state, in the sense that any writes that precede $\fbaru$ must necessarily be committed to memory. For example, we have the following logical equivalence: \[ y \fpointsto t+1 \sequiv y \fwriteu t+1 \fseq \fbaru \] This means that the description of $y$ having value $t+1$ in memory is equivalent to the description of a buffered write to address $y$ with value $t+1$ followed by a flush of that write to memory.\footnote{In fact, the points-to assertion shall later on in the description of the sequential logic be made definitionally equal to the temporal conjunction of the analogous leads-to assertion and $\fbaru$.} We wish to describe the effect of $\fbaru$ on the post-condition from the previous specification, so we extend the specification with the \emph{temporal frame rule} \textsc{frame-tm} and the frame assertion $x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)$, which we abbreviate for space below as $F_2$:
\[ \infer[\textsc{frame-tm}]{\triple{F_2 \fseq \femp}{\cfence}{F_2 \fseq \fbaru}}{
\infer[\textsc{fence}]{\triple{\femp}{\cfence}{\fbaru}}{}}\] The empty assertion is a unit w.r.t.\ both the spatial and temporal separating conjunctions, so the pre-condition assertion is equivalent to one without the final temporal conjunct $\femp$. And the post-condition is equivalent to one in which the buffered write has been flushed to memory, replacing the previous value in memory: \[ (x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)) \fseq \fbaru \sequiv x \fpointsto z \fsep y \fpointsto z+1\] So, once again, the derivation can be completed by an application of the rule of consequence: \[ \infer[\textsc{cons}]{\triple{x \fpointsto z \fsep (y \fpointsto - \fseq y \fwriteu z+1)}{\cfence}{x \fpointsto z \fsep y \fpointsto z+1}}{\infer[\textsc{frame-tm}]{\triple{F_2 \fseq \femp}{\cfence}{F_2 \fseq \fbaru}}{
\infer[\textsc{fence}]{\triple{\femp}{\cfence}{\fbaru}}{}}}\]

This completes the proof of the program specification of Equation~\ref{eq:seq-ex-spec}. In the following sections, the notions of program, system state, assertion, model, specification and proof are formally defined and discussed.

\section{Expressions and Stacks}
\label{sec:expressions}

Expressions are terms that denote values, which in this development are just integers. Hence, they are also used later on to denote memory locations and processor identifiers.

The language of expressions, written $\exprs$, is given by the following grammar: \[ \exprs~e \bnfdef v \bnfbar x \bnfbar (e + e') \bnfbar (e - e') \bnfbar \ldots, \] where $v \in \setvalues$ and $x \in \setidentifiers$. The possibility of additional operations is left open, as knowledge of the complete set of expressions is not particularly important for the purpose of describing the program logic.

The semantics of expressions is given w.r.t.\ \emph{stacks}, which are total functions from $\setidentifiers$ to $\setvalues$. The collection of functions $\setidentifiers \tfun \setvalues$ is abbreviated $\setstacks$. The interpretation of an expression w.r.t.\ a stack $s$ is given by the \emph{extension} of a stack, written $\ext{s}$, which is a total function from $\exprs$ to $\setvalues$ defined as follows: \begin{align*}
    \ext{s}(v) \eqdef & v \\
    \ext{s}(x) \eqdef & s(x) \\
    \ext{s}(e + e') \eqdef & \ext{s}(e) + \ext{s}(e') \\
    \ext{s}(e - e') \eqdef & \ext{s}(e) - \ext{s}(e')
\end{align*}

Boolean expressions are terms that denote truth values. Their language is given by the following grammar: \[ \bexprs~b \bnfdef \bexpf \bnfbar \bexpt \bnfbar (!b) \bnfbar (e = e') \bnfbar \ldots, \] where $e,e' \in \exprs$. For convenience, we represent truth values by the set $\set{0,1}$ so the extension of a stack can also be used to interpret boolean expressions: \begin{align*}
    \ext{s}(\bexpf) \eqdef & 0 \\
    \ext{s}(\bexpt) \eqdef & 1 \\
    \ext{s}(!b) \eqdef & 1 - \ext{s}(b) \\
    \ext{s}(e = e') \eqdef & \begin{cases}
        1 &\text{ if } \ext{s}(e) = \ext{s}(e')\\
        0 &\text{ otherwise.}
    \end{cases}
\end{align*}

The set of \emph{free variables} of a (boolean) expression $e$, written $\fv{e}$, is defined as usual: \begin{align*}
    \fv{v} \eqdef & \nil \\
    \fv{x} \eqdef & \set{x} \\
    \fv{e + e'} \eqdef & \fv{e} \cup \fv{e'} \\
    \fv{e - e'} \eqdef & \fv{e} \cup \fv{e'} \\
    \fv{\bexpf} \eqdef & \nil \\
    \fv{\bexpt} \eqdef & \nil \\
    \fv{!b} \eqdef & \fv{b} \\
    \fv{e = e'} \eqdef & \fv{e} \cup \fv{e'}
\end{align*}

For stacks $s,s'$ and $X \subseteq \setidentifiers$, we write $s \stcong{X} s'$ if, for all $x \in X$, $s(x) = s'(x)$. The following basic lemma uses this relation to connect the static and dynamic semantics of expressions.
\begin{lemma}
\label{lem:exp-stcong}
    If $s \stcong{\fv{e}} s'$ then $\ext{s}(e) = \ext{s'}(e)$.
\end{lemma}

\begin{proof}
By induction on the structure of the expression $e$.
\end{proof}

\section{Sequential Programs}
\label{sec:sequential-programs}

In this chapter, programs are identified with \emph{sequential commands}, which consist of compositions of \emph{primitive commands} for testing, accessing and modifying state.

The syntax of primitive commands is given by the following grammar: \begin{align*} \pcomms~p \bnfdef & \cskip \bnfbar \cassume{b} \bnfbar \cassert{b} \bnfbar \cassign{x}{e} \bnfbar \cload{x}{e} \bnfbar \\
    & \cstore{e}{e'} \bnfbar \cfence
\end{align*}

The informal meaning of the primitive commands is as follows. \begin{itemize}
    \item $\cskip$ takes no evaluations steps;
    \item $\cassume{b}$ evaluates to $\cskip$ if $b$ holds and becomes stuck otherwise;
    \item $\cassert{b}$ evaluates to $\cskip$ if $b$ holds and aborts otherwise;
    \item $\cassign{x}{e}$ assigns $e$ to identifier $x$;
    \item $\cload{x}{e}$ assigns the value at memory address $e$ to identifier $x$;
    \item $\cstore{e}{e'}$ stores the value $e'$ to memory address $e$; and
    \item $\cfence$ commits any buffered writes to memory
\end{itemize}

The formal semantics of the successful execution of a primitive command $p$ is given as a transition relation between machine states $\sigma,\sigma'$ \[ p : \sigma \step \sigma'. \] Informally, a triple $p,\sigma,\sigma'$ belongs to this relation if the successful execution of $p$ in state $\sigma$ may yield state $\sigma'$. (A formal interpretation will be given later in the context of a formal semantics of full commands.) A primitive command may alternatively execute unsuccessfully, indicated as follows \[ p : \sigma \step \abort. \] Executions may be unsuccessful as a result of failed assertions---e.g., $\cassert{\bexpf}$ is unsuccessful from any state---or attempts to access or modify a memory location outside a command's address space. We say that a primitive command $\emph{aborts}$ in such unsuccessful executions.

To define these relations formally, we must first define the notion of a uniprocessor memory system and a machine state.

\begin{definition}
A \emph{uniprocessor memory system} is a pair $(h,b)$, where: \begin{itemize}
    \item $h : \setlocations \pfun \setvalues$ is a \emph{heap}, i.e., a partial function that represents the allocated locations of shared memory and their values; and
    \item $b : \listof{(\setlocations \times \setvalues)}$ is a write buffer;
\end{itemize}
\end{definition}
The set of uniprocessor memory systems is abbreviated as $\setmemorysystems$.
The pair that consists of a stack $s$, as defined in Section~\ref{sec:expressions}, which assigns values to identifiers, and a memory system $\mu$ is called a \emph{state}, typically abbreviated by $\sigma$. The collection of states is written $\setstates$. We often abuse notation by interchanging memory systems and states in definitions for which the stack is irrelevant.

Note that the notion of machine state given here differs from that used to define the memory model. First and most obviously, there is only a single write buffer, because in this section we are discussing only the execution of sequential programs on uniprocessor machines. Second, the global lock value from the memory model is omitted because there is only one write buffer, and hence commands have no need to claim sole ownership of the memory system. Third, the set of names (i.e., ``registers,'' ``variables,'' ``identifiers,'' etc.) are global instead of local to each processor. This is for convenience only, and is not a technical restriction. The specification logic will be restricted to programs for which the names are partitioned among processes, except for those that are never modified. Another reasonable choice would have been to use local names only, and to share read-only values among processes in the shared memory. This has the advantage of codifying the above healthiness condition on programs directly into the model of the language and logic; it has the disadvantage of perhaps making the description of access to shared values slightly more awkward.

The definition of the semantic relation for primitive commands is given in Figure~\ref{fig:sequential-primitive-semantics}. By \textsc{p-assume}, the primitive $\cassume{b}$ takes an evaluation step only if the boolean expression $b$ evaluates to $1$ in the current state. Otherwise, it is effectively stuck. Later, when we describe the specification logic, we will see that such stuck executions are irrelevant to the truth of specifications. In fact, stuck executions are equivalent to diverging executions w.r.t.\ the specification logic. By \textsc{p-assert} and \textsc{p-assert-a}, $\cassert{b}$ either evaluates normally if the boolean expression $b$ evaluates to 1 in the current state, and aborts otherwise. An aborting command will later be shown to not satisfy any specification. By \textsc{p-assign}, the assignment primitive $\cassign{x}{e}$ always evaluates successfully by assigning to $x$ the value of the expression $e$ in the current state. By \textsc{p-load}, the load primitive $\cload{x}{e}$ assigns to $x$ the value of the most recent buffered write to the address that is the value of expression $e$, if such a write exists, and otherwise gives the value in the current heap at that address. If the address that is the value of expression $e$ has neither any buffered writes nor is defined in the heap (i.e., if $\ext{s}(e) \notin (h \override \funof{b})$) then by \textsc{p-load-a} the load primitive aborts. By \textsc{p-store}, the store primitive $\cstore{e}{e'}$ appends to the write buffer a new write with address the value of expression $e$ and value the value of $e'$, but only if the value of $e$ is an address that is already allocated (i.e., if $\ext{s}(e) \in \dom{h \override \funof{b}}$). Otherwise, by \textsc{p-store-a} the store primitive aborts. Finally, by \textsc{p-fence}, the $\cfence$ primitives evaluates without changing the state if the write buffer is empty. Later, when we describe the semantics of full commands, we will explain how the fence primitive can be thought to flush a non-empty buffer.

\begin{figure}[p]
    \centering

    \begin{minipage}{\columnwidth}

        \infrule[p-assume]{\text{ if  $\ext{s}(b) = 1$}}{\cassume{b} : (s,h,b) \step (s,h,b)}

        \vspace{1em}

        \infrule[p-assert]{\text{ if  $\ext{s}(b) = 1$}}{\cassert{b} : (s,h,b) \step (s,h,b)}

        \vspace{1em}

        \infrule[p-assert-a]{\text{ if  $\ext{s}(b) = 0$}}{\cassert{b} :(s,h,b) \step \abort}

        \vspace{1em}

        \infrule[p-assign]{}{\cassign{x}{e} :(s,h,b) \step (\funup{s}{\ptup{x}{\ext{s}(e)}},h,b)}

        \vspace{1em}

        \infrule[p-load]{\text{if $(h \override \funof{b})(\ext{s}(e)) = v$}}{\cload{x}{e} :(s,h,b) \step (\funup{s}{\ptup{x}{v}},h,b)}

        \vspace{1em}


        \infrule[p-load-a]{\text{if $(h \override \funof{b})(\ext{s}(e)) = \bot$}}{\cload{x}{e} :(s,h,b) \step \abort}

        \vspace{1em}

        \infrule[p-store]{\text{if $\ext{s}(e) \in \dom{h \override \funof{b}}$}}{\cstore{e}{e'} :(s,h,b) \step (s,h,b \lapp \llit{\ext{s}(e),\ext{s}(e')})}

        \vspace{1em}

        \infrule[p-store-a]{\text{if $\ext{s}(e) \notin \dom{h \override \funof{b}}$}}{\cstore{e}{e'} :(s,h,b) \step \abort}

        \vspace{1em}

        \infrule[p-fence]{\text{if $b = \lnil$}}{\cfence :(s,h,b) \step (s,h,b)}

    \end{minipage}
    \caption{\label{fig:sequential-primitive-semantics} Semantics of sequential primitive commands}
\end{figure}


Structured commands consist of either a primitive command; a sequential composition of commands; a nondeterministic (internal) choice between commands; or an iteration of a command. The language of commands is defined by the following grammar: \[ \comms~c \bnfdef p_e \bnfbar (\cseq{c}{c'}) \bnfbar (\cchoice{c}{c'}) \bnfbar \cloop{c},\] where $p$ is a primitive command.

The formal semantics of the successful execution of a command is given as a binary transition relation between command-state pairs: \[ c,\sigma \step c',\sigma'.\] But a command's execution may abort unsuccessfully as well, as with primitive commands. Unsuccessful executions are modeled as a transition relation between command-state pairs and an erroneous pseudo-state, as for primitive commands: \[ c,\sigma \step \abort. \] We refer collectively to command-state pairs and the erroneous state $\abort$ as \emph{configurations}, and use $\mathcal{C}$ to indicate a configuration. A configuration $\mathcal{C}$ is considered \emph{safe} if it does not abort: $\mathcal{C} \nostep \abort$.

The semantics of commands also encompasses ``silent'' transitions, which represent the flushing of buffered writes to the shared memory as allowed by the memory model. This flushing is described by a relation $\taustep$ between memory systems\footnote{As noted earlier, we abuse notation by interchanging the concept of state and memory system in definitions for which the stack is irrelevant. Hence, the definition of the relation between memory systems  also constitutes the definition of a relation between states such that $(s,\mu) \taustep (s,\mu')$ iff $\mu \taustep \mu'$.} defined as follows: \begin{align*} (h,b \lapp \llit{(\ell,v)}) \taustep (\funup{h}{\ptup{\ell}{v}},b)
\end{align*} We write $\taurefines$ as shorthand for the converse of the reflexive-transitive closure $\taustep$: \[ \mu \taurefines \mu' \iffdef \mu' \rtcl{\taustep} \mu.\]

The complete relation that defines the semantics of commands is given in Figure~\ref{fig:sequential-command-semantics} below. The semantics of primitive commands is lifted directly to the level of commands by \textsc{c-prim} and \textsc{c-prim-a}. The silent transitions as defined by the $\taustep$ relation are also lifted directly to the level of commands by \textsc{c-tau}. Sequential compositions $\cseq{c_1}{c_2}$ evaluate from left-to-right: by \textsc{c-seq} and \textsc{c-seq-a} if the left command $c_1$ evaluates or aborts, then so does the sequential composition; and by \textsc{c-seq-s}, once the left command $c_1$ has evaluated fully to $\cskip$ it is dropped so that evaluation of the right side $c_2$ may proceed. By \textsc{c-ch-1} and \textsc{c-ch-2}, the nondeterministic choice command $\cchoice{c_1}{c_2}$ may evaluate to either $c_1$ or $c_2$, afterward continuing evaluation of the chosen command. Finally, by \textsc{c-loop}, a looping command $\cloop{c}$ may always evaluate without modifying the state by expanding to a nondeterministic choice between doing nothing (i.e., exiting the loop) and sequential composition that consists of evaluating the loop body $c$ once and then continuing with the loop.

The reflexive-transitive closure of the command evaluation relation, written $c,\sigma \rtstep \mathcal{C}$, is defined as usual. The \emph{range} of a configuration $\mathcal{C}$, written \range{\mathcal{C}}, is defined as the set of states $\sigma$ for which $\mathcal{C} \rtstep \cskip,\sigma$.

\begin{figure}[ht]
    \centering

    \vspace{1em}

    \infrule[c-prim]{\text{ if $p : \sigma \step \sigma'$ and $\sigma' \in \setstates$}}{p,\sigma \step \cskip,\sigma'}

    \vspace{1em}

    \infrule[c-prim-a]{\text{ if $p : \sigma \step \abort$}}{p,\sigma \step \abort}

    \vspace{1em}

    \infrule[c-tau]{\text{if $\sigma \taustep \sigma'$}}{c,\sigma \step c,\sigma'}

    \vspace{1em}

    \infrule[c-seq]{c,\sigma \step c_0,\sigma'}{(\cseq{c}{c'}),\sigma \step (\cseq{c_0}{c'}),\sigma'}

    \vspace{1em}

    \infrule[c-seq-a]{c,\sigma \step \abort}{(\cseq{c}{c'}),\sigma \step \abort}

    \vspace{1em}

    \infrule[c-seq-s]{}{(\cseq{\cskip}{c'}),\sigma \step c',\sigma}

    \vspace{1em}

    \infrule[c-ch-1]{}{(\cchoice{c}{c'}),\sigma \step c,\sigma}

    \vspace{1em}

    \infrule[c-ch-2]{}{(\cchoice{c}{c'}),\sigma \step c',\sigma}

    \vspace{1em}

    \infrule[c-loop]{}{\cloop{c},\sigma \step (\cchoice{\cskip}{(\cseq{c}{\cloop{c}})}),\sigma}

    \caption{\label{fig:sequential-command-semantics} Semantics of sequential commands}
\end{figure}

\paragraph{Sequential Command Abbreviations} A few standard command abbreviations are shown in Figure~\ref{fig:sequential-command-abbreviations}. Some would benefit greatly from local variable declarations, which have not yet been added to the language.

\begin{figure}[ht]
    \centering
    \resizebox{\columnwidth}{!}{
    \begin{minipage}{\columnwidth}
    \begin{align*}
        \cifthenelse{b}{c}{c'} \eqdef & \cchoice{(\cseq{\cassume{b}}{c})}{(\cseq{\cassume{!b}}{c'})} \\
        \cifthen{b}{c} \eqdef & \cchoice{(\cseq{\cassume{b}}{c})}{(\cseq{\cassume{!b}}{\cskip})} \\
        \cwhile{b}{c} \eqdef & \cseq{\cloop{(\cseq{\cassume{b}}{c})}}{\cassume{!b}}
    \end{align*}
\end{minipage}}
    \caption{\label{fig:sequential-command-abbreviations} Sequential command abbreviations}
\end{figure}

\paragraph{Static Semantics} The static semantics of expressions, primitive commands and commands, embodied here by functions $\fv{-}$ and $\mod{-}$ associating these objects to their sets of free and modified variables, respectively, are completely standard. (Especially so because there are no name-hiding operations in the language, like the aforementioned missing local variable declaration command.) For example, $\fv{\cload{x}{y+1}} = \set{x,y}$ and $\mod{\cload{x}{y+1}} = \set{x}$.

The following lemma notes some basic facts about the relationship between the step relation $\step$ and the static semantics of commands.

\begin{lemma}
    \label{lem:step-stcong}
    If $c,s,\mu \step c',s',\mu'$ then:
    \begin{itemize}
        \item $\fv{c'} \subseteq \fv{c}$
        \item $\mod{c'} \subseteq \mod{c}$
        \item $s \stcong{\setidentifiers \setminus \mod{c}} s'$.
    \end{itemize}
\end{lemma}

\begin{proof}
    By a straightforward induction on the derivation of $c,s,\mu \step c',s\,\mu'$.
\end{proof}

\section{Locality and Separation}
\label{sec:locality}

This section gives an informal introduction to the notion of local reasoning, including a rough sketch of the idea of a local command. The latter will be codified formally later in the semantics of program specifications in Section~\ref{sec:sequential-specifications}.

The idea of local reasoning is as follows. Perhaps we wish to show that the result of evaluating a command $c$ in a particular state $\sigma$ is always included in a set of states $S$---i.e., that the configuration $c,\sigma$ does not abort, and that if $c,\sigma \rtstep \cskip,\sigma'$ then $\sigma' \in S$. Suppose the elements of the set $S$ are all composed of elements constructed from some other sets $S_0$ and $S_1$---i.e., $\sigma \in S$ iff, for some $\sigma_0 \in S_0$ and $\sigma_1 \in S_1$, $\sigma = \sigma_0 \bullet \sigma_1$ (where $\bullet$ indicates some unspecified state-constructing function)---and the initial state $\sigma = \sigma_0 \bullet \sigma_1$ with $\sigma_0 \in S_0$. In that case, we may wish to reduce the original problem to the potentially simpler task of showing that the result of evaluating $c$ in state $\sigma_1$ is always included in $S_1$---i.e., showing that the configuration $c,\sigma_1$ does not abort and that if $c,\sigma_1 \rtstep \cskip,\sigma'_1$ then $\sigma'_1 \in S_1$.

Under what circumstances is this reduction sound? First, it should be the case that if the command does not abort in the local state $\sigma_1$ then neither does it in the global state $\sigma = \sigma_0 \bullet \sigma_1$. Or, contrapositively, if $c,(\sigma_0 \bullet \sigma_1) \step \abort$ then $c,\sigma_1 \step \abort$. This is called the \emph{safety monotonicity} property. Second, it should be the case that if $c,(\sigma_0 \bullet \sigma_1) \rtstep \cskip,\sigma'$, then there exists $\sigma'_1$ such that $\sigma' = \sigma_0 \bullet \sigma'_1$ and $c,\sigma_1 \rtstep \cskip,\sigma'_1$. For then, by assumption $\sigma_0 \in S_0$, by the reduction $\sigma'_1 \in S_1$, and by the structure of $S$ $\sigma_0 \bullet \sigma'_1 = \sigma' \in S$. This is called the \emph{frame} property. A command that satisfies both the safety monotonicity and frame properties is called a \emph{local command} \cite{DBLP:conf/fossacs/YangO02}.

For example, $c$ is perhaps a command that reads and writes a particular set of memory addresses, and the starting state $\sigma$ perhaps describes the value of some addresses that are superfluous to the execution of $c$. We may decompose $\sigma$ by partitioning the memory addresses it describes into states $\sigma_0$ and $\sigma_1$ such that $(\sigma_0 \bullet \sigma_1)$, with $\sigma_1$ describing just the memory locations accessed by $c$ and the remainder by $\sigma_0$. Because $(\sigma_0 \bullet \sigma_1)$ contains all the memory addresses of $\sigma_1$ and more, then if $c$ can execute successfully (i.e., without aborting) from state $\sigma_1$ then it can also execute successfully from $\sigma$. And because the addresses of $\sigma_0$ are irrelevant to the evaluation of $c,\sigma$, those addresses will remain unchanged in the resultant state $\sigma'$, and hence $c,\sigma_1$ also evaluates to $\sigma'_1$ with $\sigma' = \sigma_0 \bullet \sigma'_1$.

This is the essence of local reasoning: leveraging some local property of a command in a local state to a related global property of the command in a global state. Of course, local reasoning is not possible for all properties---it relies crucially on the notion of decomposition $\bullet$---but when it is possible, it offers an especially direct path to showing the desired property. If the commands of a programming language are local, then the principle of local reasoning can be codified into a program logic by way of a \emph{frame rule}, which allows inference from local to global program specifications. This will be considered in more detail in Section~\ref{sec:specifications}.

In the example above, the state was separated by memory address. But in the programming model described in Section~\ref{sec:sequential-programs}, memory systems consist of both committed and buffered writes. How should we decompose (or separate) a memory system with buffered writes? Or, alternatively, how and when can we compose two partial memory systems? The goal of the next few sections is to carefully define a handful of different, useful notions of separation for which the sequential commands of the programming language are local. Each notions of separation will eventually yield a different frame rule, and hence a different principle of local reasoning about program specifications.

\subsection{Spatial Separation}
\label{sec:sequential-spatial-separation}

In this section we carefully define a notion of separation for uniprocessor states analogous to the one described in the previous section, in which states are decomposed according to memory address. This is accomplished by defining a notion of separation for memory systems, and then lifting that function to states that have identical stacks: i.e., given a definition of $\mu \bullet \mu'$, the lifted partial function on states $(s,\mu) \bullet (s',\mu')$ is defined as follows \[ (s,\mu) \bullet (s',\mu') \eqdef (s,\mu \bullet \mu') \text{~if $s = s'$ and $\mu \bullet \mu'$ is defined.}\] In the sequel, we shall ignore the distinction between the function on memory systems and the lifted function on states.

States are typically separated by the resources they describe. In the traditional, strong-memory heap model of separation logic, the resource is a flat shared memory and the heaps are separated according to address: \[ h_0 \ssep h_1 \eqdef \begin{cases}
    h_0 \uplus h_1 & \text{if $\dom{h_0} \cap \dom{h_1} = \nil$} \\
    \bot & \text{otherwise.}
\end{cases}\] Note that the partial function is defined only when the heaps have disjoint domains. It is also, very obviously, commutative.

We wish to define an analogous notion of separation for memory systems, which consist of a heap-write buffer pairs. As in the case of separation logic, we add heaps with disjoint domains. But how should we combine the write buffers? To ensure commutativity of the operation, a natural choice is to \emph{interleave} writes of the buffers: \[ (h_0,b_0) \ssep (h_1,b_1) \eqdef \bigcup_{b \in b_0 \merge b_1}
    \setof{(h_0 \uplus h_1, b)}{\dom{h_0,b_0} \cap \dom{h_1,b_1} = \nil}
\] Above, we write $\dom{h,b}$ as shorthand for $\dom{h} \cup \dom{b}$. The set $\mu_0 \ssep \mu_1$ is called the \emph{spatial separation} of $\mu_0$ and $\mu_1$ because it requires disjointness of the constituent domains, and does not constrain the order of of the buffered writes. (In a later section, we will weaken the disjointness requirement to yield a weaker notion of separation.)

Interleaving the write buffers results in a notion of separation in which the relative ordering between the writes in the constituent buffers, which necessarily have distinct memory locations, is irrelevant. For example, for $\mu_0 = (\nil,\llit{(\ell,v)})$ and $\mu_1 = (\nil,\llit{(m,u)})$, with $\ell \neq m$, we have both $(\nil,\llit{(\ell,v),(m,u)}) \in (\mu_0 \ssep \mu_1)$ and $(\nil,\llit{(m,u),(\ell,v)}) \in (\mu_0 \ssep \mu_1)$.

Unlike for the heap separation function, the heap-buffer separation function maps into the power-domain\footnote{Such algebras are called \emph{non-deterministic monoids} \cite{DBLP:conf/fsttcs/GalmicheL06}.} of memory systems: for compatible pairs of memory systems, the separation yields a memory system for each possible interleaving of the individual write buffers. The resulting set is non-empty if and only if the domains of the constituent memory systems are disjoint.\footnote{This is more convenient than defining a partial function into the power-domain. In that case, both $\bot$ and $\nil$ would apparently both indicate incompatibility.} This necessitates a slight change to the definition of a local command, as explained in Section~\ref{sec:locality}. Given a notion of separation that yields a set of memory systems, the safety monotonicity and frame properties must hold for each possible result: i.e., for every $\mu \in \mu_0 \ssep \mu_1$, if $c,\mu$ is safe then so is $c,\mu_1$; and if $c,\mu \rtstep \cskip,\mu'$ then there exists $\mu'_1$ such that $c,\mu_1 \rtstep \cskip,\mu'_1$ with $\mu' \in \mu_0 \ssep \mu'_1$.

We can informally observe locality as follows. For safety monotonicity, note that the domain of the memory systems $(\mu_0 \ssep \mu_1)$ are supersets of the domain of $\mu_1$, and so if, e.g., a load command does not abort with a memory error in $\mu_1$ then neither will it in $(\mu_0 \ssep \mu_1)$. For the frame property, a load in $\mu_0 \ssep \mu_1$ does not change the memory system, and since we have assumed that it does not abort in $\mu_1$ alone, then it will have the same result in $\mu_1$ as in $(\mu_0 \bullet \mu_1)$. For example, with $\mu_0 = (\nil,\llit{(\ell,v)})$, $\mu_1 = (\nil,\llit{(m,u)})$ and the load command $c = \cload{x}{\ell}$, it is clear that, for any $\mu \in \mu_0 \ssep \mu_1$, if $c,(s,\mu) \rtstep \cskip,(s',\mu')$ then also $c,(s,\mu_1) \rtstep \cskip,(s',\mu_1)$.

It is useful to to lift the definition of spatial separation from a partial function on states up to a total function on sets of states as follows: \[ S_0 \ssep S_1 \eqdef \bigcup_{\sigma_0 \in S_0} \bigcup_{\sigma_1 \in S_1} \set{\sigma_0 \ssep \sigma_1}.\] By overloading notation in this way, this allows us to write, e.g., $\sigma_0 \ssep (\sigma_1 \ssep \sigma_2)$ as shorthand for $\bigcup \setof{\sigma_0 \ssep \sigma_{12}}{\sigma_{12} \in \sigma_1 \ssep \sigma_2}$. It also allows us to assert associativity of the lifted function: \[ \sigma_0 \ssep (\sigma_1 \ssep \sigma_2) = (\sigma_0 \ssep \sigma_1) \ssep \sigma_2 \] Spatial separation as well as its lifting are also commutative, they have as units $(\nil,\lnil)$ and $\set{(\nil,\lnil)}$, respectively.

\subsection{Temporal Separation}
\label{sec:sequential-temporal-separation}

Spatial separation results in a set of states that encompasses all possible interleavings of the composed write buffers. Consequently, it is unsuitable for composing states with a particular interleaving in mind. Temporal separation does just this. Given memory systems $\mu_0$ and $\mu_1$, the \emph{strong temporal separation}, $\mu_0 \ssseq \mu_1$ is the element of the set $\mu_0 \ssep \mu_1$ in which the writes of $\mu_0$ all precede the writes of $\mu_1$. For example, for $\mu_0 = (\nil,\llit{(\ell,v)})$ and $\mu_1 = (\nil,\llit{(m,u)})$, the strong temporal conjunction $\mu_0 \ssseq \mu_1$ is given by: \[\mu_0 \ssseq \mu_1 = (\nil,\llit{(\ell,v),(m,u)}).\] Instead of interleaving the constituent write buffers as in spatial separation, they are \emph{concatenated} by temporal separation. For another example, let $\mu'_0 = (\ell \mapsto v, \lnil)$. Then $\mu'_0 \ssseq \mu_1 = (\ell \mapsto v, \llit{(m,u)})$. Again the writes of $\mu'_0$ (which are committed) precede the writes to $\mu_1$ in the composed state $\mu'_0 \ssseq \mu_1$. As a final example, consider $\mu'_1 = (m \mapsto u, \lnil)$ and the temporal separation $\mu_0 \ssseq \mu'_1$. The presumed result of this composition is $(m \mapsto u, \llit{(\ell,v)})$. But this violates the property of having the writes of the left-hand side precede the writes of the right-hand side because, in the composition, the committed write $m \mapsto u$ implicitly precedes the buffered write $(\ell,v)$. Consequently, in the definition of $\ssseq$ we explicitly rule out this case by requiring either that the left-side buffer or right-side heap be empty.

The complete definition of $\mu_0 \ssseq \mu_1$ is as follows: \[ (h,b) \ssseq (h',b') \eqdef \begin{cases}
  (h \uplus h', b \lapp b') & \text{if $\dom{h,b} \cap \dom{h',b'} = \nil$} \\ & \text{and $h' = \nil \vee b = \lnil$} \\
  \bot & \text{otherwise.}
\end{cases}\] Clearly $\mu_0 \ssseq \mu_1$ is defined when $\mu_0 \ssep \mu_1$ is defined and, because $b \lapp b' \in (b \merge b')$, $\mu_0 \ssseq \mu_1 \in (\mu_0 \ssep \mu_1)$. The argument for locality w.r.t.\ temporal separation is consequently similar to that for spatial separation.

The requirement that the constituent domains of the temporal separation $\mu_0 \ssseq \mu_1$ be disjoint is rather strong, however, and it is possible to eliminate this condition entirely. We refer, in the sequel, to  $\mu_0 \ssseq \mu_1$ as the \emph{strong temporal separation} of $\mu_0$ and $\mu_1$, and now define a more relaxed operation $\mu_0 \sseq \mu_1$, which we call a \emph{weak temporal separation}. As before, we illustrate this separation with a few examples before showing the complete definition.

First, consider $\mu_0 = (\nil,\llit{(\ell,v)})$ and $\mu_1 = (\nil,\llit{(\ell,u)})$. Their weak temporal separation $\mu_0 \sseq \mu_1$ simply concatenates the constituent write buffers, giving $(\nil,\llit{(\ell,v),(\ell,u)})$. Note that the semantics of load ensures that the result of loading $\ell$ in the context of $\mu_1$ is the same as for the context of $\mu_0 \sseq \mu_1$ because only the value of the most recent write to a particular location is returned.

Next consider also $\mu'_0 = (\ell \mapsto v,\lnil)$ and $\mu'_1 = (\ell \mapsto u, \nil)$. The weak temporal separation $\mu'_0 \sseq \mu_1$ is defined as for the strong variant: $(\ell \mapsto v, \llit{(\ell,u)})$. And the weak temporal separation $\mu_0 \sseq \mu'_1$ is undefined as for the strong variant because of the potential for the ostensibly more recent committed write from $\mu'_1$ preceding the buffered write of $\mu_0$.

Finally consider the weak temporal separation $\mu'_0 \sseq \mu'_1$. For the strong variant as well as for spatial separation this is, of course, undefined because the constituent domains are not disjoint. This is necessary because the result of adding the maps that represent heaps is  undefined in this case; for what would be the result of applying the hypothetical map $(\ell \mapsto v) \uplus (\ell \mapsto u)$ to $\ell$? Neither $u$ nor $v$ seem like suitable answers in general, but in the case of \emph{temporal} separation, we can answer confidently: the result should be $u$, because the committed $u$ write is more recent than the committed $v$ write. Consequently we use the map overriding operation to combine heaps in the definition of weak temporal separation.
\[ (h,b) \sseq (h',b') \eqdef \begin{cases}
(h \override h', b \lapp b') & \text{if $h' = \nil \disj b = \lnil$} \\
\bot & \text{otherwise.}
\end{cases}\]

One way to understand the choice of the overriding operation on heaps is w.r.t.\ an alternative, more concrete state model in which the heap is represented by a list of writes $l$ instead of a partial function. The list is intended to capture the complete history of committed writes in the same way that the buffer captures the history of uncommitted writes. The model of state given in Section~\ref{sec:sequential-programs} uses a partial function $h$ instead of a list of committed writes because only the most recent committed write to a particular location is relevant to the operational semantics. We can think of this model of state as an abstraction of the more concrete model in which committed writes are represented by lists. The abstraction function $\alpha$ that maps a concrete memory system $(l,b)$ into an abstract memory system $\alpha(l,b)$ is defined as follows:  \[ \alpha(l,b) \eqdef (\funof{l},b),\] where $\funof{l}$ is the lookup function for list $l$, as defined in Section~\ref{sec:lists}.

Let us again consider the definition of weak temporal separation. In the context of concrete states, the definition is completely natural: \[ (l,b) \sseq_\gamma (l',b') \eqdef \begin{cases}
(l \lapp l', b \lapp b') & \text{if $l' = \lnil \disj b = \lnil$} \\
\bot & \text{otherwise.}
\end{cases} \] This definition and the abstraction function given above provide a correctness criterion for a candidate definition of weak temporal separation on abstract states, namely that: \[ \alpha((l,b) \sseq_\gamma (l',b')) = \alpha(l,b) \sseq \alpha(l',b').\] It is easy to see that the definition given above for weak temporal separation for abstract states satisfies this criterion:
\Calc{

    $\alpha((l,b) \sseq_\gamma (l',b'))$

    \Conn{definition of $\sseq_\gamma$}

    $\alpha(l \lapp l', b \lapp b')$

    \Conn{definition of $\alpha(-)$}

    $(\funof{l \lapp l'}, b \lapp b')$

    \Conn{$l \lapp l' \in l \override l'$ and $m \in l \override l' \iffdef \funof{m} = \funof{l} \override \funof{l'}$}

    $(\funof{l} \override \funof{l'}, b \lapp b')$

    \Conn{definition of $\sseq$}

    $(\funof{l}, b) \sseq (\funof{l'}, b')$

    \Conn{definition of $\alpha(-)$}

    $\alpha(l, b) \sseq \alpha(l', b')$.
}

It is easy to see that both temporal separators are associative and, as for spatial separation, have $(\nil,\lnil)$ as a unit. Furthermore, the weak variant is defined whenever the strong variant is defined; and when both are defined, they are equal. We may also lift these functions up to the power domain, as we did with spatial separation: \begin{align*}
    S_0 \ssseq S_1 \eqdef \bigcup_{\sigma_0 \in S_0} \bigcup_{\sigma_1 \in S_1} \setof{\sigma_0 \ssseq \sigma_1}{\defined{\sigma_1 \ssseq \sigma_2}} \\
    S_0 \sseq S_1 \eqdef \bigcup_{\sigma_0 \in S_0} \bigcup_{\sigma_1 \in S_1} \setof{\sigma_0 \sseq \sigma_1}{\defined{\sigma_1 \sseq \sigma_2}}
\end{align*}

Finally, we note the fact that the strong temporal separation can be defined in terms of spatial separation and weak temporal separation: \[ \mu_0 \ssseq \mu_1 = \mu \,\,\iff\,\, \mu_0 \sseq \mu_1 = \mu \conj \mu \in \mu_0 \ssep \mu_1.\] We will leverage this fact in the sequel to simplify the rest of the development.

\subsection{Spatiotemporal Separation}
\label{sec:sequential-spatiotemporal-separation}

Both spatial and temporal are restrictions of a more general, unifying notion of separation. We write $\mu_0 \shash \mu_1$ for the \emph{spatiotemporal separation} of memory systems $\mu_1$ and $\mu_2$, defined as follows: \[ (h_0,b_0) \shash (h_1,b_1) \eqdef \bigcup_{b \in b_0 \override b_1} \setof{(h_0 \override h_1, b)}{\dom{b_0} \cap \dom{h_1} = \nil}.\]

For example, consider $\mu_0 = (\nil,\llit{(\ell,1),(m,2)})$ and $\mu_1 = (\nil,\llit{(\ell,3),(n,4)})$. Note that $\mu_0$ and $\mu_1$ are not strongly disjoint, and hence $\mu_0 \ssep \mu_1$ and $\mu_0 \ssseq \mu_1$ are undefined. The weak temporal separation $\mu_0 \sseq \mu_1$ is, on the other hand, defined and equal to $(\lnil,\llit{(\ell,1),(m,2),(\ell,3),(n,4)})$. The spatiotemporal separation $\mu_0 \shash \mu_1$ includes additionally the following states: \begin{itemize} \item $(\lnil,\llit{(\ell,1),(\ell,3),(m,2),(n,4)})$
  \item $(\lnil,\llit{(\ell,1),(\ell,3),(n,4),(m,2)})$
  \item $(\lnil,\llit{(\ell,1),(\ell,3),(m,2),(n,4)})$
\end{itemize} Note in particular that the latter write to $\ell$, with value $3$, does not precede the earlier write to $\ell$ with value 1, but otherwise all other interleavings are included. This is crucial to a locality argument because it ensures that a load in state $\mu_1$, if safe, will have the same result as a load in the separated state $(\mu_0 \shash \mu_1)$.

It is easy to see that spatiotemporal separation generalizes both spatial and temporal separation. In the spatial case, the strongly disjoint definedness condition obviously implies the weakly disjoint definedness condition for spatiotemporal separation. And when the memory systems are strongly disjoint $h_0 \override h_1 = h_0 \uplus h_1$ by Lemma~\ref{lem:override} and $b_0 \override b_1 = b_0 \merge b_1$ by Lemma~\ref{lem:list-override}. For the strong temporal case, the definedness conditions are again obviously stronger, and $b_0 \lapp b_1 \in b_0 \override b_1$. In the weak case, $b_0 = \lnil \disj h_1 = \nil$ implies $\dom{b_0} \cap \dom{h_1} = \nil$, and again $b_0 \lapp b_1 \in b_0 \override b_1$.

As with spatial separation, we lift spatiotemporal separation up to a function on the power domain of memory systems (and states), and abuse notation to refer to whichever function is appropriate in context: \[ S_0 \shash S_1 \eqdef \bigcup_{\sigma_0 \in S_0} \bigcup_{\sigma_1 \in S_1} \set{\sigma_0 \shash \sigma_1}.\] Spatiotemporal separation is associative and has $(\nil,\lnil)$ as a unit as for the other separators, but it is not commutative.


\subsection{Flushing Closure}
\label{sec:uniprocessor-predicates}

In Section~\ref{sec:locality} we considered the task of showing, for some configuration $C$ and set of states $S$, that $C$ is safe and, if it evaluates to a configuration $\cskip,\sigma$ then $\sigma \in S$. The second part of this task is equivalently restated as requiring that $\range{C} \subseteq S$. The set $\range{C}$ has a special structure worth noting: namely, it is \emph{down-closed w.r.t.\ the flushing order}. That is, if $\sigma \in \range{C}$ and $\sigma \taustep \sigma'$ then $\sigma' \in \range{C}$ as well. This is a consequence of the fact that the nondeterministic flushing of buffered writes is incorporated in the evaluation semantics of programs; from \textsc{c-tau}, if $\sigma \taustep \sigma'$ then $c,\sigma \step c,\sigma'$. Hence, if $C \rtstep \cskip,\sigma$ (by assumption that $\sigma \in \range{C}$) and $\sigma \taustep \sigma'$, then \[ C \rtstep \cskip,\sigma \step \cskip,\sigma'.\] By transitivity, $C \rtstep \cskip,\sigma'$ and so by definition of the range of a configuration, $\sigma' \in \range{C}$.

% We call sets of states that are down-closed w.r.t.\ the flushing order \emph{predicates}.
For example, the set $S_0 \eqdef \setof{(s,\nil,\llit{(\ell,v)})}{s \in \setstacks}$, which consists of states that have a single buffered write, is not closed because that write, according to the definition of the flushing relation $\taustep$ in Section~\ref{sec:sequential-programs}, may nondeterministically commit to memory as follows: \[(\nil,\llit{(\ell,v)}) \taustep (\ell \mapsto v,\lnil),\] but there is no stack $s$ such that $(s, \ell \mapsto v, \lnil) \in S_0$. On the other hand, the set $S_1 \eqdef \setof{(s,\ell \mapsto v,\lnil)}{s \in \setstacks}$ is closed because each include state is completely flushed, and so none of the included states may take additional flushing steps beyond the bounds of $S_1$. The set $S_0 \cup S_1$ is also closed because the elements of $S_0$ may step to elements of $S_1$. Furthermore, it is easy to see that both the empty set and the set of all states are closed, and that closure is preserved by union and intersection.

Because the sets $\range{C}$ are closed, we may focus our attention on showing the correctness of $C$ w.r.t.\ sets $S$ that are also closed. For if $S$ does not have this special structure, then either it will not be the case that $\range{C} \subseteq S$ (if, e.g., $C \rtstep \cskip,\sigma \step \cskip,\sigma'$ with $\sigma \in S$ but $\sigma' \notin S$), or it will be possible to demonstrate a stronger property of the configuration; namely that $\range{C} \subset S'$, for some closed set of states $S' \subseteq S$.

In Section~\ref{sec:locality} we also described a particular strategy for showing $\range{C} \subseteq S$ that we referred to as local reasoning. That strategy relied on the ability to consider the set $S$ as a composition of sets $S_0$ and $S_1$; i.e., for some notion of separation $\bullet$, it must be the case that $S = S_0 \bullet S_1$. Because we choose to restrict our attention to closed sets, it should be the case that the notion of separation preserves the property of being closed: if $S_0$ and $S_1$ are down-closed w.r.t.\ the flushing order, then $S$ ought to be as well. And, in fact, for the three notions of separation introduced in Sections~\ref{sec:sequential-spatial-separation},~\ref{sec:sequential-temporal-separation}~and~\ref{sec:sequential-spatiotemporal-separation}, this turns out to be the case.
\begin{proposition}
    \label{lem:separation-preserves-closure}
    If $S_0$ and $S_1$ are closed w.r.t.\ the flushing order, then so are:
    \begin{enumerate}
        \item $S_0 \ssep S_1$;
        \item $S_0 \sseq S_1$; and