Episode 11: Interview with Don Chamberlin, designer of SQL database language

October 12, 2017

|

Pramod Shashidhara

 

Don Chamberlin holds a B.S. degree from Harvey Mudd College and a Ph.D. from Stanford University. For many years, he worked at Almaden Research Center, researching database languages and systems. He was a member of the System R research team that developed much of today’s relational database technology and, together with Ray Boyce, he designed the original SQL database language.

More recently, he was a member of the W3C Working Group on XML Query Languages and an editor of the XPath 2.0 and XQuery language specifications, which became W3C Recommendations in 2007. With Jonathan Robie and Dana Florescu, he designed the Quilt language, which became the basis for the design of XQuery.

He likes to teach and recently taught a Java programming class at University of California, Santa Cruz. For the last several years he has been a judge and problem contributor to the ACM International Collegiate Programming Contest.

Interview…

Pramod: Hello, Listeners Welcome to 11th Episode of Mapping The Journey. Today Every computer engineer knows the importance of database and database management systems. These systems manage a massive amount of data efficiently and allow users to perform multiple tasks at ease. SQL (Structured Query Language) is used to communicate with a database.  Communication has been made so easy using SQL.

In 1970 Don Chamberlin and Raymond Boyce two IBM Researchers developed SQL. I always wondered how these brilliant men designed this beautiful and natural Language called SQL. Well, Today I have the opportunity to speak to Don Chamberlin, the original designer of SQL database language. He is an ACM, IBM, IEEE fellow and has won many prestigious awards for his contributions. With great honor, I welcome Mr. Don to the show.

Pramod: Don tell us about your early life, when were you introduced to Computers?

Don Chamberlin: The story started in the 1960’s when I was a college student at Harvey Mudd College in southern California. There was precisely one computer on the Harvey Mudd campus, an IBM 1620 in the basement of the science building. I loved programming that machine and wrote a program to play tic-tac-toe against a human on a keyboard. Programs were written on punched cards. You could sign up for a session on the computer; with luck, you might get one session per day.

During my college years, I had a summer job at a service bureau, where I would hang tapes for input to an IBM 7090 mainframe computer, and manage the line-printer that printed the output.

Pramod:Yeah I can imagine that computers in 1960. Awesome. Tell us about the early days of relational database systems. According to you why is Relational Database Management System is an important idea?

Don Chamberlin: In the 1960s, computer time was very expensive, costing hundreds of dollars per minute. As a result, computers were used only by big organizations. Data was stored on tapes, and each tape was dedicated to a specific application. So a company might have a tape for inventory control, and another for purchasing, and another for accounts receivable. Each tape was in a specialized format that was designed for a specific program.

Charlie Bachman first proposed the idea of a database management system. Charlie developed a system called the Integrated Data Store, IDS, at General Electric in 1963. Charlie’s idea was that data should be treated as a corporate resource, stored permanently on a disk and shared among many applications. Disks permitted access to data records by their keys rather than reading them sequentially from a tape. A record on a disk could contain pointers to other related records, forming a network or “data space” of information. Charlie thought that the way to retrieve information from a database should be to write a program that navigates through this data space, following pointers until you have the desired information. For this insight, Charlie received the ACM Turing Award, the highest honor in computer science, in 1973. Charlie’s Turing Lecture was titled “The Programmer as Navigator.”

The next significant change in data management was brought about by Ted Codd at IBM Research. Ted proposed what he called the Relational model of data, in which data was organized into rows and columns, like a table. Relationships could be represented by matching values in the tables, rather than by physical pointers between records. Codd said that data should be accessed by a high-level, nonprocedural language that is independent of the physical details of how data is stored. His motto was: “Tell me what you want, not how to find it.”

In Codd’s vision, an optimizing compiler would find an efficient execution plan for your high-level query. This idea was much like the earlier development of high-level programming languages like Fortran, which were compiled down into machine language. High-level languages make queries easier to write and maintain. But they raised some of the same controversies as high-level programming languages: could an optimizing compiler generate a program that would run as efficiently as a handwritten one? For his development of the relational data model, Ted Codd received the ACM Turing Award in 1981.

Pramod: That’s a great explanation. Thanks to Charlie Bachman for his proposal on Database Management System and Ted Codd for Relational Model.Such revolutionary ideas. So now there was a need for a language for relational databases. Take us on the journey. When did you get the idea for SEQUEL?

Don Chamberlin: The summer when Codd published his paper on RDBMS, I had just finished my Ph.D. at Stanford, and I was on my way to join IBM Research in New York. The mission of IBM Research is to explore promising technologies that may influence future IBM products. IBM Research was investigating Codd’s relational idea, to find out whether an optimizing compiler for an RDBMS could be made efficient. To investigate this idea, they planned to build a prototype and experiment with it. The project was centered at Codd’s location in San Jose, CA. I was sent back from New York to CA to be part of this group. I was joined there by my friend Ray Boyce, who had just finished his Ph.D. at Purdue.

Ray and I thought that Codd’s relational ideas were great, but they had one flaw. Codd was trained as a mathematician, and he framed all his concepts in mathematical terms. He never talked about tables. He called them relations. He defined a “relation” as “a subset of the Cartesian product of a set of domains.” He proposed two languages, called the Relational Algebra and Relational Calculus. Both were full of mathematical symbols, like existential and universal quantifiers. These symbols were puzzling to non-mathematicians, and couldn’t be typed on a keyboard.

Ray and I thought that the essential beauty and simplicity of Codd’s ideas were obscured by all this math terminology.

We designed a language that we called SEQUEL: Structured English Query Language. We called the data structures tables, not relations. We used simple English keywords like SELECT FROM WHERE that are easy to understand and type. The basic ideas were Codd’s, but we framed them in terminology that anyone could understand.

Ray and I wrote a paper about the SEQUEL language. Part of the mission of IBM Research is to contribute to science, and IBM had no product plans based on SEQUEL at the time, so they allowed Ray and I to publish our SEQUEL paper at a technical conference in Ann Arbor, Michigan in May 1974. Ray and I flipped a coin to decide who would get to go to Ann Arbor, and Ray won the toss. At the conference in Ann Arbor, Ray got to see the famous debate between Ted Codd and Charlie Bachman, comparing Codd’s relational approach with Bachman’s navigational approach. I would have loved to see this discussion. I think it was the last time that anyone seriously tried to defend the concept of a navigational database. Starting in 1974, I think the writing was on the wall that relational databases were the future of database management.

One month after the Ann Arbor conference, my friend Ray died suddenly of a brain aneurysm at age 25. He left behind a wife and one-year-old daughter. My family has stayed in touch with Ray’s wife and daughter for the last forty years; camping every summer. Ray’s daughter Kristin is now a manager at a data security company.

Pramod: I’m so sorry to hear about Raymond Boyce, such an impact he made in his early 20s. And thank you for taking us through the journey of SEQUEL. It was Wonderful hearing about it. As you already mentioned relational databases were the future. Now tell us about the early implementations of Relational Database Systems. You were one the managers of IBMs System R project, one of the first prototype of RDBMS.

Don Chamberlin: Work continued at IBM Research to prove the concept of an RDBMS. The project to build a relational prototype was called “System R.” I was one of the managers of this project. We had a talented team of about 12 IBM programmers. A similar team was put together at UC Berkeley, called the Ingres project, under the leadership of Prof. Michael Stonebraker. Their objective was the same as ours: to prove the concept of an RDBMS. The Berkeley team based their prototype on another high-level query language called QUEL.

Both the System R project and the Ingres project at Berkeley were successful and resulted in commercial products that became available in the late 1970s and early 1980s. The commercial version of SEQUEL took the shorter name, SQL.

The 1970s and 80s were an extraordinary time in the history of computing. The price of computers had fallen to the point where nearly all companies and many other organizations were converting their records from paper and putting them into computers. As a result, there was a large market for database systems, and Relational systems were ready to fill the bill. Tables were simple and easy to understand, and English-keyword languages like SQL were easy to learn.  Developing applications for relational systems was fast and easy compared with earlier approaches. In the early 1980s, either SQL, developed at IBM, or Quel, developed at UC Berkeley, could have become the dominant relational language.

Pramod: Fantastic. As you said SQL developed at IBM and also there was Quel developed at UC Berkeley. But SQL went on to be very successful language and later adopted as ANSI and ISO standard. Why do you think SQL became the most successful relational language?

Don Chamberlin:  I think one thing tipped the balance in favor of SQL. A young man named Larry Ellison founded a startup company in San Jose, named Software Development Laboratories, and that company took an interest in the language specification that Ray and I had published. They developed their own implementation of the language on a DEC minicomputer. The startup company was small and nimble, and their SEQUEL-based system beat IBM’s to the market by nearly two years–also, running on a minicomputer, it was more affordable than a mainframe system. Software Development Laboratories marketed their product under the name Oracle, and later took on that name for the whole company. Oracle became very successful selling SQL database systems to the U.S. Government—-so successful, in fact, that the government released something called FIPS 127, a Federal Information Processing Standard, that specified the use of SQL in federal database systems. 

SQL was also adopted by ANSI as an American National Standard, and by ISO as an International Standard. The existence of these standards did a lot to make applications portable from one database system to another, and to make SQL ubiquitous in the database industry. During the 1980s, every major supplier of database software brought an SQL product to market.

Pramod: Wow Nice to know that Larry Elision, founder of Oracle played an important role in the success of SQL. And the company Oracle is very successful today. For decades after the introduction of RDBMS, it was the way to go for storing data. Why is there a new wave of database systems called “NoSQL” systems? Will these systems replace the SQL systems that are currently in use?

Don Chamberlin: These days you hear a lot about “No SQL” systems. These are systems that are designed to relax some of the constraints that traditionally apply to relational systems. In a relational database, all the data is organized into rows and columns. That is an excellent way to store a large number of data items that all share the same simple structure, like bank accounts or airline reservations or auction items. Tables are very well adapted for storing these kinds of data, and a lot of business data falls into this category. So tables and the SQL language are alive and well and are going to be with us for a long time.

But there’s a class of applications for which data items have a more complex structure, and in which the data structure varies from one item to another. For example, imagine that you are storing information about insurance policies. Some policies have only one driver, and others have many drivers. Some drivers have had no accidents, and others have had many. Some accidents have no witnesses and others have more than one. It’s possible to represent all this in rows and columns of tables, but you have to split it up and store it in several different tables, and that can make applications more complicated to write. A non-relational, or “No SQL” system, might allow you to store all the data from an insurance policy in one place, using a more complex data format like JSON. Non-relational systems are also often designed for very large-scale distributed applications, in which data changes do not need to be immediately effective on all nodes, but are allowed some time to propagate through the network. This feature is called “eventual consistency” in contrast to the “immediate consistency” of relational systems.

So I interpret “No SQL” as “N. O. SQL”, meaning “Not only SQL.” I don’t think there’s a single data structure or language that is best for all applications. I think that relational databases and SQL systems are still going to be with us for a long time. But I also believe that other kinds of database systems, with features like eventual consistency and JSON data structures, are going to be increasingly important for very large-scale, high-performance applications.

Pramod: Very well said, RDBMS and NoSQL are used for an entirely different class of Applications. Definitely for Transactions its RDBMS. Looking back on your career, what has been the most satisfying part of your technological journey?

Don Chamberlin: It’s very gratifying to me that, 43 years after Ray Boyce and I published the first SEQUEL paper, SQL is still the world’s most widely used database query language. I’m also very gratified to see that SQL is being used in a vast variety of exciting applications, far more than Ray and I ever anticipated. When we started work on SQL, the internet did not exist, but nevertheless, SQL has turned out to be an essential tool for commerce on the internet. Several open-source SQL systems are available, so anyone who wants to can use SQL for free.

In developing SQL, Ray Boyce and I were standing on the shoulders of giants: Turing Award winners Charlie Bachman, who invented the concept of an integrated database system, and Ted Codd, who invented the relational model of data. What Ray and I were able to do was to couch some of these technical ideas in plain English terms that were easy for everybody to understand. I’m grateful for the work of the System R team at IBM and the Ingres team at U.C. Berkeley, who built the first prototype relational database systems, to Larry Ellison at Oracle, who was the first to establish the market for SQL systems, and to Jim Melton, who dedicated his career to creating and editing the international SQL standard. It has been a privilege for me to work with all these people.

Pramod: Thank You, Don, for your contributions to Computer Science and Thanks for being on the show. It was a pleasure talking to you.

I enjoyed this episode speaking to Don Chamberlin, very informative. I learned about the creation of databases, database management systems and SQL. And Don explained it very well. I hope you enjoyed listening to him. Next episode I will be speaking to Lead Language Designer of Kotlin Andrey Breslav. I have many questions for Andrey and should be interesting to talk to him. I have a request to all the listeners, I have completed ten episodes, that’s a significant milestone for me and please me provide me your feedback. Leave a review on iTunes or rate me on the facebook page. Until then, bye. You all have a good day.

Leave a Reply

5 Comments on "Episode 11: Interview with Don Chamberlin, designer of SQL database language"

Notify of
avatar
Sort by:   newest | oldest | most voted
fish
Guest

exceptional

Michael Gorman
Guest

Wow! What an interesting article. To bad it largely didn’t happen that way at all. When you’re an employee of IBM you get opportunities to make up your version of history long after it happened.

John the Scott
Guest

not so sure codd “invented” relations. set theory goes way back to bool, cantor, and more recently pierce.

http://boole.stanford.edu/pub/ocbr.pdf

nisha
Guest

Its really a great and useful piece of information. Im glad that you shared this helpful information with us.
Please stay us up to date like this. Thanks for sharing.

Bob Sanderson
Guest

It is wonderful to read the story of wonderful people. Thanks for this article. Please us again