It’s an excellent paper, about 18 pages, and great reading for folks looking for a primer.
We use data in many different ways, and the volume, variety, and velocity of that data increase every day. Because of this, organizations rely on lots of different data technologies. Taken as a group, these technologies make up a data platform.
One way to think about the technologies in a data platform is to divide them into three categories based on the kind of data they work with. Those categories are:
- Operational data, such as transactional data used by a banking system, an online retailer, or an ERP application. This data is typically both read and written by applications, commonly in response to user requests. A banking application might read your account balance, for instance, then write a new value to reflect a deposit you make. And while operational data was once almost entirely relational, the increasing volume and variety of data have changed this. Today, working with unstructured operational data can be just as important.
- Analytical data, such as the information kept in a data warehouse. This data is typically read-only, and it usually includes historical information extracted over time from other data sources, such as operational databases. Analytical data is commonly used for things such as business intelligence and machine learning, and like operational data, it can be either relational or unstructured.
- Streaming data, such as data produced by sensors. The defining characteristic of streaming data is velocity; if the data isn’t processed quickly, it can lose a large share of its value. Many streaming scenarios today relate to the Internet of Things (IoT), where the focus is on interacting with data provided by lots of devices. Streaming data is also used in other situations, such as analyzing financial transactions as they happen. In both cases, the challenge is to work effectively with large amounts of data being produced in real time.
The Microsoft data platform provides technologies for all three categories, along with connections among the three. Figure 1 summarizes the platform’s offerings in each area. [See graphic to the right]
This paper focuses on the middle column in the figure, Microsoft’s offerings for working with analytical data. (For more on the other two categories, see the companion papers Operational Data Scenarios Using the Microsoft Data Platform and Streaming Data Scenarios Using the Microsoft Data Platform.) And don’t be confused by the diagram: These technologies aren’t layered in the sense that each one depends on the others shown below it. Instead, think of each column as a group of technologies for working with data in a particular way. Also, realize that the lines between the columns are permeable—these technologies can be used together in various combinations. For example, the analytical technologies in the center column are often used together with both the operational technologies in the left column and the streaming technologies in the right column.
Download the whitepaper here:
- WHITEPAPER: David Chappell’s “Analytical Scenarios using the Microsoft Data Platform”