Friday, April 12, 2019

Data Persistence

The role of Data in information systems

In computing, data is information that has been translated into a form that is efficient for movement or processing. Relative to today's computers and transmission media, data is information converted into binary digital form. It is acceptable for data to be used as a singular subject or a plural subject. Raw data is a term used to describe data in its most basic digital format.

Persistence is "the continuance of an effect after its cause is removed". In the context of storing data in a computer system, this means that the data survives after the process with which it was created has ended. In other words, for a data store to be considered persistent, it must write to non-volatile storage.
•Information systems process data and convert them into information
•The data should persist for later use
•To maintain the status
• For logging purposes
•To further process and derive knowledge (data science)



Data


Information in raw or unorganized form (such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects.



Database


A database is a collection of information that is organized so that it can be easily accessed, managed and updated.


Database Server

Database server is the term used to refer to the back-end system of a database application using client/server architecture. The back-end, sometimes called a database server, performs tasks such as data analysis, storage, data manipulation, archiving, and other non-user specific tasks.



Database Managment System

A database management system (DBMS) is system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.
A DBMS makes it possible for end users to create, read, update and delete data in a database. The DBMS essentially serves as an interface between the database and end users or application programs, ensuring that data is consistently organized and remains easily accessible.
________________________________________________________________________________________



Pros and Cons of  File System vs Database


Pros of the File system:
  • Performance can be better than doing it in db. To justify this, If you store large files in db then it may slow down the performance because a simple query to retrieve the list of files or filename will also load the file data if you used Select * in your query. While Files system accessing a file is quite simple and light weight.
  • Saving the files and downloading them in the file system is much simpler than database since a simple Save as function will help you out. Downloading can be done by addressing an URL with the location of the saved file.
  • Migrating the data is an easy process here. You can just copy and paste the folder to your desired destination while ensuring that write permissions are provided to your destination.
  • Cost effective as It is Economical in most of the cases to expand your web server rather than paying for certain Databases.
  • Easy to migrate it to Cloud storage like Amazon S3 or CDNs etc in the future.


Cons of the File system:
  • Loosely packed. No ACID (Atomicity, Consistency, Isolation, Durability) operations relational mapping which mean there is no guarantee. Consider a scenario if your files are deleted from the location manually or by some hacking dudes, you might not know whether the file exists or not.
  • Low Security. Since your files can be saved in a folder where you should have provided write permissions, it is prone to safety issues and invites troubles like hacking. So it is best to avoid saving in fs if you cannot afford to compromise in terms of security.



Pros of Database:
  • ACID consistency which includes a rollback of an update that is complicated when the files are stored outside the database.
  • Files will be in sync with the database so cannot be orphaned from it which gives you an upper hand in tracking transactions.
  • Backups automatically include file binaries.
  • More Secure than saving in a File System.


Cons of Database:

  • You may have to convert the files to blob in order to store it in db.
Database Backups will become more hefty and heavy.

  • Memory ineffective. To add more, often RDBMS’s are RAM driven. So all data has to go to RAM first. Yeah, that’s right. Had you ever thought about what happens when an RDBMS has to find and sort data? RDBMS tracks each data page even lowest amount of data read/written, and it has to track if it’s in memory or if it’s on disk if it’s indexed or sorted physically etc.
________________________________________________________________________________


Data arrangement


Unstructured Data

Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a database.


Semi-structured

Semi-structured data is information that doesn't reside in a relational database but that does have some organizational properties that make it easier to analyze. Examples of semi-structured data might include XML documents and NoSQL databases.


_______________________________________________________________________________


Big Data vs Data Warehouses







_______________________________________________________________________________

ORM Tools

Irwsoft Data Framework

Irwsoft Data Framework is a lightweight ORM that integrates directly with Visual Studio to generate table, view, function and procedure classes directly from a database.


LLBLGen Pro

A sort of uber-ORM you might want to evaluate for more complex development situations is LLBLGen Pro, which provides both data model development and ORM functionality in one package.


nHydrate ORM Modeler

Inspired by NHibernate is an open source ORM solution for mapping relational databases to .NET objects. nHydrate now uses Entity Framework as its internal data access layer and provides a visual modeler of data relationships.


ORMapster

ORMapster for Visual Studio 2013 is a simple ORM data mapper and code generator extension that does one thing: iI reads your data source and creates a data access layer with LINQ self-tracking entities. That's it. The resulting data entities can then be used in your projects, including WCF, ASP.NET MVC, Web Forms, Windows Forms and so on.


________________________________________________________________________________


NoSQL

Benefits of NoSQL 

• When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address.

 • Large volumes of rapidly changing structured, semi-structured, and unstructured data. 

Agile sprints, quick schema iteration, and frequent code pushes

Object-oriented programming that is easy to use and flexible

Geographically distributed scale-out architecture instead of expensive, monolithic architecture


NoSQL Database Types

Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.

Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph.
    Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality.

    Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

________________________________________________________________________________

Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

________________________________________________________________________________

1 comment:

  1. I appreciate you taking the time and effort to share your knowledge. This material proved to be really efficient and beneficial to me. Thank you very much for providing this information. Continue to write your blog.

    Data Engineering Services 

    Artificial Intelligence Solutions

    Data Analytics Services

    Data Modernization Services

    ReplyDelete

Client - side Development II - RiWAs

Key features of RiWAs Direct interaction:  In RiWAs, users can interact directly with page elements through editing or drag-and-drop too...