
Read the text and write a short brief using the following phrases: The article is devoted … The
article deals (is concerned) with … The article touches upon the issues of… The article is about… The purpose (aim) of the article is… Much attention is given to… It is reported that.. It is spoken in detail about… The article gives a detailed analysis of.. The following conclusions are drawn… To avoid a data swamp, set out to build a data reservoir, based on a systematic approach, sound architecture and a set of best practices. THE SYSTEMATIC APPROACH How do most people approach the creation of a data lake? They say, “We’ll figure it out as we go along.” But, what happens is that as soon as the word gets out about a data lake’s existence, employees start adding data to the lake. Data will come in very rapidly, and each user will do things his or her own way. Before you know it, you will have the proverbial data swamp. A better approach is one in which major problems are anticipated, solutions are defined in advance and users work together. Let’s take the example of sharing data. Imagine you want to have a copy of all the social media tweets relevant to your business. If the tweets are obtained for one employee, you don’t want someone else to have to go out to another vendor and buy them again. Effective data sharing is fundamental to the business value of the data reservoir. It’s also important for users to be able to find what data is already present in the data lake and learn about the data to tell if it’s suitable for use. This effort requires an architecture and best practices that take data sharing into account. You will also need tools to make it easy to search data and their associated metadata. Organizations should anticipate the importance of data sharing from the beginning of a data lake project so that data is shared and reused throughout the organization. All things considered, a successful data lake approach will identify fundamental issues up front and address them with an integrated architecture and essential best practices. SOUND ARCHITECTURE As mentioned above, one component of data reservoir architecture is the management of metadata to encourage and support data reuse. Here are other key capabilities: Data ingestion: It must be straightforward and efficient to bring data from a new source into the data reservoir. In particular, custom coding and scripting should be avoided. Archiving data as sourced: Many data reservoirs require that a copy of the data as originally received is available for audit, traceability, reproducibility and some data science techniques. Thus, an automatic and efficient way to archive a copy of the source data, usually with lossless compression and sometimes in encrypted form is required. Data transformation: To prepare data for analytics, the set of all necessary transformations in a given data reservoir is likely to be large. A minimum amount of custom programming should be the goal. Data publication: Data ingestion, data transformation and metadata capture should be completed so that data can be used. The act of “publication” makes data ready for a specific class of reports, dashboards or queries. Security: A data reservoir should manage access to data objects, certain s services (such as Hive or HBase), specific applications and to the Hadoop cluster itself. A security architecture and strategy should protect the perimeter, handle identification and authorization of users, control access to data, address needs for encryption, masking, and tokenization, and comply with requirements for logging, reporting and auditing. Operations and management: When a data reservoir is operating at full scale, there will be many data pipelines in operation at the same time, each ingesting, transforming and publishing data. There will also be processes for consuming data, either via extract and download or via interactive reporting and query. BEST PRACTICES In the creation of data pipelines – ingestion, transformation and publication – a best practice is correctly performing the necessary steps to make data consistently usable. This means capturing metadata at every touch point, handling exceptions correctly at every step, completing appropriate data quality checks, efficiently handling incorrect data as well as correct data, and performing any data conversions or normalizations according to specified standards. ---->продолжение статьи в комментарии

Ответы на вопрос

Ответ:
The article is devoted to building a data reservoir, based on a systematic approach, sound architecture and a set of best practices.
The article deals with an approach in which major problems are anticipated, solutions are defined in advance and users work together.
The article touches upon the issues of effective data sharing, which is fundamental to the business value of the data reservoir.
The article is about the importance of data sharing from the beginning of a data lake project so that data is shared and reused throughout the organization. All things considered, a successful data lake approach will identify fundamental issues up front and address them with an integrated architecture and essential best practices.
The purpose of the article is to introduce the audience to effective data reservoir architecture, metadata management and data reuse.
Much attention is given to key capabilities such as data injection, archiving data as sourced, traceability and reproducibility, data transformation, data publication and security.
It is reported that when a data reservoir is operating at full scale, there will be many data pipelines in operation at the same time, each ingesting, transforming and publishing data. There will also be processes for consuming data, either via extract and download or via interactive reporting and query.
It is spoken in detail about the best practices.
The article gives a detailed analysis of the ways your data will provide an appropriate foundation for rigorous analysis to enable significant business decisions and actions.
The following conclusions are drawn: instead of a data swamp that no one uses, you can have a successful data reservoir in which data is well organized and provides increasing value.
Объяснение:



The article is devoted to the creation and management of a data reservoir, specifically focusing on a systematic approach, sound architecture, and best practices. It deals with the issues of avoiding a data swamp by implementing a structured strategy in building and maintaining a data lake. The article is about the importance of anticipating major problems, defining solutions in advance, and fostering collaboration among users to prevent chaotic data accumulation.
The purpose of the article is to provide guidance on establishing an effective data reservoir that promotes data sharing, reuse, and accessibility. Much attention is given to the concept of data sharing as a fundamental aspect of deriving business value from the data reservoir. The article emphasizes that users should be able to locate existing data within the reservoir and determine its suitability for use.
The article gives a detailed analysis of the systematic approach required for successful data lake implementation. It discusses the significance of identifying issues early in the process and addressing them through a cohesive architecture and best practices. It is reported that taking a proactive stance on data sharing and reuse contributes to the long-term success of a data reservoir project.
In terms of architecture, the article emphasizes the need for a sound approach to managing metadata to encourage data reuse. Key capabilities, including data ingestion, archiving of original data, data transformation, and data publication, are highlighted. The article speaks in detail about the importance of minimizing custom coding and scripting, while also stressing the automatic archiving of source data for audit and reproducibility purposes.
In conclusion, the article draws the following conclusions: A successful data reservoir requires a systematic approach that anticipates challenges, an architecture that encourages data sharing and reuse, and adherence to essential best practices. The integration of these elements is essential for achieving a well-organized and valuable data reservoir.


Похожие вопросы
Топ вопросов за вчера в категории Английский язык
Последние заданные вопросы в категории Английский язык
-
Математика
-
Литература
-
Алгебра
-
Русский язык
-
Геометрия
-
Английский язык
-
Химия
-
Физика
-
Биология
-
Другие предметы
-
История
-
Обществознание
-
Окружающий мир
-
География
-
Українська мова
-
Информатика
-
Українська література
-
Қазақ тiлi
-
Экономика
-
Музыка
-
Право
-
Беларуская мова
-
Французский язык
-
Немецкий язык
-
МХК
-
ОБЖ
-
Психология
-
Физкультура и спорт
-
Астрономия
-
Кыргыз тили
-
Оʻzbek tili