Season of Docs 2020 Proposal for Apache Flink (Hezeh)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Season of Docs 2020 Proposal for Apache Flink (Hezeh)

Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: <https://bit.ly/gsod-tw-projectselection>. The form
is also available in the guide for organization administrators
<https://developers.google.com/season-of-docs/docs/admin-guide#tech-writer-application-phase>.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
<https://developers.google.com/season-of-docs/docs/timeline> on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at [hidden email].

Best,
The Google Season of Docs team


Title: Restructure the Table API and SQL documentation Project length:
Standard length (3 months)
Writer information *Name:* Hezeh
*Email:* [hidden email]


Writing experience: Experience 1:
*Title:* Wrote Documentation for Samba AD DC Cockpit Plugin
*Date:* May 2020 - present
*Description:* Wrote documentation for the Samba Active Directory Domain
Controller cockpit plugin.
The project involved developing a plugin for use by Active Directory
administrators to perform various functions usually done using the
samba-tool command line utility.
My responsibilities were writing the required code for the plugin, testing
the code and documenting the UI designs and code functionality including
future developments needed for the project.
*Summary:* By the time of writing this proposal the project is yet to be
merged to the Samba but do hope it will be accepted and merged.

*Sample:* https://gitlab.com/HezekiahM/samba-ad-dc
*Additional information:* Over the past couple of months the I have been
really interested in working with data processing engines. Streaming
frameworks especially Apache Flink and Apache Spark seemed the naturally
way for me to dive into doing stream processing and batch processing. I
decided to go with Flink it was a true streaming engine and I must admit
they have good documentation. I do believe that stream processing is the
future of big data processing and being able to give Apache Flink users
great documentation will be key to more adoption. The first dive into a
given open source software has always been the documentation. Though I
still have a gap to close in understanding Apache Flink I do believe that
through this project I will understand more deeply how its works under the
hood.
Project Description The documentation for Table & SQL API has evolved over
time. In the meantime, more streaming concepts, a SQL Client, connectors,
catalogs etc. were added. Also the vision of a unified API for both batch
and streaming is making progress.

It is time to rework the structure in order to attract and target both pure
SQL users and programming users, find features more easily and have a clear
reading flow/order of topics.

The work to be done can be segmented into the following areas:

An Overview
The overview should describe have what is the Table and SQL Ecosystem. It
should also give an executive summary which is informative on areas
including main features like schema awareness, abstraction, connections and
catalogs. How to achieve a unified data processing through Dynamic Tables.
Detailed advantages and disadvantages over DataStream API, E2E example for
SQL, E2E example for Table API in Java/Scala/Python, E2E example for Table
API in Java/Scala/Python with DataStream API and short presentation of the
SQL Client.

Table API
The Table API's goal is to show the main table features early and link to
concepts if necessary.
It should give and overview which is set to include a short getting started
with link to more detailed execution section, explain the most important
methods in unified Table Environment, present sqlUpdate/sqlQuery and
querying, execution, optimization internals behind the API.
A full reference should describe the available operations in the API. This
location allows to further split the page in the future if we think and
operation needs mores pace without affecting the top-level structure. It
should take into account more features that are added in the future.

SQL
The goal of the SQL documentation is to show users the main features early
and link to concepts if necessary. The documentation is intended for users
with SQL knowledge.
The following should be present an overview with a getting started with
link to more detailed execution section, a full references with the
available operations in SQL as a table, data definition to explain special
SQL syntax around DDL, pattern matching with more features to be added in
the future.

Setup and Execution
The setup and execution should describe how to setup a project and submit a
job this includes dependency structure, table environments, features and
limitations of table environments and setting up python projects, how to
use the SQL client.

Connect to External Systems
This section should describe how to connect to other systems for data or
metadata giving an overview of available sources and sinks, what are
catalogs and their use case. Available connectors, catalogs and hive should
also be documented.

Other sections to be documented include built-in functions, configuration,
user-defined extensions and concepts that any user should know about.
{{EXTRA16}} {{EXTRA17}}