Season of Docs 2020 Proposal for Apache Flink (Kartik Khare)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Season of Docs 2020 Proposal for Apache Flink (Kartik Khare)

Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: <https://bit.ly/gsod-tw-projectselection>. The form
is also available in the guide for organization administrators
<https://developers.google.com/season-of-docs/docs/admin-guide#tech-writer-application-phase>.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
<https://developers.google.com/season-of-docs/docs/timeline> on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at [hidden email].

Best,
The Google Season of Docs team


Title: Extend the Table API & SQL Documentation Project length: Standard
length (3 months)
Writer information *Name:* Kartik Khare
*Email:* [hidden email]
*Résumé/CV:*
https://drive.google.com/file/d/1pCRIriLtwOnG5RJ2BXY4AeCiekAKLRdX/view?usp=sharing


Writing experience: Experience 1:
*Title:* Participated in GSoD 2019 for Apache Airflow
*Date:* Aug 2019 - Dec 2019
*Description:* My project involved writing documentation on How to create
the Workflow in Airflow.
*Summary:* I submitted 4 PRs for documentations which involved docs on
creating a custom operator in Airflow, Creating a new DAG and Best
practices guide for Production. I also wrote a blog on their site
summarizing my experience.
https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/
Here are the docs which were published on the official website -
https://airflow.apache.org/docs/stable/best-practices.html
https://airflow.apache.org/docs/stable/howto/custom-operator.html


Experience 2:
*Title:* Documentation for Apache Pinot
*Date:* Jun 2020 - Present
*Description:* Currently working on revamping the documentation for Apache
Pinot. This involves restructuring the whole documentation and migrating it
from gitbook to a proper rst based format. This is currently in progress.
*Summary:* The work has not yet been published. The discussions are
happening on our Notion page.

*Sample:*
https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/
*Additional information:* I participated in GSoD 2019 where I contributed
documentation for Apache Airflow. Currently, I am writing and re-writing
the documentation for Apache Pinot. I am a data engineer by profession and
have been using Apache Flink for the last 3 years. I have written multiple
articles on Flink on my blog and contributed The guide to Unit-testing to
the official Flink blog. Apart from Airflow, I have taken technical writing
for Apache Pinot as well and I am currently working on revamping their
documentation E2E. I have 300K+ reads on my blog on
https://medium.com/@kharekartik. I follow the Google developer
documentation style guide (https://developers.google.com/style ) while
writing most of the documentation. 2 years ago I started working on a DSL
for Flink for my company. However, by the time I completed it, Flink's SQL
matured to production-grade levels. Since then I have been experimenting
with Flink SQL and have been willing to contribute back to the community. I
see GSoD as an opportunity to deeply engage with the organization, write
some docs, have fun, and become a long-lasting member of the community.
Project Description Flink SQL shows staggering promise. The idea that you
can work on Streams using just SQL is obvious but complicated. I believe
Flink SQL can ultimately remove the need for data engineers for trivial
analysis on realtime data such as counting requests for Uber/Lyft in
various cities in realtime. The first step towards this goal is to simplify
the documentation such that someone with an analytics background can also
get started.

Here are some of the improvements I am planning for the docs. We can
discuss more of them when we actually get started with the documentation -
* The overview page doesn't contain any examples. It should contain easy
examples to provide an intro to Table API.
* Concepts & Common API page should be split up.
* The functions should have a separate page for each category. The
functions should also be listed in a tabular fashion with name, input data,
parameters (if any), and return value as columns.
* Data Type page should contain proper mappings of JAVA data-type → SQL
data-type. The current representation is in the form of paragraphs but
ideally, it should be a table.
* The expression Syntax section on Table API page should have better
formatting.
* A completely new page on Planners and how users can take advantage of
them.
* A page on optimizations that are right now scattered throughout multiple
sections.
SQL Overview page also needs to be reformatted.

The content also needs a rework as it feels a bit overwhelming. We also
need to add a section on where you can use Table API instead of Streaming
API with comparisons such as a decrease in lines of code, better implicit
optimizations, better readability, and easier debugging. {{EXTRA16}}
{{EXTRA17}}