Subpar Trees For Unclosed Blocks In Tree-sitter-go A Discussion
Hey everyone! Let's dive into a peculiar issue we've encountered with tree-sitter-go
when dealing with unclosed blocks. Specifically, we're seeing subpar parse trees generated when a function body is left unclosed.
The Problem: ERROR
Node and Misplaced Comments
Consider this Go snippet:
package pkg
func foo() {
// bar
When tree-sitter-go
parses this, instead of a MISSING "}"
node, we get an ERROR
node. The really odd part is that the comment
node ends up outside the function_declaration
. Check out the tree structure:
source_file [0, 0] - [5, 0]
package_clause [0, 0] - [0, 11]
package [0, 0] - [0, 7]
package_identifier [0, 8] - [0, 11]
[0, 11] - [2, 0]
function_declaration [2, 0] - [2, 10]
func [2, 0] - [2, 4]
name: identifier [2, 5] - [2, 8]
parameters: parameter_list [2, 8] - [2, 10]
( [2, 8] - [2, 9]
) [2, 9] - [2, 10]
ERROR [2, 11] - [2, 12]
{ [2, 11] - [2, 12]
comment [3, 0] - [3, 6]
This becomes problematic because any tooling relying on the parse tree would have difficulty correctly associating the comment with the function. This unexpected behavior also occurs with completely empty function bodies; we get an ERROR
instead of the expected MISSING
node.
Why is This Happening?
It seems the parser struggles to recover gracefully from the missing closing brace when there are no other statements within the block. This leads to the ERROR
node and the misplacement of the comment. Understanding why this happens requires digging deeper into the grammar and how tree-sitter handles error recovery. The current grammar rules might not be explicitly defining how to handle an unclosed block, leading to this ambiguous parsing outcome.
Impact on Tooling
The ramifications of this issue extend to various tools that depend on the accuracy of the parse tree. For instance, linters, code formatters, and static analysis tools could misinterpret the code structure, leading to incorrect diagnostics or unexpected behavior. Imagine a linter trying to analyze the function body but being unable to do so because the comment is incorrectly placed outside the function declaration. This could lead to missed warnings or errors, ultimately impacting code quality.
A More Useful Tree with Content
Interestingly, the situation improves when we add some content inside the function body. Consider this example:
package pkg
func foo() {
bar()
Now, we get a more informative tree:
source_file [0, 0] - [5, 0]
package_clause [0, 0] - [0, 11]
package [0, 0] - [0, 7]
package_identifier [0, 8] - [0, 11]
[0, 11] - [2, 0]
function_declaration [2, 0] - [5, 0]
func [2, 0] - [2, 4]
name: identifier [2, 5] - [2, 8]
parameters: parameter_list [2, 8] - [2, 10]
( [2, 8] - [2, 9]
) [2, 9] - [2, 10]
body: block [2, 11] - [5, 0]
{ [2, 11] - [2, 12]
expression_statement [3, 0] - [3, 5]
call_expression [3, 0] - [3, 5]
function: identifier [3, 0] - [3, 3]
arguments: argument_list [3, 3] - [3, 5]
( [3, 3] - [3, 4]
) [3, 4] - [3, 5]
[3, 5] - [5, 0]
MISSING "}" [5, 0] - [5, 0]
We now have a `MISSING